statanalysis.utils_md package

Submodules

statanalysis.utils_md.compute_ppf_and_p_value.get_f_value(cf, ddl)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value(Z: float, tail: str, test: str, ddl: int | None = None, debug=False)

get p value based on

(if test==”t_test”) student distribution T(df=ddl) with ddl degres of freedom
(if test==”z_test”) normal distribution N(0, 1)
(if test==”f_test”) normal distribution F(ddl[0], ddl[1])

if tail

right: return P(T > Z)
left: return P(T < Z)
middle: return P(T < -|Z|) + P(T > |Z|) => return 2*P(T > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_f_test(Z: float, dfn: int, dfd: int, debug: bool = False)

get p value based on fisher distribution T(dfn, dfd) with ddl degres of freedom

Utils

[F-distribution - wiki](https://en.wikipedia.org/wiki/F-distribution)
tail is right because Fisher is positive

Parameters:

Z (float) – _description_
dfn (int) – _description_
dfd (int) – _description_
debug (bool, optional) – _description_. Defaults to False.

Raises:

Exception – _description_

Returns:

_description_

Return type:

_type_

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_from_tail(prob, tail, debug=False)

get p value based on cdf and tail If tail=Tails.middle, the distribution is assumed symmetric because we double F(Z) if tail

right: return P(N > Z) = 1- F(Z) = 1 - prob

left: return P(N < Z) = F(Z) = prob

middle: return P(N < -|Z|) + P(N > |Z|) => return 2*P(N > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_t_test(Z: float, ddl, tail: str, debug: bool = False)

get p value based on student distribution T(df=ddl) with ddl degres of freedom if tail

right: return P(T > Z)

left: return P(T < Z)

middle: return P(T < -|Z|) + P(T > |Z|) => return 2*P(T > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_z_test(Z: float, tail: str, debug=False)

get p value based on normal distribution N(0, 1) if tail

right: return P(N > Z)

left: return P(N < Z)

middle: return P(N < -|Z|) + P(N > |Z|) => return 2*P(N > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_t_value(cf, ddl)

statanalysis.utils_md.compute_ppf_and_p_value.get_z_value(cf)

statanalysis.utils_md.constraints.check_equal_var(*samples, alpha=0.05)

_summary_

Parameters:: alpha (_type_, optional) – _description_. Defaults to COMMON_ALPHA_FOR_HYPH_TEST.

Utils - use levene test [plus robuste que fisher ou bartlett face à la non-normalité de la donnée](https://fr.wikipedia.org/wiki/Test_de_Bartlett)

Returns:: _description_
Return type:: _type_

statanalysis.utils_md.constraints.check_hyp_min_sample(n: int, p: int | None = None)

statanalysis.utils_md.constraints.check_hyp_min_samples(p1: float, p2: float, n1: int, n2: int, overall=False)

statanalysis.utils_md.constraints.check_or_get_alpha_for_hyph_test(alpha=None)

statanalysis.utils_md.constraints.check_or_get_cf_for_conf_inte(confidence=None)

statanalysis.utils_md.constraints.check_sample_normality(residuals: list, debug=False, alpha=None)

check if residuals is like a normal distribution - test_implemented

Parameters:

residuals (list) – list of float or array-like (will be flatten)
debug (bool, optional) – _description_. Defaults to False.

Returns:

if all tests passed

Return type:

bool

statanalysis.utils_md.constraints.check_zero_to_one_constraint(*args)

statanalysis.utils_md.constraints.print(*args)

statanalysis.utils_md.estimate_std.compute_slope_std(X, y, y_hat, debug=False, skipcst=True)

statanalysis.utils_md.estimate_std.estimate_std(sample): Instead of std, he divide by (n-1) correspondng to the std estimator used in t-test

statanalysis.utils_md.preprocessing.clear_list(L: list) → ndarray

remove nan from a list

Parameters:: L (list) – a 1-dim array (n,). Anyway, data will be flatten

What about he handle missing values properly !!

weight shit
Anyway, it would be good to know how missing values removal the distribution of L

Returns:: array of shape (n,)
Return type:: 1-dim array

Examples

>>> A = np.array([
        [1,3],
        [4,3],
        [5,3],
        [7,np.nan]
        ])
>>> y = np.array([6,np.nan,3,2])
>>> A1 = clear_list(A)
>>> y1 = clear_list(y)
>>> print("A1: ",A1)
A1:  array([1, 3, 4, 3, 5, 3])
>>> print("y: ",y1)
y:  array([6. 3. 2.])

statanalysis.utils_md.preprocessing.clear_list_pair(L1, L2) → Tuple[ndarray, ndarray]

remove nan values (remove observation data containing nan value in L1 or L2) from 2 lists

Parameters:

L1 (list) – a 1-dim array (n,). Anyway, data will be flatten
L2 (list) – a 1-dim array (n,). Anyway, data will be flatten

What about he handle missing values properly !!

weight shit
Anyway, it would be good to know how missing values removal the distribution of L

Raises:: L1 and L2 have different size – lists must be of the same size
Returns:: L1 of shape(n,) 1-dim array: L2 of shape(n,)
Return type:: 1-dim array

Examples

>>> y1 = np.array([4, 8,np.nan,2])
>>> y2 = np.array([6,np.nan,36,9])
>>> y1,y2 = clear_list_pair(y1, y2)
>>> print("y1: ",y1)
y1:  array([4, 2])
>>> print("y2: ",y2)
y2:  array([6. 9.])

statanalysis.utils_md.preprocessing.clear_mat_vec(A, y) → Tuple[ndarray, ndarray]

Remove nan values (remove observation data containing nan value in X or y) from a matric and a corresponding vector

Parameters:

A (2-dimensional array (n,p)) –
y (1-dimensional array (n,)) –
Others –
---------- –
!! (What about he handle missing values properly) –
- weight shit
- Anyway, it would be good to know how missing values removal the distribution of L

Raises:

L1 and L2 have different size – lists must be of the same size:

Returns:

1-dim array (L1 of shape(n,))
1-dim array (L2 of shape(n,))

Examples

>>> A = np.array([
        [1,3],
        [4,3],
        [5,3],
        [7,np.nan]
        ])
>>> y = np.array([6,np.nan,3,2])
>>> A1,y1 = clear_mat_vec(A,y)
>>> print("A1: ",A1)
A1:  [[1. 3.]
     [5. 3.]]
>>> print("y: ",y1)
y:  [6. 3.]

class statanalysis.utils_md.refactoring.Confidence_data(parameter: float, sample_size: int, confidence: int, marginOfError: float, interval: tuple)

Bases: object

confidence: int

interval: tuple

marginOfError: float

parameter: float

sample_size: int

class statanalysis.utils_md.refactoring.HypothesisValidationData(testPassed: bool, obj: dict = None)

Bases: object

obj: dict = None

testPassed: bool

class statanalysis.utils_md.refactoring.Hypothesis_data(parameter: float, pnull: float, std_stat_eval: float, tail: str, sample_size: int, alpha: int, Z: float, p_value: float, reject_null: bool)

Bases: object

Z: float

alpha: int

p_value: float

parameter: float

pnull: float

reject_null: bool

sample_size: int

std_stat_eval: float

tail: str

class statanalysis.utils_md.refactoring.RegressionFisherTestData(DFE: float = None, SSE: float = None, MSE: float = None, DFR: float = None, SSR: float = None, MSR: float = None, DFT: float = None, SST: float = None, MST: float = None, R_carre: float = None, R_carre_adj: float = None, F_stat: float = None, p_value: float = None, reject_null: float = None)

Bases: object

DFE: float = None

DFR: float = None

DFT: float = None

F_stat: float = None

MSE: float = None

MSR: float = None

MST: float = None

R_carre: float = None

R_carre_adj: float = None

SSE: float = None

SSR: float = None

SST: float = None

p_value: float = None

reject_null: float = None

class statanalysis.utils_md.refactoring.Tails

Bases: object

INF_SYMB = 'p<p0'

NEQ_SYMB = 'p!=p0'

SUP_SYMB = 'p>p0'

get_tail_from_symb()

left = 'tail-left'

middle = 'tail-middle'

norm_tail()

right = 'tail-right'

statanalysis.utils_md package

Submodules

Module contents