statanalysis.utils_md package

Submodules

statanalysis.utils_md.compute_ppf_and_p_value.get_f_value(cf, ddl)
statanalysis.utils_md.compute_ppf_and_p_value.get_p_value(Z: float, tail: str, test: str, ddl: int | None = None, debug=False)
get p value based on
  • (if test==”t_test”) student distribution T(df=ddl) with ddl degres of freedom

  • (if test==”z_test”) normal distribution N(0, 1)

  • (if test==”f_test”) normal distribution F(ddl[0], ddl[1])

if tail
  • right: return P(T > Z)

  • left: return P(T < Z)

  • middle: return P(T < -|Z|) + P(T > |Z|) => return 2*P(T > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_f_test(Z: float, dfn: int, dfd: int, debug: bool = False)

get p value based on fisher distribution T(dfn, dfd) with ddl degres of freedom

Utils
Parameters:
  • Z (float) – _description_

  • dfn (int) – _description_

  • dfd (int) – _description_

  • debug (bool, optional) – _description_. Defaults to False.

Raises:

Exception – _description_

Returns:

_description_

Return type:

_type_

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_from_tail(prob, tail, debug=False)

get p value based on cdf and tail If tail=Tails.middle, the distribution is assumed symmetric because we double F(Z) if tail

  • right: return P(N > Z) = 1- F(Z) = 1 - prob

  • left: return P(N < Z) = F(Z) = prob

  • middle: return P(N < -|Z|) + P(N > |Z|) => return 2*P(N > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_t_test(Z: float, ddl, tail: str, debug: bool = False)

get p value based on student distribution T(df=ddl) with ddl degres of freedom if tail

  • right: return P(T > Z)

  • left: return P(T < Z)

  • middle: return P(T < -|Z|) + P(T > |Z|) => return 2*P(T > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_z_test(Z: float, tail: str, debug=False)

get p value based on normal distribution N(0, 1) if tail

  • right: return P(N > Z)

  • left: return P(N < Z)

  • middle: return P(N < -|Z|) + P(N > |Z|) => return 2*P(N > |Z|)

statanalysis.utils_md.compute_ppf_and_p_value.get_t_value(cf, ddl)
statanalysis.utils_md.compute_ppf_and_p_value.get_z_value(cf)
statanalysis.utils_md.constraints.check_equal_var(*samples, alpha=0.05)

_summary_

Parameters:

alpha (_type_, optional) – _description_. Defaults to COMMON_ALPHA_FOR_HYPH_TEST.

Utils - use levene test [plus robuste que fisher ou bartlett face à la non-normalité de la donnée](https://fr.wikipedia.org/wiki/Test_de_Bartlett)

Returns:

_description_

Return type:

_type_

statanalysis.utils_md.constraints.check_hyp_min_sample(n: int, p: int | None = None)
statanalysis.utils_md.constraints.check_hyp_min_samples(p1: float, p2: float, n1: int, n2: int, overall=False)
statanalysis.utils_md.constraints.check_or_get_alpha_for_hyph_test(alpha=None)
statanalysis.utils_md.constraints.check_or_get_cf_for_conf_inte(confidence=None)
statanalysis.utils_md.constraints.check_sample_normality(residuals: list, debug=False, alpha=None)

check if residuals is like a normal distribution - test_implemented

Parameters:
  • residuals (list) – list of float or array-like (will be flatten)

  • debug (bool, optional) – _description_. Defaults to False.

Returns:

if all tests passed

Return type:

bool

statanalysis.utils_md.constraints.check_zero_to_one_constraint(*args)
statanalysis.utils_md.constraints.print(*args)
statanalysis.utils_md.estimate_std.compute_slope_std(X, y, y_hat, debug=False, skipcst=True)
statanalysis.utils_md.estimate_std.estimate_std(sample)

Instead of std, he divide by (n-1) correspondng to the std estimator used in t-test

statanalysis.utils_md.preprocessing.clear_list(L: list) ndarray

remove nan from a list

Parameters:

L (list) – a 1-dim array (n,). Anyway, data will be flatten

What about he handle missing values properly !!
  • weight shit

  • Anyway, it would be good to know how missing values removal the distribution of L

Returns:

array of shape (n,)

Return type:

1-dim array

Examples

>>> A = np.array([
        [1,3],
        [4,3],
        [5,3],
        [7,np.nan]
        ])
>>> y = np.array([6,np.nan,3,2])
>>> A1 = clear_list(A)
>>> y1 = clear_list(y)
>>> print("A1: ",A1)
A1:  array([1, 3, 4, 3, 5, 3])
>>> print("y: ",y1)
y:  array([6. 3. 2.])
statanalysis.utils_md.preprocessing.clear_list_pair(L1, L2) Tuple[ndarray, ndarray]

remove nan values (remove observation data containing nan value in L1 or L2) from 2 lists

Parameters:
  • L1 (list) – a 1-dim array (n,). Anyway, data will be flatten

  • L2 (list) – a 1-dim array (n,). Anyway, data will be flatten

What about he handle missing values properly !!
  • weight shit

  • Anyway, it would be good to know how missing values removal the distribution of L

Raises:

L1 and L2 have different size – lists must be of the same size

Returns:

L1 of shape(n,) 1-dim array: L2 of shape(n,)

Return type:

1-dim array

Examples

>>> y1 = np.array([4, 8,np.nan,2])
>>> y2 = np.array([6,np.nan,36,9])
>>> y1,y2 = clear_list_pair(y1, y2)
>>> print("y1: ",y1)
y1:  array([4, 2])
>>> print("y2: ",y2)
y2:  array([6. 9.])
statanalysis.utils_md.preprocessing.clear_mat_vec(A, y) Tuple[ndarray, ndarray]

Remove nan values (remove observation data containing nan value in X or y) from a matric and a corresponding vector

Parameters:
  • A (2-dimensional array (n,p)) –

  • y (1-dimensional array (n,)) –

  • Others

  • ----------

  • !! (What about he handle missing values properly) –

    • weight shit

    • Anyway, it would be good to know how missing values removal the distribution of L

Raises:

L1 and L2 have different size – lists must be of the same size:

Returns:

  • 1-dim array (L1 of shape(n,))

  • 1-dim array (L2 of shape(n,))

Examples

>>> A = np.array([
        [1,3],
        [4,3],
        [5,3],
        [7,np.nan]
        ])
>>> y = np.array([6,np.nan,3,2])
>>> A1,y1 = clear_mat_vec(A,y)
>>> print("A1: ",A1)
A1:  [[1. 3.]
     [5. 3.]]
>>> print("y: ",y1)
y:  [6. 3.]
class statanalysis.utils_md.refactoring.Confidence_data(parameter: float, sample_size: int, confidence: int, marginOfError: float, interval: tuple)

Bases: object

confidence: int
interval: tuple
marginOfError: float
parameter: float
sample_size: int
class statanalysis.utils_md.refactoring.HypothesisValidationData(testPassed: bool, obj: dict = None)

Bases: object

obj: dict = None
testPassed: bool
class statanalysis.utils_md.refactoring.Hypothesis_data(parameter: float, pnull: float, std_stat_eval: float, tail: str, sample_size: int, alpha: int, Z: float, p_value: float, reject_null: bool)

Bases: object

Z: float
alpha: int
p_value: float
parameter: float
pnull: float
reject_null: bool
sample_size: int
std_stat_eval: float
tail: str
class statanalysis.utils_md.refactoring.RegressionFisherTestData(DFE: float = None, SSE: float = None, MSE: float = None, DFR: float = None, SSR: float = None, MSR: float = None, DFT: float = None, SST: float = None, MST: float = None, R_carre: float = None, R_carre_adj: float = None, F_stat: float = None, p_value: float = None, reject_null: float = None)

Bases: object

DFE: float = None
DFR: float = None
DFT: float = None
F_stat: float = None
MSE: float = None
MSR: float = None
MST: float = None
R_carre: float = None
R_carre_adj: float = None
SSE: float = None
SSR: float = None
SST: float = None
p_value: float = None
reject_null: float = None
class statanalysis.utils_md.refactoring.Tails

Bases: object

INF_SYMB = 'p<p0'
NEQ_SYMB = 'p!=p0'
SUP_SYMB = 'p>p0'
get_tail_from_symb()
left = 'tail-left'
middle = 'tail-middle'
norm_tail()
right = 'tail-right'

Module contents