statanalysis.utils_md package
Submodules
- statanalysis.utils_md.compute_ppf_and_p_value.get_f_value(cf, ddl)
- statanalysis.utils_md.compute_ppf_and_p_value.get_p_value(Z: float, tail: str, test: str, ddl: int | None = None, debug=False)
- get p value based on
(if test==”t_test”) student distribution T(df=ddl) with ddl degres of freedom
(if test==”z_test”) normal distribution N(0, 1)
(if test==”f_test”) normal distribution F(ddl[0], ddl[1])
- if tail
- statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_f_test(Z: float, dfn: int, dfd: int, debug: bool = False)
get p value based on fisher distribution T(dfn, dfd) with ddl degres of freedom
- Utils
[F-distribution - wiki](https://en.wikipedia.org/wiki/F-distribution)
tail is right because Fisher is positive
- Parameters:
Z (float) – _description_
dfn (int) – _description_
dfd (int) – _description_
debug (bool, optional) – _description_. Defaults to False.
- Raises:
Exception – _description_
- Returns:
_description_
- Return type:
_type_
- statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_from_tail(prob, tail, debug=False)
get p value based on cdf and tail If tail=Tails.middle, the distribution is assumed symmetric because we double F(Z) if tail
- statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_t_test(Z: float, ddl, tail: str, debug: bool = False)
get p value based on student distribution T(df=ddl) with ddl degres of freedom if tail
- statanalysis.utils_md.compute_ppf_and_p_value.get_p_value_z_test(Z: float, tail: str, debug=False)
get p value based on normal distribution N(0, 1) if tail
- statanalysis.utils_md.compute_ppf_and_p_value.get_t_value(cf, ddl)
- statanalysis.utils_md.compute_ppf_and_p_value.get_z_value(cf)
- statanalysis.utils_md.constraints.check_equal_var(*samples, alpha=0.05)
_summary_
- Parameters:
alpha (_type_, optional) – _description_. Defaults to COMMON_ALPHA_FOR_HYPH_TEST.
Utils - use levene test [plus robuste que fisher ou bartlett face à la non-normalité de la donnée](https://fr.wikipedia.org/wiki/Test_de_Bartlett)
- Returns:
_description_
- Return type:
_type_
- statanalysis.utils_md.constraints.check_hyp_min_sample(n: int, p: int | None = None)
- statanalysis.utils_md.constraints.check_hyp_min_samples(p1: float, p2: float, n1: int, n2: int, overall=False)
- statanalysis.utils_md.constraints.check_or_get_alpha_for_hyph_test(alpha=None)
- statanalysis.utils_md.constraints.check_or_get_cf_for_conf_inte(confidence=None)
- statanalysis.utils_md.constraints.check_sample_normality(residuals: list, debug=False, alpha=None)
check if residuals is like a normal distribution - test_implemented
- Parameters:
residuals (list) – list of float or array-like (will be flatten)
debug (bool, optional) – _description_. Defaults to False.
- Returns:
if all tests passed
- Return type:
bool
- statanalysis.utils_md.constraints.check_zero_to_one_constraint(*args)
- statanalysis.utils_md.constraints.print(*args)
- statanalysis.utils_md.estimate_std.compute_slope_std(X, y, y_hat, debug=False, skipcst=True)
- statanalysis.utils_md.estimate_std.estimate_std(sample)
Instead of std, he divide by (n-1) correspondng to the std estimator used in t-test
- statanalysis.utils_md.preprocessing.clear_list(L: list) ndarray
remove nan from a list
- Parameters:
L (list) – a 1-dim array (n,). Anyway, data will be flatten
- What about he handle missing values properly !!
weight shit
Anyway, it would be good to know how missing values removal the distribution of L
- Returns:
array of shape (n,)
- Return type:
1-dim array
Examples
>>> A = np.array([ [1,3], [4,3], [5,3], [7,np.nan] ]) >>> y = np.array([6,np.nan,3,2]) >>> A1 = clear_list(A) >>> y1 = clear_list(y) >>> print("A1: ",A1) A1: array([1, 3, 4, 3, 5, 3]) >>> print("y: ",y1) y: array([6. 3. 2.])
- statanalysis.utils_md.preprocessing.clear_list_pair(L1, L2) Tuple[ndarray, ndarray]
remove nan values (remove observation data containing nan value in L1 or L2) from 2 lists
- Parameters:
L1 (list) – a 1-dim array (n,). Anyway, data will be flatten
L2 (list) – a 1-dim array (n,). Anyway, data will be flatten
- What about he handle missing values properly !!
weight shit
Anyway, it would be good to know how missing values removal the distribution of L
- Raises:
L1 and L2 have different size – lists must be of the same size
- Returns:
L1 of shape(n,) 1-dim array: L2 of shape(n,)
- Return type:
1-dim array
Examples
>>> y1 = np.array([4, 8,np.nan,2]) >>> y2 = np.array([6,np.nan,36,9]) >>> y1,y2 = clear_list_pair(y1, y2) >>> print("y1: ",y1) y1: array([4, 2]) >>> print("y2: ",y2) y2: array([6. 9.])
- statanalysis.utils_md.preprocessing.clear_mat_vec(A, y) Tuple[ndarray, ndarray]
Remove nan values (remove observation data containing nan value in X or y) from a matric and a corresponding vector
- Parameters:
A (2-dimensional array (n,p)) –
y (1-dimensional array (n,)) –
Others –
---------- –
!! (What about he handle missing values properly) –
weight shit
Anyway, it would be good to know how missing values removal the distribution of L
- Raises:
L1 and L2 have different size – lists must be of the same size:
- Returns:
1-dim array (L1 of shape(n,))
1-dim array (L2 of shape(n,))
Examples
>>> A = np.array([ [1,3], [4,3], [5,3], [7,np.nan] ]) >>> y = np.array([6,np.nan,3,2]) >>> A1,y1 = clear_mat_vec(A,y) >>> print("A1: ",A1) A1: [[1. 3.] [5. 3.]] >>> print("y: ",y1) y: [6. 3.]
- class statanalysis.utils_md.refactoring.Confidence_data(parameter: float, sample_size: int, confidence: int, marginOfError: float, interval: tuple)
Bases:
object- confidence: int
- interval: tuple
- marginOfError: float
- parameter: float
- sample_size: int
- class statanalysis.utils_md.refactoring.HypothesisValidationData(testPassed: bool, obj: dict = None)
Bases:
object- obj: dict = None
- testPassed: bool
- class statanalysis.utils_md.refactoring.Hypothesis_data(parameter: float, pnull: float, std_stat_eval: float, tail: str, sample_size: int, alpha: int, Z: float, p_value: float, reject_null: bool)
Bases:
object- Z: float
- alpha: int
- p_value: float
- parameter: float
- pnull: float
- reject_null: bool
- sample_size: int
- std_stat_eval: float
- tail: str
- class statanalysis.utils_md.refactoring.RegressionFisherTestData(DFE: float = None, SSE: float = None, MSE: float = None, DFR: float = None, SSR: float = None, MSR: float = None, DFT: float = None, SST: float = None, MST: float = None, R_carre: float = None, R_carre_adj: float = None, F_stat: float = None, p_value: float = None, reject_null: float = None)
Bases:
object- DFE: float = None
- DFR: float = None
- DFT: float = None
- F_stat: float = None
- MSE: float = None
- MSR: float = None
- MST: float = None
- R_carre: float = None
- R_carre_adj: float = None
- SSE: float = None
- SSR: float = None
- SST: float = None
- p_value: float = None
- reject_null: float = None