Chi Square test for homogeneity and Independence

Homogeneity

1. State (Hypothesis Formulation)

Null Hypothesis ( $H_{0}$ ): The distributions of the categorical variable are the same across all populations/groups.
Alternative Hypothesis ( $H_{a}$ ): The distributions of the categorical variable are not the same across all populations/groups (i.e., at least two groups differ).
Mathematical Representation:
- $H_{0}$ : $P_{1, 1} = P_{1, 2} = ... = P_{1, n}$
- $H_{a}$ : At least two of the probabilities are different.

2. Check Conditions

To ensure the validity of the Chi-Square test for homogeneity, we must satisfy these conditions:

Randomness: The data must come from independent random samples or a properly randomized experiment.
Large Sample Size: Each expected count should be at least 5 in all categories.
Independence:
- Each observation must be independent.
- If sampling without replacement, the total population must be at least 10 times the sample size (10% condition).

3. Calculation

The Chi-Square statistic is calculated using:

χ^{2} = i = 1 \sum r j = 1 \sum c \frac{( Observed _{ij} - Expected _{ij} ) ^{2}}{Expected _{ij}}

Where:

$r$ is the number of rows (groups),
$c$ is the number of columns (categories),
Observed $_{ij}$ represents the actual count in row $i$ and column $j$ ,
Expected $_{ij}$ is calculated as:

Expectedij=(Row Total)i×(Column Total)jGrand Total\text{Expected}_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}}

Degrees of Freedom (df):

df=(Number of Rows−1)×(Number of Columns−1)df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1)

P-value Calculation:
Given $χ^{2}$ and $df$ , the p-value is computed as:

p_{v a l} = P (χ^{2} \geq X) = 1 - \frac{γ ( \frac{df}{2} , \frac{χ ^{2}}{2} )}{Γ ( \frac{df}{2} )}

where:

$γ (s, x)$ is the Lower incomplete gamma function:

γ (s, x) = \int 0 x t s - 1 e - t d t γ (s, x) = \int_{0}^{x} t^{s - 1} e^{- t} d t

$Γ (s)$ is the Gamma function:

Γ (s) = \int 0\infty t s - 1 e - t d t Γ (s) = \int_{0}^{\infty} t^{s - 1} e^{- t} d t

The Regularized Incomplete Gamma Function:

γ (s, x) Γ (s) \frac{γ ( s , x )}{Γ ( s )}

which is also known as $P (s, x)$ and implemented in scipy.special.gammainc().

Using `scipy.special.gammainc` for p-value computation:

import scipy.special as sp
 
def chi2_p_value(chi2_val, df):
    p_val = 1 - sp.gammainc(df / 2, chi2_val / 2)
    return p_val
 
# Example usage
chi2_val = 15.2  # Chi-Square statistic
df = 6           # Degrees of freedom
p_value = chi2_p_value(chi2_val, df)
print(f"P-value: {p_value:.5f}")

4. Decision Rule

If $p_{v a l} \geq α$ (common choices: $α = 0.05, 0.01$ ):
- Fail to reject $H_{0}$ → The distributions across groups are not significantly different.
If $p_{v a l} \leq α$ :
- Reject $H_{0}$ → At least two groups have significantly different distributions.

Additional Step:
If we reject $H_{0}$ , we can analyze which categories contribute most to the differences by examining:

(O b ser v e d ij - E x p ec t e d ij) 2 E x p ec t e d ij \frac{( Observed _{ij} - Expected _{ij} ) ^{2}}{Expected _{ij}}

5. Conclusion

Based on the test results:

If $H_{0}$ is not rejected, we conclude that there is no significant difference in distributions across the groups.
If $H_{0}$ is rejected, we conclude that there is sufficient statistical evidence to suggest that at least two groups have different distributions.

Final conclusion should be written in context:

We have sufficient statistical evidence that [in context].\text{We have sufficient statistical evidence that [in context].}

My Vault

Explorer

Chi Square test for homogeneity and Independence

Homogeneity

1. State (Hypothesis Formulation)

2. Check Conditions

3. Calculation

Using `scipy.special.gammainc` for p-value computation:

4. Decision Rule

5. Conclusion

Independence

Graph View

Table of Contents

Backlinks

My Vault

Explorer

Chi Square test for homogeneity and Independence

Homogeneity

1. State (Hypothesis Formulation)

2. Check Conditions

3. Calculation

Using scipy.special.gammainc for p-value computation:

4. Decision Rule

5. Conclusion

Independence

Graph View

Table of Contents

Backlinks

Using `scipy.special.gammainc` for p-value computation: