Homogeneity

1. State (Hypothesis Formulation)

  • Null Hypothesis (): The distributions of the categorical variable are the same across all populations/groups.
  • Alternative Hypothesis (): The distributions of the categorical variable are not the same across all populations/groups (i.e., at least two groups differ).
  • Mathematical Representation:
    • :
    • : At least two of the probabilities are different.

2. Check Conditions

To ensure the validity of the Chi-Square test for homogeneity, we must satisfy these conditions:

  1. Randomness: The data must come from independent random samples or a properly randomized experiment.
  2. Large Sample Size: Each expected count should be at least 5 in all categories.
  3. Independence:
    • Each observation must be independent.
    • If sampling without replacement, the total population must be at least 10 times the sample size (10% condition).

3. Calculation

The Chi-Square statistic is calculated using:

Where:

  • is the number of rows (groups),
  • is the number of columns (categories),
  • Observed represents the actual count in row and column ,
  • Expected is calculated as:

Expectedij=(Row Total)i×(Column Total)jGrand Total\text{Expected}_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}}

  • Degrees of Freedom (df):

df=(Number of Rows−1)×(Number of Columns−1)df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1)

  • P-value Calculation:
    Given and , the p-value is computed as:

where:

  • is the Lower incomplete gamma function:
  • is the Gamma function:
  • The Regularized Incomplete Gamma Function:

which is also known as and implemented in scipy.special.gammainc().

Using scipy.special.gammainc for p-value computation:
import scipy.special as sp
 
def chi2_p_value(chi2_val, df):
    p_val = 1 - sp.gammainc(df / 2, chi2_val / 2)
    return p_val
 
# Example usage
chi2_val = 15.2  # Chi-Square statistic
df = 6           # Degrees of freedom
p_value = chi2_p_value(chi2_val, df)
print(f"P-value: {p_value:.5f}")

4. Decision Rule

  • If (common choices: ):
    • Fail to reject → The distributions across groups are not significantly different.
  • If :
    • Reject → At least two groups have significantly different distributions.

Additional Step:
If we reject , we can analyze which categories contribute most to the differences by examining:


5. Conclusion

Based on the test results:

  • If is not rejected, we conclude that there is no significant difference in distributions across the groups.
  • If is rejected, we conclude that there is sufficient statistical evidence to suggest that at least two groups have different distributions.

Final conclusion should be written in context:

We have sufficient statistical evidence that [in context].\text{We have sufficient statistical evidence that [in context].}

Independence