Untitled 1

State

We want to determine whether a machine-learning–based classifier (RandomForest) performs better than a rule-based detector when both models are applied to the same set of test URLs.

Because each URL is evaluated by both classifiers, the data are paired, and the outcome for each classifier is categorical (correct or incorrect). Therefore, the appropriate analysis is a chi-square test for paired categorical data.

Let
$p_{R M}$ = proportion of URLs where the Rule-based model is correct and the ML model is incorrect
$p_{M R}$ = proportion of URLs where the ML model is correct and the Rule-based model is incorrect

Plan

Overall Classification Error

Null Hypothesis
$H_{0} : p_{R M} = p_{M R}$
There is no difference in performance between the two classifiers.

Alternative Hypothesis
$H_{a} : p_{M R} > p_{R M}$
The ML-based classifier makes fewer errors than the rule-based classifier.

Conditions

Paired data: Each URL is classified by both models.
Categorical outcomes: Each classification is either correct or incorrect.
Large sample condition: The total number of discordant pairs satisfies
$b + c \geq 10$
From the output,
$b = 4750, c = 46002, b + c = 50752$
All conditions are satisfied.

Do

Using the chi-square test for paired categorical data, the test statistic is
$χ^{2} = \frac{( ∣ b - c ∣ - 1 ) ^{2}}{b + c}$
Substituting the observed values:
$χ^{2} = 33528.6294$
The corresponding p-value is
$p_{val} \approx 1 0^{- 7282}$

Conclude

Because the p-value is far smaller than the significance level $α = 0.05$ , we reject the null hypothesis.

There is extremely strong statistical evidence that the ML-based classifier has a lower overall error rate than the rule-based detector when applied to the same URLs.

Paired 2×2 Table #1: Overall Correctness (TEST)

Rows = Rule-based classifier, Columns = ML-based classifier

	ML Correct	ML Wrong	Row Total
Rule Correct	61,645	4,750	66,395
Rule Wrong	46,002	8,810	54,812
Column Total	107,647	13,560	121,207

Discordant cells (used in paired chi-square / McNemar test):

(b = 4{,}750) (Rule correct, ML wrong)
(c = 46{,}002) (Rule wrong, ML correct)

My Vault

Explorer