The results of such an experiment can be summarized as a table in which every cell contains the number of times a specific combination of characteristics is observed (e.g., stressed new words at the start of a sentence). The question to be answered is whether or not the observed frequencies deviate from a known distribution or, alternatively, whether subsets from the table (rows, columns) have identical frequency distributions.
In principle, it is possible to determine all possible frequency distributions and to calculate a level of significance under the null hypothesis (i.e., H0). However, this is not practical. The number of possible frequency distributions can become quite large. A more efficient approach is to use the fact that the individual observations follow a Binomial or multinomial distribution which can be approximated with a Normal distribution. Using the fact that the variance estimated from the observed numbers follows a Chi-square distribution, it is possible to get a very robust and versatile family of tests: The Chi-square tests.
N = N1 + N2
We want to know how likely this result would be under the null hypothesis that the probability of observing case 1 is p and of case 2 is 1 - p. We start with the known facts of the binomial distribution:
E(N1) = N * p, E(N2) = N * (1 - p), Var(N1) = Var(N2) = N * p * (1 - p)
Define X1 and X2 as:
X1 = (N1 - N * p) / SQRT( N * p * (1 - p) )
X2 = (N2 - N * (1 - p) )/ SQRT( N * p * (1 - p) )
X1 = - X2
(i.e., (N2 - N * (1 - p) ) = (N - N1 - N + N * p) = -(N1 - N * p) )
For large values of N1 and N2 the values of X1 and X2 will follow a Standard Normal distribution. The summed variance of the observed frequencies around their expected values would be the sum of X1^2 + X2^2 if N1 and N2 were independent. However, they are not independent. Therefore, the summed variance is less, it is half this sum, or better a weighted sum (X^2 is called Chi-square):
X^2 = (1-p) * X1^2 + p * X2^2
The choice of the weighting will not be motivated here, but it will
prove very convenient.
Now if we write out in full this equation, and write it in terms of
the expected values, we get:
X^2 = (1-p) * (N1 - N * p)^2 / (N * p * (1 - p)) +
p * (N2 - N * (1 - p) )^2 / (N * p * (1 - p))
becomes
X^2 = (N1 - N * p)^2 / (N * p) +
(N2 - N * (1 - p) )^2 / (N * (1 - p))
and finally:
X^2 = (N1 - E(N1) )^2 / E(N1) +
(N2 - E(N2) )^2 / E(N2)
For large values of E(N1) and E(N2), the values of X^2 follow a Chi-square distribution with 1 degree of freedom. This degree of freedom is the sum of the weighting factors used to calculate X^2, i.e., p + (1-p) = 1. It takes into account that there is only one value that can be chosen freely, i.e., either N1 or N2. The other value is then fixed by the fact that N = N1 + N2. H0 can be tested using standard tables for the Chi-square distribution.
The last formulation of X^2 is very convenient. There is no explicit p value to choose, the expected values generally follow from H0 directly. But more important, this formulation can be used unaltered for more complex cases.
Sum i=1,k (pi) = 1
Sum i=1,k (Ni) = N
For each category we find:
E(Ni) = N * pi, Var(Ni) = N * pi * (1 - pi)
Define:
Xi = (Ni - N * pi) / SQRT( N * pi * (1 - pi) )
All Xi can be approximated with a Standard Normal distribution. The summed variance can be calculated as:
X^2 = Sum i=1,k ( (1-pi) * Xi^2 )
These weighting factors ensure that the variance will sum up correctly. Again, X^2 will follow a Chi-square distribution but now with k-1 degrees of freedom (the sum of all (1-pi) factors). This also works when rewritten to E(Ni):
X^2 = Sum i=1,k ( (Ni - E(Ni))^2 / E(Ni) )
Note: the correct derivation of these formulas is based on (the inverse of) the covariance matrix, no weighting factors are involved. The "derivation" given here is used to give some feeling in this matter.
The procedure to calculate X^2 is the same as before: Sum all factors ( (Nij - Eij)^2 / Eij ), i.e.,
X^2 = Sum i,j ( (Nij - Eij)^2 / Eij )
The degrees of freedom becomes more difficult to determine. The degrees of freedom is the number of table cell values that can be freely chosen, keeping all the row and column totals fixed. In most cases, this is:
Degrees of Freedom = (Number of Rows - 1) * (Number of Columns - 1)
For a 2*2 table, the degrees of freedom would be 1.
Even with the continuity correction, most text-books advise to use the Chi-square approximation only when each expected value is larger than 5, i.e., all Eij > 5.
To assess the accuracy we compare the exact results of a Sign test with the results of a Chi-square approximation of this test. Below we have tabulated the significance levels side-by-side (with and without a continuity correction).
DoF = 1, E(N+) = E(N-) = N/2 Sign Test | Chi-square | With Continuity Correction N+ N- N p<= | E X^2 p<= | X^2 p<= -----------------+--------------------+------------ 6 0 6 0.031 | 3 6 0.014 | 4.17 0.041 7 0 7 0.016 | 3.5 7 0.008 | 5.14 0.023 8 1 9 0.039 | 4.5 5.44 0.020 | 4 0.046 9 1 10 0.022 | 5 6.4 0.011 | 4.9 0.027 10 2 12 0.039 | 6 5.33 0.021 | 4.08 0.043 11 2 13 0.023 | 6.5 6.43 0.011 | 4.92 0.027 11 3 14 0.057 | 7 4.57 0.033 | 3.5 0.061 12 3 15 0.035 | 7.5 5.4 0.020 | 4.27 0.039 13 4 17 0.049 | 8.5 4.76 0.029 | 3.76 0.052 14 4 18 0.031 | 9 5.56 0.018 | 4.5 0.034 15 5 20 0.041 | 10 5 0.025 | 4.05 0.044It is evident that the continuity correction is indispensible for small numbers of observations or Degrees of Freedom. In this example, the difference between the results of the exact Sign test and the approximation with the Chi-square test (with continuity correction) becomes smaller than 0.005 when the expected number of observations becomes larger than 5. Especially the fact that the Chi-square test is more conservative than the exact test (i.e., p calculated with the Chi-square test is always larger than calculated with the Sign test) makes this a "save" approximation (you err on the side of caution).