OBJECTIVE
Verify
whether the difference between the observed frequencies and the expected frequencies
is significant or not.
DESCRIPTION
A
chi-square test is used to test the frequencies of independent observations
(not suitable for paired testing) with two main purposes:
- Test of independence: to determine the association between two categorical variables, for example if the kind of civil status does not affect the kind of service that customers buy;
- Test
of goodness of fit: to determine the difference between the observed values and
the expected values (for example whether a sample taken from a population
follows the expected population distribution or a theoretical distribution).
In either
case the method used is the same; specifically, we apply a chi-square test
using the observed values and the expected values. When using a chi-square test,
it is important to bear in mind that this test is sensitive to the sample size
(with fewer than 50 this test is not appropriate) and needs to have a minimum
frequency in each bin or class (at least 5). If these conditions are not met, we
should consider using Fisher’s exact test.
TEST OF INDEPENDENCE
In the example we are testing the
independence of the variables “civil status” and “service level” chosen by
customers.
Observed Values vs Expected Values in a Chi-Square Test
For this
test we assume that the probability of having a specific service level and the
probability of being married, single, or divorced are independent events. With
these assumptions, we compare the actual distribution of the products sold in
each country and the expected distribution based on the independent probabilities
of the two events (see the template). In other words, the test compares the expected
frequencies with the actual frequencies. The Excel formula “=CHITEST” is then
applied, and if the resulting p-value is smaller than 0.05 (or a different
alpha), we reject the null hypothesis (the two variables are independent) and
we can infer that the two variables are associated. In other words, we can say
that civil status affects the level of service purchased.
GOODNESS OF FIT
Goodness of fit can be calculated in
Excel using the same formula (=CHITEST), which is applied to a column (or row)
with observed values and a column with expected values. For example, if we throw
a die, we expect that each number has the same probability of appearing (1/6)
but suspect the die to have been loaded. In our example we throw the die 60
times and expect to produce each number 10 times. If the p-value is lower than
0.05, we reject the null hypothesis, which is that the variables are
independent (as in the chi-square independence test). Since the p-value is
larger than 0.05, we conclude that the die is not loaded.
Data and Results of a Goodness of-Fit Chi-Square Test
We can also
compare our observed values with a theoretical distribution.
TEST OF PROPORTIONS
The chi-square test can also be used
instead of the z-test in the test of proportions (see 44. TEST OF PROPORTIONS) when the assumptions for using the
parametric test are not met. In this case we will compare the observed
proportion with the expected proportion with a double-entry contingency table
(the same table used in the independence test but with only two categories per
row and two categories per column).
TEMPLATE
Download the Chi-square Excel Template
No comments:
Post a Comment