80 Fundamental Models for Business Analysts: 48. CHI-SQUARE

OBJECTIVE

Verify whether the difference between the observed frequencies and the expected frequencies is significant or not.

DESCRIPTION

A chi-square test is used to test the frequencies of independent observations (not suitable for paired testing) with two main purposes:

Test of independence: to determine the association between two categorical variables, for example if the kind of civil status does not affect the kind of service that customers buy;

Test of goodness of fit: to determine the difference between the observed values and the expected values (for example whether a sample taken from a population follows the expected population distribution or a theoretical distribution).

In either case the method used is the same; specifically, we apply a chi-square test using the observed values and the expected values. When using a chi-square test, it is important to bear in mind that this test is sensitive to the sample size (with fewer than 50 this test is not appropriate) and needs to have a minimum frequency in each bin or class (at least 5). If these conditions are not met, we should consider using Fisher’s exact test.

TEST OF INDEPENDENCE

In the example we are testing the independence of the variables “civil status” and “service level” chosen by customers.

Observed Values vs Expected Values in a Chi-Square Test

For this test we assume that the probability of having a specific service level and the probability of being married, single, or divorced are independent events. With these assumptions, we compare the actual distribution of the products sold in each country and the expected distribution based on the independent probabilities of the two events (see the template). In other words, the test compares the expected frequencies with the actual frequencies. The Excel formula “=CHITEST” is then applied, and if the resulting p-value is smaller than 0.05 (or a different alpha), we reject the null hypothesis (the two variables are independent) and we can infer that the two variables are associated. In other words, we can say that civil status affects the level of service purchased.

GOODNESS OF FIT

Goodness of fit can be calculated in Excel using the same formula (=CHITEST), which is applied to a column (or row) with observed values and a column with expected values. For example, if we throw a die, we expect that each number has the same probability of appearing (1/6) but suspect the die to have been loaded. In our example we throw the die 60 times and expect to produce each number 10 times. If the p-value is lower than 0.05, we reject the null hypothesis, which is that the variables are independent (as in the chi-square independence test). Since the p-value is larger than 0.05, we conclude that the die is not loaded.

Data and Results of a Goodness of-Fit Chi-Square Test

We can also compare our observed values with a theoretical distribution.

TEST OF PROPORTIONS

The chi-square test can also be used instead of the z-test in the test of proportions (see 44. TEST OF PROPORTIONS) when the assumptions for using the parametric test are not met. In this case we will compare the observed proportion with the expected proportion with a double-entry contingency table (the same table used in the independence test but with only two categories per row and two categories per column).

TEMPLATE

Download the Chi-square Excel Template

80 Fundamental Models for Business Analysts

Pages

Tuesday, February 11, 2020

48. CHI-SQUARE

No comments:

Post a Comment