OBJECTIVE
Analyze the
interrelations among several variables and explain them with a reduced number
of variables.
DESCRIPTION
A principal component
analysis (PCA)
analyzes the interrelations among a large number of variables to find a small
number of variables (components) that explain the variance of the original
variables. This method is usually performed as the first step in a series of
analyses; for example, it can be used when there are too many predictor
variables compared with the number of observations or to avoid
multicollinearity.
Suppose that a company
is obtaining responses about many characteristics of a product, say a new
shampoo: color, smell, cleanliness, and shine. After a PCA it finds out that
the four original variables can be reduced to two components[1]:
- - Component “quality”: color and smell;
- - Component “effect on hair”: cleanliness and shine.
Even though it is
possible to run a PCA in Excel with complex calculations or special
complements,[2] I
suggest using a proper statistical tool. Here I will only explain some
guidelines when performing a PCA.
First of all, the
analysis starts with a covariance or correlation matrix. I suggest using a
correlation matrix, since we cannot use a covariance matrix if the variables have
different scales or the variances are too different. Then, eigenvectors (the direction
of the variance) and eigenvalues (the degree of variance in a certain
direction) are calculated. Now we have a number of components that is equal to
the number of variables, each one with a specific eigenvalue.
The more variance
(eigenvalue) that a component explains, the more important it is. There are
several approaches that we can use to choose the number of components to
retain:
-
Defining a
threshold before the analysis:
- choose all the components with a certain eigenvalue (usually > 1);
- choose a priori a specific number of components (then verify the total variance explained and other validity tests);
- choose the first x components that explain at least X% of the variance, for example 80% if using the results for descriptive purposes or higher if the results will be used in other statistical analysis (Figure 74);
-
Use a scree
plot (Figure 75) and “cut” the line at the main inflexion
point or at one of the main inflexion points where there is an acceptable total
variance explained (for example, in Figure
75 the first four components can be chosen, since
there is an important inflexion point, but they just explain 60% of the
variance).
Figure 75:
Scree Plot
The next step is to
analyze the principal components’ correlation coefficients in a matrix with
variables and components. Ideally we want one variable to have a high
correlation with one component to define each component conceptually (smell and
color = component “quality”). However, even if we cannot explain the resulting
components conceptually, we have to bear in mind that the main objective of a
PCA is to reduce a large number of variables to a manageable number of
components, while interpreting the component is not strictly necessary. In
chapter 64.
EXPLORATORY FACTOR ANALYSIS, PCA analysis will be used as the method for a
factor analysis, and I will introduce optimization methods, factor scoring, and
validity tests.
TEMPLATE
[1] In spite of this example, PCA is usually
performed when we have a larger number of variables.
No comments:
Post a Comment