Principal Component Analysis (PCA) is a powerful tool when you have many variables and you want to look into things that these variables can explain. As the name of PCA suggests, PCA finds the combination of your variables which explains the phenomena. In this sense, PCA is useful when you want to reduce the number of the variables. One common scenario of PCA is that you have n variables and you want to combine them and make them 3 or 4 variables without losing much of the information that the original data have. More mathematically, PCA is trying to find some linear projections of your data which preserve the information your data have.
PCA is one of the methods you may want to try if you have lots of Likert data and try to understand what these data tell you. Let's say we asked the participants four 7-scale Likert questions about what they care about when choosing a new computer, and got the results like this.
Participant | Price | Software | Aesthetics | Brand |
---|---|---|---|---|
P1 | 6 | 5 | 3 | 4 |
P2 | 7 | 3 | 2 | 2 |
P3 | 6 | 4 | 4 | 5 |
P4 | 5 | 7 | 1 | 3 |
P5 | 7 | 7 | 5 | 5 |
P6 | 6 | 4 | 2 | 3 |
P7 | 5 | 7 | 2 | 1 |
P8 | 6 | 5 | 4 | 4 |
P9 | 3 | 5 | 6 | 7 |
P10 | 1 | 3 | 7 | 5 |
P11 | 2 | 6 | 6 | 7 |
P12 | 5 | 7 | 7 | 6 |
P13 | 2 | 4 | 5 | 6 |
P14 | 3 | 5 | 6 | 5 |
P15 | 1 | 6 | 5 | 5 |
P16 | 2 | 3 | 7 | 7 |
Now what you want to do is what combination of these four variables can explain the phenomena you observed. I will explain this with the example R code.
Let's prepare the same data shown in the table above.
At this point, data looks pretty much the same as the table above. Now, we do PCA. In R, there are two functions for PCA: prcomp() and princomp(). prcomp() uses a correlation coefficient matrix, and princomp() uses a variance covariance matrix. But it seems that the results become similar in many cases (which I haven't formally tested, so be careful), and the results gained from princomp() have nice features, so here I use princomp().
And here is the result of the PCA.
I will explain how to interpret this result in the next section.
Let's take a look at the table for loadings, which mean the coefficients for the “new” variables.
Comp.1 | Comp.2 | Comp.3 | Comp.4 | |
---|---|---|---|---|
Price | -0.523 | 0.848 | ||
Software | -0.177 | 0.977 | -0.120 | |
Aesthetics | 0.597 | 0.134 | 0.295 | -0.734 |
Brand | 0.583 | 0.167 | 0.423 | 0.674 |
From the second table (loadings), PCA found four new variables which can explain the same information as the original four variables (Price, Software, Aesthetics, and Brand), which are Comp.1 to Comp.4. And Comp.1 is calculated as follows:
;#; ## Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * Aesthetics + 0.583 * Brand ## ;#;
Thus, PCA successfully found a new combination of the variables, which is good. The next thing we want to know is how much each of new variables has a power to explain the information that the original data have. For this, you need to look at Standard deviation, and Cumulative Proportion (of Variance) in the result.
Comp.1 | Comp.2 | Comp.3 | Comp.4 | |
---|---|---|---|---|
Standard deviation | 1.56 | 0.98 | 0.68 | 0.38 |
Cumulative Proportion | 0.61 | 0.85 | 0.96 | 1.00 |
Standard deviation means the standard deviation of the new variables. PCA calculates the combination of the variables such that new variables have a large standard deviation. Thus, generally a larger standard deviation means a better variable. A heuristics is that we take all the new variables whose standard deviations are roughly over 1.0 (so, we will take Comp.1 and Comp.2).
Another way to determine how many new variables we want to take is to look at cumulative proportion of variance. This means how much of the information that the original data have can be described by the combination of the new variables. For instance, with only Comp.1, we can describe 61% of the information the original data have. If we use Comp.1 and Comp2, we can describe 85% of them. Generally, 80% is considered as the number of the percentage which describes the data well. So, in this example, we can take Comp.1 and Comp.2, and ignore Comp.3 and Comp.4.
In this manner, we can decrease the number of the variables (in this example, from 4 variables to 2 variables). Your next task is to understand what the new variable means in the context of your data. As we have seen, the first new variable can be calculated as follows:
;#; ## Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * Aesthetics + 0.583 * Brand ## ;#;
It is a very good idea to plot the data to see what this new variable means. You can use scores to take the values of each variable modeled by PCA.
With the graphs (sorry I was kinda lazy to upload the graph, but you can quickly generate it by yourself), you can see Participant 1 - 8 get negative values, and the other participants get positive values. It seems that this new variable indicates whether a user cares about Price and Software or Aesthetics and Brand for her computer. So, we probably can name this variable as “Feature/Fashion index” or something. There is no definitive answer for this part of PCA. You need to go through your data and make sense what the new variables mean by yourself.
Once you have done the analysis with PCA, you may want to look into whether the new variables can predict some phenomena well. This is kinda like machine learning: Whether features can classify the data well. Let's say you have asked the participants one more thing, which OS they are using (Windows or Mac) in your survey, and the results are like this.
Participant | Price | Software | Aesthetics | Brand | OS |
---|---|---|---|---|---|
P1 | 6 | 5 | 3 | 4 | 0 |
P2 | 7 | 3 | 2 | 2 | 0 |
P3 | 6 | 4 | 4 | 5 | 0 |
P4 | 5 | 7 | 1 | 3 | 0 |
P5 | 7 | 7 | 5 | 5 | 1 |
P6 | 6 | 4 | 2 | 3 | 0 |
P7 | 5 | 7 | 2 | 1 | 0 |
P8 | 6 | 5 | 4 | 4 | 0 |
P9 | 3 | 5 | 6 | 7 | 1 |
P10 | 1 | 3 | 7 | 5 | 1 |
P11 | 2 | 6 | 6 | 7 | 0 |
P12 | 5 | 7 | 7 | 6 | 1 |
P13 | 2 | 4 | 5 | 6 | 1 |
P14 | 3 | 5 | 6 | 5 | 1 |
P15 | 1 | 6 | 5 | 5 | 1 |
P16 | 2 | 3 | 7 | 7 | 1 |
Here what we are going to do is to see whether the new variables given by PCA can predict the OS people are using. OS is 0 or 1 in our case, which means the dependent variable is binomial. Thus, we are going to do logistic regression. I will skip the details of logistic regression here. If you are interested, the details of logistic regression are available in a separate page.
First, we prepare the data about OS.
Then, fit the first variable we found through PCA (i.e.. Comp.1) to a logistic function.
Now you get the logistic function model.
Let's see how well this model predicts the kind of OS. You can use fitted() function to see the prediction.
These values represent the probabilities of being 1. For example, we can expect 15% chance that Participant 1 is using OS 1 based on the variable derived by PCA. Thus, in this case, Participant 1 is more likely to be using OS 0, which agrees with the survey response. In this way, PCA can be used with regression models for calculating the probability of a phenomenon or making a prediction.
A similar concept and name to PCA you may have heard of is Factor Analysis. I explain the difference between PCA and factor analysis in the factor analysis page.