User Tools

Site Tools


Principal Component Analysis (PCA)


Principal Component Analysis (PCA) is a powerful tool when you have many variables and you want to look into things that these variables can explain. As the name of PCA suggests, PCA finds the combination of your variables which explains the phenomena. In this sense, PCA is useful when you want to reduce the number of the variables. One common scenario of PCA is that you have n variables and you want to combine them and make them 3 or 4 variables without losing much of the information that the original data have. More mathematically, PCA is trying to find some linear projections of your data which preserve the information your data have.

PCA is one of the methods you may want to try if you have lots of Likert data and try to understand what these data tell you. Let's say we asked the participants four 7-scale Likert questions about what they care about when choosing a new computer, and got the results like this.

  • Price: A new computer is cheap to you (1: strongly disagree – 7: strongly agree),
  • Software: The OS on a new computer allows you to use software you want to use (1: strongly disagree – 7: strongly agree),
  • Aesthetics: The appearance of a new computer is appealing to you (1: strongly disagree – 7: strongly agree),
  • Brand: The brand of the OS on a new computer is appealing to you (1: strongly disagree – 7: strongly agree)

Now what you want to do is what combination of these four variables can explain the phenomena you observed. I will explain this with the example R code.

R code example

Let's prepare the same data shown in the table above.

Price <- c(6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2) Software <- c(5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3) Aesthetics <- c(3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7) Brand <- c(4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7) data <- data.frame(Price, Software, Aesthetics, Brand)

At this point, data looks pretty much the same as the table above. Now, we do PCA. In R, there are two functions for PCA: prcomp() and princomp(). prcomp() uses a correlation coefficient matrix, and princomp() uses a variance covariance matrix. But it seems that the results become similar in many cases (which I haven't formally tested, so be careful), and the results gained from princomp() have nice features, so here I use princomp().

pca <- princomp(data, cor=T) summary(pca, loadings=T)

And here is the result of the PCA.

Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.5589391 0.9804092 0.6816673 0.37925777 Proportion of Variance 0.6075727 0.2403006 0.1161676 0.03595911 Cumulative Proportion 0.6075727 0.8478733 0.9640409 1.00000000 Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Price -0.523 0.848 Software -0.177 0.977 -0.120 Aesthetics 0.597 0.134 0.295 -0.734 Brand 0.583 0.167 0.423 0.674

I will explain how to interpret this result in the next section.

Interpretation of the results of PCA

Let's take a look at the table for loadings, which mean the coefficients for the “new” variables.

Price-0.523 0.848

From the second table (loadings), PCA found four new variables which can explain the same information as the original four variables (Price, Software, Aesthetics, and Brand), which are Comp.1 to Comp.4. And Comp.1 is calculated as follows:

Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * Aesthetics + 0.583 * Brand

Thus, PCA successfully found a new combination of the variables, which is good. The next thing we want to know is how much each of new variables has a power to explain the information that the original data have. For this, you need to look at Standard deviation, and Cumulative Proportion (of Variance) in the result.

Standard deviation1.560.980.680.38
Cumulative Proportion0.610.850.961.00

Standard deviation means the standard deviation of the new variables. PCA calculates the combination of the variables such that new variables have a large standard deviation. Thus, generally a larger standard deviation means a better variable. A heuristics is that we take all the new variables whose standard deviations are roughly over 1.0 (so, we will take Comp.1 and Comp.2).

Another way to determine how many new variables we want to take is to look at cumulative proportion of variance. This means how much of the information that the original data have can be described by the combination of the new variables. For instance, with only Comp.1, we can describe 61% of the information the original data have. If we use Comp.1 and Comp2, we can describe 85% of them. Generally, 80% is considered as the number of the percentage which describes the data well. So, in this example, we can take Comp.1 and Comp.2, and ignore Comp.3 and Comp.4.

In this manner, we can decrease the number of the variables (in this example, from 4 variables to 2 variables). Your next task is to understand what the new variable means in the context of your data. As we have seen, the first new variable can be calculated as follows:

Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * Aesthetics + 0.583 * Brand

It is a very good idea to plot the data to see what this new variable means. You can use scores to take the values of each variable modeled by PCA.

plot(pca$scores[,1]) barplot(pca$scores[,1])

With the graphs (sorry I was kinda lazy to upload the graph, but you can quickly generate it by yourself), you can see Participant 1 - 8 get negative values, and the other participants get positive values. It seems that this new variable indicates whether a user cares about Price and Software or Aesthetics and Brand for her computer. So, we probably can name this variable as “Feature/Fashion index” or something. There is no definitive answer for this part of PCA. You need to go through your data and make sense what the new variables mean by yourself.

PCA and Logistic regression

Once you have done the analysis with PCA, you may want to look into whether the new variables can predict some phenomena well. This is kinda like machine learning: Whether features can classify the data well. Let's say you have asked the participants one more thing, which OS they are using (Windows or Mac) in your survey, and the results are like this.


Here what we are going to do is to see whether the new variables given by PCA can predict the OS people are using. OS is 0 or 1 in our case, which means the dependent variable is binomial. Thus, we are going to do logistic regression. I will skip the details of logistic regression here. If you are interested, the details of logistic regression are available in a separate page.

First, we prepare the data about OS.

OS <- c(0,0,0,0,1,0,0,0,1,1,0,1,1,1,1,1)

Then, fit the first variable we found through PCA (i.e.. Comp.1) to a logistic function.

model <- glm(OS ~ pca$scores[,1], family=binomial) summary(model)

Now you get the logistic function model.

Call: glm(formula = OS ~ pca$scores[, 1], family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.19746 -0.44586 0.01932 0.60018 1.65268 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.08371 0.74216 -0.113 0.9102 pca$scores[, 1] 1.42973 0.62129 2.301 0.0214 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 22.181 on 15 degrees of freedom Residual deviance: 12.033 on 14 degrees of freedom AIC: 16.033 Number of Fisher Scoring iterations: 5

Let's see how well this model predicts the kind of OS. You can use fitted() function to see the prediction.

fitted(model) 1 2 3 4 5 6 7 0.15173723 0.04159449 0.34968733 0.04406133 0.25520745 0.07808633 0.02649166 8 9 10 11 12 13 14 0.21744454 0.89433079 0.93612411 0.91057994 0.73428648 0.85190931 0.76285170 15 16 0.78149889 0.96410841

These values represent the probabilities of being 1. For example, we can expect 15% chance that Participant 1 is using OS 1 based on the variable derived by PCA. Thus, in this case, Participant 1 is more likely to be using OS 0, which agrees with the survey response. In this way, PCA can be used with regression models for calculating the probability of a phenomenon or making a prediction.

Difference between PCA and Factor Analysis

A similar concept and name to PCA you may have heard of is Factor Analysis. I explain the difference between PCA and factor analysis in the factor analysis page.


Guest, 2015/01/06 08:04
Please, what software can i use for this analysis. Can i use SPSS or Excel
Annyanka, 2015/02/25 00:26
Thank you for your post. It was very helpful! I was wondering. If one uses prcomp(). Would your (pca$scores[,1]) be equivalent to pca$x[,1] for prcomp()?
Jorge G Casanovas PhD, 2015/03/17 08:19
Excellent, just solved my question on GLM with PCA factors as independent variables on R. Thanks and congratulations for the Blog.
Scott, 2015/07/13 18:31
Could you please upload the image for plot(pca$scores[,1]) and barplot(pca$scores[,1])?

Some description of the images would be great, too! Thanks!
Nelson, 2016/01/20 08:50
This is an excellent tutorial of PCA,
Would you explain a bit detail how to apply variable 1 and 2. For my understanding,

Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * Aesthetics + 0.583 * Brand
Comp.2 = 0 * Price + 0.977 * Software + 0.134 * Aesthetics + 0.167 * Brand

I understand how to apply the components individually, but I cannot figure out how to apply both (comp.1 & comp.2)
Guest, 2016/03/14 08:13
I am having around 150 independent variables which are highly correlated. I want to run clogit. I had tried PCA to reduce the number of variables. But when I run the clogit with PC scores,the predicted value is exact one. How much PCs should be considered for regression.
Curtisgype, 2017/01/25 09:36

<a href=>kamagra sin receta</a>
<a href=>cialis 20</a>
<a href=>cialis precio</a>
<a href=>viagra sin receta</a>
<a href=>comprar viagra</a>
Thomassaw, 2017/02/03 21:05

<a href=>Gazelle Adidas Bleu Marine</a>
<a href=>Nike Sb Femme Stefan Janoski</a>
<a href=>Vans Grigie Scamosciate</a>
<a href=>Puma Uomo Rihanna</a>
<a href=>Adidas Stan Smith Women</a>
Richardmiff, 2017/02/04 05:49

<a href=>Air Max 90 Black And White Hyperfuse</a>
<a href=>Nike Air Max Women</a>
<a href=>Adidas Gazelle Ladies</a>
<a href=>Nike Shox Uk 12</a>
<a href=>Nike Free Rn Flyknit Ms Mens Shoes Volt/Black</a>
Curtisgype, 2017/02/10 03:25

<a href=>Oakley Madman Fake</a>
<a href=>Nike Free Tumblr</a>
<a href=>Adidas Yeezy Boost Black And White</a>
<a href=>Roshe Run Outfit Men</a>
<a href=>Adidas Tubular Defiant Purple</a>
Curtisgype, 2017/02/10 21:56

<a href=>Cortez Nike 72</a>
<a href=ünstig-deichmann-169.php>Mbt Schuhe Günstig Deichmann</a>
<a href=>Nike Air Max 1 Limited Edition</a>
<a href=>Louboutin Pumps</a>
<a href=>Michael Kors Tas Zwart Nep</a>
Curtisgype, 2017/02/11 08:40

<a href=ß.php>Nike Air Max Thea Weiß</a>
<a href=>Nike Air Max 2016 Nike.Nl</a>
<a href=>Adidas Gazelle Beige</a>
<a href=>Nike Air Presto Nikelab</a>
<a href=ß-damen.php>Nike Air Force One Schwarz Weiß Damen</a>
Curtisgype, 2017/02/12 00:03

<a href=ß-herren.php>Stefan Janoski Weiß Herren</a>
<a href=>Ray Ban Sonnenbrille Herren Verspiegelt</a>
<a href=>New Balance Nederland Bestellen</a>
<a href=>Nike Air Max Online Kopen Dames</a>
<a href=>Nike Free 5.0 Flyknit Nsw</a>
Raghuram , 2017/02/13 11:35

i have gone through entire article on PCA. I understood in your example PC1 and PC2 are significant after considering sdev and composite variance. What is the next step? Do i need to appendd these PC1 and PC2 at the end of my original dataset? or what to do by knowing PC1 and PC2 ?
Curtisgype, 2017/02/15 00:43

<a href=>Nike Air Huarache</a>
<a href=>Nike Huarache Mujer Baratas</a>
<a href=>Ray Ban Mujer</a>
<a href=>Superstar Blancas Hombre</a>
<a href=>Air Max 2016 Hombre</a>
Curtisgype, 2017/02/15 13:41

<a href=>Adidas Yeezy 350 Ebay Uk</a>
<a href=>Nike Cap Blue</a>
<a href=>Air Max 90 Infrared Patch</a>
<a href=>Nike Free Run 3.0</a>
<a href=>Nike Air Max 2017 Ebay</a>
Curtisgype, 2017/02/18 03:33

<a href=>Reebok Rosa Claro</a>
<a href=>Nike Stefan Janoski Max Black For Sale</a>
<a href=>Air Force 1 Low</a>
<a href=>Saucony Type A7</a>
<a href=>Zapatillas Zx Flux Adv</a>
Richardmiff, 2017/02/21 23:06

<a href=>Adidas Yeezy Mujer</a>
<a href=>Adidas Zapatillas Nmd Verdes</a>
<a href=>Nike Force One</a>
<a href=>Saucony Baratas Online</a>
<a href=>Puma Rihanna Creeper Gris</a>
Richardmiff, 2017/02/23 15:11

<a href=>Adidas La Trainer Beige</a>
<a href=>Nike Flyknit Oreo 2.0</a>
<a href=>Saucony Hurricane Iso Vs Asics Kayano 21</a>
<a href=>Adidas Lite Racer Neo</a>
<a href=>Saucony 2017</a>
Curtisgype, 2017/02/23 20:39

<a href=ı-bayan-siyah.htm>Puma Ayakkabı Bayan Siyah</a>
<a href=>Saucony Hurricane 15</a>
<a href=>Nike Flyknit Racer Damen</a>
<a href=>Asics Fireblast</a>
<a href=>Nike Air Max Nere E Bianche</a>
Richardmiff, 2017/02/24 04:05

<a href=>Buy Converse Uk Cheap</a>
<a href=>Air Max 2016 Mens Black</a>
<a href=>Air Max 1 Size 9</a>
<a href=>Adidas Stan Smith Gold Metallic</a>
<a href=>Adidas Ultra Boost With Jeans</a>
Curtisgype, 2017/02/25 05:05

<a href=>Nike Air Huarache White And Gold</a>
<a href=>Vans Tenisky Zlavy</a>
<a href=>Adidas Y 3 Crazy Explosive</a>
<a href=>Nike Air Mag</a>
<a href=>Adizero Primeknit Skor</a>
Curtisgype, 2017/02/26 16:14

<a href=>Peregrine Saucony</a>
<a href=ünstig.php>Nike Air Force Herren Günstig</a>
<a href=>Converse Bordeaux Rood</a>
<a href=>Converse Schoenen Dames Laag</a>
<a href=>Nike Schoenen Heren Zwart</a>
Richardmiff, 2017/02/26 16:30

<a href=>Nike Air Max 2016 Womens Grey</a>
<a href=>Huarache Wolf Grey White</a>
<a href=>Nike Air Force Army</a>
<a href=>Air Max All Black</a>
<a href=>Nike Roshe Womens Grey</a>
Curtisgype, 2017/02/26 19:24

<a href=>New Balance</a>
<a href=>Nike Air Force High Brown</a>
<a href=>Nike Air Max 90 Womens 2016</a>
<a href=>Puma Rihanna Creepers</a>
<a href=>Nike Air Max 90 2016 Women's</a>
Curtisgype, 2017/03/04 21:49

<a href=>Adidas Stan Smith Pink And White</a>
<a href=>Nike Air Max 90 Hyperfuse Independence Day White</a>
<a href=>Oakley Half Jacket Nose Pads</a>
<a href=>Adidas Gazelle Grey And White</a>
<a href=>New Balance 813</a>
DouglasGYPE, 2017/03/07 21:46

<a href=>Air Force Max 2013 Black</a>
<a href=>Air Max 95 On Feet</a>
<a href=>Nike Free Rn Flyknit Reddit</a>
<a href=>Nike Air Max 2016 Bright Crimson</a>
<a href=>Air Max 90 Black And White And Gray</a>
Curtisgype, 2017/03/09 10:55

<a href=>Oakley Frogskins Lenses Change</a>
<a href=>Reebok Lifters 2.0</a>
<a href=>Nike Flyknit Racer Multicolor For Sale</a>
<a href=>Nike Free Rn Flyknit Men's Running</a>
<a href=>Adidas Ultra Boost Cream Chalk On Feet</a>
Curtisgype, 2017/03/10 02:57

<a href=>Abercrombie Fitch Blusa De Frio</a>
<a href=>Chaquetas Ralph Lauren</a>
<a href=>Nike Sb Krampus</a>
<a href=>Gorra NY Negra</a>
<a href=>Nike Presto 2014</a>
Curtisgype, 2017/03/12 18:13

<a href=>Adidas Human Race Nmd Pharrell</a>
<a href=>Converse Chuck Taylor Czerwone</a>
<a href=>Nike Hypervenom Svarta</a>
<a href=>Timberland Gorge C2 Centauro</a>
<a href=>Adidas Ultra Boost White Women</a>
Curtisgype, 2017/03/13 12:22

<a href=>Adidas Climacool Maglia</a>
<a href=>Adidas Kobe 2 Price</a>
<a href=>Nike Air Max 90 White</a>
<a href=>Asics Obuv Beh</a>
<a href=>Nike Hyperadapt Amazon</a>
Curtisgype, 2017/03/13 14:43

<a href=>Flyknit Roshe Run Grey</a>
<a href=>Adidas Schuhe Damen Grau Schwarz</a>
<a href=>Nike Free Rn Motion Flyknit Foot Locker</a>
<a href=>Nike Shox Rivalry Schwarz Blau</a>
<a href=>Adidas Muts Zwart</a>
Curtisgype, 2017/03/15 12:43

<a href=>Nike Foamposite Pro Blue</a>
<a href=>Air Max Tavas Vs Thea</a>
<a href=>Nike Roshe Run Blue And White</a>
<a href=>Womens Nike Air Max 2015 Black</a>
<a href=>Air Max 95 Ultra Se Trainer</a>
Curtisgype, 2017/03/15 16:03

<a href=>Air Jordan Son Of Mars Hot Lava</a>
<a href=é>Adidas Gazelle Grise Foncé</a>
<a href=>Asics Gel Quantum 360</a>
<a href=>Adidas Gazelle Blanc</a>
<a href=>Louis Vuitton Sac De Voyage</a>
Please enter your comment. You cannot remove your comments by yourself. So double-check before you submit.:
If you can't read the letters on the image, download this .wav file to get them read to you.
hcistats/pca.txt · Last modified: 2014/08/14 05:24 by Koji Yatani

Page Tools