User Tools

Site Tools


Generalized Linear Model (GLM)


Generalized Linear Model (GLM) is a more generic framework that supports statistical analysis with some sort of linear formation of a model. You can think that GLM is like an upper class of linear regression and logistic regression. These regressions are special cases of GLM. In GLM, your model should look like below.


where f is an function. In other words,


where is the inverse function of f. What this formula means: we first do linear calculation with the coefficients ( and independent variables (x). Then, we apply transformation to it by using the function , and the result is the predicted dependent variable. So, if , it is linear regression. If , it is logistic regression. By the way, is called the inverse-logit function, and the logit function is x / (1 - x). So far so good? :)

The nice thing about GLM is that we can handle other data distributions than the normal distribution by putting an appropriate function to as we do in logistic regression. With logistic regression, the data distribution of the dependent variable becomes the probability for binomial data. And there are some common data distributions and function used in GLM.

  • Poisson model: The data distribution of the dependent variable is Poisson, and (this means that the transformation for the dependent variable is the exponential function). This model is used for count data which are non-negative integers.
  • logistic-binomial model: This uses the inverse-logit function for the transformation () and uses the binomial distribution for the dependent variable. This model is used when the dependent variable represents the number of trials of interest out of the all trials (e.g., the count of the trials in which the participants successfully completed the tasks).
  • Multinomial logit and probit model: This is an extended model of logistic and probit regressions for ordinal or categorical data with more than two options. This model uses the logit or probit function for transformation, and uses the multinomial distribution for the dependent variable.

There are more models used in GLM , but the the models above are probably useful for an analysis in the context of HCI research, so I am going to explain these models in this page.

One thing you should note that the methods explained in this page does not cover multilevel models (often referred as mixed-effect models). If I explain it with terms which are more familiar to HCI folks, you need to use slightly different procedures to accommodate repeated-measure factors. This will be explained in a separate page, but understanding GLM is definitely useful to understand multilevel models.

Poisson model

This model uses the Poisson distribution. It is well known that the Poisson distribution represents the probability of the number of events occurrence within a period of time. For instance, the number of car accidents in a city may follow the Poisson distribution.

R code example

Let's take a look at an example of the following hypothetical data. You have studied how many emails 20 smart phone users (Device A and Device B) sent to someone from their mobile devices during the period of the study. You have the average number of the emails per day for each participant, and the data are like this.

ParticipantDeviceUnlimited Data PlanEmail counts
P1Device Ayes10
P2Device Ano5
P3Device Ayes9
P4Device Ayes4
P5Device Ayes6
P6Device Ayes14
P7Device Ano2
P8Device Ayes3
P9Device Ano8
P10Device Ayes10
P11Device Bno11
P12Device Byes15
P13Device Bno4
P14Device Byes5
P15Device Byes7
P16Device Byes9
P17Device Byes12
P18Device Bno6
P19Device Bno5
P20Device Bno8

In addition to the different devices, you also recorded whether each participant had an unlimited data plan, which you expected is likely to affect the number of emails sent from the mobile devices. So, now your hypothesis is that the number of emails sent from mobile devices can be influenced by the difference of the devices and whether the participants had an unlimited data plan. OK, it's time to do regression! First, prepare the data.

Device <- factor(c('A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B')) DataPlan <- factor(c('Y','N','Y','Y','Y','Y','N','Y','N','Y','N','Y','N','Y','Y','Y','Y','N','N','N')) Count <- c(10,5,9,4,6,14,2,3,8,10,11,15,4,5,7,9,12,6,5,8) data <- data.frame(Device, DataPlan, Count)

Please note that Device and DataPlan are treated as categorical data here. If your independent variable is continuous, you don't have to use factor(), of course. We the run Poisson regression.

model <- glm(Count ~ Device + DataPlan, data=data, family=poisson) summary(model) Call: glm(formula = Count ~ Device + DataPlan, family = poisson, data = data) Deviance Residuals: Min 1Q Median 3Q Max -1.9870 -0.9788 -0.1896 0.7300 1.9684 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.6702 0.1812 9.219 <2e-16 *** DeviceB 0.2187 0.1651 1.324 0.1855 DataPlanY 0.3923 0.1765 2.222 0.0263 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 32.725 on 19 degrees of freedom Residual deviance: 26.802 on 17 degrees of freedom AIC: 108.43 Number of Fisher Scoring iterations: 4

So the deviance has been improved by about 6 points, which is not too bad. And the model we have is as follows.


where DeviceB = 1 if the participant used Device B, and otherwise 0, and DeviceY = 1 if the participant had an unlimited data plan, and otherwise 0. Now we look at the exponential part, which is


Let's take a deeper look at this model. First, we see the difference between Device A and Device B. If DeviceB = 1, u will have a exp(0.2187) = 1.24 times increment. So the difference of the devices may have an effect on encouraging people to send more emails; however, DeviceB is not statistically significant (p = 0.19), so we cannot strongly argue that there is such an effect, and we probably need more data.

How about the effect of the data plan? If DataPlanY = 1, u will have a exp(0.3923) = 1.48 times increment, which sounds something. Moreover, DataPlanY is statistically significant (p <0.05). Thus, we can think that an unlimited data plan has a positive effect on the number of emails sent from the mobile devices.


So, it seems that the data plan matters for the usage of emails on mobile devices, but there is a concern about this analysis, not in terms of the method itself, but in terms of the observation. Users who had unlimited data plans happened to use emails more often than users who don't have unlimited data plans. If this is the case, our analysis is not correct. If you have noticed this before, you are a very sharp reader :). Yes, it is a concern, and we have a solution for it. Let's say you also measure how many emails each participant sent from all types of devices they used (e.g., desktop machines, laptops, internet tablets, etc), which are shown as “Total email counts” in the table below.

ParticipantDeviceUnlimited Data PlanEmail countsTotal email counts
P1Device Ayes1012
P2Device Ano515
P3Device Ayes913
P4Device Ayes45
P5Device Ayes68
P6Device Ayes1417
P7Device Ano210
P8Device Ayes36
P9Device Ano822
P10Device Ayes1014
P11Device Bno1120
P12Device Byes1518
P13Device Bno411
P14Device Byes511
P15Device Byes712
P16Device Byes913
P17Device Byes1228
P18Device Bno610
P19Device Bno514
P20Device Bno810

We use this “Total email count” as a baseline for each participant. This baseline is called offset in GLM. With the offset, the model becomes like this.


where j is the index of the data point (y_j means the dependent variable of j-th data point). This model allows us to have an adjustment (offset) for each data point. More intuitively, we are going to use the rate of the number of sent emails from the mobile devices over the total number of the sent emails. Thus, x_ji is the rate, and o_j is the total number of the sent emails. We can do this in the glm() function by using the offset option. But you have to calculate its logarithmic.

TotalCount <- c(12,15,13,5,8,17,10,6,22,14,20,18,11,11,12,13,28,10,14,10) data <- data.frame(Device, DataPlan, Count, TotalCount) model <- glm(Count ~ Device + DataPlan, data=data, family=poisson, offset=log(TotalCount)) summary(model) Call: glm(formula = Count ~ Device + DataPlan, family = poisson, data = data, offset = log(TotalCount)) Deviance Residuals: Min 1Q Median 3Q Max -1.5878 -0.5418 0.1302 0.6890 1.5710 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.81515 0.17070 -4.775 1.79e-06 *** DeviceB -0.01994 0.16235 -0.123 0.9022 DataPlanY 0.41366 0.17353 2.384 0.0171 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 18.135 on 19 degrees of freedom Residual deviance: 12.132 on 17 degrees of freedom AIC: 93.765 Number of Fisher Scoring iterations: 4

So, we still have a significant effect of the data plan.


It's not completely done even if you consider the offset. Another problem you may have is overdispersion. We have built a model with some (often very) strong assumptions: we only have two factors and the distribution of the dependent variable follows the Poisson distribution. Thus, the predicted values by our model might be good enough to describe the variance we observed in the data. If this is the case, the model is not appropriate enough to use. This is what we call overdispersion, and often happens in Poisson regression.

To see whether the model suffers from overdispersion, we can calculate the sum of square of the standardized residuals (i.e., values of the difference between the observed values and predicted values), and compare it to the chi-squared distribution.

Here is the procedure. You first need to calculate the predicted value.

predict_count <- predict(model, type="response") z <- (Count - predict_count) / sqrt(predict_count) z 1 2 3 4 5 6 7 0.6944586 -0.6359551 0.1012997 0.3571653 0.2789212 0.7771620 -1.1530480 8 9 10 11 12 13 14 -0.5069579 -0.5565284 0.2056396 0.7887425 0.9282716 -0.3534704 -0.8253118 15 16 17 18 19 20 -0.3112302 0.1611358 -1.4864222 0.7977789 -0.4356561 1.7579941

Then, calculate the sum of square of z. If this is much larger than 1, it implies that you have overdispersion. We need two values: the sample size and the number of coefficients. In our case, they are 20 and 3, respectively.

sum(z^2)/(20 - 3) 0.7166947

We also can do a test with the chi-squared distribution.

1 - pchisq(sum(z^2), 20 - 3) 0.7888812

If this value is below 0.05, the test indicates that you have the dispersion. In this example, we don't seem to have the dispersion, which is good news.

What if we have the overdispersion? We then need to adjust the standard error for each coefficient by multiplying them by the root of the sum of square of z. For example, supposed that our sum of square of z turned out 10 in our example. The coefficient for DataPlanY is 0.41366 with the standard error 0.17353. This means that the coefficient is likely [0.23983, 0.58689]. With the adjustment for the overdispersion, the standard error becomes 0.17353 * sqrt(10) = 0.54875. Thus, the adjusted coefficient is likely [-0.13509, 0.96241], which means that it can be zero (which means that the effect of DataPlanY can be zero). Fortunately, you don't have to do this manually, and you can re-fit the model by using quasipoisson as follows.

model <- glm(Count ~ Device + DataPlan, data=data, family=quasipoisson, offset=log(TotalCount)) summary(model)

It also calculates the p value based on the adjustment for the overdispersion.

logistic-binomial model

This is basically the logistic regression, but we are going to use the binomial distribution for the dependent variable. In this way, we can model the number of trials of interest out of the all trials, like a success rate or an error rate.

R code example

Let's say you have a game application in which the participants can hit monsters by a mouse click on them. In your experiment, you set different time limits (e.g., 10, 20, 30, 40, 50, and 60 sec), and measured the success rate (i.e., the number of the monsters the participants could clicked during the trial out of the pre-defiend number of the monsters). Now, what you want to do is to create a model of the success rate with the game time limit. Let's say your data look like this.

ParticipantTimeClickedNot-clickedSuccess rate
P110 sec10900.10
P220 sec23770.23
P330 sec40600.40
P440 sec70300.70
P550 sec82180.82
P660 sec9640.96
P710 sec12880.12
P820 sec20800.20
P930 sec35650.35
P1040 sec58420.58
P1150 sec76240.76
P1260 sec90100.90

(Well, you will have much more data in reality, but I put 12 samples here to make the example simple.) Let's try to do logistic regression for this data. First, we prepare the data in R.

Time <- c(10,20,30,40,50,60,10,20,30,40,50,60) Success <- c(10,23,40,70,82,96,12,20,35,58,76,90) Failure <- 100 - Success GameResult <- cbind(Success, Failure)

Note that we prepare both the numbers of “Success” and “Failure”, and make it into one variable, called “GameResult”. In this way, the logistic regression is going to use the binomial distribution. Now, we do the same thing with the glm() function.

model <- glm(GameResult ~ Time, family = binomial) summary(model)

You get the result.

Call: glm(formula = GameResult ~ Time, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.17893 -0.58216 0.02050 0.50723 1.84797 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.142800 0.191896 -16.38 <2e-16 *** Time 0.091572 0.005147 17.79 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 491.313 on 11 degrees of freedom Residual deviance: 10.378 on 10 degrees of freedom AIC: 68.458 Number of Fisher Scoring iterations: 4

You can do a similar analysis as described in the previous section. For instance, the model greatly improves the deviance, which indicates that this model is very powerful. Time is statistically significant on predicting the success rate.

This model also often suffers from the overdispersion, so you have to be careful about it. You can find more details on how to calculate the overdispersion in the previous section. You can use quasibinomial to adjust the model for overdispersion.

Poisson model or logistic-binomial model

The Poisson model is similar to the logistic-binomial model, particularly in a sense that both treat count data, and you may wonder how we should choose which model to use. Here is the rule of thumb.

  • If the dependent variable does not have a upper limit, you should use the Poisson model.
  • Otherwise, you should use the logistic-binomial model.

Multinomial logit model

(coming soon).


The content in this page is largely what I learned from the following book. Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill. It is the best book I have read about regression analysis (well, I haven't read that many books, but I'm sure this is an excellent book).


Guest, 2015/08/06 15:00
in glm binomial model work fine but multinomial models does not supported. i need it.
Guest, 2016/08/27 09:46
Just a small correction.
The logit function is:
f(x) = ln[x/(1-x)]
f(x) = x/(1-x)
Jeoriupsfoups, 2017/05/27 21:52
All the managed hosting customers are offered with 30 days money back policy too. So remember to take the DLAB frequently and very early.Another benefit of hiring Southern Trust Property Management Professionals to manage the rentals is that your tenants will always have somebody to talk to every time a problem arises in the apartments.Pointed in the Online world is the greatest alternative as it contains a chiropractic care database in order to find a better localized chiropractic doctor's business for Sarasota. <a href=>Wholesale NFL Jerseys</a>. So ask those who have taken it and get an up to date assessment on what to anticipate.<a href=>Cheap Jerseys</a>.<a href=>Wholesale Jerseys From China</a>. The test is based off a made up language and requires you to respond to questions by shifting the grammar structure of the sentence. That should be appealing for that youthful end users to seem greatly remarkable with their skullcandy with robotic concept or with skullcandy earphones that happen to be served with striking shades. You can still look for different types that may work well with you but remember to talk to people who knows what they are doing. Such people can choose monthly subscription option and divert part of their finance to develop their site. They show great interest with their studies and to go along with other students and staff. They can present you very good insight as to what to expect.Uncovering Real World Antivirus On line Protection Policies 2021Maloof Coy &nbsp;&nbsp;Submitted 2014 06 02 03:57:06 Balloon printing with customized printouts are moreover Good direction of advertising your office environment.<br>Visit my site锟斤拷
Jeoriupsfoups, 2017/06/02 11:46
Rings and earring must also be removed before the start of match. If you think the men and ladies are inventive in promoting school spirit, wait til you get a load of the little ones they have with them to the games. In front of the midfielders there are forwards and centers. With help from inner pistons it forces the brake pads against the rotors. <a href=>Wholesale Cheap Jerseys Free Shipping</a>. It would be futile to force things because you simply do not have the financial capacity to do it. They also offer retirement offers, 401k and life insurance. Heading can be painful in beginning because you must hit the ball with right part of your head. Car brake repair knowledge is always good to have because you by no means know if or when your car’s brake system will allow you to down. When the army shows up, payments willget fed. If you website does not look like it is professionally created it will steer people away. Each to do with going to be the even when enthusiasts sat on the basis of observing going to be the back again and fourth banter and also all your family members tube feature films so that you have Toney and Jackson, an authentic UFC battle has as an all in one final point been organized enchanting Rampage."At that time, I was just listening for offers,<a href=>Wholesale NFL Jerseys China</a>, to see what was outthere,<a href=>Wholesale Cheap Jerseys China</a>," Beltre said.<br>Visit my site
Jeoriupsfoups, 2017/06/15 04:07
Many buyers spend much attentionto the vamp material and ignore the lining, which is not a good idea. Working professionals can also consult counselors for developing adequate communication skills.S.Use a natural language when selling your football exactly where most of us look at hunting web site. <a href=>Wholesale Soccer Jerseys</a>.<a href=>Wholesale Soccer Jerseys From China</a>. Vin Dicarlo's plan contains numerous video clips as well as recommendations and there's no problem that it is amazingly thorough in addition to well written method.(I recently discovered that my elderly mother, suffering from a variety of ailments, including dementia, was on 17 different medications. To remedy this, use a cleanser and toner in your face before you go to bed. It's better that will schedule your article submission to every single article directory hence your articles may appear on top pages of that directories. Otherwise, you could pull a muscle and be out of commission for a while. Many of these find bad or cheap stethoscopes, which usually break after a few uses.The Ohio even has a college campus for teaching International Studies; whereby students are taught educational programs that include East san, Latin,<a href=>Wholesale Jerseys Cheap</a>, European, African and Slavic courses.<br>Visit my site:
Jeoriupsfoups, 2017/06/17 05:08
Above all, don't stop breastfeeding! The breasts must be drained. <a href=>Cheap Jerseys Wholesale</a>. Hence, in turn a bad sanitation might initiate issues, which will remain with you always and may even instigate major harm to you.It does not accede coffee weight loss, but it does promise FAT LOSS which is what we are really after!It is also a low glycemic, and diabetic friendly blend."In 31 appearances, he is 2-6 with a 4. The overhead is essentially a doubles shot,<a href=>Cheap Jerseys</a>, because in singles the chances of passing the net man are greater than lobbing over his head, while in doubles two men cover the net so easily that the best way to open the court is to lob one man back. These are just two of the recommended supplements for aging dogs. Truly people obtain confused regarding which tattoo supplies is the best as well as which tattoo machines kit to start with is easily the most appropriate for all of them. Aside from that,<a href=>Cheap Jerseys From China</a>, they can also add features that are going to be useful to you. You do not want your hair to get in your busy days ahead.Great clothing is just the start with fashion.<br>Visit my site锟斤拷
Jeoriupsfoups, 2017/06/20 14:45
Truly people obtain confused regarding which tattoo supplies is the best as well as which tattoo machines kit to start with is easily the most appropriate for all of them.He has a limited no-trade clause and Texas must decide if they want to continue to use him as a super utilityman or if his offensive production isn't high enough. In performing my exploration on coffee weight loss I discovered much conflicting information as well, <a href=>Wholesale NFL Jerseys</a>, but most troubling was the significant fact that for most all of the product claims made, there was no proof to be recognized… no long term exploration or clinical trials,<a href=>Wholesale China Jerseys</a>, just generalities inferring that on the grounds that this compound has been demonstrated to elevate weight loss by one means or another, if you introduce it into coffee the splendid result is coffee weight loss. If you are consideringstarting your very own tattoo shop, or if you simply wish to improve your talents like a tattoo artist as well as pamper yourself with beautiful tattoos, the most issues to keep in mind is that you look for a good tattoo supply. They also need certain vitamins and minerals to maintain a healthy body.51 ERA this season.<a href=>Wholesale NFL Jerseys China</a>. They will burn her in her shift. This often helps mothers retain that special bond even at times like this.natyanjali. This will help stimulate and maintain a good milk flow.<br>Visit my site
Jeoriupsfoups, 2017/06/21 07:43
”“You cowards! Gareth! Gaheris! Make him to do something. This is a technical field, <a href=>Cheap NFL Jerseys Free Shipping</a>, and should be done to a high standard to ensure that the response is good and the system is available when necessary.In order to improve your situation there a number of actions you can take which will not only improve your situation but also make them great experiences too.Even if you have the knowledge. The symptoms resemble the flu: fatigue, fever,<a href=>Cheap Jerseys Free Shipping</a>, chills, and aches.Additionally, a benefit of this program is the accessibility of the information. The smallest aspects within an institution's setting are the factors that affirm or ruin its graphic. Once you’re happy with your pump you can start preparing to express An important point that may often be missed by mother is to be relaxed, in body and in mind. Not everyone is aware of potential risks and risk factors for dogs.<a href=>Cheap NFL Jerseys China</a>.In case your chapter hazes, this suggestion almost certainly won't be extremely handy.<br>Visit my site
Please enter your comment. You cannot remove your comments by yourself. So double-check before you submit.:
If you can't read the letters on the image, download this .wav file to get them read to you.
hcistats/glm.txt · Last modified: 2014/08/14 05:27 by Koji Yatani

Page Tools