hcistats:datatype

There are four kinds of data you encounter in an analysis. **Nominal** (Categorical), **Ordinal**, **Interval**, and **Ratio**.

**Nominal**: The data indicating categories. There is no notion of ordering in nominal data. The examples are*Techniques (technique A, technique B,…)*,*Gender (male, and female)*, and*Occupation (students, professional programmers,…)*.**Ordinal**: The data which you can order (there is a notion of small or large), but the difference between any of the two values may not be always equal. The examples are responses of a Likert-scale question.**Interval**: The data which you can order and the difference between any of two values are the same, but there is no absolute zero, which allows us to have meaningful negative values. The most famous example is temperature with C or F. 0 C or 0 F are artificially defined and negative values can be used for these units. But the differences between 0 C and 1 C and between 100 C and 101 C are the same.**Ratio**: The data which you can order and the difference between any of two values are the same (so they are interval), and there is the absolute zero. This means that a meaningful negative value of interval data does not exist (in statistics). The examples are weight, height, length, time, speed, and error rate. Counts can often be considered as ratio.

The different types of data have different characteristics for mathematical operations.

Operations | Nominal | Ordinal | Interval | Ratio |

Frequency count, mode, chi-square | O | O | O | O |

Median, percentile | X | O | O | O |

Add, subtract, mean, variance, correlation, regression | X | X | O | O |

Geometric/harmonic mean, coefficient of variation, logarithms | X | X | X | O |

Thus, the ratio data allow you to do the most mathematical operations followed by interval, ordinal, and nominal data. **It is better to design your experiment so that your dependent variable (what you measure) is ratio**. It allows you to do a variety of analyses.

And you have two distinctive variables for statistical tests.

**Independent variables**: The dimensions you are testing the data against.**Dependent variables**: The values you are using for your test. Usually you have only one dependent variable for one test.

Let's say you are comparing the performance time of two interaction techniques (Technique A and Technique B). Your independent variable is *techniques*, which is nominal. You are comparing performance time against the techniques and there is no concept of ordering for the techniques. Your dependent variable is *performance time* (msec), which is ratio. You can order time and the millisecond is an equal unit. **It is important to figure out which dependent variables and independent variable you use and what types of data they are before jumping into any kind of statistical tests.** The types of data determine statistical methods you can use. Particularly, **it is generally a good idea to make your dependent variable interval or ratio** because it allows you to do a wider variety of statistical analyses than nominal or ordinal.

One thing you may need to consider is how to treat the data from your Likert-scale questions. If you can assume that the differences between any two options are equal, you can treat them as interval data. For instance, if your options are *strongly agree*, *agree*, *neutral*, *disagree*, and *strongly disagree*, you may be able to treat them as interval data. However, if your options are like *use it everyday*, *use it once a week*, *use it once a month*, *use it once a year*, and *have never used it*, it is probably safer to treat them as ordinal data.

hcistats/datatype.txt · Last modified: 2014/03/29 00:20 by Koji Yatani

## Discussion

"it is generally a good idea to make your dependent variable interval or ratio because it allows you to do a wider variety of statistical analyses than nominal or ordinal"

I would like to suggest to add the phrase following the word "because"; "it is more powerful to analyze data and", or something like that. As you placed 4 types of variables in the table in the order they appeared from left to right, the variable type which has the most analytical power is the ratio variable and the least the nominal. One implication from this is that it is generally considered better not to convert an interval (e.g., IQ) or a ratio variable (e.g., age - as people often do such on this variable) to an ordinal variable because you will loose the analytical power of interval/ratio variable as you do that unless using the original variable is contraindicated.

However, there are some situations in that you would better choose a dichotomous variable such as gender as the dependent variable. For example, you would like to determine which gender might be when you observe various explanatory variables related to certain set of human behaviors such as speed of learning by detecting users' clicking a set of options correctly, when a person is given a new (particular) machinery, tool, computer program, or application etc (or buy/no buy a product as a dependent given a gender, age etc. as independents). And you may want to analyse this using logistic regression. Or when you are interested to find what hazard level of failure a machine might wind up with over time when machinists misused such machine in a certain way or just failed to give proper maintenance given gender and age of the machinist among other covariates. Thus you may want to use survival analysis computing hazard ratio (of failure: failure vs non-failure). These two analytical tools are just a few example which are much more powerful in giving you odds ratio of being a female by a unit increase in speed (or odds ratio of purchase given a female contrasted with male), or hazard ratio of machine failure when maintenance is not followed over time respectively rather than to use 2x2 tables to compute odds ratios ignoring important covariates. They are powerful to give you more insights on the given observation / experimentation considering many aspects of human behavior in machine-human interface as above examples relate to.