Types of Data
There are four kinds of data you encounter in an analysis. Nominal (Categorical), Ordinal, Interval, and Ratio.
- Nominal: The data indicating categories. There is no notion of ordering in nominal data. The examples are Techniques (technique A, technique B,…), Gender (male, and female), and Occupation (students, professional programmers,…).
- Ordinal: The data which you can order (there is a notion of small or large), but the difference between any of the two values may not be always equal. The examples are responses of a Likert-scale question.
- Interval: The data which you can order and the difference between any of two values are the same, but there is no absolute zero, which allows us to have meaningful negative values. The most famous example is temperature with C or F. 0 C or 0 F are artificially defined and negative values can be used for these units. But the differences between 0 C and 1 C and between 100 C and 101 C are the same.
- Ratio: The data which you can order and the difference between any of two values are the same (so they are interval), and there is the absolute zero. This means that a meaningful negative value of interval data does not exist (in statistics). The examples are weight, height, length, time, speed, and error rate. Counts can often be considered as ratio.
The different types of data have different characteristics for mathematical operations.
|Frequency count, mode, chi-square||O||O||O||O|
|Add, subtract, mean, variance, correlation, regression||X||X||O||O|
|Geometric/harmonic mean, coefficient of variation, logarithms||X||X||X||O|
Thus, the ratio data allow you to do the most mathematical operations followed by interval, ordinal, and nominal data. It is better to design your experiment so that your dependent variable (what you measure) is ratio. It allows you to do a variety of analyses.
And you have two distinctive variables for statistical tests.
- Independent variables: The dimensions you are testing the data against.
- Dependent variables: The values you are using for your test. Usually you have only one dependent variable for one test.
Let's say you are comparing the performance time of two interaction techniques (Technique A and Technique B). Your independent variable is techniques, which is nominal. You are comparing performance time against the techniques and there is no concept of ordering for the techniques. Your dependent variable is performance time (msec), which is ratio. You can order time and the millisecond is an equal unit. It is important to figure out which dependent variables and independent variable you use and what types of data they are before jumping into any kind of statistical tests. The types of data determine statistical methods you can use. Particularly, it is generally a good idea to make your dependent variable interval or ratio because it allows you to do a wider variety of statistical analyses than nominal or ordinal.
One thing you may need to consider is how to treat the data from your Likert-scale questions. If you can assume that the differences between any two options are equal, you can treat them as interval data. For instance, if your options are strongly agree, agree, neutral, disagree, and strongly disagree, you may be able to treat them as interval data. However, if your options are like use it everyday, use it once a week, use it once a month, use it once a year, and have never used it, it is probably safer to treat them as ordinal data.