Tips for R

In this page, I put some commands/ways that I found useful for analyzing your data more quickly and effectively. If you are a beginner of R, you should take a look at the intro pdf provided by R project first. There are many online materials for R, so you should do online search. You may find other useful tips in Rtips (http://pj.freefaculty.org/R/Rtips.html).

Install packages

For example, if you want to install car package,

install.packages("car")

You will be asked to select one of the sites to download the package. And then R will just do everything for you.

Include packages

For example, if you want to include car package, you can do

library(car)

R will include the other packages which the package you are going to include has dependency on.

Changing the working directory

You can know where the current working directory is by

getwd()

If you want to change the working directory, you can do like

setwd("C:\\Users\\koji\\Documents")

Read/import a csv file

You can read a csv file by

dat = read.csv("data.csv", header=T)

If your csv file does not have the header (i.e., the first row representing the names of variables), you can do header=F. The imported data (dat) is a dataframe.

Write/export a csv file

You can write a csv file by

write.csv(dat, "data.csv")

Manipulating a dataframe

Dataframe is a very useful form to represent the data in R. It is like a table, but you can do different operations on it.

Creating a dataframe

You can use data.frame to create a dataframe.

A = c(1,1,1,1,1) B = c(2,2,2,2,2) C = c(3,3,3,3,3) data = data.frame(A,B,C) data A B C 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3

You can also pick up a specific column by using $.

data$A 1 1 1 1 1

If you want to take a row, you can do this.

data[1,] A B C D 1 1 2 3 4

You can also do like data[,1] to specify a column (in this case, the first column). Please note that the index starts 1, not 0 like an array or list in many programming lauguages.

Adding and removing a column

You can also add a column at the end by doing this.

data$D = c(4,4,4,4,4) data A B C D 1 1 2 3 4 2 1 2 3 4 3 1 2 3 4 4 1 2 3 4 5 1 2 3 4

You can also remove a column.

data$C = c() data A B D 1 1 2 4 2 1 2 4 3 1 2 4 4 1 2 4 5 1 2 4

Changing the names of the columns

You can use to see or change the names of columns.

colnames(data) "A" "B" "D" colnames(data) = c("AA", "BB", "DD") data AA BB DD 1 1 2 4 2 1 2 4 3 1 2 4 4 1 2 4 5 1 2 4

Adding and removing a row

You can also add a row at the end by doing this.

data = rbind(data, c(2,4,6)) data A B C 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 6 2 4 6

You can remove any specific row by putting “-” in front of the index. In this case, we are removing the sixth row (which we've just added).

data[-6,] A B C 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3

Using the summary function

summary() is a very useful function to know the general information of the variable.

G = c("M","M","M","M","M","F","F","F","F","F") A = c(4,6,2,3,1,4,5,3,2,4) B = c(5,7,2,3,6,5,7,7,4,5) C = c(9,6,4,7,8,5,7,6,7,9) data = data.frame(A,B,C) summary(data) G A B C F:5 Min. :1.00 Min. :2.00 Min. :4.00 M:5 1st Qu.:2.25 1st Qu.:4.25 1st Qu.:6.00 Median :3.50 Median :5.00 Median :7.00 Mean :3.40 Mean :5.10 Mean :6.80 3rd Qu.:4.00 3rd Qu.:6.75 3rd Qu.:7.75 Max. :6.00 Max. :7.00 Max. :9.00

You can see various stats (e.g., the mean, median, min and max) with summary(). Because G is a factor (or nominal data), you only see the count. This is a common way to use summary, but it also provides different information depending on the variable you put in the function. So if you want to get some general information about the variable you have, you should try summary().

by() function

You can use by() function to apply a specific function to different groups in a dataframe. In the following example, I apply summary() for Group M and Group F.

G = c("M","M","M","M","M","F","F","F","F","F") A = c(4,6,2,3,1,4,5,3,2,4) B = c(5,7,2,3,6,5,7,7,4,5) C = c(9,6,4,7,8,5,7,6,7,9) data = data.frame(G,A,B,C) by(data, data$G, summary) data$G: F G A B C F:5 Min. :2.0 Min. :4.0 Min. :5.0 M:0 1st Qu.:3.0 1st Qu.:5.0 1st Qu.:6.0 Median :4.0 Median :5.0 Median :7.0 Mean :3.6 Mean :5.6 Mean :6.8 3rd Qu.:4.0 3rd Qu.:7.0 3rd Qu.:7.0 Max. :5.0 Max. :7.0 Max. :9.0 --------------------------------------------------------- data$G: M G A B C F:0 Min. :1.0 Min. :2.0 Min. :4.0 M:5 1st Qu.:2.0 1st Qu.:3.0 1st Qu.:6.0 Median :3.0 Median :5.0 Median :7.0 Mean :3.2 Mean :4.6 Mean :6.8 3rd Qu.:4.0 3rd Qu.:6.0 3rd Qu.:8.0 Max. :6.0 Max. :7.0 Max. :9.0

Apply family

In the following examples, I use a dataframe, but you can use functions in Apply family with a matrix or list.

apply() function

You can do calculations for each row or column by using apply() function. The following example calculates the sum for each row and column.

A = c(4,6,2,3,1,4,5,3,2,4) B = c(5,7,2,3,6,5,7,7,4,5) C = c(9,6,4,7,8,5,7,6,7,9) data = data.frame(A,B,C) apply(data, 1, sum) 18 19 8 13 15 14 19 16 13 18 apply(data, 2, sum) A B C 34 51 68

lapply(), sapply() function

You can apply a specific function to each column by using lapply or sapply function.

lapply(data, mean) $A [1] 3.4 $B [1] 5.1 $C [1] 6.8 sapply(data, mean) A B C 3.4 5.1 6.8

Table of Contents