getting started with r

October 23, 2012

I installed the “roasted marshmallows” version of R (2.15.1) from www.r-project.org, which went smoothly on my MacBookPro running Lion. I was happy to find its REPL ran easily on the command line in Terminal.

R is a free implementation of a dialect of the S language, the statistics and graphics environment for which John Chambers won the ACM Software Systems award. S was consciously designed to blur the distinction between users and programmers. S is a high-level programming language, with similarities to Scheme and Python. It is a good system for rapid development of statistical applications.
R fundamentals

R has amazing built-in primitives and libraries for what scientists like to do with data and incredible graphing options. I struggled to find good, simple resources to approach learning R as a programmer. I loved this tutorial. Here are my cliff notes:

The most fundamental objects are vectors — basically an array, where index starts at 1.
Names of objects are case sensitive
Comments starts with ‘#’

Examples:

n <- 25
v1 <- c(2,5,1,9)        # create a vector, combine a list of columns
v2 <- numeric(4)        # initialize a vector of specific length (with zeros)
v3 <- rep(3, 10)        # initialize a vector with number: repeat 3, 10x
v4 <- 1:4                  # specify a vector using the range 1-4
v4 <- 4:1                  # you can count down, 4,3,2,1
v4 <- 3:-1                 # even to negative numbers: 3,2,1,0,-1
v4 <- seq(1, 1.5, by=.1)   # easy to generate a sequence of equality spaced non-integers
v5 <- c(v3, v4, 7)        # c will combine or concatenate vectors
v2[1] <- 4         # use bracket notation to access (or set) an element
v2[1:3] <- 4:6     # you can even access (or set) a range

For those of us who already know it from Math class (or computer graphics), vector math in R works the way you would expect:

Adding vectors of different length doesn’t really make sense in real life (although maybe there’s an application for that I don’t know about), but R conveniently defines that the shorter vector is repeated as often a needed to match the length of the longer vector.

Like many languages, and more importantly, like Math, functions are a name with its arguments enclosed in parentheses. Here are some common ones:

max(v1)
mean(v1)
sum(v1)
prod(v1)
length(v1)
unique(v1)

By using parentheses for grouping, one can combine several expressions that involve
functions.

(sum(v2^2) - (sum(v2)^2)/length(v2)) / (length(v2) - 1)

A simpler way to get the same result would be to use the var function.

var(v2)

The standard deviation is computed by using the expression sqrt(var(v2))

Comparisons are cool.


> 1  1:2  v6  v6[v6 < 3]    # find all elements in a vector which are  length(v6[v6 < 3])    # count how many are < 3
[1] 2