R Programming/Advanced programming
Conditional execution
edit- Help for programming :
> ?Control
if accepts a unidimensional condition.
if (condition){ statement } else { alternative }
The ifelse() command takes as first argument the condition, as second argument the treatment if the condition is true and as third argument the treatment if the condition is false. In that case, the condition can be a vector. For instance we generate a sequence from 1 to 10 and we want to display values which are lower than 5 and greater than 8.
> x <- 1:10 > ifelse(x<5 | x>8, x, 0) [1] 1 2 3 4 0 0 0 0 9 10
Loops
editR provides three ways to write loops: for, repeat and while. The for statement is excessively simple. You simply have to define index (here k) and a vector (in the example below the vector is 1:5) and you specify the action you want between braces.
> for (k in 1:5){ + print(k) + } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
When it is not possible to use the for statement, you can also use break or while by specifying a breaking rules. One should be careful with this kind of loops since if the breaking rules is misspecified the loop will never end. In the two examples below the standard normal distribution is drawn in as long as the value is lower than 1. The cat() function is used to display the present value on screen.
> repeat { + g <- rnorm(1) + if (g > 1.0) break + cat(g,"\n") + } -1.214395 0.6393124 0.05505484 -1.217408 > g <- 0 > while (g < 1){ + g <- rnorm(1) + cat(g,"\n") + } -0.08111594 0.1732847 -0.2428368 0.3359238 -0.2080000 0.05458533 0.2627001 1.009195
Implicit loops
editLoops are generally slow and it is better to avoid them when it is possible.
- apply() can apply a function to elements of a matrix or an array. This may be the rows of a matrix (1) or the columns (2).
- lapply() applies a function to each column of a dataframe and returns a list.
- sapply() is similar but the output is simplified. It may be a vector or a matrix depending on the function.
- tapply() applies the function for each level of a factor.
> N <- 10 > x1 <- rnorm(N) > x2 <- rnorm(N) + x1 + 1 > male <- rbinom(N,1,.48) > y <- 1 + x1 + x2 + male + rnorm(N) > mydat <- data.frame(y,x1,x2,male) > lapply(mydat,mean) # returns a list $y [1] 3.247 $x1 [1] 0.1415 $x2 [1] 1.29 $male [1] 0.5 > sapply(mydat,mean) # returns a vector y x1 x2 male 3.2468 0.1415 1.2900 0.5000 > apply(mydat,1,mean) # applies the function to each row [1] 1.1654 2.8347 -0.9728 0.6512 -0.0696 3.9206 -0.2492 3.1060 2.0478 0.5116 > apply(mydat,2,mean) # applies the function to each column y x1 x2 male 3.2468 0.1415 1.2900 0.5000 > tapply(mydat$y,mydat$male,mean) # applies the function to each level of the factor 0 1 1.040 5.454
- See also aggregate() which is similar to tapply() but is applied to a dataframe instead of a vector.
Iterators
edit- Loops in R are generally slow. iterators may be more efficient than loops. See this entry in the Revolution Computing Blogs