R Programming/Control Structures
Conditional execution
edit- Help for programming :
> ?Control
if accepts a unidimensional condition.
> if (condition){
+ statement
+ }
> else{
+ alternative
+ }
The unidimensional condition may be one of TRUE or FALSE, T or F, 1 or 0 or a statement using the truth operators:
- x == y "x is equal to y"
- x != y "x is not equal to y"
- x > y "x is greater than y"
- x < y "x is less than y"
- x <= y "x is less than or equal to y"
- x >= y "x is greater than or equal to y"
And may combine these using the & or && operators for AND. | or || are the operators for OR.
> if(TRUE){
+ print("This is true")
+ }
[1] "This is true"
> x <- 2 # x gets the value 2
> if(x==3){
+ print("This is true")
+ } else {
+ print("This is false")
+ }
[1] "This is false"
> y <- 4 # y gets the value 4
> if(x==2 && y>2){
+ print("x equals 2 and y is greater than 2")
+ }
[1] "x equals 2 and y is greater than 2"
The ifelse() command takes as first argument the condition, as second argument the treatment if the condition is true and as third argument the treatment if the condition is false. In that case, the condition can be a vector. For instance we generate a sequence from 1 to 10 and we want to display values which are lower than 5 and greater than 8.
> x <- 1:10
> ifelse(x<5 | x>8, x, 0)
[1] 1 2 3 4 0 0 0 0 9 10
Sets
editR has some very useful handlers for sets to select a subset of a vector:
> x = runif(10)
> x<.5
[1] TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE
> x
[1] 0.32664759 0.57826623 0.98171138 0.01718607 0.24564238 0.62190808 0.74839301
[8] 0.32957783 0.19302650 0.06013694
> x[x<.5]
[1] 0.32664759 0.01718607 0.24564238 0.32957783 0.19302650 0.06013694
to exclude a subset of a vector:
> x = 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> x[-1:-5]
[1] 6 7 8 9 10
Loops
editImplicit loops
editR has support for implicit loops, which is called vectorization. This is built-in to many functions and standard operators. for example, the +
operator can add two arrays of numbers without the need for an explicit loop.
Implicit Loops are generally slow, and it is better to avoid them when it is possible.
- apply() can apply a function to elements of a matrix or an array. This may be the rows of a matrix (1) or the columns (2).
- lapply() applies a function to each column of a dataframe and returns a list.
- sapply() is similar but the output is simplified. It may be a vector or a matrix depending on the function.
- tapply() applies the function for each level of a factor.
> N <- 10
> x1 <- rnorm(N)
> x2 <- rnorm(N) + x1 + 1
> male <- rbinom(N,1,.48)
> y <- 1 + x1 + x2 + male + rnorm(N)
> mydat <- data.frame(y,x1,x2,male)
> lapply(mydat,mean) # returns a list
$y
[1] 3.247
$x1
[1] 0.1415
$x2
[1] 1.29
$male
[1] 0.5
> sapply(mydat,mean) # returns a vector
y x1 x2 male
3.2468 0.1415 1.2900 0.5000
> apply(mydat,1,mean) # applies the function to each row
[1] 1.1654 2.8347 -0.9728 0.6512 -0.0696 3.9206 -0.2492 3.1060 2.0478 0.5116
> apply(mydat,2,mean) # applies the function to each column
y x1 x2 male
3.2468 0.1415 1.2900 0.5000
> tapply(mydat$y,mydat$male,mean) # applies the function to each level of the factor
0 1
1.040 5.454
- See also aggregate() which is similar to tapply() but is applied to a dataframe instead of a vector.
Explicit loops
editR provides three ways to write loops: for, repeat and while. The for statement is excessively simple. You simply have to define index (here k) and a vector (in the example below the vector is 1:5) and you specify the action you want between braces.
> for (k in 1:5){
+ print(k)
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
When it is not possible to use the for statement, you can also use break or while by specifying a breaking rules. One should be careful with this kind of loops since if the breaking rules is misspecified the loop will never end. In the two examples below the standard normal distribution is drawn in as long as the value is lower than 1. The cat() function is used to display the present value on screen.
> repeat {
+ g <- rnorm(1)
+ if (g > 1.0) break
+ cat(g,"\n")
+ }
-1.214395
0.6393124
0.05505484
-1.217408
> g <- 0
> while (g < 1){
+ g <- rnorm(1)
+ cat(g,"\n")
+ }
-0.08111594
0.1732847
-0.2428368
0.3359238
-0.2080000
0.05458533
0.2627001
1.009195
The next
statement can be used to discontinue one particular cycle and skip to the “next”.
> for (k in 1:10) {
+ if(k==8) {
+ print("skipped")
+ next
+ }
+ print(k)
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] "skipped"
[1] 9
[1] 10
Iterators
edit- Loops in R are generally slow. iterators may be more efficient than loops. See this entry in the Revolution Computing Blogs