2 R: First Impressions

Type values and mathematical formulas into R’s command prompt

1 + 1

## [1] 2

Assign values to symbols (variables)

x = 1
x + x

## [1] 2

Invoke functions such as c(), which takes any number of values and returns a single vector

x = c(1, 2, 3)
x

## [1] 1 2 3

R functions, such as sqrt(), often operate efficienty on vectors

y = sqrt(x)
y

## [1] 1.000000 1.414214 1.732051

There are often several ways to accomplish a task in R

x = c(1, 2, 3)
x

## [1] 1 2 3

x <- c(4, 5, 6)
x

## [1] 4 5 6

x <- 7:9
x

## [1] 7 8 9

10:12 -> x
x

## [1] 10 11 12

Sometimes R does ‘surprising’ things that can be fun to figure out

x <- c(1, 2, 3) -> y
x

## [1] 1 2 3

## [1] 1 2 3

2.1 R Data types: vector and list

‘Atomic’ vectors

Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)

people <- c("Lori", "Yubo", "Greg", "Nitesh", "Valerie", "Herve")
people

## [1] "Lori"    "Yubo"    "Greg"    "Nitesh"  "Valerie" "Herve"

Atomic vectors can be named

population <- c(Buffalo=259000, Rochester=210000, `New York`=8400000)
population

##   Buffalo Rochester  New York 
##    259000    210000   8400000

log10(population)

##   Buffalo Rochester  New York 
##  5.413300  5.322219  6.924279

Statistical concepts like NA (“not available”)

truthiness <- c(TRUE, FALSE, NA)
truthiness

## [1]  TRUE FALSE    NA

Logical concepts like ‘and’ (&), ‘or’ (|), and ‘not’ (!)

!truthiness

## [1] FALSE  TRUE    NA

truthiness | !truthiness

## [1] TRUE TRUE   NA

truthiness & !truthiness

## [1] FALSE FALSE    NA

Numerical concepts like infinity (Inf) or not-a-number (NaN, e.g., 0 / 0)

undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
undefined_numeric_values

## [1]   NA  NaN  NaN  Inf -Inf

sqrt(undefined_numeric_values)

## Warning in sqrt(undefined_numeric_values): NaNs produced

## [1]  NA NaN NaN Inf NaN

Common string manipulations

toupper(people)

## [1] "LORI"    "YUBO"    "GREG"    "NITESH"  "VALERIE" "HERVE"

substr(people, 1, 3)

## [1] "Lor" "Yub" "Gre" "Nit" "Val" "Her"

R is a green consumer – recylcing short vectors to align with long vectors

x <- 1:3
x * 2            # '2' (vector of length 1) recycled to c(2, 2, 2)

## [1] 2 4 6

truthiness | NA

## [1] TRUE   NA   NA

truthiness & NA

## [1]    NA FALSE    NA

It’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([: subset; <: less than)

substr(tolower(people), 1, 3)

## [1] "lor" "yub" "gre" "nit" "val" "her"

population[population < 1000000]

##   Buffalo Rochester 
##    259000    210000

Lists

The list type can contain other vectors, including other lists

frenemies = list(
    friends=c("Larry", "Richard", "Vivian"),
    enemies=c("Dick", "Mike")
)
frenemies

## $friends
## [1] "Larry"   "Richard" "Vivian" 
## 
## $enemies
## [1] "Dick" "Mike"

[ subsets one list to create another list, [[ extracts a list element

frenemies[1]

## $friends
## [1] "Larry"   "Richard" "Vivian"

frenemies[c("enemies", "friends")]

## $enemies
## [1] "Dick" "Mike"
## 
## $friends
## [1] "Larry"   "Richard" "Vivian"

frenemies[["enemies"]]

## [1] "Dick" "Mike"

Factors

Character-like vectors, but with values restricted to specific levels

sex = factor(c("Male", "Male", "Female"),
             levels=c("Female", "Male", "Hermaphrodite"))
sex

## [1] Male   Male   Female
## Levels: Female Male Hermaphrodite

sex == "Female"

## [1] FALSE FALSE  TRUE

table(sex)

## sex
##        Female          Male Hermaphrodite 
##             1             2             0

sex[sex == "Female"]

## [1] Female
## Levels: Female Male Hermaphrodite

2.2 Classes: data.frame and beyond

Variables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet

x = rnorm(1000)       # 1000 random normal deviates
y = x + rnorm(1000)   # another 1000 deviates, as a function of x
plot(y ~ x)           # relationship bewteen x and y

Convenient to manipulate them together

data.frame(): like columns in a spreadsheet

df = data.frame(X=x, Y=y)
head(df)           # first 6 rows

##             X          Y
## 1  0.03638925  0.5489812
## 2 -0.41545524  0.1326022
## 3 -0.07465566 -0.5745222
## 4 -0.54492524  1.0485564
## 5  1.09338400 -0.4200256
## 6  0.95695268  1.6142163

plot(Y ~ X, df)    # same as above

See all data with View(df). Summarize data with summary(df)

summary(df)

##        X                   Y            
##  Min.   :-3.907163   Min.   :-4.897088  
##  1st Qu.:-0.671598   1st Qu.:-0.969528  
##  Median : 0.005968   Median : 0.078811  
##  Mean   : 0.014603   Mean   : 0.002155  
##  3rd Qu.: 0.700852   3rd Qu.: 0.991753  
##  Max.   : 3.133759   Max.   : 3.973073

Easy to manipulate data in a coordinated way, e.g., access column X with $ and subset for just those values greater than 0

positiveX = df[df$X > 0,]
head(positiveX)

##             X          Y
## 1  0.03638925  0.5489812
## 5  1.09338400 -0.4200256
## 6  0.95695268  1.6142163
## 7  1.97637428  0.9102277
## 9  2.26707137  1.0087697
## 10 1.26675817  1.5059712

plot(Y ~ X, positiveX)

R is introspective – ask it about itself

class(df)

## [1] "data.frame"

dim(df)

## [1] 1000    2

colnames(df)

## [1] "X" "Y"

matrix() a related class, where all elements have the same type (a data.frame() requires elements within a column to be the same type, but elements between columns can be different types).

A scatterplot makes one want to fit a linear model (do a regression analysis)

Use a formula to describe the relationship between variables
Variables found in the second argument
```
fit <- lm(Y ~ X, df)
```
Visualize the points, and add the regression line
```
plot(Y ~ X, df)
abline(fit, col="red", lwd=3)
```

Summarize the fit as an ANOVA table

anova(fit)

## Analysis of Variance Table
## 
## Response: Y
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## X           1 1063.5 1063.55    1058 < 2.2e-16 ***
## Residuals 998 1003.2    1.01                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

N.B. – ‘Type I’ sums-of-squares, so order of independent variables matters; use drop1() for ‘Type III’. See DataCamp Quick-R

Introspection – what class is fit? What methods can I apply to an object of that class?

class(fit)

## [1] "lm"

methods(class=class(fit))

##  [1] add1           alias          anova          case.names     coerce         confint       
##  [7] cooks.distance deviance       dfbeta         dfbetas        drop1          dummy.coef    
## [13] effects        extractAIC     family         formula        hatvalues      influence     
## [19] initialize     kappa          labels         logLik         model.frame    model.matrix  
## [25] nobs           plot           predict        print          proj           qr            
## [31] residuals      rstandard      rstudent       show           simulate       slotsFromS3   
## [37] summary        variable.names vcov          
## see '?methods' for accessing help and source code

2.3 Help!

Help available in Rstudio or interactively

Check out the help page for rnorm()
```
?rnorm
```
‘Usage’ section describes how the function can be used
```
rnorm(n, mean = 0, sd = 1)
```
Arguments, some with default values. Arguments matched first by name, then position
‘Arguments’ section describes what the arguments are supposed to be
‘Value’ section describes return value
‘Examples’ section illustrates use
Often include citations to relevant technical documentation, reference to related functions, obscure details
Can be intimidating, but in the end actually very useful

A.1 – Using R

Martin Morgan Martin.Morgan@RoswellPark.org
Lori Shepherd Lori.Shepherd@RoswellPark.org

2 March 2017

Contents

1 RStudio: A Quick Tour

2 R: First Impressions

2.1 R Data types: vector and list

2.2 Classes: data.frame and beyond

2.3 Help!

A.1 – Using R

Martin Morgan Martin.Morgan@RoswellPark.org Lori Shepherd Lori.Shepherd@RoswellPark.org

2 March 2017

Contents

1 RStudio: A Quick Tour

2 R: First Impressions

2.1 R Data types: vector and list

2.2 Classes: data.frame and beyond

2.3 Help!

Martin Morgan Martin.Morgan@RoswellPark.org
Lori Shepherd Lori.Shepherd@RoswellPark.org