1 Example: t-test

t.test()

x: vector of univariate measurements
y: factor describing experimental design
var.equal=TRUE: appropriate for relatively small experiments where no additional information available?
formula: alternative representation, y ~ x.

head(sleep)

##   extra group ID
## 1   0.7     1  1
## 2  -1.6     1  2
## 3  -0.2     1  3
## 4  -1.2     1  4
## 5  -0.1     1  5
## 6   3.4     1  6

plot(extra ~ group, data = sleep)

## Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))

## 
##  Welch Two Sample t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean of x mean of y 
##      0.75      2.33

## Formula interface
t.test(extra ~ group, sleep)

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

## equal variance between groups
t.test(extra ~ group, sleep, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.363874  0.203874
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

lm() and anova()

lm(): fit linear model.
anova(): statisitcal evaluation.

## linear model; compare to t.test(var.equal=TRUE)
fit <- lm(extra ~ group, sleep)
anova(fit)

## Analysis of Variance Table
## 
## Response: extra
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## group      1 12.482 12.4820  3.4626 0.07919 .
## Residuals 18 64.886  3.6048                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Under the hood: formula: translated into model matrix, used in lm.fit().
With (implicit) intercept 1, last coefficient of model matrix reflects group effect
With intercept 0, contrast between effects of coefficient 1 and coefficient 2 reflect group effect

## underlying model, used in `lm.fit()`
model.matrix(extra ~ group, sleep)     # last column indicates group effect

##    (Intercept) group2
## 1            1      0
## 2            1      0
## 3            1      0
## 4            1      0
## 5            1      0
## 6            1      0
## 7            1      0
## 8            1      0
## 9            1      0
## 10           1      0
## 11           1      1
## 12           1      1
## 13           1      1
## 14           1      1
## 15           1      1
## 16           1      1
## 17           1      1
## 18           1      1
## 19           1      1
## 20           1      1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

model.matrix(extra ~ 0 + group, sleep) # contrast between columns

##    group1 group2
## 1       1      0
## 2       1      0
## 3       1      0
## 4       1      0
## 5       1      0
## 6       1      0
## 7       1      0
## 8       1      0
## 9       1      0
## 10      1      0
## 11      0      1
## 12      0      1
## 13      0      1
## 14      0      1
## 15      0      1
## 16      0      1
## 17      0      1
## 18      0      1
## 19      0      1
## 20      0      1
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

Covariate – fit base model containing only covariate, test improvement in fit when model includes factor of interest

fit0 <- lm(extra ~ ID, sleep)
fit1 <- lm(extra ~ ID + group, sleep)
anova(fit0, fit1)

## Analysis of Variance Table
## 
## Model 1: extra ~ ID
## Model 2: extra ~ ID + group
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     10 19.290                                
## 2      9  6.808  1    12.482 16.501 0.002833 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

t.test(extra ~ group, sleep, var.equal=TRUE, paired=TRUE)

## 
##  Paired t-test
## 
## data:  extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

genefilter::rowttests()

t-tests for gene expression data
useful for exploratory analysis, but statistically sub-optimal
x: matrix of expression values
features x samples (reverse of how a ‘statistician’ would represent the data – samples x features)
fac: factor of one or two levels describing experimental design

Limitations

Assumes features are independent
Ignores common experimental design
Ignores multiple testing

Consequences

Poor estimate of between-group variance for each feature
Elevated false discovery rate

Supplement 2: RNA-Seq Statistical Issues

Martin Morgan (martin.morgan@roswellpark.org)
Roswell Park Cancer Institute, Buffalo, NY
5 - 9 October, 2015

Contents

1 Example: t-test

2 Common experimental designs

Supplement 2: RNA-Seq Statistical Issues

Martin Morgan (martin.morgan@roswellpark.org) Roswell Park Cancer Institute, Buffalo, NY 5 - 9 October, 2015

Contents

1 Example: t-test

2 Common experimental designs

Martin Morgan (martin.morgan@roswellpark.org)
Roswell Park Cancer Institute, Buffalo, NY
5 - 9 October, 2015