Author: Martin Morgan Date: 22 July, 2019
1 + 2
## [1] 3
x = c(1, 2, 3)
1:3 # sequence of integers from 1 to 3
## [1] 1 2 3
x + c(4, 5, 6) # vectorized
## [1] 5 7 9
x + 4 # recycling
## [1] 5 6 7
Vectors
numeric()
, character()
, logical()
, integer()
, complex()
, …NA
: ‘not available’factor()
: values from restricted set of ‘levels’.Operations
==
, <
, <=
, >
, >=
, …|
(or), &
(and), !
(not)[
, e.g., x[c(2, 3)]
[<-
, e.g., x[c(1, 3)] = x[c(1, 3)]
is.na()
Functions
x = rnorm(100)
y = x + rnorm(100)
plot(x, y)
data.frame
df <- data.frame(Independent = x, Dependent = y)
head(df)
## Independent Dependent
## 1 0.8658385 0.8357491
## 2 -1.2530897 -2.3004453
## 3 0.6287058 1.8726218
## 4 -0.4357103 -1.7256617
## 5 -0.9183898 -0.8309443
## 6 -0.1622652 -1.0660857
df[1:5, 1:2]
## Independent Dependent
## 1 0.8658385 0.8357491
## 2 -1.2530897 -2.3004453
## 3 0.6287058 1.8726218
## 4 -0.4357103 -1.7256617
## 5 -0.9183898 -0.8309443
df[1:5, ]
## Independent Dependent
## 1 0.8658385 0.8357491
## 2 -1.2530897 -2.3004453
## 3 0.6287058 1.8726218
## 4 -0.4357103 -1.7256617
## 5 -0.9183898 -0.8309443
plot(Dependent ~ Independent, df) # 'formula' interface
df[, 1]
, df[, "Indep"]
, df[[1]]
,
df[["Indep"]]
, df$Indep
Exercise: plot only values with Dependent > 0
, Independent > 0
Select rows
ridx <- (df$Dependent > 0) & (df$Independent > 0)
Plot subset
plot(Dependent ~ Independent, df[ridx, ])
Skin the cat another way
plot(
Dependent ~ Independent, df,
subset = (Dependent > 0) & (Independent > 0)
)
fit <- lm(Dependent ~ Independent, df) # linear model -- regression
anova(fit) # summary table
## Analysis of Variance Table
##
## Response: Dependent
## Df Sum Sq Mean Sq F value Pr(>F)
## Independent 1 89.009 89.009 97.886 < 2.2e-16 ***
## Residuals 98 89.113 0.909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Dependent ~ Independent, df)
abline(fit)
lm()
: plain-old functionfit
: an object of class “lm”anova()
: a generic with a specific method for class “lm”class(fit)
## [1] "lm"
methods(class="lm")
## [1] add1 alias anova case.names
## [5] coerce confint cooks.distance deviance
## [9] dfbeta dfbetas drop1 dummy.coef
## [13] effects extractAIC family formula
## [17] hatvalues influence initialize kappa
## [21] labels logLik model.frame model.matrix
## [25] nobs plot predict print
## [29] proj qr residuals rstandard
## [33] rstudent show simulate slotsFromS3
## [37] summary variable.names vcov
## see '?methods' for accessing help and source code
?"plot" # plain-old-function or generic
?"plot.formula" # method
?"plot.lm" # method for object of class 'lm', plot(fit)
library(ggplot2)
ggplot(df, aes(x = Independent, y = Dependent)) +
geom_point() + geom_smooth(method = "lm")
library(ggplot2)
, once per session)Started 2002 as a platform for understanding analysis of microarray data
1,750 packages. Domains of expertise:
Important themes
Resources
A distinctive feature of Bioconductor – use of objects for representing data
library(Biostrings)
dna <- DNAStringSet(c("AACTCC", "CTGCA"))
dna
## A DNAStringSet instance of length 2
## width seq
## [1] 6 AACTCC
## [2] 5 CTGCA
reverseComplement(dna)
## A DNAStringSet instance of length 2
## width seq
## [1] 6 GGAGTT
## [2] 5 TGCAG
Web site, https://bioconductor.org
1750 ‘software’ packages, https://bioconductor.org/packages
Discovery and use, e.g., DESeq2
Also:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19
##
## Matrix products: default
## BLAS: /home/msmith/Applications/R/R-3.6.0/lib/libRblas.so
## LAPACK: /home/msmith/Applications/R/R-3.6.0/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Biostrings_2.52.0 XVector_0.24.0 IRanges_2.18.1
## [4] S4Vectors_0.22.0 BiocGenerics_0.30.0 ggplot2_3.2.0
## [7] BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.0
## [4] BiocManager_1.30.4 zlibbioc_1.30.0 tools_3.6.0
## [7] digest_0.6.20 evaluate_0.13 tibble_2.1.3
## [10] gtable_0.3.0 pkgconfig_2.0.2 rlang_0.4.0
## [13] yaml_2.2.0 xfun_0.7 withr_2.1.2
## [16] stringr_1.4.0 dplyr_0.8.1 knitr_1.23
## [19] grid_3.6.0 tidyselect_0.2.5 glue_1.3.1
## [22] R6_2.4.0 rmarkdown_1.12 bookdown_0.10
## [25] purrr_0.3.2 magrittr_1.5 scales_1.0.0
## [28] codetools_0.2-16 htmltools_0.3.6 assertthat_0.2.1
## [31] colorspace_1.4-1 labeling_0.3 stringi_1.4.3
## [34] lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4
Research reported in this tutorial was supported by the National Human Genome Research Institute and the National Cancer Institute of the National Institutes of Health under award numbers U41HG004059 and U24CA180996.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 633974)