---
title: "Data Analysis"
output: html_document
---
Lab #5
Differential expression
1.) Get the GEO rat ketogenic brain data set (rat_KD.txt).
[rat_KD.txt](http://tinyurl.com/data-uruguay)
2a.) Load into R, using read.table() function and header=T argument.
```{r}
rat = read.table("rat_KD.txt", sep = "\t", header = T)
dimnames(rat)[[1]] <- rat[,1]
rat = rat[,-1]
```
2b.) Set the gene names to row names and remove this first column.
3.) Use the t-test function to calculate the difference per gene between the control diet and ketogenic diet classes. (Hint: use the colnames() function to determine where one class ends and the other begins).
```{r}
colnames(rat)
t.test(rat[1,1:6], rat[1,7:11])
y <- t.test(rat[2,1:6], rat[2, 7:11])
dim(rat)
ttestRat <- function(df, grp1, grp2) {
x = df[grp1]
y = df[grp2]
x = as.numeric(x)
y = as.numeric(y)
results = t.test(x, y)
results$p.value
}
rawpvalue = apply(rat, 1, ttestRat, grp1 = c(1:6), grp2 = c(7:11))
```
4.) Plot a histogram of the p-values.
```{r}
hist(rawpvalue)
```
5.) Log2 the data, calculate the mean for each gene per group. Then calculate the fold change between the groups (control vs. ketogenic diet). hint: log2(ratio)
```{r}
##transform our data into log2 base.
rat = log2(rat)
#calculate the mean of each gene per control group
control = apply(rat[,1:6], 1, mean)
#calcuate the mean of each gene per test group
test = apply(rat[, 7:11], 1, mean)
#confirming that we have a vector of numbers
class(control)
#confirming we have a vector of numbers
class(test)
#because our data is already log2 transformed, we can take the difference between the means. And this is our log2 Fold Change or log2 Ratio == log2(control / test)
foldchange <- control - test
```
6.) Plot a histogram of the fold change values.
```{r}
hist(foldchange, xlab = "log2 Fold Change (Control vs Test)")
```
7.) Transform the p-value (-1*log(p-value)) and create a volcano plot using ggplot2.
```{r}
results = cbind(foldchange, rawpvalue)
results = as.data.frame(results)
results$probename <- rownames(results)
library(ggplot2)
volcano = ggplot(data = results, aes(x = foldchange, y = -1*log10(rawpvalue)))
volcano + geom_point()
```