The tbdata
GatingSet is part of a data set of intracellular cytokine staining (ICS) samples from TB-infected and non-infected subjects, stimulated with TB-specific and non-specific peptides. The samples were assayed for Ag-specific response to stimulation. The data were analyzed in Lin et al. (2015), using results from manual gating.
flowFrame
is an R object that corresponds to a single FCS file.flowSet
is a collection of flowFrame
s, all with the same markers and channels.GatingHierarchy
is a flowFrame
with its associated data transformation, compensation, and gating information defining cell populations.GatingSet
is a collection of GatingHierarchy
objects, or equivalently a flowSet
, with all transformation, compensation, and gating information.The GatingSet associates each sample with a hierarchy of gates that define cell popualtions. The gates can be one, two, three or high-dimensional. Any gates supported by the flowCore
infrastructure can be attached to a a hierarchy of gates in a GatingSet
.
The GatingSet lets us visualize FCM data analysis at the individual event-level (single-cells). Data are stored on disk using HDF5 and are read-in as needed. This has a very efficient memory footprint and allows us to analyze large data sets on modest hardware.
It provides a conveninent and unified way to interact with FCM data for performing hierarchical or non-hierarchical automated gating by managing cell population definitions and their relationships.
If your aren’t yet using this infrastructure, you should be.
There are at least 50 different algorithms in the literature for performing automated gating of FCM data. 75% of those are available for BioConductor.
There is no “one best algorithm” for FCM gating. Different algorithms have different strengths (Aghaeepour2013a).
OpenCyto provides a mechansim to use multiple algorithms to identify different cell populations. Complex high-dimensional methods are rarely necessary, and the focus is on simple methods that work robustly rather than fancy approaches that work on one data set.
The general idea is that for a given FCM assay utilizing a set staining panel generated by one lab, a general strategy to identify cell populations of interest should be applicable to any set of data produced by the lab.
OpenCyto formalizes this in the form of a gatingTemplate
. This template defines cell populations in terms of their phenotypic markers, parent populations, and gating methods / algorithms used to define them. The template is in the form of a csv
file that is easier to create than an R script. openCyto will take the template and a data set and produce data-driven gates for each cell population and sample.
Our goal here will be to gate the tbdata
data set. We have multiple samples per subject. One sample is unstimulated, the other is stimulated with ESAT-6 peptide. We’ll focus on identifying antigen-specific CD4+ T cells.
To this end, we need to compare the stimulated and non-stimulated samples within each subject.
The data were imported from a manual gating analysis. We’ll begin by deleting the manual gates, as we’re not interested in them for this demo. We’ll also rename some phenotypic data columns so that we can use faceting more easily.
The general gating scheme we’ll apply here is root/Singlets/CD14-/nonDebris/Lymphocytes/Live/CD3/CD4/Functional markers
.
#Delete existing gates
Rm("Singlets",tbdata)
Next, we’ll use the new API add_pop
in openCyto to build our new automated gating strategy.
We’ll grab a subset of the data for testing.
tbdata_subset = subset(tbdata,`PID`%in%unique(pData(tbdata)$`PID`)[1:2])
We’ll build up our gating template incrementally.
template = add_pop(
tbdata_subset, alias = "Singlets", pop = "Singlets", parent = "root", dims = "FSC-A,FSC-H",gating_method = "singletGate",gating_args =
"wider_gate=TRUE,subsample_pct=0.1"
)
Plotting it, we see it looks reasonable.
ggcyto(tbdata_subset,mapping = aes(x = `FSC-A`,y = `FSC-H`)) + geom_hex(bins =
50) + geom_gate("Singlets") + xlim(c(0,2e5))
Next we add CD14- cells. Lets’ see what this dimension looks like and decide on a good gating method.
ggcyto(tbdata_subset,mapping = aes(x = "CD14"),subset = "Singlets") + geom_histogram(binwidth =
100) + xlim(c(-100,4096)) + facet_wrap( ~ PID,scales = "free")
Mindensity should be able to handle this easily.
template = rbind(
template,add_pop(
tbdata_subset,alias = "CD14n", "CD14-", parent = "Singlets", dims = "CD14", gating_method =
"mindensity",groupBy = "PID",collapseDataForGating = TRUE
)
)
ggcyto(tbdata_subset,mapping = aes(x = "CD14"),subset = "Singlets") + geom_histogram(binwidth =
100) + xlim(c(-100,4096)) + facet_wrap( ~ PID,scales = "free") + geom_gate("CD14n")
template = template[1:2,]
template = rbind(
template,add_pop(
tbdata_subset,alias = "nonDebris",pop = "nonDebris+",parent = "CD14n",dims = "FSC-A",gating_method =
"boundary",collapseDataForGating = FALSE,gating_args = "min=40000,max=2.5e5"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "FSC-A",y = "SSC-A"),subset = "CD14n") + geom_hex(bins = 100) +
geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "Lymphocytes",pop = "Lymphocytes+",parent = "nonDebris",dims = "FSC-A,SSC-A",gating_method =
"flowClust.2d",preprocessing_method = "prior_flowClust",preprocessing_args="K=3",collapseDataForGating = FALSE,gating_args =
"quantile=0.95,target=c(80000,50000),K=3"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "FSC-A",y = "SSC-A"),subset = "nonDebris") +
geom_hex(bins = 50) + geom_gate("Lymphocytes")
template = rbind(
template, add_pop(
tbdata_subset,alias = "Live",pop = "Live+",parent = "Lymphocytes",dims = "<Am Cyan-A>",gating_method = "boundary",gating_args = "min=0,max=2000",collapseDataForGating = FALSE
)
)
ggcyto(tbdata_subset,mapping = aes(x = "AViD",y="SSC-A"),subset = "Lymphocytes") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") +
geom_gate("Live")
template = rbind(
template, add_pop(
tbdata_subset,alias = "CD3",pop = "CD3+",parent = "Live",dims = "CD3",gating_method = "mindensity"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "CD3",y = "FSC-A"),subset = "Live") +
geom_hex(bins=100) + ggcyto_par_set(limits = "instrument") + geom_gate("CD3")
template = rbind(
template, add_pop(
tbdata_subset,alias = "*",pop = "CD4+/-CD8+/-",dims = "CD4,CD8",gating_method =
"mindensity2",gating_args="gate_range=c(500,4000)",parent = "CD3")
)
ggcyto(tbdata_subset,mapping = aes(x = "CD4",y = "CD8"),subset = "CD3") +
geom_hex(bins=100) + ggcyto_par_set(limits = "data") + geom_gate()+xlim(0,3000)+ylim(0,4000)
In order to make inferences on the proportions of cytokine positive cells in the stimulated and non-stimulated samples within each subject, we should, ideally, apply the same gate to both samples within each subject. We can do this using the groupBy
column in the template, to specify the Patient ID (PID
) as the grouping variable, and set collapseDataForGating
to TRUE
in order to use both samples together to derive the gate.
The cytokines of interest are:
kable(pData(parameters(getData(tbdata_subset[[1]])))[,1:2],row.names=FALSE)
name | desc |
---|---|
Time | NA |
FSC-A | NA |
FSC-H | NA |
SSC-A | NA |
IFNg | |
AViD | |
CD14 | |
IL2 | |
CD3 | |
CD154 | |
PE Cy55-A | NA |
IL22 | |
IL4 | |
IL17a | |
CD4 | |
TNFa | |
CD8 |
AViD, CD14, CD3, CD4 and CD8 are phenotypic markers. We want to gate IL2, IFNg, CD154, IL22, IL4, IL17a and TNFa marginally for CD4+/CD8- T cells.
template = rbind(
template, add_pop(
tbdata_subset,alias = "TNF",pop = "TNF+",dims = "TNFa",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
"CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "TNFa",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "IL2",pop = "IL2+",dims = "IL2",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
"CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "IL2",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "IL22",pop = "IL22+",dims = "IL22",gating_method =
"tailgate",gating_args = "auto_tol=TRUE,adjust=2",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "IL22",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "IL4",pop = "IL4+",dims = "IL4",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
"CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "IL4",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "CD154",pop = "CD154+",dims = "CD154",gating_method =
"tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "CD154",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "IFNg",pop = "IFNg+",dims = "IFNg",gating_method =
"tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "IFNg",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
template = rbind(
template, add_pop(
tbdata_subset,alias = "IL17a",pop = "IL17a+",dims = "IL17a",gating_method =
"tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
)
)
ggcyto(tbdata_subset,mapping = aes(x = "IL17a",y = "SSC-A"),subset = "CD4+CD8-") +
geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
facet_grid(PID ~ Peptide,scale = "free") + geom_gate()
Now that we’ve built up our gating template we can apply it to the remainder of our data set.
We write out our gating template.
tmp = tempfile(fileext=".csv")
write.csv(template,tmp,row.names=FALSE)
gt = gatingTemplate(tmp)
data(tbdata)
Rm("Singlets",tbdata) #Remove any gates attached to the GatingSet
We do the gating in parallel.
library(parallel)
#clean up our gating set
gating(gt,tbdata,parallel_type="multicore",mc.cores=7)
We can examine the population statistics for the different cell populations. These should be relatively stable across samples. If there are some outliers, that is cause for concern and suggests we need to go back and look at those samples.
Barplots of the coefficient of variation of each cell population, faceted by its parent population, and the peptide stimulation.
stats = getPopStats(tbdata) #extract stats
stats[,prop := Count/ParentCount] #compute the cell proportions
## name Population Parent Count ParentCount prop
## 1: 353385.fcs Singlets root 197002 216918 0.908186504
## 2: 353385.fcs CD14n Singlets 191215 197002 0.970624664
## 3: 353385.fcs nonDebris CD14n 90583 191215 0.473723296
## 4: 353385.fcs Lymphocytes nonDebris 78449 90583 0.866045505
## 5: 353385.fcs Live Lymphocytes 76475 78449 0.974837155
## ---
## 604: 557788.fcs IL22 CD4+CD8- 12319 25232 0.488229233
## 605: 557788.fcs IL2 CD4+CD8- 61 25232 0.002417565
## 606: 557788.fcs TNF CD4+CD8- 1115 25232 0.044189918
## 607: 557788.fcs CD4-CD8+ CD3 34955 80621 0.433571898
## 608: 557788.fcs CD4+CD8+ CD3 95 80621 0.001178353
stats = merge(stats,pData(tbdata),by="name")
data.table::setDT(stats)
ggplot(stats[,.(CV = mad(prop) / median(prop)),.(Parent,Peptide,Population)]) +
geom_bar(position = "dodge",stat = "identity") + aes(y = CV,x = Population,fill =
Peptide) + theme(axis.text.x = element_text(angle = 90,hjust = 1)) + facet_wrap( ~
Parent,scales = "free_x")
Proportions of each cell population.
ggplot(stats) + geom_boxplot() + aes(y = prop,x = Population) + geom_jitter() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)) + facet_wrap( ~
Parent,scales = "free")
There are some clear outliers in the CD8 population. We can have a closer look at those samples.
nm = stats[,.(name,rZ = (prop - median(prop)) / mad(prop)),.(Population)][,.(outlier = abs(rZ) >
3),.(name,Population)][outlier == TRUE][order(name)][Population %in% "CD8+"]$name
ggcyto(tbdata[as.character(nm)],mapping = aes(x = "CD8"),subset = "CD3") +
geom_histogram(binwidth = 25) + geom_gate("CD8+")
We see the issue is one sample where the automated gate completely failed to find the CD8+ cells. For some reason they are really poorly resolved in this sample, even in the original manual gating.
We can decide to set a hard threshold at 1750 for the CD8+ cells. We adjust all the gates for the CD8 containing cell populations
for(n in nm){
mygate = getGate(tbdata[[n]],"CD8+") #extract a gates
mygate@min["<PerCP Cy55 Blue-A>"] = 1750 #adjust the min value for CD8
setGate(tbdata[[n]],"CD8+",mygate) #set the gate for the cell population
recompute(tbdata[[n]]) # recompute the statistics.
#again for the others
mygate = getGate(tbdata[[n]],"CD4-CD8+")
mygate@min["<PerCP Cy55 Blue-A>"]=1750
mygate@max["<PerCP Cy55 Blue-A>"]=Inf
setGate(tbdata[[n]],"CD4-CD8+",mygate)
mygate = getGate(tbdata[[n]],"CD4+CD8-")
mygate@max["<PerCP Cy55 Blue-A>"]=1750
setGate(tbdata[[n]],"CD4+CD8-",mygate)
mygate = getGate(tbdata[[n]],"CD4-CD8-")
mygate@max["<PerCP Cy55 Blue-A>"]=1750
setGate(tbdata[[n]],"CD4-CD8-",mygate)
mygate = getGate(tbdata[[n]],"CD4+CD8+")
mygate@min["<PerCP Cy55 Blue-A>"]=1750
setGate(tbdata[[n]],"CD4+CD8+",mygate)
recompute(tbdata[[n]],"CD4-CD8+")
recompute(tbdata[[n]],"CD4+CD8-")
recompute(tbdata[[n]],"CD4-CD8-")
recompute(tbdata[[n]],"CD4+CD8+")
}
plotGate(tbdata[as.character(nm)],c("CD4+CD8-","CD4-CD8+","CD4-CD8-","CD4+CD8+"),
xbin=256,margin=FALSE)
The above at least reflects what we expect from the biology, few double positive T-cells.
We’ll have a look at the proportion of cytokine positive cells in ESAT-6 and DMSO stimulated samples for each subject.
library(scales)
#custom hyperbolic arcsine transformation.
asinh_trans = function (c=800)
{
trans_new(name = "asinh", transform = function(x) asinh(x *
c), inverse = function(x) sinh(x)/c)
}
stats = merge(pData(tbdata),getPopStats(tbdata),by="name")
data.table::setDT(stats)
stats = stats[Parent=="CD4+CD8-"][,prop:=Count/ParentCount]
stats=reshape2::dcast(stats,PID+Population+Parent~Peptide)
ggplot(stats) + geom_boxplot(outlier.colour = NA) + geom_jitter(aes(fill =
Population),position = position_jitterdodge()) + aes(x = Population,y =
pmax(`ESAT-6` - DMSO,0),col = Population) + facet_wrap( ~ Parent,scales =
"free") + theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
scale_y_continuous("Background Corrected Proportion",trans =
"asinh") +
ggtitle("Marginal Background Corrected Proportions\nof Cytokine Producing CD4 Cells") +
scale_x_discrete("Marginal Cytokine")
There are some clear marginal responses in IFNg, IL2, IL17a. We can look at combinatorial subsets of cells using the COMPASS framework, which models antigen-specific responses jointly in all cell subsets.
COMPASS will test whether the ESAT-6 stimulation is greater than the DMSO stimulation for each subject and each cell subset.
It returns a probability of response.
library(COMPASS)
data = getSingleCellExpression(tbdata,nodes = c("IL2","IFNg","CD154","IL17a","IL4","TNF","IL22")) #single-cell expression of seven markers
counts = as.vector(getPopStats(tbdata,subpopulations = "CD4+CD8-",statistic =
"count")[,.(name,Count)]) #parent population cell counts
counts = as.vector(as.matrix(counts[,2,with = FALSE]))
names(counts) = names(data)
meta = pData(tbdata) #metadata
CC = COMPASSContainer(
data = data,counts = counts,meta = meta,individual_id = "PID",sample_id = "name"
)
set.seed(100)
compass_result = COMPASS(CC,treatment = Peptide == "ESAT-6",control = Peptide ==
"DMSO")
p = plot(compass_result,threshold=0.01)
ggplot(cbind(compass_result$data$meta,PFS = FunctionalityScore(compass_result))) +
geom_boxplot(outlier.colour = NA) + geom_jitter(aes(col = known_response)) +
aes(x = known_response,y = PFS) + theme_gray()
The differences between the known_response
variable and the calculated polyfunctionality scores are due to
Hopefully this provides you with a sufficient introduction to some of the new features and new API of openCyto.
Lin, Lin, Greg Finak, Kevin Ushey, Chetan Seshadri, Thomas R Hawn, Nicole Frahm, Thomas J Scriba, et al. 2015. “COMPASS identifies T-cell subsets correlated with clinical outcomes.” Nature Biotechnology, May.