Cytometry
data with ggCyto
ggCyto
package.The ggCyto
package was developed at the RGLab by Mike Jiang. It is new and under active development.
Here we aim to demonstrate some of its functionality.
By overloading ggplot
’s fortify
method, we make cytometry data fully compatible with ggplot
.
It usually doesn’t make sense to visualize 100’s or FCS samples, so we’ll subset the data for visualization, restricting ourselves to the first two subjects. We can use subset
on the GatingSet object to subset by pData
variables. This is pretty standard BioConductor
behavior.
# Subset the data for a demo of visualization.
ptids <- unique(pData(tbdata)[["PID"]])[1:2]
tbdata <- subset(tbdata, `PID` %in% ptids)
Rm("CD4",tbdata)
Furthermore, we’ll focus on the CD3+ cell subsets for this demonstration. These are extracted into a flowSet
.
# extract the CD3 population
fs <- getData(tbdata, "CD3")
The simplest way to visualize FCM data is via one-dimensional histograms or density plots. This is supported using the standard ggplot2
geom_xxxx
interface.
Here we specify that we want a histogram, and we map the aesthetic x
to the variable CD4
, which corresponds to the dimension/marker we want to plot.
ggCyto
automatically facets by the name
variable, which usually represents individual FCS files.
p <- ggcyto(fs, aes(x = CD4))
p1 <- p + geom_histogram(bins = 60)
p1
ggCyto
will show you the full range of the data, which is often more than the instrument range. We can restrict the range to the instrument range, using ggcyto_par_set
.
Valid values are data
and instrument
.
myPars <- ggcyto_par_set(limits = "instrument")
p1 + myPars
We can print the default parameter settings using ggcyto_par_default
.
# print the default settings
ggcyto_par_default()
## $limits
## [1] "data"
##
## $facet
## facet_wrap(name)
##
## $hex_fill
## continuous_scale(aesthetics = "fill", scale_name = "gradientn",
## palette = gradient_n_pal(colours, values, space), na.value = na.value,
## trans = "sqrt", guide = guide)
##
## $lab
## $labels
## [1] "both"
##
## attr(,"class")
## [1] "labs_cyto"
##
## attr(,"class")
## [1] "ggcyto_par"
Of course, other geometries are supported. geom_density
will generate a denstiy plot rather than a histogram.
p = p + geom_density() + geom_density(fill = "black") + myPars
p
As you saw, the default faceting is using the name
variable. But, any variable defined in the pData
slot of the flowSet
is valid.
kable(pData(fs))
Peptide | Stim | PID | EXPERIMENT NAME | name | known_response | |
---|---|---|---|---|---|---|
353385.fcs | DMSO | General | 01-0917 | 130517_TB-ICS_ACS_GP | 353385.fcs | non-responder |
353387.fcs | ESAT-6 | General | 01-0917 | 130517_TB-ICS_ACS_GP | 353387.fcs | non-responder |
353421.fcs | DMSO | General | 01-0996 | 130517_TB-ICS_ACS_GP | 353421.fcs | responder |
353423.fcs | ESAT-6 | General | 01-0996 | 130517_TB-ICS_ACS_GP | 353423.fcs | responder |
Here we facet by Peptide stimulation and known_response (which comes from previous analysis).
#change facetting (default is facet_wrap(~name))
p + facet_grid(known_response ~ Peptide)
The typical view of FCM data is using two-dimensional dot plots. Hexagonal binning is a popular and rapid was to view the data.
Again, axis limits need to be specified since by default ggcyto
will present all the data, includig outliers that have unusually large positive or negative values.
# 2d hexbin
p <- ggcyto(fs, aes(x = CD4, y = CD8)) + geom_hex(bins = 60) + ylim(c(-100,4e3)) + xlim(c(-100,3e3))
p
The default colour scale can be changed using scale_fill_gradient
.
For example, a color brewer scale using the PiYG
scale, with a square root transform of the counts.
p + scale_fill_gradientn(colours = brewer.pal(n=8,name="PiYG"),trans="sqrt")
Or grayscale.
p + scale_fill_gradient(trans = "sqrt", low = "gray", high = "black")
geom_density2d
behaves as expected.
ggcyto(fs, aes(x = CD4, y = CD8))+ geom_hex(bins = 60)+geom_density2d(colour = "black")+ylim(c(-100,4e3)) + xlim(c(-100,3e3))
It’s possible to plot gates on top of the data.
One way to do so is to extract the gate from the GatingSet
and add it explicitly.
# add geom_gate layer
p <- ggcyto(fs, aes(x = CD4, y = CD8)) + geom_hex(bins = 60) + ylim(c(-100,4e3)) + xlim(c(-100,3e3))
g <- getGate(tbdata, "CD4+")
p <- p + geom_gate(g)
p
Overlay statistics for cell populations.
# add geom_stats
p + geom_stats()
As before, but we can use a GatingSet object rather than a flowSet. Again, dimensions are mapped to aesthetics using marker names.
#use customized range to overwrite the default data limits
myPars <- ggcyto_par_set(limits = list(y = c(-100,4e3), x = c(-100,3e3)))
p <- ggcyto(tbdata, aes(x = CD4, y = CD8), subset = "CD3")
p <- p + geom_hex(bins = 64) + myPars
p
If we want to use marker names on the axes rather than channel and marker names, that is possible.
#only display marker on axis
p <- p + labs_cyto("marker")
p
When plotting gates, we don’t need to extract them explicitly as they’re part of the object.
One gate.
# add gate
p + geom_gate("CD4+CD8-")
Two gates.
# add two gates
p <- p + geom_gate(c("CD4+CD8-","CD4-CD8-"))
p
Overlay population statistics.
p + geom_stats()
Overlay population statistics for just one population.
# add stats just for one specific gate
p + geom_stats("CD4+CD8-")
Change the background color, style, and report the count rather than the percentage.
# change stats type, background color and position
p + geom_stats("CD4+CD8-", type = "count", size = 6, color = "white", fill = "black", adjust = 0.3)
As you can see there is a great deal of flexibility in using the ggplot2
interface to interact with FCM plots.
Say you want to plot the CD4 and CD8 cell populations, but don’t necessarily know the parent population.
To do this with ggCyto would look like:
#'subset' is ommitted
p <- ggcyto(tbdata, aes(x = CD4, y = CD8)) + geom_hex(bins = 64) + myPars + geom_gate(c("CD4+CD8-", "CD4-CD8-"))
p
We define the dimensions and the gates. There’s no need to specify the parent population. ggCyto will subset the parent population and plot the relevant events.
The subset
argument allows us to explicitly subset a parent population. When subset
is specified, ggCyto
plots all child populations.
Rm("CD8+",tbdata)
Rm("CD4+",tbdata)
p <- ggcyto(tbdata, aes(x = CD4, y = CD8), subset = "CD3") + geom_hex(bins = 64) + geom_gate() + geom_stats() + myPars
p
By default, ggCyto
plots the data in the transformed space (if it’s been transformed). For FCM data processed by flowJo, this is in [0,4096], or so-called channel space.
Because we store the data transformation, we can transform the axes and show the raw fluorescence intensities on the x and y axes using axis_x_inverse_trans
and axis_y_inverse_trans
.
p + axis_x_inverse_trans() + axis_y_inverse_trans()
ggcyto
objectWe have defined a ggcyto
object that delays transformation the data until it is plotted. This makes things a little faster as we don’t have to do any melting or reshaping of the underlying data until we need it.
The ggcyto
object is entirely ggplot2
compatible, in terms of adding layers and parameters.
class(p)
## [1] "ggcyto_GatingSet" "ggcyto_flowSet" "ggcyto"
## [4] "gg" "ggplot"
class(p$data)
## [1] "GatingSet"
## attr(,"package")
## [1] "flowWorkspace"
You can use as
to return a ggplot
object.
# To return a regular ggplot object
p <- as.ggplot(p)
class(p)
class(p$data) # it is fortified now
ggplot
directly on flowSet
objects:Please open issues and file bug reports or unexpected behaviour on our github page. http://github.com/RGLab/ggcyto