%\VignetteIndexEntry{Introduction to EBImage, an image processing and analysis toolkit for R} %\VignetteDepends{} %\VignetteKeywords{image processing, visualization} %\VignettePackage{EBImage} \documentclass[10pt,a4paper]{article} \RequirePackage{amsfonts,amsmath,amstext,amssymb,amscd} \usepackage{graphicx} \usepackage{verbatim} \usepackage{hyperref} \usepackage{color} \definecolor{darkblue}{rgb}{0.2,0.0,0.4} \topmargin -1.5cm \oddsidemargin -0cm % read Lamport p.163 \evensidemargin -0cm % same as oddsidemargin but for left-hand pages \textwidth 17cm \textheight 24.5cm \newcommand{\lib}[1]{{\mbox{\normalfont\textsf{#1}}}} \newcommand{\file}[1]{{\mbox{\normalfont\textsf{'#1'}}}} \newcommand{\R}{{\mbox{\normalfont\textsf{R}}}} \newcommand{\EBImage}{{\mbox{\normalfont\textsf{EBImage}}}} \newcommand{\Rfunction}[1]{{\mbox{\normalfont\texttt{#1}}}} \newcommand{\Robject}[1]{{\mbox{\normalfont\texttt{#1}}}} \newcommand{\Rpackage}[1]{{\mbox{\normalfont\textsf{#1}}}} \newcommand{\Rclass}[1]{{\mbox{\normalfont\textit{#1}}}} \newcommand{\code}[1]{{\mbox{\normalfont\texttt{#1}}}} \newcommand{\email}[1]{\mbox{\href{mailto:#1}{\textcolor{darkblue}{\normalfont{#1}}}}} \newcommand{\web}[2]{\mbox{\href{#2}{\textcolor{darkblue}{\normalfont{#1}}}}} %\usepackage[baseurl={http://www.ebi.ac.uk/~osklyar/EBImage/},pdftitle={EBImage - Image Processing Toolkit For R},pdfauthor={Oleg Sklyar},pdfsubject={EBImage},pdfkeywords={image processing},pagebackref,bookmarks,colorlinks,linkcolor=darkblue,citecolor=darkblue,pagecolor=darkblue,raiselinks,plainpages,pdftex]{hyperref} \SweaveOpts{keep.source=TRUE,eps=FALSE} \begin{document} \begin{figure} \begin{center} \scalebox{0.2}{\includegraphics{logo.png}} \end{center} \end{figure} %------------------------------------------------------------ \title{Introduction to \Rpackage{EBImage}, an image processing and analysis toolkit for R} %------------------------------------------------------------ \author{Oleg Sklyar, Wolfgang Huber\\\email{osklyar@ebi.ac.uk}} \maketitle \tableofcontents %%-------------------------------------------- \section{Introduction} %%-------------------------------------------- In this manual we demonstrate the \EBImage{} package. In creating this package, the application that we have in mind is the analysis of sets of images acquired in cell-based RNAi or compound screens with automated microscopy readout, and the extraction of numerical descriptors of the cells in these images. Relations between the descriptor values and biological phenotypes are sought. Some of the choices we have made in the design of the package reflect this application. This vignette tries to give a general overview of the package functionality, it does not provide a recipe for the analysis of cell-based RNAi screens. The package makes use of two external libraries for some of its functionality, \lib{ImageMagick} and \lib{GTK+}. In particular, \lib{ImageMagick} is used for image import and export, and we provide an interface to many of its image processing capabilities. \lib{GTK+} provides facilities for the display of and user interaction with images. Reliance on these external (to R and the EBImage package) libraries makes the installation of \EBImage{} a bit more tricky than than that of a self-contained package. Please see the vignette \textit{\Rpackage{EBImage} installation HOWTO} for hints on the installation. %%-------------------------------------------- \section{Importing and exporting images} %%-------------------------------------------- % <>= library("EBImage") @ % \EBImage{} provides two functions to read images, \Rfunction{readImage} and \Rfunction{chooseImage}. They both allow users to specify whether the result should be in grayscale or RGB mode. The \Rfunction{chooseImage} function provides a dialogue window that lets you interactively choose an image from the file system. It is available if the package was compiled with GTK+ support (this is always the case on Windows): % <>= x = chooseImage(TrueColor) @ % For programmatic use, the \Rfunction{readImage} function can read local files, or files on the internet via the HTTP or anonymous FTP protocols. Multiple files can be loaded into a single object, an image stack. If the source image has multiple frames they all will be read into a stack. Images and image stacks are stored in objects of class \code{Image}: % <>= imgdir = file.path(system.file(package="EBImage"), "images") fG = dir(imgdir, pattern="_G.tif", full.names=TRUE) iG = readImage(fG[1], Grayscale) class(iG) dim(iG) fR = dir(imgdir, pattern="_R.tif", full.names=TRUE) iR = readImage(fR[1]) @ % The images are 16-bit grayscale TIFFs and were measured at two different wavelengths corresponding to two different stainings: DNA content measured in the green channel (suffix G) and a cytoplasmic protein in red (suffix R). Images can be read from remote URLs as in the following example: % <>= baseurl = "http://www.ebi.ac.uk/~osklyar/BioC2007/data" a = readImage(paste(baseurl, c("Gene1_R.tif", "Gene2_R.tif"), sep="/")) @ The \Rfunction{writeImage} function can be used to write images to files. % <>= outdir = tempdir() writeImage(iR, file.path(outdir, "test.png")) compression(iR) writeImage(iR[,,1], file.path(outdir, "test.jpg"), quality=90) file.info(dir(outdir, full.names=TRUE))[,c(1,5)] @ %% FIXME: %% - the default for 'quality' should be 1, not 0.95, and should be part %% of the function signature (because this is where users expect to see %% the default value of function arguments). %% - why is compression a slot of the Image class? It does not %% belong there. It should be an argument of the writeImage function. %%------------------------------------------------------ \section{Displaying images and accessing the image data} %%------------------------------------------------------ The prefered method for displaying objects of class \Rclass{Image} is the \Rfunction{display} function. % <>= display(iR) animate(iR) @ % % FIXME: Explain what 'animate' does % \Rfunction{display} and \Rfunction{animate} create their own windows, using \lib{GTK+} or \lib{ImageMagick} functions. Sometimes it may also be useful to use the \Rfunction{image} method for the \Rclass{Image} class. This method plots the image on the current R device. % <>= image(iR[,,1]) @ % One can access the data in the same way as for a 3-dimensional array, where the first two dimensions correspond to $x-$ and $y-$directions and applications can use the third dimension for the $z$-direction, for different channels, for a time covariate, or indeed for any other experimental covariate. Alternatively, the size of the third dimension can be set to 1, which corresponds to a simple 2-dimensional image. The \Rclass{Image} class contains (inherits from) the base R class \Rclass{array}, and one can use all the statistical summaries and plotting functions for arrays. % <>= is(iG, "array") @ <>= stopifnot(is(iG, "array")) @ % For example, for the image stack \code{iR}, which consists of 4 images, we can draw histogrammes of the intensities in each image (see Figure~\ref{figure:histsbefore}). % <>= par(mfrow=c(2,2)) for(i in 1:4) hist(iR[,,i], breaks=20, xlim=c(0,1)) @ % \begin{figure}[tp] \begin{center} \includegraphics[width=0.7\textwidth]{AnalysisWithEBImage-hists} \caption{\label{figure:histsbefore}% Histogrammes of intensities of the four images in \file{Gene1\_{}R.tif}. Note that, in general, histogrammes may be subject to binning artifacts, and smooth density estimates (as for example provided by the \Rfunction{density} function from the \Rpackage{stats} package or the \Rfunction{multidensity} function from the \Rpackage{geneplotter} package) are often more appropriate.} \end{center} \end{figure} % Try additionally the following commands to explore the structure of the data: % <>= dim(iR) range(iR) print(iR) str(iR) @ % Try to do the same for \code{iG}, the images of the DNA content. %%-------------------------------------------- \section{Image processing} %%-------------------------------------------- One often distiguishes between image processings, operations that map images into transformed images, and image analysis, operations that extract features of interest or compute numerical or categorical summaries. One of the simplest image processing operations is \emph{normalization} of the image intensities, i.\,e.\ adjusting the intensities from different images such that they are on the same range, \begin{equation} x\mapsto \frac{x-b}{s} \end{equation} This is provided by the \Rfunction{normalize} function. The default target range is $[0,1]$, which corresponds to $b=\min\{x\}$, $s=\max\{x\}-\min\{x\}$. The minimum and maximum can be computed either for each image separately, or simultaneously for all images in a stack. Let us try the following two normalizations and compare the results. % <>= iRnns = normalize(iR, separate=FALSE) apply(iRnns, 3, range) iRn = normalize(iR, separate=TRUE) apply(iRn, 3, range) iGn = normalize(iG, separate=TRUE) @ % More generally, on might also normalize not on range but on other distribution properties such as interquantile ranges or other measures of location and scale. Functions from \lib{ImageMagick} are provided to manipulate images, such as \Rfunction{enhance}, \Rfunction{contrast}, \Rfunction{despeckle} and \Rfunction{denoise}. Try them out on the images in \code{iRn}. Such manipulations can be useful for visualisation, but their results are likely not appropriate for quantitative analyses. Sometimes it is useful apply non-linear transformations. When displaying a grayscale image, this can improve contrast in regions of high interest. For example, a power transformation $x\mapsto x^{\alpha}$ with $0<\alpha<1$ will enhance contrast at lower intensities and suppress contrast at higher intensities. Transformations are sometimes also used in model-based data anaylses. For example, when the noise component of a signal has a constant coefficient of variation, rather than a constant standard deviation, then a logarithimc transformation is often convenient. Box-Cox type transformations have many uses in regression analysis. Let us define several functions for transforming the range $[0,1]$ into itself. % <>= modif1 = function(x) sin((x-1.2)^3)+1 modif2 = function(x, s) (exp(-s*x) - 1) / (exp(-s) - 1) modif3 = function(x) x^0.5 @ % The graphs of these functions are shown in Figure~\ref{figure:modif}. % <>= x = (0:100)/100 plot(x, x, type="l", xlim=c(0,1), ylim=c(0,1), col="gray", lwd=1, lty=2, xlab="source intensity", ylab="target intensity") lines(x, modif2(x,4), col="#10508B", lwd=2) text(0.3, 0.83, "modif2(s=4)", col="#10508B") lines(x, modif2(x,-4), col="#4E82B5", lwd=2) text(0.7, 0.16, "modif2(s=-4)", col="#4E82B5") lines(x, sqrt(x), col="#24A072", lwd=2) text(0.2, 0.37, "sqrt") lines(x, x^1.5, col="#421C80", lwd=2) text(0.52, 0.3, "modif3", col="#421C80") lines(x, modif1(x), col="red", lwd=2) text(0.52, 0.6, "modif1", col="red") @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{AnalysisWithEBImage-functions} \caption{\label{figure:modif}Non-linear transformations of the range $[0,1]$ onto itself.} \end{center} \end{figure} Transformations of image size and orientation can be performed using \Rfunction{resize}, \Rfunction{rotate} etc. % FIXME - please not 'etc.' this not very helpful to anyone. Either say % say what you mean or say nothing. The following increases the size of the image by a factor of 1.3 or rotates it by 15 degrees: % <>= a1 = resize(iRn, dim(iRn)[1]*1.3) a2 = rotate(iRn, 15) @ <>= display(a1) display(a2) @ % --- I don't like this: ---- %We will use \code{modif1} to improve the contrast and proceed as %follows. The modification used (if any) is problem specific and is %determined by the quality of original images: % % <>= % iGn = modif1(iGn) % iRn = modif1(iRn) % @ % % The histogrammes after the normalization and the above % transformation are given in Figure~\ref{figure:histsafter}: % <>= % split.screen(c(2,2)) % for ( i in 1:4 ) { % screen(i) % hist(iRn[,,i], xlim=c(0,1)) % } % close.screen(all=TRUE) % @ % \begin{figure}[htbp] % \begin{center} % \includegraphics[width=0.7\textwidth]{03.pdf} % \caption{\label{figure:histsafter} Histogrammes of % individual images in \file{Gene1\_{}R.tif} after the normalization.} % \end{center} % \end{figure} %%-------------------------------------------- \section{Colour modes} %%-------------------------------------------- For visual representations, images can be converted between grayscale and true colour modes. A grayscale image can also be converted into one of the RGB channels if required, and channels can be added together as in the following example. % <>= iRG = channel(iRn, "asred") + channel(iGn, "asgreen") @ % <>= display(iRG) display(channel(iRG, "gray")) display(channel(iRG, "asred")) @ <>= writeImage(iRG[,,1], "iRG.png") @ % The image \Robject{iRG} is shown in Figure~\ref{figure:screenshot}. % \begin{figure}[htbp] \begin{center} \includegraphics[width=0.49\textwidth]{iRG.png} \caption{\label{figure:screenshot}The image \Robject{iRG}, a false-colour representation of the data in the two channels \Robject{iRG} and \Robject{iG}.} \end{center} \end{figure} % The \Rfunction{channel} function can also be used to convert vectors containing colour data from one format to another. Grayscale to RGB integers: % <>= ch = channel(c(0.2, 0.5), "rgb") ch sprintf("%X", ch) @ \noindent Grayscale to X11 hexadecimal color strings: <>= channel(c(0.2, 0.5), "x11") @ \noindent Color strings to RGB: <>= channel(c("red","green","#0000FF"), "rgb") @ \noindent 24-bit RGB integers to grayscale: <>= channel(as.integer(c(0x0000f0, 0x00f000, 0xf0000, 0x808080)), "gray") @ %%------------------------------------------------------- \section{Creating images and further data manipulation} %%------------------------------------------------------- Images can be created either using the default constructor \Rfunction{new} for class \code{Image} or using a wrapper function, \Rfunction{Image}: % <>= a = Image(runif(200*100), c(200,100)) @ <>= writeImage(a, "AnalysisWithEBImage-random.png") @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{AnalysisWithEBImage-random} \caption{\label{figure:imx} An image of uniform random numbers.} \end{center} \end{figure} Simple data manipulations can be performed by subsetting. For example, the following lines represent simple thresholding: <>= a = iRn a[ a > 0.6 ] = 1.0 a[ a <= 0.6] = 0.0 @ \noindent On a grayscale image the values of $0$ and $1$ in the above example create a black-and-white mask. If now we want to mark the background e.g. in blue and foreground in red, this can be achieved as follows: <>= b = channel(a, "rgb") b[ a >= 0.1] = channel("red", "rgb") b[ a < 0.1 ] = channel("#114D90", "rgb") @ <>= display(b) @ <>= writeImage(b[,,1], "06.png") rm(a,b,x) gc() @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{06.png} \caption{\label{figure:marked} Colour-marked binary mask.} \end{center} \end{figure} %%-------------------------------------------- \section{Image segmentation and image analysis} %%-------------------------------------------- The purpose of segmentation is to mask the objects of interest from the background prior to identifying them. The quality of the segmentation will generally define the quality of the subsequent object indexing and feature extraction. We need something better than the mask in Figure~\ref{figure:marked}. For further object indexing We will make use of the fact that we have two images corresponding to the same location of the microscope -- one of the nuclei staining and one for the cytoplasm protein. Assuming that every cell has a nucleus we will use indexed nuclei (after segmentation, indexing and feature extraction) to index cells. Therefore, we start start with segmenting nuclei, images \code{iG}. The function \Rfunction{thresh} provides an implementation of an adaptive threshold filter that takes into account inequalities in background intensity across the image. For \code{iG} the segmented image can be obtained as follows: <>= mask = thresh(iGn, 15, 15, 0.002) @ The parameters \Robject{w}, \Robject{h} of the function are related to the size of the objects we expect to find in the image: objects of different size would require adjustment of these parameters. The \Robject{offset} is determined by the local intensity differences. Try using different parameters and compare the segmentation. The quality of segmentation is vital for good quality of object indexing and feature extraction, therefore it worth to spend time tuning the parameters. For comparable results, images across the experiment should be segmented using the same parameter set. This might lead to artifacts in segmentation in some cases, but will ensure that same types of cells look similar in different images! The result of the above segmentation is shown in Figure~\ref{figure:nt1}. <>= writeImage(mask[,,1], "07.png") @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{07.png} \caption{\label{figure:nt1} Preliminary nuclei segmentation.} \end{center} \end{figure} Some further smoothing of the mask is necessary. A useful set of instruments for this is provided by \emph{mathematical morphology} implemented in the morphological operators \Rfunction{dilate}, \Rfunction{erode}, \Rfunction{opening} and \Rfunction{closing}: <>= mk3 = morphKern(3) mk5 = morphKern(5) mask = dilate(erode(closing(mask, mk5), mk3), mk5) @ <>= writeImage(mask[,,1], "08.png") @ \noindent Here, several operators were used sequentially. You can observe the results of each of these operators separately by looking at the intermediate images. You can also try different kernels, i.\,e.\ different parameters for the function \Rfunction{morphKern}. The current result is shown in Figure~\ref{figure:nt2}. \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{08.png} \caption{\label{figure:nt2} Nuclei segmentation after smoothing and noise removal.} \end{center} \end{figure} As the next step, one needs to index regions in the segmented image that correspond to the different objects. A classic algorithm for this is computing the distance map transform followed by the \code{watershed} transform (see Figure~\ref{figure:wsn1}): <>= sG = watershed( distmap(mask), 1.5, 1) @ <>= writeImage(normalize(sG[,,1]), "09.png") @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{09.png} \caption{\label{figure:wsn1} Nuclei segmentation by watershed (before artifact removal).} \end{center} \end{figure} Finally, when we are happy with the result of the watershed transform, we can remove nuclei that are either too small or too dark or fall on the edge of the images, etc (see Figure~\ref{figure:wsn2}): <>= ft = hullFeatures(sG) ## need these for edge and perimeter mf = moments(sG, iGn) ## need these for intensity and size for ( i in seq_along(ft) ) ft[[i]] = cbind(ft[[i]], mf[[i]]) sG = rmObjects(sG, lapply(ft, function(x) which(x[,"h.s"] < 150 | x[,"h.s"] > 10000 | x[,"int"] < 30 | 0.4 * x[,"h.p"] < x[,"h.edge"] ) )) @ <>= writeImage(normalize(sG[,,1]), "10.png") @ \noindent here \code{h.s} and \code{h.p} stand for the hull size and perimeter, \code{h.edge} for the number of pixels at the image edge and \code{int} for the intensity of the region (as returned by \Rfunction{hullFeatures} and \Rfunction{moments}). Investigate the structure of \code{ft} and \code{mf} and explain what kind of objects were removed. What we have finally obtained is an \code{IndexedImage} for the nuclei, where each nucleus is given an index from $1$ to \code{max(sG)}. One can now directly use functions like \Rfunction{getFeatures} or \Rfunction{moments} etc.\ to obtain numerical descriptors of each nucleus. \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{10.png} \caption{\label{figure:wsn2} Nuclei segmentation by watershed (after artefact removal).} \end{center} \end{figure} In principle, the same distance-map/watershed algorithm could be used to segment the cells, however we often find that neighbouring cells are touching and lead to segmentation errors. We can use the already identified nuclei as seed points to detect corresponding cells -- assuming that each cell has exactly one nucleus. This method falls short of detecting multi-nuclear cells, but it improves the quality of detection for all other cells tremendously. We start similarly to the nuclei segmentation, however instead of using \code{watershed}, we use \code{propagate}, supplying it with an \code{IndexedImage} of the seed points (nuclei). The function implements an elegant algorithm that produces a Voronoi segmentation using a metric that combines Euclidean distance and intensity differences between different pixels in the image: <>= mask = thresh(blur(iRn,4,1.5), 25, 25, 0.005) mask = erode( erode( dilate(mask,mk5), mk5), mk3 ) sR = propagate( iRn, sG, mask, 1e-5, 1.6) @ <>= writeImage(normalize(sR[,,1]), "11.png") @ \noindent A weighting factor is used in propagate to either give more weight to the Euclidean distance or otherwise to the intensity-driven one. We use a very low value of $1e-5$ basically minimizing the effect of the Euclidean. Also please note that we used the \Rfunction{blur} filter to obtain the original mask. In case of cells we use seed points and we know where the cells are, therefore the mask we use is larger and smoother to accommodate more tiny settled changed in the overall image of every individual cell. The result is shown in Figure~\ref{figure:cells}. \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{11.png} \caption{\label{figure:cells} Cell segmentation by the \Rfunction{propagate} function (before artifact removal).} \end{center} \end{figure} Again, some artifacts need to be removed. In all consequent computations, the number of objects in an indexed image (as obtained from \Rfunction{watershed} or \Rfunction{propagate}) is determined by finding the maximum value. Consider that this value is $N$ for \code{sR}. If the image contains pixels indexed with $N$, but is missing pixels with some other indexes, smaller than $N$, the corresponding objects will be identified with $0$ data, first of all $0$ size. $N$ can be smaller than the original number of nuclei as it could happen that for some nuclei no cells were identified. There can be many reasons for this: cells masked out or too small or too dark etc. In order to preserve the 1-t-1 match of nuclei to cells, \code{max(sG)} must be equal $N$, so we mask out all nuclei with indexes larger than $N$: <>= for ( i in 1:dim(sR)[3] ) { x = sG[,,i] x[ x > max(sR[,,i]) ] = 0.0 sG[,,i] = x } @ Now as we ensured the 1-to-1 match of nuclei to cells, we can remove cells that are too small or too large to be plausible, are on the edge of the image, are too dark, etc. We also remove the corresponding nuclei (the result is given in Figure~\ref{figure:wsn3}): <>= ft = hullFeatures(sR) mf = moments(sR, iRn) for ( i in seq_along(ft) ) ft[[i]] = cbind(ft[[i]], mf[[i]]) index = lapply(ft, function(x) which( x[,"h.s"] < 150 | x[,"h.s"] > 15000 | x[,"int"]/x[,"h.s"] < 0.1 | 0.3 * x[,"h.p"] < x[,"h.edge"] )) sR = rmObjects(sR, index) sG = rmObjects(sG, index) @ \noindent See above for the notations of the column names in \code{x}. <>= writeImage(normalize(sR[,,1]), "12.png") @ \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{12.png} \caption{\label{figure:wsn3} Cell segmentation by propagate (after artefact removal).} \end{center} \end{figure} Finally, having the indexed images for cells and nuclei, the full set of descriptors can be extracted using the \Rfunction{getFeatures} function: <>= sG = getFeatures(sG, iGn) sR = getFeatures(sR, iRn) nucl = do.call("rbind", features(sG)) cells = do.call("rbind", features(sR)) stopifnot(identical(dim(nucl), dim(cells))) @ The resulting matrices have \Sexpr{nrow(nucl)} rows (one for each of cell/nucleus) and \Sexpr{ncol(nucl)} columns (one for each object descriptor). You can now try out the following visualisations, with the first one shown in Figure~\ref{figure:done}: <>= rgb = paintObjects(sR, iRG) rgb = paintObjects(sG, rgb) ct = tile(stackObjects(sR, iRn)) nt = tile(stackObjects(sG, iGn)) @ <>= writeImage(rgb[,,1], "13.png") @ The result is shown in Figure~\ref{figure:done}. \begin{figure}[htbp] \begin{center} \includegraphics[width=0.5\textwidth]{13.png} \caption{\label{figure:done} Results of detection.} \end{center} \end{figure} \end{document}