\section{Convert data from plate order to multi-dimensional SGI-arrary}\label{sec:convert} \subsection{Preliminaries} Load the package \Biocexptpkg{HD2013SGI}. <>= library("HD2013SGI") @ % Load screening data in workspace. <>= data("featuresPerWell", package="HD2013SGI") @ % After image segmentation and feature extraction, a feature vector is available for each image field. The feature data are represented as a 2 dimensional table with extension \Sexpr{nrow(featuresPerWell$data)} image fields $\times$ \Sexpr{ncol(featuresPerWell$data)} phenotypic features. There are \Sexpr{length(unique(featuresPerWell$Anno$plate))} plates in the screen. On those plates, \Robject{NROW} rows and \Robject{NCOL} columns were used, and \Robject{NFIELD} fields were imaged per well: <>= NROW = 15 NCOL = 23 NFIELD = 4 @ In this section, these data are rearranged into a 6-dimensional array with one dimension each for target designs, features, and replicatess. An output directory is created where the data array will be placed in. <>= dir.create(file.path("result","data"), recursive=TRUE, showWarnings=FALSE) @ \subsection{Parse plate barcodes to annotate the plates} Plate barcodes and plate numbers are extracted from the screen annotation. The plate annotation is summarized in a data frame. <>= plates = featuresPerWell$Anno[seq(1, nrow(featuresPerWell$Anno), by=NFIELD*NCOL*NROW),"plate"] PlateAnnotation = HD2013SGI:::parsePlateBarcodes(plates) head(PlateAnnotation) @ The names of all query genes, siRNA designs, and replicates are extracted for all sample experiments. <>= S = which(PlateAnnotation$queryGroup=="sample") tdnames = unique(PlateAnnotation$targetDesign[S]) qnames = unique(PlateAnnotation$queryGene[S]) qdnames = unique(PlateAnnotation$queryDesign[S]) repnames = unique(PlateAnnotation$replicate[S]) @ \subsection{Reorder data} In this step, we create the array \Robject{D} and fill it with the data. The target genes are spread over the three dimensions \Robject{field}, \Robject{col} and \Robject{row}. The mean of the measurements in the \Sexpr{NFIELD} fields per well is taken. <>= D = array(0, dim=c(field=NFIELD, col=NCOL, row=NROW, features=dim(featuresPerWell$data)[2], targetDesign=length(tdnames), query=length(qnames), queryDesign=length(qdnames), replicate=length(repnames))) dimnames(D) = list(field=seq_len(NFIELD), col=seq_len(NCOL), row=LETTERS[seq_len(NROW)+1], features=dimnames(featuresPerWell$data)[[2]], targetDesign=tdnames, queryGene=qnames, queryDesign=qdnames, replicate=repnames) for (td in tdnames) { for (q in qnames) { for (qd in qdnames) { for (r in repnames) { plate = PlateAnnotation$plate[ which((PlateAnnotation$targetDesign == td) & (PlateAnnotation$queryGene == q ) & (PlateAnnotation$queryDesign == qd) & (PlateAnnotation$replicate == r )) ] I = which(featuresPerWell$Anno$plate == plate) D[,,,,td,q,qd,r] = as.vector(featuresPerWell$data[I,]) } } } } D[is.na(D)] = 0 D = (D[1,,,,,,,] + D[2,,,,,,,] + D[3,,,,,,,] + D[4,,,,,,,])/4 # faster than D = apply(D,2:8,mean,na.rm=TRUE) dim(D) @ % The dimensions are reordered, and row and column dimensions are merged into a single dimension for target genes. The data is saved to disk. % <>= D = aperm(D,c(1,2,4,5,6,3,7)) dn = dimnames(D) dim(D) = c(prod(dim(D)[1:2]),dim(D)[3:7]) dimnames(D) = c(list(targetGene = sprintf("%s%d",rep(LETTERS[seq_len(NROW)+1],each=NCOL), rep(seq_len(NCOL),times=NROW))), dn[3:7]) datamatrixfull = list(D = D) save(datamatrixfull, file=file.path("result","data","datamatrixfull.rda")) @ % The raw data are now represented in the \Sexpr{length(dim(D))}-dimensional array \Robject{D} with dimensions \begin{center} \begin{tabular}{lrrl} & \Sexpr{dim(datamatrixfull$D)[1]} & target genes \\ $\times$ & \Sexpr{dim(datamatrixfull$D)[2]} & siRNA target designs \\ $\times$ & \Sexpr{dim(datamatrixfull$D)[3]} & query genes \\ $\times$ & \Sexpr{dim(datamatrixfull$D)[4]} & siRNA query designs \\ $\times$ & \Sexpr{dim(datamatrixfull$D)[5]} & phenotypic features\\ $\times$ & \Sexpr{dim(datamatrixfull$D)[6]} & biological replicates \end{tabular} \end{center} A precomputed version of the \Robject{datamatrix} is available from the package and can be loaded by \Robject{data(datamatrix, package="HD2013SGI")}.