Contents

1 Installation

The sosta package can be installed from Bioconductor as follows:

if (!requireNamespace("BiocManager")) {
    install.packages("BiocManager")
}
BiocManager::install("sosta")

2 Setup

For this vignette, we will need several additional packages:

library("sosta")
library("dplyr")
library("tidyr")
library("ggplot2")
library("sf")
library("SpatialExperiment")

theme_set(theme_bw())

3 Introduction

As an example, we will load an simulated dataset with three images and three cell types A, B and C, which are stored in a SpatialExperiment object:

# load the data
data("sostaSPE")
sostaSPE
#> class: SpatialExperiment 
#> dim: 0 5641 
#> metadata(0):
#> assays(0):
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(3): cellType imageName sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = cellType)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal()

The goal is to reconstruct, or segment, the structures given by cell type A.

4 Structure reconstruction

4.1 Reconstruction of structures for one image

In this example, we will reconstruct cellular “structures” based on the point pattern density of the cells of type A. We will start with estimating parameters that are used for reconstruction afterwards. For one image, this can be illustrated as follows:

shapeIntensityImage(
    sostaSPE,
    marks = "cellType",
    imageCol = "imageName",
    imageId = "image1",
    markSelect = "A"
)

We see the pixel-wise density image on the left and a histogram of the intensity values on the right. The estimated threshold is roughly the mean between the two modes of the (truncated) pixel intensity distribution. The dimensions of the pixel image are specified on the bottom left. The dimensions correspond to the density image, setting a higher value for the dim parameter will result in a higher resolution density image but will not change how many points are within a pixel. This can be changed by varying the smoothing parameter (bndw).

This was done for one image. The function estimateReconstructionParametersSPE returns two plots with the estimated parameters for each image. The parameter bndw is the bandwidth parameter that is used for estimating the intensity profile of the point pattern. The parameter thres is the estimated parameter for the density threshold for reconstruction.

n <- estimateReconstructionParametersSPE(
    sostaSPE,
    marks = "cellType",
    imageCol = "imageName",
    markSelect = "A",
    plotHist = TRUE
)

We will use the mean of the two estimated vectors as our parameters.

(thresSPE <- mean(n$thres))
#> [1] 0.04467379
(bndwSPE <- mean(n$bndw))
#> [1] 3.804571

We can now use the function reconstructShapeDensity to segment the cell-type-A structure into regions, the result of which is sf polygon (Pebesma 2018).

(struct <- reconstructShapeDensityImage(
    sostaSPE,
    marks = "cellType",
    imageCol = "imageName",
    imageId = "image1",
    markSelect = "A",
    bndw = bndwSPE,
    dim = 500,
    thres = thresSPE
))
#> Simple feature collection with 3 features and 0 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 0.1061074 ymin: 0.1104626 xmax: 127.8393 ymax: 127.9601
#> CRS:           NA
#>                     sostaPolygon
#> 1 POLYGON ((3.938104 113.6409...
#> 2 POLYGON ((96.41697 125.6588...
#> 3 POLYGON ((47.11194 127.4487...

We can then plot both the points and the segmented polygon.

cbind(
    colData(sostaSPE[, sostaSPE$imageName == "image1"]),
    spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = cellType)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal() +
    geom_sf(
        data = struct,
        fill = NA,
        color = "darkblue",
        inherit.aes = FALSE, # this is important
        linewidth = 1
    )
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

If no arguments are given, the function reconstructShapeDensityImage estimates the reconstruction parameters internally. The bandwidth is estimated using cross validation using the function bw.diggle of the spatstat.explore package. The threshold is estimated by taking the mean between the two modes of the pixel intensity distribution as illustrated above.

struct2 <- reconstructShapeDensityImage(
    sostaSPE,
    marks = "cellType",
    imageCol = "imageName",
    imageId = "image1",
    markSelect = "A",
    dim = 500
)
cbind(
    colData(sostaSPE[, sostaSPE$imageName == "image1"]),
    spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = cellType)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal() +
    geom_sf(
        data = struct2,
        fill = NA,
        color = "darkblue",
        inherit.aes = FALSE, # this is important
        linewidth = 1
    )
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

4.2 Reconstruction of structures for all images

The function reconstructShapeDensitySPE reconstructs the cell-type-A structure for all images in the spe object. We use the estimated parameters from above.

allStructs <- reconstructShapeDensitySPE(
    sostaSPE,
    marks = "cellType",
    imageCol = "imageName",
    markSelect = "A",
    bndw = bndwSPE,
    thres = thresSPE,
    nCores = 1
)

4.3 Intersection with cells

Next, we assign cells in the SingleCellExperiment object to their specific structure.

assign <- assingCellsToStructures(sostaSPE, allStructs,
    imageCol = "imageName", nCores = 1
)
sostaSPE$structAssign <- assign
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = structAssign)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal()

5 Structure level metrics

5.1 Proportion of cell types within structures

Using the function cellTypeProportions, we can estimate the proportion of cell types witin each individual structure.

cellTypeProportions(sostaSPE, "structAssign", "cellType")
#>                  A          B          C
#> image1_2 0.8302326 0.08837209 0.08139535
#> image1_1 0.7996255 0.12172285 0.07865169
#> image1_3 0.8333333 0.12500000 0.04166667
#> image2_3 0.8770764 0.07475083 0.04817276
#> image2_2 0.8609626 0.09447415 0.04456328
#> image3_7 0.8088235 0.10160428 0.08957219
#> image3_2 0.6200000 0.28000000 0.10000000
#> image3_5 0.8421053 0.10526316 0.05263158
#> image3_6 0.7307692 0.19230769 0.07692308
#> image3_4 0.7894737 0.21052632 0.00000000
#> image3_1 0.8000000 0.20000000 0.00000000
#> image3_3 1.0000000 0.00000000 0.00000000

5.2 Shape Metrics

The function totalShapeMetrics calculates a set of metrics related to the shape of the structures

(shapeMetrics <- totalShapeMetrics(allStructs))
#>                  image1_1     image1_2    image1_3  image2_1     image2_2
#> Area         4250.2792228 3574.6477409 226.9959779 1.3752774 4295.1222139
#> Compactness     0.1346764    0.2519490   0.6080376 0.4581489    0.1953486
#> Eccentricity    0.7875707    0.4641678   0.8363778 0.9999739    0.5647329
#> Circularity     0.4723775    0.5970360   0.8846882 0.5943689    0.4862921
#> Solidity        0.5963449    0.8051348   0.9769469 0.8936170    0.6614292
#> Curl            0.6224990    0.3932054   0.2617308 0.3922730    0.4574191
#> fibreLength   300.7421880  192.5589303  25.2605053 2.5265850  245.3105698
#> fibreWidth     14.1326338   18.5639157   8.9862010 0.5443226   17.5089162
#>                  image2_3   image3_1    image3_2   image3_3     image3_4
#> Area         4746.6716180 51.8620571 417.5778496 12.6875650 145.71079859
#> Compactness     0.2150477  0.5718048   0.6331764  0.6342179   0.35221739
#> Eccentricity    0.4239889  0.5710073   0.7120634  0.9381917   0.27046967
#> Circularity     0.5075727  0.7502777   0.8892681  0.9053567   0.45173303
#> Solidity        0.7430950  0.9653074   0.9832153  0.9440389   0.96512887
#> Curl            0.4753033  0.1632771   0.1889006  0.2827801   0.09665593
#> fibreLength   243.8668860 12.8415084  32.7783740  5.7028999  31.41216792
#> fibreWidth     19.4641909  4.0386266  12.7394315  2.2247567   4.63867374
#>                 image3_5    image3_6     image3_7
#> Area         186.3240867 234.8507530 6621.3393593
#> Compactness    0.6428357   0.5221272    0.1752898
#> Eccentricity   0.8736600   0.7688595    0.7542569
#> Circularity    0.9288492   0.7443243    0.5683033
#> Solidity       0.9787015   0.9320010    0.6863814
#> Curl           0.2514767   0.2936578    0.6032739
#> fibreLength   21.5160903  29.6774666  324.0514214
#> fibreWidth     8.6597558   7.9134367   20.4329897

cbind(allStructs, t(shapeMetrics)) |>
    ggplot() +
    geom_sf(aes(fill = Area)) +
    facet_wrap(~imageName)

6 Cell level metrics

6.1 Distance to structure border

Using the function minBoundaryDistances, we can compute the distance to the border structure. Negative values indicate that the points lie inside the structure.

sostaSPE$minDist <- minBoundaryDistances(
    sostaSPE, "imageName",
    "structAssign", allStructs
)

cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = minDist)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal() +
    scale_colour_gradient2() +
    geom_sf(
        data = allStructs,
        fill = NA,
        inherit.aes = FALSE
    ) +
    facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

This information can be used to define border cells by thresholding to a range of positive and negative values.

sostaSPE$border <- ifelse(abs(sostaSPE$minDist) < 3, TRUE, FALSE)


cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = border)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal() +
    geom_sf(
        data = allStructs,
        fill = NA,
        inherit.aes = FALSE
    ) +
    facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

Alternatively, borders can be defined using st_difference and st_buffer on the structure object. In this case we will have an sf polygon that correspond to the border region.

borders <- lapply(
    st_geometry(allStructs),
    \(x) st_difference(st_buffer(x, 3), st_buffer(x, -3))
) |>
    # both needed for proper conversion
    st_as_sfc() |> st_as_sf()

borders$imageName <- allStructs$imageName
borders$borderID <- paste0("border_", allStructs$structID)

sostaSPE$borderSf <- assingCellsToStructures(sostaSPE,
    borders,
    imageCol = "imageName",
    uniqueId = "borderID",
    nCores = 1
)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
    as.data.frame() |>
    ggplot(aes(x = x, y = y, color = borderSf)) +
    geom_point(size = 0.25) +
    facet_wrap(~imageName) +
    coord_equal() +
    geom_sf(
        data = borders,
        fill = NA,
        inherit.aes = FALSE
    ) +
    facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

7 Session Info

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ggfortify_0.4.17            imcdatasets_1.15.0         
#>  [3] cytomapper_1.19.0           EBImage_4.49.0             
#>  [5] tidyr_1.3.1                 sosta_0.99.5               
#>  [7] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.2
#>  [9] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#> [11] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
#> [13] IRanges_2.41.3              S4Vectors_0.45.4           
#> [15] MatrixGenerics_1.19.1       matrixStats_1.5.0          
#> [17] sf_1.0-19                   lmerTest_3.1-3             
#> [19] lme4_1.1-36                 Matrix_1.7-3               
#> [21] ggplot2_3.5.1               ExperimentHub_2.15.0       
#> [23] AnnotationHub_3.15.0        BiocFileCache_2.15.1       
#> [25] dbplyr_2.5.0                BiocGenerics_0.53.6        
#> [27] generics_0.1.3              dplyr_1.1.4                
#> [29] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.6.0           later_1.4.1             bitops_1.0-9           
#>   [4] filelock_1.0.3          tibble_3.2.1            svgPanZoom_0.3.4       
#>   [7] polyclip_1.10-7         lifecycle_1.0.4         Rdpack_2.6.3           
#>  [10] lattice_0.22-6          MASS_7.3-65             magrittr_2.0.3         
#>  [13] sass_0.4.9              rmarkdown_2.29          jquerylib_0.1.4        
#>  [16] yaml_2.3.10             httpuv_1.6.15           sp_2.2-0               
#>  [19] spatstat.sparse_3.1-0   DBI_1.2.3               minqa_1.2.8            
#>  [22] RColorBrewer_1.1-3      abind_1.4-8             purrr_1.0.4            
#>  [25] RCurl_1.98-1.16         rappdirs_0.3.3          GenomeInfoDbData_1.2.14
#>  [28] ggrepel_0.9.6           spatstat.utils_3.1-3    terra_1.8-29           
#>  [31] units_0.8-7             goftest_1.2-3           spatstat.random_3.3-3  
#>  [34] svglite_2.1.3           codetools_0.2-20        DelayedArray_0.33.6    
#>  [37] tidyselect_1.2.1        raster_3.6-31           UCSC.utils_1.3.1       
#>  [40] farver_2.1.2            viridis_0.6.5           spatstat.explore_3.4-2 
#>  [43] jsonlite_1.9.1          e1071_1.7-16            systemfonts_1.2.1      
#>  [46] smoothr_1.0.1           tools_4.6.0             Rcpp_1.0.14            
#>  [49] glue_1.8.0              gridExtra_2.3           SparseArray_1.7.7      
#>  [52] xfun_0.51               HDF5Array_1.35.16       shinydashboard_0.7.2   
#>  [55] withr_3.0.2             numDeriv_2016.8-1.1     BiocManager_1.30.25    
#>  [58] fastmap_1.2.0           boot_1.3-31             rhdf5filters_1.19.2    
#>  [61] digest_0.6.37           R6_2.6.1                mime_0.13              
#>  [64] colorspace_2.1-1        tensor_1.5              jpeg_0.1-11            
#>  [67] spatstat.data_3.1-6     RSQLite_2.3.9           h5mread_0.99.4         
#>  [70] class_7.3-23            httr_1.4.7              htmlwidgets_1.6.4      
#>  [73] S4Arrays_1.7.3          pkgconfig_2.0.3         gtable_0.3.6           
#>  [76] blob_1.2.4              XVector_0.47.2          htmltools_0.5.8.1      
#>  [79] bookdown_0.42           fftwtools_0.9-11        scales_1.3.0           
#>  [82] png_0.1-8               spatstat.univar_3.1-2   reformulas_0.4.0       
#>  [85] knitr_1.50              rjson_0.2.23            nlme_3.1-167           
#>  [88] curl_6.2.1              nloptr_2.2.1            proxy_0.4-27           
#>  [91] cachem_1.1.0            rhdf5_2.51.2            stringr_1.5.1          
#>  [94] BiocVersion_3.21.1      KernSmooth_2.23-26      parallel_4.6.0         
#>  [97] vipor_0.4.7             AnnotationDbi_1.69.0    pillar_1.10.1          
#> [100] grid_4.6.0              vctrs_0.6.5             promises_1.3.2         
#> [103] xtable_1.8-4            beeswarm_0.4.0          evaluate_1.0.3         
#> [106] tinytex_0.56            magick_2.8.5            cli_3.6.4              
#> [109] locfit_1.5-9.12         compiler_4.6.0          rlang_1.1.5            
#> [112] crayon_1.5.3            labeling_0.4.3          classInt_0.4-11        
#> [115] ggbeeswarm_0.7.2        stringi_1.8.4           viridisLite_0.4.2      
#> [118] deldir_2.0-4            BiocParallel_1.41.2     nnls_1.6               
#> [121] munsell_0.5.1           Biostrings_2.75.4       tiff_0.1-12            
#> [124] spatstat.geom_3.3-6     patchwork_1.3.0         bit64_4.6.0-1          
#> [127] Rhdf5lib_1.29.1         KEGGREST_1.47.0         shiny_1.10.0           
#> [130] rbibutils_2.3           memoise_2.0.1           bslib_0.9.0            
#> [133] bit_4.6.0

References

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439. https://doi.org/10.32614/RJ-2018-009.