In this vignette we show different functionalities of sosta
. First we perform a density-based reconstruction of structures based on spatial coordinates. Next, we will calcuate different metrics on the structure and cell level.
The sosta package can be installed from Bioconductor as follows:
if (!requireNamespace("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("sosta")
For this vignette, we will need several additional packages:
library("sosta")
library("dplyr")
library("tidyr")
library("ggplot2")
library("sf")
library("SpatialExperiment")
theme_set(theme_bw())
As an example, we will load an simulated dataset with three images and three cell types A, B and C, which are stored in a SpatialExperiment object:
# load the data
data("sostaSPE")
sostaSPE
#> class: SpatialExperiment
#> dim: 0 5641
#> metadata(0):
#> assays(0):
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(3): cellType imageName sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal()
The goal is to reconstruct, or segment, the structures given by cell type A.
In this example, we will reconstruct cellular “structures” based on the point pattern density of the cells of type A. We will start with estimating parameters that are used for reconstruction afterwards. For one image, this can be illustrated as follows:
shapeIntensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A"
)
We see the pixel-wise density image on the left and a histogram of the intensity values on the right. The estimated threshold is roughly the mean between the two modes of the (truncated) pixel intensity distribution. The dimensions of the pixel image are specified on the bottom left. The dimensions correspond to the density image, setting a higher value for the dim
parameter will result in a higher resolution density image but will not change how many points are within a pixel. This can be changed by varying the smoothing parameter (bndw
).
This was done for one image. The function estimateReconstructionParametersSPE
returns two plots with the estimated parameters for each image. The parameter bndw
is the bandwidth parameter that is used for estimating the intensity profile of the point pattern. The parameter thres
is the estimated parameter for the density threshold for reconstruction.
n <- estimateReconstructionParametersSPE(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
markSelect = "A",
plotHist = TRUE
)
We will use the mean of the two estimated vectors as our parameters.
(thresSPE <- mean(n$thres))
#> [1] 0.04467379
(bndwSPE <- mean(n$bndw))
#> [1] 3.804571
We can now use the function reconstructShapeDensity
to segment the cell-type-A structure into regions, the result of which is sf polygon (Pebesma 2018).
(struct <- reconstructShapeDensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A",
bndw = bndwSPE,
dim = 500,
thres = thresSPE
))
#> Simple feature collection with 3 features and 0 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 0.1061074 ymin: 0.1104626 xmax: 127.8393 ymax: 127.9601
#> CRS: NA
#> sostaPolygon
#> 1 POLYGON ((3.938104 113.6409...
#> 2 POLYGON ((96.41697 125.6588...
#> 3 POLYGON ((47.11194 127.4487...
We can then plot both the points and the segmented polygon.
cbind(
colData(sostaSPE[, sostaSPE$imageName == "image1"]),
spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = struct,
fill = NA,
color = "darkblue",
inherit.aes = FALSE, # this is important
linewidth = 1
)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
If no arguments are given, the function reconstructShapeDensityImage
estimates the reconstruction parameters internally. The bandwidth is estimated using cross validation using the function bw.diggle
of the spatstat.explore package. The threshold is estimated by taking the mean between the two modes of the pixel intensity distribution as illustrated above.
struct2 <- reconstructShapeDensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A",
dim = 500
)
cbind(
colData(sostaSPE[, sostaSPE$imageName == "image1"]),
spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = struct2,
fill = NA,
color = "darkblue",
inherit.aes = FALSE, # this is important
linewidth = 1
)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
The function reconstructShapeDensitySPE
reconstructs the cell-type-A structure for all images in the spe
object. We use the estimated parameters from above.
allStructs <- reconstructShapeDensitySPE(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
markSelect = "A",
bndw = bndwSPE,
thres = thresSPE,
nCores = 1
)
Next, we assign cells in the SingleCellExperiment
object to their specific structure.
assign <- assingCellsToStructures(sostaSPE, allStructs,
imageCol = "imageName", nCores = 1
)
sostaSPE$structAssign <- assign
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = structAssign)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal()
Using the function cellTypeProportions
, we can estimate the proportion of cell types witin each individual structure.
cellTypeProportions(sostaSPE, "structAssign", "cellType")
#> A B C
#> image1_2 0.8302326 0.08837209 0.08139535
#> image1_1 0.7996255 0.12172285 0.07865169
#> image1_3 0.8333333 0.12500000 0.04166667
#> image2_3 0.8770764 0.07475083 0.04817276
#> image2_2 0.8609626 0.09447415 0.04456328
#> image3_7 0.8088235 0.10160428 0.08957219
#> image3_2 0.6200000 0.28000000 0.10000000
#> image3_5 0.8421053 0.10526316 0.05263158
#> image3_6 0.7307692 0.19230769 0.07692308
#> image3_4 0.7894737 0.21052632 0.00000000
#> image3_1 0.8000000 0.20000000 0.00000000
#> image3_3 1.0000000 0.00000000 0.00000000
The function totalShapeMetrics
calculates a set of metrics related to the shape of the structures
(shapeMetrics <- totalShapeMetrics(allStructs))
#> image1_1 image1_2 image1_3 image2_1 image2_2
#> Area 4250.2792228 3574.6477409 226.9959779 1.3752774 4295.1222139
#> Compactness 0.1346764 0.2519490 0.6080376 0.4581489 0.1953486
#> Eccentricity 0.7875707 0.4641678 0.8363778 0.9999739 0.5647329
#> Circularity 0.4723775 0.5970360 0.8846882 0.5943689 0.4862921
#> Solidity 0.5963449 0.8051348 0.9769469 0.8936170 0.6614292
#> Curl 0.6224990 0.3932054 0.2617308 0.3922730 0.4574191
#> fibreLength 300.7421880 192.5589303 25.2605053 2.5265850 245.3105698
#> fibreWidth 14.1326338 18.5639157 8.9862010 0.5443226 17.5089162
#> image2_3 image3_1 image3_2 image3_3 image3_4
#> Area 4746.6716180 51.8620571 417.5778496 12.6875650 145.71079859
#> Compactness 0.2150477 0.5718048 0.6331764 0.6342179 0.35221739
#> Eccentricity 0.4239889 0.5710073 0.7120634 0.9381917 0.27046967
#> Circularity 0.5075727 0.7502777 0.8892681 0.9053567 0.45173303
#> Solidity 0.7430950 0.9653074 0.9832153 0.9440389 0.96512887
#> Curl 0.4753033 0.1632771 0.1889006 0.2827801 0.09665593
#> fibreLength 243.8668860 12.8415084 32.7783740 5.7028999 31.41216792
#> fibreWidth 19.4641909 4.0386266 12.7394315 2.2247567 4.63867374
#> image3_5 image3_6 image3_7
#> Area 186.3240867 234.8507530 6621.3393593
#> Compactness 0.6428357 0.5221272 0.1752898
#> Eccentricity 0.8736600 0.7688595 0.7542569
#> Circularity 0.9288492 0.7443243 0.5683033
#> Solidity 0.9787015 0.9320010 0.6863814
#> Curl 0.2514767 0.2936578 0.6032739
#> fibreLength 21.5160903 29.6774666 324.0514214
#> fibreWidth 8.6597558 7.9134367 20.4329897
cbind(allStructs, t(shapeMetrics)) |>
ggplot() +
geom_sf(aes(fill = Area)) +
facet_wrap(~imageName)
Using the function minBoundaryDistances
, we can compute the distance to the border structure. Negative values indicate that the points lie inside the structure.
sostaSPE$minDist <- minBoundaryDistances(
sostaSPE, "imageName",
"structAssign", allStructs
)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = minDist)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
scale_colour_gradient2() +
geom_sf(
data = allStructs,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
This information can be used to define border cells by thresholding to a range of positive and negative values.
sostaSPE$border <- ifelse(abs(sostaSPE$minDist) < 3, TRUE, FALSE)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = border)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = allStructs,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
Alternatively, borders can be defined using st_difference
and st_buffer
on the structure object. In this case we will have an sf
polygon that correspond to the border region.
borders <- lapply(
st_geometry(allStructs),
\(x) st_difference(st_buffer(x, 3), st_buffer(x, -3))
) |>
# both needed for proper conversion
st_as_sfc() |> st_as_sf()
borders$imageName <- allStructs$imageName
borders$borderID <- paste0("border_", allStructs$structID)
sostaSPE$borderSf <- assingCellsToStructures(sostaSPE,
borders,
imageCol = "imageName",
uniqueId = "borderID",
nCores = 1
)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = borderSf)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = borders,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggfortify_0.4.17 imcdatasets_1.15.0
#> [3] cytomapper_1.19.0 EBImage_4.49.0
#> [5] tidyr_1.3.1 sosta_0.99.5
#> [7] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.2
#> [9] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [11] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
#> [13] IRanges_2.41.3 S4Vectors_0.45.4
#> [15] MatrixGenerics_1.19.1 matrixStats_1.5.0
#> [17] sf_1.0-19 lmerTest_3.1-3
#> [19] lme4_1.1-36 Matrix_1.7-3
#> [21] ggplot2_3.5.1 ExperimentHub_2.15.0
#> [23] AnnotationHub_3.15.0 BiocFileCache_2.15.1
#> [25] dbplyr_2.5.0 BiocGenerics_0.53.6
#> [27] generics_0.1.3 dplyr_1.1.4
#> [29] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.6.0 later_1.4.1 bitops_1.0-9
#> [4] filelock_1.0.3 tibble_3.2.1 svgPanZoom_0.3.4
#> [7] polyclip_1.10-7 lifecycle_1.0.4 Rdpack_2.6.3
#> [10] lattice_0.22-6 MASS_7.3-65 magrittr_2.0.3
#> [13] sass_0.4.9 rmarkdown_2.29 jquerylib_0.1.4
#> [16] yaml_2.3.10 httpuv_1.6.15 sp_2.2-0
#> [19] spatstat.sparse_3.1-0 DBI_1.2.3 minqa_1.2.8
#> [22] RColorBrewer_1.1-3 abind_1.4-8 purrr_1.0.4
#> [25] RCurl_1.98-1.16 rappdirs_0.3.3 GenomeInfoDbData_1.2.14
#> [28] ggrepel_0.9.6 spatstat.utils_3.1-3 terra_1.8-29
#> [31] units_0.8-7 goftest_1.2-3 spatstat.random_3.3-3
#> [34] svglite_2.1.3 codetools_0.2-20 DelayedArray_0.33.6
#> [37] tidyselect_1.2.1 raster_3.6-31 UCSC.utils_1.3.1
#> [40] farver_2.1.2 viridis_0.6.5 spatstat.explore_3.4-2
#> [43] jsonlite_1.9.1 e1071_1.7-16 systemfonts_1.2.1
#> [46] smoothr_1.0.1 tools_4.6.0 Rcpp_1.0.14
#> [49] glue_1.8.0 gridExtra_2.3 SparseArray_1.7.7
#> [52] xfun_0.51 HDF5Array_1.35.16 shinydashboard_0.7.2
#> [55] withr_3.0.2 numDeriv_2016.8-1.1 BiocManager_1.30.25
#> [58] fastmap_1.2.0 boot_1.3-31 rhdf5filters_1.19.2
#> [61] digest_0.6.37 R6_2.6.1 mime_0.13
#> [64] colorspace_2.1-1 tensor_1.5 jpeg_0.1-11
#> [67] spatstat.data_3.1-6 RSQLite_2.3.9 h5mread_0.99.4
#> [70] class_7.3-23 httr_1.4.7 htmlwidgets_1.6.4
#> [73] S4Arrays_1.7.3 pkgconfig_2.0.3 gtable_0.3.6
#> [76] blob_1.2.4 XVector_0.47.2 htmltools_0.5.8.1
#> [79] bookdown_0.42 fftwtools_0.9-11 scales_1.3.0
#> [82] png_0.1-8 spatstat.univar_3.1-2 reformulas_0.4.0
#> [85] knitr_1.50 rjson_0.2.23 nlme_3.1-167
#> [88] curl_6.2.1 nloptr_2.2.1 proxy_0.4-27
#> [91] cachem_1.1.0 rhdf5_2.51.2 stringr_1.5.1
#> [94] BiocVersion_3.21.1 KernSmooth_2.23-26 parallel_4.6.0
#> [97] vipor_0.4.7 AnnotationDbi_1.69.0 pillar_1.10.1
#> [100] grid_4.6.0 vctrs_0.6.5 promises_1.3.2
#> [103] xtable_1.8-4 beeswarm_0.4.0 evaluate_1.0.3
#> [106] tinytex_0.56 magick_2.8.5 cli_3.6.4
#> [109] locfit_1.5-9.12 compiler_4.6.0 rlang_1.1.5
#> [112] crayon_1.5.3 labeling_0.4.3 classInt_0.4-11
#> [115] ggbeeswarm_0.7.2 stringi_1.8.4 viridisLite_0.4.2
#> [118] deldir_2.0-4 BiocParallel_1.41.2 nnls_1.6
#> [121] munsell_0.5.1 Biostrings_2.75.4 tiff_0.1-12
#> [124] spatstat.geom_3.3-6 patchwork_1.3.0 bit64_4.6.0-1
#> [127] Rhdf5lib_1.29.1 KEGGREST_1.47.0 shiny_1.10.0
#> [130] rbibutils_2.3 memoise_2.0.1 bslib_0.9.0
#> [133] bit_4.6.0
Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439. https://doi.org/10.32614/RJ-2018-009.