---
title: "Providing Bioimage Dataset for ExperimentHub"
author:
- name: Satoshi Kume
email: satoshi.kume.1984@gmail.com
date: "`r Sys.Date()`"
graphics: no
package: BioImageDbs
output:
BiocStyle::html_document:
toc_float: true
bibliography: BioImageDbs.bib
csl: bj.csl
vignette: >
%\VignetteIndexEntry{Providing Bioimage Dataset for ExperimentHub}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{ExperimentHub}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Last modified:** `r file.info("BioImageDbs.Rmd")$mtime`
**Compiled**: `r Sys.time()`
# Utilisation and prospects of bioimage datasets
In recent years, there has been a growing need for data analysis using machine
learning in the field of bioimaging research. Machine learning is an inductive
approach using data, and the construction of models, such as image segmentation
and classification, involves the use of image data itself. Therefore, the
publication and sharing of bioimage datasets [@Williams2017] as well as knowledge creation
through providing metadata to bioimages [@Kobayashi2018;@Kume2017] are important issues to be
discussed. At present, there is no commonly used format for sharing bioimage
datasets. Also, the data is scattered among various repositories. Therefore,
different image repositories manage the data in different formats (image data
itself and metadata, including image format, instruments/microscopes and
biosamples).
In the data analysis and quantification using those images, it is assumed that
several steps of image pre-processing are performed depending on the analysis
environment used. However, the implementation of supervised learning starts with
finding a repository of the bioimage dataset that contains original images and
their corresponding supervised labels. Once the repository is found, the image
data is downloaded from the repository, the data is loaded into each
environment and it is prepared in a format suitable for analytical package.
These processes are time consuming before the main analysis. Also, in most of
the image repositories, the data are not published in a format suitable for
reading and processing in R (.Rdata, etc.), and the data are not easy to use
for R users.
For performing supervised learning of bioimage data, BioImageDbs provides R
list objects of the original images and their corresponding supervised labels
converted into a 4D or 5D array. After retrieving the data from ExperimentHub,
it can be utilised for deep learning using Keras/Tensorflow [@Chollet2017kerasR]
and other machine learning methods, without the need for pre-processing.
On the other hand, many image analysis packages are also available on R;
however, there is a lack of standardisation in image analysis. The use of
common, open datasets is one of the essential steps in standardising and
comparing the analytical methods. The provision of the array data of images
through ExperimentHub is also intended for applications such as (1) comparing
models using common-sharing data among R users and (2) applying predictions
to new datasets through transfer learning and fine-tuning based on these arrays.
# Fetch Bioimage Datasets from ExperimentHub
The `BioImageDbs` package provides the metadata for all BioImage
databases in `r Biocpkg("ExperimentHub")`.
The `BioImageDbs` package provides the metadata for bioimage datasets,
which is preprocessed as array format and saved in
`r Biocpkg("ExperimentHub")`.
First we load/update the `ExperimentHub` resource.
```{r load-lib, message = FALSE}
library(ExperimentHub)
eh <- ExperimentHub()
```
Next we list all BioImageDbs entries from `ExperimentHub`.
```{r list-BioImageDbs}
query(eh, "BioImage")
```
We can confirm the metadata in ExperimentHub in Bioconductor S3 bucket
with `mcols()`.
```{r confirm-metadata}
mcols(query(eh, "BioImage"))
```
We can retrieve only the BioImageDbs tibble files as follows.
```{r query-mouse}
qr <- query(eh, c("BioImageDbs", "LM_id0001"))
qr
#Import data
#BioImageDbs_image_Dat <- qr[[1]]
```
# 5D Arrays from the ExperimentHub
The ordering of the array dimensions corresponds to the channels_last format
(default) in R/Keras. The input shape of 5D array is to be batch, spatial_dim1,
spatial_dim2, spatial_dim3 and channels. The number of this batch is the same
as the number of the 3D image sets. The number of channels is 1 for grey
images and 3 for RGB images.
# 4D Arrays from the ExperimentHub
The ordering of the array dimensions corresponds to the channels_last format
(default) in R/Keras. The input shape of 4D array is to be batch, height,
width and channels. The number of this batch is the same as the number of
the 2D images.
# Visualization of gif images from the ExperimentHub
As a test, we also provided gif files of some arrays for visualizations.
We visualize the files using `magick::image_read` function.
```{r}
qr <- query(eh, c("BioImageDbs", ".gif"))
qr
#EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_data
qr[1]
##Display the gif image
#magick::image_read(qr[[1]])
```
```{r Fig001, fig.cap = "EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[1]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0001.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig002, fig.cap = "EM_id0002_Drosophila_brain_region_5dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[2]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0002.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig003, fig.cap = "EM_id0003_J558L_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[9]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0003.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig004, fig.cap = "EM_id0004_PrHudata_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[10]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0004.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig008, fig.cap = "EM_id0005_Mouse_Kidney_2D_All_Mito_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0005_Mouse_Kidney_2D_Mito.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig009, fig.cap = "EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0005_Mouse_Kidney_2D_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig010, fig.cap = "EM_id0006_Rat_Liver_2D_All_Mito_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0006_Rat_Liver_Mito.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig011, fig.cap = "EM_id0006_Rat_Liver_2D_All_Nuc_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0006_Rat_Liver_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig012, fig.cap = "EM_id0007_Mouse_Kidney_MultiScale_All_Low_Glomerulus_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0007_Mouse_Kidney_MultiScale_Glomerulus.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig013, fig.cap = "EM_id0007_Mouse_Kidney_MultiScale_All_Middle_Podocyte_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0007_Mouse_Kidney_MultiScale_Podocyte.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig014, fig.cap = "EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0008_Human_NB4_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig015, fig.cap = "EM_id0008_Human_NB4_2D_All_Nuc_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0008_Human_NB4_Cell.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig016, fig.cap = "EM_id0009_MurineBMMC_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0009.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig017, fig.cap = "EM_id0010_HumanBlast_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0010.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig018, fig.cap = "EM_id0011_HumanJurkat_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0011.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig005, fig.cap = "LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[3]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0001.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig006, fig.cap = "LM_id0002_PhC_C2DH_U373_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[5]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0002.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
```{r Fig007, fig.cap = "LM_id0003_Fluo_N2DH_GOWT1_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[7]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0003.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```
# A simple execution command using Keras/Tensorflow
We select a data array and a label array from the data list and assign
them to variables. These variables are then used as the x and y arguments
of the fit () function of Keras as an example.
The model in Keras should be prepared before the execution.
```{r}
## Not Run ##
# qr <- query(eh, c("BioImageDbs"))
# BioImageData <- qr[[1]]
# data <- BioImageData$Train$Train_Original
# labels <- BioImageData$Train$Train_GroundTruth
# dim(data); dim(labels)
# model %>% fit( x = data, y = labels )
```
# About the imaging dataset and its metadata in BioImageDbs
For this dataset in BioImageDbs, the published open data was used as follows:
1. For cellular ultra-microstructures, electron microscopy-based imaging data of
mouse B myeloma cell line J558L (ex. EM_id0003_J558L_4dTensor.Rda) [@Morath2013]
and primary human T cell isolated from peripheral blood mononuclear cells
(ex. EM_id0004_PrHudata_4dTensor.Rda) [@Morath2013], Human NB-4 cell
(ex. EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor.Rds) [@Kume2017],
murine bone marrow derived-mast cells (ex. EM_id0009_MurineBMMC_All_512_4dTensor.Rds) [@Morath2013],
human blasts (ex. EM_id0010_HumanBlast_All_512_4dTensor.Rds) [@Morath2013], and
human T-cell line Jurkat (ex. EM_id0011_HumanJurkat_All_512_4dTensor.Rds) [@Morath2013]
were used.
2. For bio-tissue ultra-microstructures, electron microscopy-based imaging data
of the mouse brain (ex. EM_id0001_Brain_CA1_hippocampus_region_5dTensor.Rda)
[@Lucchi2012;@Lucchi2013bib], Drosophila brain (ex. EM_id0002_Drosophila_brain_region_5dTensor.Rda)
[@Cardona2010;@ArgandaCarreras2015], mouse kidney (ex. EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds) [@Kume2016]
and rat liver (ex. EM_id0006_Rat_Liver_2D_All_Mito_1024_4dtensor.Rds) [@Kume2016] were used.
3. For cell tracking, light microscopy-based imaging data of the human HeLa
cells on a flat glass (ex. LM_id0001_DIC_C2DH_HeLa_4dTensor.Rda) [@Maska2014;@Ulman2017],
human glioblastoma-astrocytoma U373 cells on a polyacrylamide substrate
(ex. LM_id0002_PhC_C2DH_U373_4dTensor.Rda) [@Maska2014;@Ulman2017] and GFP-GOWT1 mouse stem
cells (ex. LM_id0003_Fluo_N2DH_GOWT1_4dTensor.Rda) [@Bartova2011] were used.
The values of the supervised labels were provided as array data with binary
or multiple values. The detailed information was described in the metadata
file of BioImageDbs.
Some of cell tracking data were obtained from the [cell tracking challenge](http://celltrackingchallenge.net/2d-datasets/).
# Session information {.unnumbered}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```
# References