---
title: "2. Tensor arithmetic by DelayedTensor"
author:
- name: Koki Tsuyuzaki
affiliation: Laboratory for Bioinformatics Research,
RIKEN Center for Biosystems Dynamics Research
- name: Itoshi Nikaido
affiliation: Laboratory for Bioinformatics Research,
RIKEN Center for Biosystems Dynamics Research
email: k.t.the-answer@hotmail.co.jp
graphics: no
package: DelayedTensor
output:
BiocStyle::html_document:
toc_float: true
vignette: |
%\VignetteIndexEntry{TensorArithmetic}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Authors**: `r packageDescription("DelayedTensor")[["Author"]] `
**Last modified:** `r file.info("DelayedTensor_2.Rmd")$mtime`
**Compiled**: `r date()`
# Setting
```{r Setting 1, echo=TRUE}
suppressPackageStartupMessages(library("DelayedTensor"))
suppressPackageStartupMessages(library("DelayedArray"))
suppressPackageStartupMessages(library("HDF5Array"))
suppressPackageStartupMessages(library("DelayedRandomArray"))
darr1 <- RandomUnifArray(c(2,3,4))
darr2 <- RandomUnifArray(c(2,3,4))
```
There are several settings in `r Biocpkg("DelayedTensor")`.
First, the sparsity of the intermediate `r Biocpkg("DelayedArray")` objects
calculated inside `r Biocpkg("DelayedTensor")` is set by `setSparse`.
Note that the sparse mode is experimental.
Whether it contributes to higher speed and lower memory is quite dependent
on the sparsity of the `r Biocpkg("DelayedArray")`,
and the current implementation does not recognize the block size,
which may cause out-of-memory errors, when the data is extremely huge.
Here, we specify `as.sparse` as `FALSE`
(this is also the default value for now).
```{r Setting 2, echo=TRUE}
DelayedTensor::setSparse(as.sparse=FALSE)
```
Next, the verbose message is suppressed by `setVerbose`.
This is useful when we want to monitor the calculation process.
Here we specify `as.verbose` as `FALSE`
(this is also the default value for now).
```{r Setting 3, echo=TRUE}
DelayedTensor::setVerbose(as.verbose=FALSE)
```
The block size of block processing is specified by `setAutoBlockSize`.
When the sparse mode is off, all the functions of `r Biocpkg("DelayedTensor")`
are performed as block processing,
in which each block vector/matrix/tensor is expanded to memory space
from on-disk file incrementally so as not to exceed the specified size.
Here, we specify the block size as `1E+8`.
```{r Setting 4, echo=TRUE}
setAutoBlockSize(size=1E+8)
```
Finally, the temporal directory to store the intermediate HDF5 files during running `r Biocpkg("DelayedTensor")` is specified by `setHDF5DumpDir`.
Note that in many systems the `/var` directory has the storage limitation, so if there is no enough space, user should specify the other directory.
```{r Setting 5, echo=TRUE}
# tmpdir <- paste(sample(c(letters,1:9), 10), collapse="")
# dir.create(tmpdir, recursive=TRUE))
tmpdir <- tempdir()
setHDF5DumpDir(tmpdir)
```
These specified values are also extracted by each getter function.
```{r Setting 6, echo=TRUE}
DelayedTensor::getSparse()
DelayedTensor::getVerbose()
getAutoBlockSize()
getHDF5DumpDir()
```
# Tensor Arithmetic Operations
## Unfold/Fold Operations
Unfold (a.k.a. matricizing) operations are used to reshape a tensor into a
matrix.
![Figure 1: Unfold/Fold Operasions](Figure2_1.png)
In `unfold`, `row_idx` and `col_idx` are specified to set which modes are used
as the row/column.
```{r Unfold/Fold operations 1, echo=TRUE}
dmat1 <- DelayedTensor::unfold(darr1, row_idx=c(1,2), col_idx=3)
dmat1
```
`fold` is the inverse operation of `unfold`, which is used to reshape
a matrix into a tensor.
In `fold`, `row_idx`/`col_idx` are specified to set which modes correspond
the row/column of the output tensor and `modes`
is specified to set the mode of the output tensor.
```{r Unfold/Fold operations 2, echo=TRUE}
dmat1_to_darr1 <- DelayedTensor::fold(dmat1,
row_idx=c(1,2), col_idx=3, modes=dim(darr1))
dmat1_to_darr1
identical(as.array(darr1), as.array(dmat1_to_darr1))
```
There are some wrapper functions of `unfold` and `fold`.
For example, in `k_unfold`, mode `m` is used as the row, and the other modes
are is used as the column.
`k_fold` is the inverse operation of `k_unfold`.
```{r Unfold/Fold operations 3, echo=TRUE}
dmat2 <- DelayedTensor::k_unfold(darr1, m=1)
dmat2_to_darr1 <- k_fold(dmat2, m=1, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat2_to_darr1))
dmat3 <- DelayedTensor::k_unfold(darr1, m=2)
dmat3_to_darr1 <- k_fold(dmat3, m=2, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat3_to_darr1))
dmat4 <- DelayedTensor::k_unfold(darr1, m=3)
dmat4_to_darr1 <- k_fold(dmat4, m=3, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat4_to_darr1))
```
In `rs_unfold`, mode `m` is used as the row, and the other modes
are is used as the column.
`rs_fold` and `rs_unfold` also perform the same operations.
On the other hand, `cs_unfold` specifies the mode `m` as the column
and the other modes are specified as the column.
`cs_fold` is the inverse operation of `cs_unfold`.
```{r Unfold/Fold operations 4, echo=TRUE}
dmat8 <- DelayedTensor::cs_unfold(darr1, m=1)
dmat8_to_darr1 <- DelayedTensor::cs_fold(dmat8, m=1, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat8_to_darr1))
dmat9 <- DelayedTensor::cs_unfold(darr1, m=2)
dmat9_to_darr1 <- DelayedTensor::cs_fold(dmat9, m=2, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat9_to_darr1))
dmat10 <- DelayedTensor::cs_unfold(darr1, m=3)
dmat10_to_darr1 <- DelayedTensor::cs_fold(dmat10, m=3, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat10_to_darr1))
```
In `matvec`, m=2 is specified as unfold.
`unmatvec` is the inverse operation of `matvec`.
```{r Unfold/Fold operations 5, echo=TRUE}
dmat11 <- DelayedTensor::matvec(darr1)
dmat11_darr1 <- DelayedTensor::unmatvec(dmat11, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat11_darr1))
```
`ttm` multiplies a tensor by a matrix.
`m` specifies in which mode the matrix will be multiplied.
```{r Unfold/Fold operations 7, echo=TRUE}
dmatZ <- RandomUnifArray(c(10,4))
DelayedTensor::ttm(darr1, dmatZ, m=3)
```
`ttl` multiplies a tensor by multiple matrices.
`ms` specifies in which mode these matrices will be multiplied.
```{r Unfold/Fold operations 6, echo=TRUE}
dmatX <- RandomUnifArray(c(10,2))
dmatY <- RandomUnifArray(c(10,3))
dlizt <- list(dmatX = dmatX, dmatY = dmatY)
DelayedTensor::ttl(darr1, dlizt, ms=c(1,2))
```
## Vectorization
`vec` collapses a `r Biocpkg("DelayedArray")` into
a 1D `r Biocpkg("DelayedArray")` (vector).
![Figure 2: Vectorization](Figure2_2.png)
```{r Vectorization, echo=TRUE}
DelayedTensor::vec(darr1)
```
## Norm Operations
`fnorm` calculates the Frobenius norm of a `r Biocpkg("DelayedArray")`.
![Figure 3: Norm Operations](Figure2_3.png)
```{r Norm operations 1, echo=TRUE}
DelayedTensor::fnorm(darr1)
```
`innerProd` calculates the inner product value of two
`r Biocpkg("DelayedArray")`.
```{r Norm operations 2, echo=TRUE}
DelayedTensor::innerProd(darr1, darr2)
```
## Outer Product
Inner product multiplies two tensors and collapses to 0D tensor (norm).
On the other hand, the outer product is an operation that leaves all subscripts intact.
![Figure 4: Outer Product](Figure2_4.png)
```{r Outer Product, echo=TRUE}
DelayedTensor::outerProd(darr1[,,1], darr2[,,1])
```
## Diagonal Operations
Using `DelayedDiagonalArray`, we can originally create a diagonal
`r Biocpkg("DelayedArray")` by specifying the dimensions (modes) and the values.
![Figure 5: Diagonal Operations](Figure2_5.png)
```{r Diagonal operations 1, echo=TRUE}
dgdarr <- DelayedTensor::DelayedDiagonalArray(c(5,6,7), 1:5)
dgdarr
```
Similar to the `diag` of the `r CRANpkg("base")` package,
the `diag` of `r Biocpkg("DelayedTensor")` is used to extract
and assign values to `r Biocpkg("DelayedArray")`.
```{r Diagonal operations 2, echo=TRUE}
DelayedTensor::diag(dgdarr)
```
```{r Diagonal operations 3, echo=TRUE}
DelayedTensor::diag(dgdarr) <- c(1111, 2222, 3333, 4444, 5555)
DelayedTensor::diag(dgdarr)
```
## Mode-wise Operations
`modeSum` calculates the summation for a given mode `m` of
a `r Biocpkg("DelayedArray")`.
The mode specified as `m` is collapsed into 1D as follows.
![Figure 6: Mode-wise Operations](Figure2_6.png)
```{r Mode-wise operations 1, echo=TRUE}
DelayedTensor::modeSum(darr1, m=1)
DelayedTensor::modeSum(darr1, m=2)
DelayedTensor::modeSum(darr1, m=3)
```
Similar to `modeSum`, `modeMean` calculates the average value
for a given mode `m` of a `r Biocpkg("DelayedArray")`.
```{r Mode-wise operations 2, echo=TRUE}
DelayedTensor::modeMean(darr1, m=1)
DelayedTensor::modeMean(darr1, m=2)
DelayedTensor::modeMean(darr1, m=3)
```
## Tensor Product Operations
There are some tensor specific product such as Hadamard product,
Kronecker product, and Khatri-Rao product.
### Hadamard Product
Suppose a tensor $A \in \Re ^{I \times J}$ and
a tensor $B \in \Re ^{I \times J}$.
Hadamard product is defined as the element-wise product of $A$ and $B$.
![Figure 7: Hadamard Product](Figure2_7.png)
Hadamard product can be extended to higher-order tensors.
$$
A \circ B = \begin{bmatrix}
a_{11}b_{11} & a_{12}b_{12} & \cdots & a_{1J}b_{1J} \\
a_{21}b_{21} & a_{22}b_{22} & \cdots & a_{2J}b_{2J} \\
\vdots & \vdots & \ddots & \vdots \\
a_{I1}b_{I1} & a_{I2}b_{I2} & \cdots & a_{IJ}b_{IJ} \\
\end{bmatrix}
$$
`hadamard` calculates Hadamard product of two `r Biocpkg("DelayedArray")`
objects.
```{r Tensor product operations 1, echo=TRUE}
prod_h <- DelayedTensor::hadamard(darr1, darr2)
dim(prod_h)
```
`hadamard_list` calculates Hadamard product of multiple
`r Biocpkg("DelayedArray")` objects.
```{r Tensor product operations 2, echo=TRUE}
prod_hl <- DelayedTensor::hadamard_list(list(darr1, darr2))
dim(prod_hl)
```
### Kronecker Product
Suppose a tensor $A \in \Re ^{I \times J}$ and
a tensor $B \in \Re ^{K \times L}$.
Kronecker product is defined as all the possible combination of element-wise
product and the dimensions of output tensor are ${IK \times JL}$.
![Figure 8: Kronecker Product](Figure2_8.png)
Kronecker product can be extended to higher-order tensors.
$$
A \otimes B = \begin{bmatrix}
a_{11}B & a_{12}B & \cdots & a_{1J}B \\
a_{21}B & a_{22}B & \cdots & a_{2J}B \\
\vdots & \vdots & \ddots & \vdots \\
a_{I1}B & a_{I2}B & \cdots & a_{IJ}B \\
\end{bmatrix}
$$
`kronecker` calculates Kronecker product of two `r Biocpkg("DelayedArray")`
objects.
```{r Tensor product operations 3, echo=TRUE}
prod_kron <- DelayedTensor::kronecker(darr1, darr2)
dim(prod_kron)
```
`kronecker_list` calculates Kronecker product of multiple
`r Biocpkg("DelayedArray")` objects.
```{r Tensor product operations 4, echo=TRUE}
prod_kronl <- DelayedTensor::kronecker_list(list(darr1, darr2))
dim(prod_kronl)
```
### Khatri-Rao Product
Suppose a tensor $A \in \Re ^{I \times J}$ and
a tensor $B \in \Re ^{K \times J}$.
Khatri-Rao product is defined as the column-wise Kronecker product
and the dimensions of output tensor is ${IK \times J}$.
$$
A \odot B = \begin{bmatrix}
a_{1} \otimes a_{1} & a_{2} \otimes a_{2} & \cdots & a_{J} \otimes a_{J} \\
\end{bmatrix}
$$
![Figure 9: Khatri-Rao Product](Figure2_9.png)
Khatri-Rao product can only be used for 2D tensors (matrices).
`khatri_rao` calculates Khatri-Rao product of two `r Biocpkg("DelayedArray")`
objects.
```{r Tensor product operations 5, echo=TRUE}
prod_kr <- DelayedTensor::khatri_rao(darr1[,,1], darr2[,,1])
dim(prod_kr)
```
`khatri_rao_list` calculates Khatri-Rao product of multiple
`r Biocpkg("DelayedArray")` objects.
```{r Tensor product operations 6, echo=TRUE}
prod_krl <- DelayedTensor::khatri_rao_list(list(darr1[,,1], darr2[,,1]))
dim(prod_krl)
```
## Utilities Functions
`list_rep` replicates an arbitrary number of any R object.
```{r Utilities 1, echo=TRUE}
str(DelayedTensor::list_rep(darr1, 3))
```
### Bind Operations
`modebind_list` collapses multiple `r Biocpkg("DelayedArray")` objects
into single `r Biocpkg("DelayedArray")` object.
`m` specifies the collapsed dimension.
![Figure 10: Bind Operations](Figure2_10.png)
```{r Utilities 2, echo=TRUE}
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=1))
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=2))
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=3))
```
`rbind_list` is the row-wise `modebind_list` and
collapses multiple 2D `r Biocpkg("DelayedArray")` objects
into single `r Biocpkg("DelayedArray")` object.
```{r Utilities 3, echo=TRUE}
dim(DelayedTensor::rbind_list(list(darr1[,,1], darr2[,,1])))
```
`cbind_list` is the column-wise `modebind_list` and
collapses multiple 2D `r Biocpkg("DelayedArray")` objects
into single `r Biocpkg("DelayedArray")` object.
```{r Utilities 4, echo=TRUE}
dim(DelayedTensor::cbind_list(list(darr1[,,1], darr2[,,1])))
```
# Session information {.unnumbered}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```