---
title: "Basic usage of GBScleanR"
author: 
  -name: "Tomoyuki Furuta"
  affiliation: Institute of Plant Science and Resources, Okayama University, Okayama, Japan
  email: f.tomoyuki@okayama-u.ac.jp
date: "Aug 27, 2024"
package: GBScleanR
output: 
  BiocStyle::html_document:
    toc: true
    toc_float: true
bibliography: references.bib
vignette: >
  %\VignetteIndexEntry{Basic usage of GBScleanR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.pos = 'H', fig.align = "center", warning = FALSE, message = FALSE)
```

# Introduction
The `r Biocpkg("GBScleanR")` package has been developed to conduct error correction on genotype data obtained via NGS-based genotyping methods such as RAD-seq and GBS [@Miller2007; @Elshire2011]. It is designed to estimate true genotypes along chromosomes from given allele read counts in the VCF file generated by SNP callers like GATK and TASSEL-GBS [@McKenna2010; @Glaubitz2014]. The current implementation supports genotype data of a mapping population derived from two or more diploid parents followed by selfings, sibling crosses, or random crosses. e.g. F$_2$ and 8-way RILs. Our method supposes markers to be biallelic and ordered along chromosomes by mapping reads on a reference genome sequence. To support smooth access to large size genotype data, every input VCF file is first converted to a Genomic Data Structure (GDS) file [@Zheng2012]. The current implementation does not allow non-biallelic markers, and those markers in an input VCF file will be automatically removed from a resultant GDS file. `r Biocpkg("GBScleanR")` also provides functions for data visualization, filtering, and loading/writing a VCF file. Furthermore, the data structure of the GDS file created via this package is compatible with those used in the `r Biocpkg("SNPRelate")`, `r Biocpkg("GWASTools")` and `r Biocpkg("GENESIS")` packages those are designed to handle large variant data and conduct downstream analyses including regression analysis [@Zheng2012; @Gogarten2012; @Gogarten2019]. In this document, we first walk through the utility functions implemented in `r Biocpkg("GBScleanR")` to introduce a basic usage. Then, the core function of `r Biocpkg("GBScleanR")` which estimates true genotypes for error correction will be introduced.

# New Functions in GBScleanR Version 2
The latest release of `r Biocpkg("GBScleanR")` is version 2. The major update in this version is that `r Biocpkg("GBScleanR")` supports polyploid populations. `r Biocpkg("GBScleanR")` tries to estimate the haplotype phases of founder and offspring samples based on given read counts. To process your data as of a polyploid population, set the `ploidy` argument in the `loadGDS()` function.
```{r eval = FALSE}
gds <- loadGDS(x = "/path/to/your/gds", ploidy = 4) # For tetraploid
gds <- loadGDS(x = "/path/to/your/gds", ploidy = 6) # For hexaploid
```
You can use any other functions in the `r Biocpkg("GBScleanR")` package for your polyploid population in the same way for diploids.
\  

In addition, the `setDominantMarkers()` function enables you to force `r Biocpkg("GBScleanR")` to treat some markers as dominant markers. In the iterative parameter optimization of the genotype estimation step, `r Biocpkg("GBScleanR")` calculates the marker-wise reference-read-bias. We can assume that reads of either reference or alternative allele only can be observed at dominant markers and the marker-wise reference-read-bias must be 1 or 0 for dominant markers that give only reference or alternative allele reads, respectively.
\  

As another important change in estGeno() since version 2, the function now requires the node argument to explicitly specify whether raw or filtered read counts should be used for genotype estimation. If setCallFilter() has not been executed, the user-specified node argument will be ignored, and the "raw" node will be used by default.
\  

The data visualization by `r Biocpkg("GBScleanR")` was also enhanced by new plot functions: `plotReadRatio()` and `plotDosage()`. See the "[Plot Read Ratio and Dosage](#plot_dosage)" section for the details.


# Prerequisites
This package internally uses the following packages.  
- `r CRANpkg("ggplot2")`  
- `r CRANpkg("dplyr")`  
- `r CRANpkg("tidyr")`  
- `r CRANpkg("expm")`  
- `r Biocpkg("gdsfmt")`  
- `r Biocpkg("SeqArray")`  
\  

# Installation
You can install `r Biocpkg("GBScleanR")` from the Bioconductor repository with the following code.
```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GBScleanR")
```
\  

The code below let you install the package from the github repository.
The package released in the github usually get frequent update more than that in Bioconductor due to the release schedule.
```{r eval=FALSE}
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("tomoyukif/GBScleanR", build_vignettes = TRUE)
```

# Error while Handling a GDS File
You may face to the following error message or similar one if you killed the process that 
was accessing a GDS file. 
```{r eval=FALSE}
Stream Read Error, need 12 byte(s) but receive 0
```
This error message indicates the corruption of the GDS file and you cannot access it anymore.
In this case, please remake a GDS file using the `gbsrVCF2GDS()` function.

# Loading Data
The main class of the `r Biocpkg("GBScleanR")` package is `GbsrGenotypData` which inherits the `GenotypeData` class in the `r Biocpkg("SeqArray")` package. The `gbsrGenotypeData` class object has three slots: `sample`, `marker`, and `scheme`. The `data` slot holds genotype data as a `gds.class` object which is defined in the `gdsfmt` package while `snpAnnot` and `scanAnnot` contain objects storing annotation information of SNPs and samples, which are the `SnpAnnotationDataFrame` and `ScanAnnotationDataFrame` objects defined in the `r Biocpkg("GWASTools")` package. See the vignette of `r Biocpkg("GWASTools")` for more detail. `r Biocpkg("GBScleanR")` follows the way of `r Biocpkg("GWASTools")` in which a unique genotyping instance (genotyped sample) is called "scan".
\ 

Load the package.
```{r warning=FALSE, message=FALSE}
library("GBScleanR")
```
\ 

`r Biocpkg("GBScleanR")` only supports a VCF file as input. As an example data, we use sample genotype data for a biparental F2 population derived from inbred parents.
```{r}
vcf_fn <- system.file("extdata", "sample.vcf", package = "GBScleanR")
gds_fn <- tempfile("sample", fileext = ".gds")
```
\ 

As mentioned above, the `GbsrGenotypeData` class requires genotype data in the `gds.class` object which enable us quick access to the genotype data without loading the whole data on RAM. At the beginning of the processing, we need to convert data format of our genotype data from VCF to GDS. This conversion can be achieved using `gbsrVCF2GDS()` as shown below. A compressed VCF file (.vcf.gz) is also acceptable. 
```{r}
# `force = TRUE` allow the function to over write the GDS file,
# even if a GDS file exists at `out_fn`.
gbsrVCF2GDS(vcf_fn = vcf_fn, out_fn = gds_fn, force = TRUE, verbose = FALSE)
```
\ 

Once we converted the VCF to the GDS, we can create a `GbsrGenotypeData` instance for our data.
```{r}
gds <- loadGDS(gds_fn, verbose = FALSE)
```
The first time to load a newly produced GDS file will take long time due to data
reformatting for quick access.


# Utility Methods
## Getters
Getter functions allow you to retrieve basic information of genotype data, e.g. the number of SNPs and samples, chromosome names, physical position of SNPs and alleles.
```{r}
# Number of samples
nsam(gds)
```
\ 
```{r}
# Number of SNPs
nmar(gds) 
```
\ 
```{r}
# Indices of chromosome ID of all markers
head(getChromosome(gds)) 
```
\ 
```{r}
# Chromosome names of all markers
head(getChromosome(gds)) 
```
\ 
```{r}
# Position (bp) of all markers
head(getPosition(gds)) 
```
\ 
```{r}
# Reference allele of all markers
head(getAllele(gds)) 
```
\ 
```{r}
# SNP IDs
head(getMarID(gds)) 
```
\ 
```{r}
# sample IDs
head(getSamID(gds)) 
```
\ 

The function `getGenotype()` returns genotype call data in which integer numbers 0, 1, and 2 indicate the number of reference allele.
```{r}
geno <- getGenotype(gds)
```
\ 

The function `getRead()` returns read count data as a list with two elements `ref` and `alt` containing reference read counts and alternative read counts, respectively.
```{r}
geno <- getRead(gds)
```



## Data Summarization
### Collect Basic Summary Statistics
`countGenotype()` and `countRead()` are class methods of `GbsrGenotypeData` and they summarize genotype counts and read counts per marker and per sample. 
```{r}
gds <- countGenotype(gds)
gds <- countRead(gds)
```
\ 

### Visualize Basic Summary Statistics
These summary statistics can be visualized via plotting functions.
With the values obtained via `countGenotype()`, we can plot histograms of missing rate , heterozygosity, reference allele frequency as shown below.
```{r fig.alt="Missing rate per marker and per sample."}
# Histgrams of missing rate
histGBSR(gds, stats = "missing") 
```
\ 
```{r fig.alt="Heterozygosity per marker and per sample."}
# Histgrams of heterozygosity
histGBSR(gds, stats = "het") 
```
\ 
```{r fig.alt="Reference allele frequency per marker and per sample."}
# Histgrams of reference allele frequency
histGBSR(gds, stats = "raf") 
```
\ 

With the values obtained via `countRead()`, we can plot histograms of total read depth , allele read depth , reference read frequency as shown below.
```{r fig.alt="Total read depth per marker and per sample."}
# Histgrams of total read depth
histGBSR(gds, stats = "dp") 
```
\ 
```{r fig.alt="Reference read depth per marker and per sample."}
# Histgrams of allelic read depth
histGBSR(gds, stats = "ad_ref") 
```
\ 
```{r fig.alt="Alternative read depth per marker and per sample."}
# Histgrams of allelic read depth
histGBSR(gds, stats = "ad_alt") 
```
\ 
```{r fig.alt="Reference read per marker and per sample."}
# Histgrams of reference allele frequency
histGBSR(gds, stats = "rrf") 
```
\ 
```{r fig.alt="Mean of reference read depth per marker and per sample."}
# Histgrams of mean allelic read depth
histGBSR(gds, stats = "mean_ref") 
```
\ 
```{r fig.alt="SD of reference read depth per marker and per sample."}
# Histgrams of standard deviation of read depth
histGBSR(gds, stats = "sd_ref") 
```
\ 
```{r fig.alt="SD of alternative read depth per marker and per sample."}
# Histgrams of standard deviation of read depth
histGBSR(gds, stats = "sd_ref") 
```
\ 

`plotGBSR()` and `pairsGBSR()` provide other ways to visualize statistics. `plotGBSR()` draws a line plot of a specified statistics per marker along each chromosome. `pairsGBSR()` give us a two-dimensional scatter plot to visualize relationship between statistics.
```{r}
plotGBSR(gds, stats = "missing")
```
\ 

```{r}
plotGBSR(gds, stats = "geno")
```
\ 

```{r}
pairsGBSR(gds, stats1 = "missing", stats2 = "dp")
```
\ 

### Getter Methods for Summary Statistics
The statistics obtained via `countGenotype()`, `countReat()`, and `calcReadStats()` are sotred in the `snpAnnot` and `scanAnnot` slots. They can be retrieved using getter functions as follows.
```{r}
# Reference genotype count per marker
head(getCountGenoRef(gds, target = "marker")) 
# Reference genotype count per sample
head(getCountGenoRef(gds, target = "sample")) 
```
\ 
```{r}
# Heterozygote count per marker
head(getCountGenoHet(gds, target = "marker")) 
# Heterozygote count per sample
head(getCountGenoHet(gds, target = "sample")) 
```
\ 
```{r}
# Alternative genotype count per marker
head(getCountGenoAlt(gds, target = "marker")) 
# Alternative genotype count per sample
head(getCountGenoAlt(gds, target = "sample"))
```
\ 
```{r}
# Missing count per marker
head(getCountGenoMissing(gds, target = "marker")) 
# Missing count per sample
head(getCountGenoMissing(gds, target = "sample")) 
```
\ 
```{r}
# Reference allele count per marker
head(getCountAlleleRef(gds, target = "marker")) 
# Reference allele count per sample
head(getCountAlleleRef(gds, target = "sample")) 
```
\ 
```{r}
# Alternative allele count per marker
head(getCountAlleleAlt(gds, target = "marker")) 
# Alternative allele count per sample
head(getCountAlleleAlt(gds, target = "sample")) 
```
\ 
```{r}
# Missing allele count per marker
head(getCountAlleleMissing(gds, target = "marker")) 
# Missing allele count per sample
head(getCountAlleleMissing(gds, target = "sample")) 
```
\ 
```{r}
# Reference read count per marker
head(getCountReadRef(gds, target = "marker")) 
# Reference read count per sample
head(getCountReadRef(gds, target = "sample")) 
```
\ 
```{r}
# Alternative read count per marker
head(getCountReadAlt(gds, target = "marker")) 
# Alternative read count per sample
head(getCountReadAlt(gds, target = "sample")) 
```
\ 
```{r}
# Sum of reference and alternative read counts per marker
head(getCountRead(gds, target = "marker")) 
# Sum of reference and alternative read counts per sample
head(getCountRead(gds, target = "sample")) 
```
\ 
```{r}
# Mean of reference allele read count per marker
head(getMeanReadRef(gds, target = "marker")) 
# Mean of reference allele read count per sample
head(getMeanReadRef(gds, target = "sample"))
```
\ 
```{r}
# Mean of Alternative allele read count per marker
head(getMeanReadAlt(gds, target = "marker")) 
# Mean of Alternative allele read count per sample
head(getMeanReadAlt(gds, target = "sample")) 
```
\ 
```{r}
# SD of reference allele read count per marker
head(getSDReadRef(gds, target = "marker")) 
# SD of reference allele read count per sample
head(getSDReadRef(gds, target = "sample")) 
```
\ 
```{r}
# SD of Alternative allele read count per marker
head(getSDReadAlt(gds, target = "marker")) 
# SD of Alternative allele read count per sample
head(getSDReadAlt(gds, target = "sample"))
```
\ 
```{r}
# Minor allele frequency per marker
head(getMAF(gds, target = "marker")) 
# Minor allele frequency per sample
head(getMAF(gds, target = "sample")) 
```
\ 
```{r}
# Minor allele count per marker
head(getMAC(gds, target = "marker")) 
# Minor allele count per sample
head(getMAC(gds, target = "sample")) 
```
\ 

You can get the proportion of each genotype call with `prop = TRUE`.
```{r}
head(getCountGenoRef(gds, target = "marker", prop = TRUE))
head(getCountGenoHet(gds, target = "marker", prop = TRUE))
head(getCountGenoAlt(gds, target = "marker", prop = TRUE))
head(getCountGenoMissing(gds, target = "marker", prop = TRUE))
```
\ 

The proportion of each allele counts.
```{r}
head(getCountAlleleRef(gds, target = "marker", prop = TRUE))
head(getCountAlleleAlt(gds, target = "marker", prop = TRUE))
head(getCountAlleleMissing(gds, target = "marker", prop = TRUE))
```
\ 

The proportion of each allele read counts.
```{r}
head(getCountReadRef(gds, target = "marker", prop = TRUE))
head(getCountReadAlt(gds, target = "marker", prop = TRUE))
```


# Filtering and Subsetting
## Filtering Data
Based on the statistics we obtained, we can filter out less reliable markers and samples using `setMarFilter()` and `setSamFilter()`, respectively.
```{r}
gds <- setMarFilter(gds, missing = 0.2, het = c(0.1, 0.9), maf = 0.05)
gds <- setSamFilter(gds, missing = 0.8, het = c(0.25, 0.75))
```
\ 

`setCallFilter()` is another type of filtering which works on each genotype call that is a data point at a marker in a sample. We can replace some genotype calls with missing based on the specified criteria. If you would like to filter out less reliable genotype calls that are only supported by less than 5 reads, set the arguments as below.
```{r}
gds <- setCallFilter(gds, dp_count = c(5, Inf))
```
\ 

If needed to remove genotype calls supported by too many reads, which might be the results of mismapping from repetitive sequences, set as follows.
```{r}
# Filtering genotype calls based on total read counts
gds <- setCallFilter(gds, dp_qtile = c(0, 0.9))

# Filtering genotype calls based on reference read counts 
# and alternative read counts separately.
gds <- setCallFilter(gds, ref_qtile = c(0, 0.9), alt_qtile = c(0, 0.9))
```
\ 

Usually reference reads and alternative reads show different data distributions. Thus, we can set the different thresholds for them via `dp_qtile`, `ref_qtile`, and `alt_qtile` to filter out genotype calls based on quantiles of total, reference, and alternative read counts in each sample.
\ 

Here, the following codes filter out calls supported by less than 5 reads and then filter out markers showing a missing rate of more than 20%.
```{r}
gds <- setCallFilter(gds, dp_count = c(5, Inf))
gds <- setMarFilter(gds, missing = 0.2)
```
\

#### Important Instruction for Filtering
Based on our study using simulation data and real data for a rice F2 population derived from a cross between distant relatives (cultivar x wild species), we recommend the setting of `ref_qtile = c(0, 0.9), alt_qtile = c(0, 0.9)` to filter out markers with over represented reads. If your population contains samples that have only either of reference or alternative reads at the majority of markers, filtering with `ref_qtile = c(0, 0.9), alt_qtile = c(0, 0.9)` will set missing to a large portion of markers for the samples. In that case, it is better to set `dp_qtile = c(0, 0.9)`. In addition, the error correction by GBScleanR does not require any filtering for markers based on missing rate, heterozygosity, and allele frequency. Therefore, `setMarFilter()` and `setSamFilter()` will be used only when you have specific markers and samples that should be removed. In that case, please specify marker IDs and sample IDs to the `id` argument of `setMarFilter()` and `setSamFilter()`, respectively.
```{r}
gds <- setCallFilter(gds, ref_qtile = c(0, 0.9), alt_qtile = c(0, 0.9))
invalid_mar <- getMarID(gds)[1:5]
gds <- setMarFilter(gds, id = invalid_mar)
invalid_sam <- getSamID(gds)[1:3]
gds <- setSamFilter(gds, id = invalid_sam)
```




In addition to those statistics based filtering functions, `r Biocpkg("GBScleanR")` provides a filtering function based on relative marker positions. Markers locating too close each other usually have redundant information, especially if those markers are closer each other than the read length, in which case the markers are supported by completely (or almost) the same set of reads. To select only one marker from those markers, we can sue `thinMarker()`. This function selects one marker having the least missing rate from each stretch of the specified length. If some markers have the least missing rate, select the first marker in the stretch.
```{r}
# Here we select only one marker from each 150 bp stretch.
gds <- thinMarker(gds, range = 150) 
```
\ 

We can obtain the summary statistics using `countGenotype()`, `countRead()`, and `calcReadStats()` for only the SNPs and samples retained after the filtering. Importantly, we need to set `node = "filt` if we have apllied `setCallFiler()`. Otherwise, `countGenotype()` used the raw genotype calls.
```{r}
gds <- countGenotype(gds, node = "filt")
gds <- countRead(gds, node = "filt")
```
\ 

We can check which markers and samples are retained after the filtering using `validSnp()` and `validSam()`.
```{r}
head(validMar(gds))
head(validSam(gds))
```
\ 

The class methods of `GbsrGenotypeData` basically work with only the markers and samples retained after filtering. To use all markers and samples, please specify `valid = FALSE` to the `GbsrGenotypeData` class methods.
```{r}
nmar(gds)
nmar(gds, valid = FALSE)
```
\ 

## Reset Filtering
We can reset filtering as following.
```{r}
# Reset the filter on markers
gds <- resetMarFilter(gds) 

# Reset the filter on samples
gds <- resetSamFilter(gds) 

# Reset the filter on calls
gds <- resetCallFilter(gds) 

# Reset all filters
gds <- resetFilter(gds) 
```



# Error Correction
The error correction algorithm of `r Biocpkg("GBScleanR")` bases on the HMM assuming observed allele read counts for each SNP marker along a chromosome as the outputs from a sequence of latent true genotypes. Our model supposes that a population of $N^o \geq 1$ sampled offspring was originally derived form the crosses between $N^f \geq 2$ parent individuals. The parents can be inbred lines having homozygotes at all markers and outbred lines in which markers show heterozygous genotype.


## Set Replicates
The update on Mar 7, 2024 added a function to set sample replicate information 
to jointly evaluate read counts for replicates in the genotype estimation by 
the `estGeno()` function. The `estGeno()` function sums up the read counts of 
replicates specified by `setReplicates()` and estimates genotypes based on the 
summed-up read counts. The samples specified as replicates in `setReplicates()` 
will have the same genotypes at all markers in the estimated genotypes obtained 
via `estGeno()`. The `setReplicates()` function assumes that replicate 
information would be supplied for all samples in the data including parents via 
the `replicates` argument. In addition, the `setReplicates()` function assumes 
that the samples that are assigned same numbers or characters via the 
`replicates` argument are replicates. Therefore, the ordering of samples in the 
data and the identifiers in the vector specified to `replicates` should match. 
Replicates can be specified as follows. 
First, it is better to confirm the ordering of samples in the data with the 
setting "valid = FALSE" to obtain all sample IDs.
```{r}
sample_id <- getSamID(gds, valid = FALSE)
sample_id
```
Here, as an example, we assume that the samples at odd indices in the 
`sample_id` vector are the replicates of the next samples at even indices. For 
example, F2_1054 and F2_1055 are replicates for which DNA samples were extracted
 from the same F2 individual although they have different samples IDs. In this 
 case, you can set replicate information as shown below.
```{r}
gds <- setReplicates(gds, replicates = rep(1:51, each = 2))
```
As another example, if parents in the data are Founder1 and Founder2 and 
replicates are F2_1054 and F2_1022 for Founder1 and F2_1178 and F2_1637 for 
Founder2, you should give a vector to the `replicates` argument like the 
following.
```{r}
rep_of_parent1 <- sample_id %in% c("Founder1", "F2_1054", "F2_1022")
rep_of_parent2 <- sample_id %in% c("Founder2", "F2_1178", "F2_1637")
sample_id[rep_of_parent1] <- "Founder1"
sample_id[rep_of_parent2] <- "Founder2"
gds <- setReplicates(gds, replicates = sample_id)
```
If you set replicates for parents, you should give a sample id of the replicates 
 as an identifier for a parent in `setParents()` as described in the next 
 section. If you set replicates for parents after setting parents by 
 `setParents()`, the replicates for parents will be also set as parents with 
 assigning the same member ID for the replicates of each parent.



To reset the assigned replicate information, please use `setReplicates()` with 
specifying different values to the `replicates` argument.
```{r}
gds <- setReplicates(gds, replicates = seq_len(nsam(gds, valid = FALSE)))
```



## Set Parents
As the first step for genotype error correction, we have to specify which samples are the parents of the population via `setParents()`. In the case of genotype data in the biparental population, people usually filter out SNPs which are not monomorphic in each parental sample and not biallelic between parents. `setParents()` automatically do this filtering, if you set `mono = TRUE` and `bi = TRUE`.
```{r}
p1 <- grep("Founder1", getSamID(gds, valid = FALSE), value = TRUE)
p2 <- grep("Founder2", getSamID(gds, valid = FALSE), value = TRUE)
gds <- setParents(gds, parents = c(p1, p2), mono = TRUE, bi = TRUE)
```
If you set replicates for parents, you should give a sample id of the replicates 
 as an identifier for a parent in `setParents()`. In the last example in the 
 previous section, we set three replicates for each parent. To properly set 
 parents, we should specify either of "Founder1", "F2_1054", and "F2_1022" for 
 `p1` and either of "Founder2", "F2_1178", "F2_1637" for `p2`.

## Data QC and Filtering
The next step is to visualize statistical summaries of the data. Get genotype data summaries as mentioned in the previous section.
```{r}
gds <- countGenotype(gds)
```
\ 

Then, get histograms.
```{r}
histGBSR(gds, stats = "missing")
```
\ 
```{r}
histGBSR(gds, stats = "het")
```
\ 
```{r}
histGBSR(gds, stats = "raf")
```
\ 

As the histograms showed, the data contains a lot of missing genotype calls with unreasonable heterozygosity in a F2 population. Reference allele frequency shows a bias to reference allele. If you can say your population has no strong segregation distortion in any positions of the genome, you can filter out the markers having too high or too low reference allele frequency.
```{r}
# filter out markers with reference allele frequency
# less than 5% or more than 95%.
gds <- setMarFilter(gds, maf = 0.05) 
```
\ 

However, sometimes filtering based on allele frequency per marker removes all markers from regions truly showing segregation distortion. Although heterozygosity can be a criterion to filter out markers, this will removes too many markers which even contains useful information for genotyping. In addition, as we described in the previous section, the error correction by GBScleanR does not require any filtering for markers based on missing rate, heterozygosity, and allele frequency.
\ 

If we found poor quality samples in you dataset based on missing rate, heterozygosity, and reference allele frequency, we can omit those samples with `setSamFilter()`.
```{r}
# Filter out samples with more than 90% missing genotype calls,
# less than 5% heterozygosity, and less than 5% minor allele frequency.
gds <- setSamFilter(gds, missing = 0.9, het = c(0.05, 1), maf = 0.05)
```
\ 

Before filtering using `setMarFilter()` and `setSamFilter()`, we recomend that you conduct filtering on each genotype call based on read depth. The error correction via `r Biocpkg("GBScleanR")` is robust against low coverage calls, while genotype calls messed up by mismapping might lead less reliable error correction. Therefore, filtering for extremely high coverage calls are necessary rather than that for low coverage ones.
```{r}
# Filter out genotype calls supported by reads less than 2 reads.
gds <- setCallFilter(gds, dp_count = c(2, Inf))

# Filter out genotype calls supported by reads more than 100.
gds <- setCallFilter(gds, dp_count = c(0, 100))

# Filter out genotype calls based on quantile values 
# of read counts at markers in each sample.
gds <- setCallFilter(gds, ref_qtile = c(0, 0.9), alt_qtile = c(0, 0.9))
```
\ 

Since missing genotype calls left in the data basically give no negative effect on genotype error correction. Therefore, you can leave any missing genotype calls. We can, however, remove markers based on missing genotype calls.
```{r}
# Remove markers having more than 75% of missing genotype calls
gds <- setMarFilter(gds, missing = 0.2) 
nmar(gds)
```
\ 

To check statistical summaries of the filtered genotype data, we need to set `node = "filt`. Otherwise, `countGenotype()` used the raw genotype data.
```{r}
gds <- countGenotype(gds, node = "filt")
```
\ 
```{r}
histGBSR(gds, stats = "missing")
```
\ 
```{r}
histGBSR(gds, stats = "het")
```
\ 
```{r}
histGBSR(gds, stats = "raf")
```
\ 

We can still see the markers showing distortion in allele frequency, while the expected allele frequency is 0.5 in a F2 population. To investigate that those markers having distorted allele frequency were derived from truly distorted regions or just error prone markers, we must check if there are regions where the markers with distorted allele frequency are clustered.
```{r}
plotGBSR(gds, stats = "raf")
```
\ 

No region seem to have severe distortion. Based on the histogram of reference allele frequency, we can roughly cut off the markers with frequency more than 0.75 or less than 0.25, in other words, less than 0.25 of minor allele frequency. As we mentioned already in the previous section, the error correction by GBScleanR basically works finely without any filtering for markers based on missing rate, heterozygosity, and allele frequency.
```{r}
gds <- setMarFilter(gds, maf = 0.25)
nmar(gds)
```
\ 

Let's see the statistics again.
```{r}
gds <- countGenotype(gds)
histGBSR(gds, stats = "missing")
```
\ 
```{r}
histGBSR(gds, stats = "het")
```
\ 
```{r}
histGBSR(gds, stats = "raf")
```
\ 

At the end of filtering, check marker density and genotype ratio per marker along chromosomes.
```{r}
# Marker density
plotGBSR(gds, stats = "marker")
```
\ 
```{r}
plotGBSR(gds, stats = "geno")
```
\ 
The `coord` argument controls the number of rows and columns of the facets in the plot.


## Genotype Estimation
### Prepare Scheme Information
Before executing the function for true genotype estimation, we need to build a scheme object. The update on May 17, 2024 added a wrapper function for scheme information preparation. If your population has been developed in a relatively simple scheme, you can use the `makeScheme()` function.
```{r}
# For selfed F2 population
gds <- makeScheme(gds, generation = 2, crosstype = "self")

# For sibling-crossed F2 population
gds <- makeScheme(gds, generation = 2, crosstype = "sib")

# For selfed F5 population
gds <- makeScheme(gds, generation = 5, crosstype = "self")

# For F1 population
gds <- makeScheme(gds, generation = 1) # the crosstype argument will be ignored.
```
When your population has $2^n$ parents specified by `setParents()`, `makeScheme()` assumes those parents were crossed in the "funnel" design in which $2^n$ parents are crossed to obtain $2^n/2$ F1 hybrids followed by successive intercrossings (pairings) of the hybrids to combine the genomes of all parents in one family of siblings. The `makeScheme()` function assumes that the parents that were assigned an odd number member ID (N) in `setParents()` had been crossed with the parent that were assigned an even number (N+1). For example, if you set parents as shown below. The `makeScheme()` function prepare a scheme information that indicates the intercrossings of "p1 x p2", "p3 x p4", "p5 x p6", and "p7 x p8" followed by crossing of "p1xp2_F1 x p3xp4_F1" and "p5xp6_F1 x p7xp8_F1" and then crossing of the two 4-way crossed liens to produce 8-way crossed hybrid lines. If, for example, `generation = 5` indicating an F5 generation was specified to `makeScheme()`, the function adds 4 successive selfing or sibling crossings in the scheme.
```{r, eval=FALSE}
# Do not run.
gds <- setParents(gds, 
                  parents = c("p1", "p2", "p3", "p4", "p5", "p6", "p7", "p8"))
# Member IDs will be 1, 2, 3, 4, 5, 6, 7, and 8 
# for p1, p2, p3, p4, p5, p6, p7, and p8, respectively.
```



In many cases, `makeScheme()` is enough to prepare scheme information. However, if your population underwent more complicated crossings, please register scheme information step-by-step as shown below.

Our sample data is of a biparental F2 population derived from inbred parents. Therefore, we should run `initScheme()` and `addScheme()` as following.
```{r}
# As always the first step of breeding scheme would be "pairing" cross(es) of 
# parents, never be "selfing" and a "sibling" cross,
# the argument `crosstype` in initScheme() was deprecated on the update on April 6, 2023.
# gds <- initScheme(gds, crosstype = "pairing", mating = matrix(1:2, 2))
gds <- initScheme(gds, mating = rbind(1, 2))
gds <- addScheme(gds, crosstype = "selfing")
```
\ 

The function `initScheme()` initializes the scheme object with information about parents. You need to specify a matrix indicating combinations of `mating`, in which each column shows a pair of parental samples. For example, if you have only two parents, the `mating` matrix should be `mating = matrix(1:2, nrow = 2, ncol = 1)` or equivalents. The indices used in the matrix should match with the IDs labeled to parental samples by `setParents()`. To confirm the IDs for parental samples, run the following code.
```{r}
getParents(gds)
```
The created `GbsrScheme` object is set in the `scheme` slot of the `GbsrGenotypeData` object.
The function `addScheme()` adds the information about the next breeding step of your population. In the case of our example data, the second step was selfing to produce F2 individuals from the F1 obtained via the first parent crossing.
\  

The codes shown below in the rest of the section "Prepare scheme information" are sample codes assuming some specific situations that are not applicable for the sample data used in this vignette. Therefore, you will get error messages if you run the codes.



If your population was derived from a 4-way or 8-way cross, you need to add more `paring` steps. In the case of 8-way RILs developed by three pairing crosses followed by five selfing cycles, the scheme object should be built as following. First we need to initialize the scheme object with specifying the first mating scheme. The crosstype argument should be "pairing" and the mating argument should be given as a matrix in which each pairing combination of parents are shown in each column. The following case indicates the pairing of parent 1 and 2 as well as 3 and 4, 5 and 6, and 7 and 8.
```{r eval=FALSE}
# As always the first step of breeding scheme would be "pairing" cross(es) of 
# parents, never be "selfing" and a "sibling" cross,
# the argument `crosstype` in initScheme() was deprecated on the update on April 6, 2023.
# gds <- initScheme(gds, crosstype = "pair", mating = cbind(c(1:2), c(3:4), c(5:6), c(7:8)))

# Do not run.
gds <- initScheme(gds, mating = cbind(c(1:2), c(3:4), c(5:6), c(7:8)))
```
\  

Now the progenies of the crosses above have member ID 9, 10, 11,
and 12 for each combination of mating. You can check IDs with showScheme().

Then, add the step to make 4-way crosses.
```{r eval=FALSE}
# Do not run.
gds <- addScheme(gds, crosstype = "pair", mating = cbind(c(9:10), c(11:12)))

# Check IDs.
showScheme(gds)
```
\  
Add the last generation of the paring step .
```{r eval=FALSE}
# Do not run.
gds <- addScheme(gds, crosstype = "pair", mating = cbind(c(13:14)))

#' # Check IDs.
showScheme(gds)
```
\  
Now we have the scheme information of a 8-way cross.
The follwoing steps add the selfing cycles.
```{r eval=FALSE}
# Inbreeding by five times selfing.
# Do not run.
gds <- addScheme(gds, crosstype = "self")
gds <- addScheme(gds, crosstype = "self")
gds <- addScheme(gds, crosstype = "self")
gds <- addScheme(gds, crosstype = "self")
gds <- addScheme(gds, crosstype = "self")
```
You can set `crosstype = "sibling"` or `crosstype = "random"`, if your population was developed through sibling crosses or random crosses, respectively.

The update on April 4, 2023 introduced new function to GBScleanR.
The genotype estimation algorithm in estGeno() supports populations that consist
of samples belonging to different pedigree. For example, if you have a population
 of F1 samples that derived from three different crosses of four parents: 
Founder1 x Founder2, Founder1 x Founder3, Founder1 x Founder4. You can build a 
scheme info as following.
```{r eval=FALSE}
# Do not run.
gds <- setParents(object = gds,
                  parents = c("Founder1", "Founder2", "Founder3", "Founder4"))
gds <- initScheme(object = gds, 
                  mating = cbind(c(1, 2), c(1, 3), c(1,4)))
# The initScheme() function here automatically set 5, 6, and 7 as member ID to
# the progenies of the above maiting (pairing) combinations, respectively.

# Then you have to assign member IDs to your samples to indicate which sample 
# belongs to which pedigree.
gds <- assignScheme(object = gds, 
                    id = c(rep(5, 10), rep(6, 15), rep(7, 20)))
```

The assignScheme() assign member IDs `id` to the samples in order.
Please confirm the order of the member IDs in `id` and the order of the
sample IDs shown by getSamID(gds).
```{r eval=FALSE}
# Do not run.
# Get sample ID
sample_id <- getSamID(object = gds)

# Initialize the id vector
id <- integer(nsam(gds))

# Assume your samples were named with prefixes that indicate which 
# sample was derived from which combination of parents.
id[grepl("P1xP2", sample_id)] <- 5
id[grepl("P1xP3", sample_id)] <- 6
id[grepl("P1xP4", sample_id)] <- 7
gds <- assignScheme(object = gds, id = id)
```


### Execute Genotype Estimation
The following codes suppose that you built the scheme object for the example data that is a biparental F2 population derived from a cross between inbred parents, not for the 8-way RILs explained above.
Now we can execute genotype estimation for error correction. `r Biocpkg("GBScleanR")` estimates error pattern via iterative optimization of parameters for genotype estimation. We could not guess the best number of iterations, but our simulation tests showed `iter = 4` usually saturates the improvement of estimation accuracy.
```{r message=FALSE, results=FALSE}
gds <- estGeno(gds, node = "filt", iter = 4)
```
\  
*As an important change in estGeno() since version 2, the function now requires the node argument to explicitly specify whether raw or filtered read counts should be used for genotype estimation. If setCallFilter() has not been executed, the user-specified node argument will be ignored, and the "raw" node will be used by default.*

\  

If your population derived from outbred parents, please set `het_parents = TRUE`.
```{r eval=FALSE}
# Do nut run
# This is an example to show the case to use "het_parents = TRUE".
gds <- estGeno(gds, het_parent = TRUE, iter = 4)
```
The larger number of iterations makes running time longer. If you would like to execute no optimization, set `optim = FALSE` or `iter = 1`.
\ 

```{r eval=FALSE}
# Following codes do the same.

# Do nut run
# These are examples to show the case to set "iter = 1" or "optim = FALSE".
gds <- estGeno(gds, iter = 1)
gds <- estGeno(gds, optim = FALSE)
```
\ 

### Get the Results
All of the results of estimation are stored in the gds file linked to the `GbsrGenotypeData` object. You can obtain the estimated genotype data via the `getGenotype()` function with `node = "cor"`.
```{r}
est_geno <- getGenotype(gds, node = "cor")
```
\ 

`r Biocpkg("GBScleanR")` also estimates phased parent genotypes and you can access it.
```{r}
parent_geno <- getGenotype(gds, node = "parents")
```




`r Biocpkg("GBScleanR")` first internally estimate phased haplotype and then convert them to genotype calls. If you need the estimated haplotype data, run `getHaplotype()`.
```{r}
est_hap <- getHaplotype(gds)
```
\ 

The function `gbsrGDS2VCF()` generate a VCF file containing the estimated genotype data and phased haplotype information. The estimated haplotypes are indicated in the FORMAT field with the HAP tag. The parent genotypes correspond to each haplotype are indicated in the INFO field with the PGT tag. HAP shows the pair of haplotype for each marker of each sample, while PGT shows the allele of each haplotype. HAP is indicated by two numbers separated by a pipe symbol "|". Each of the numbers takes one of the numbers from 1 to 2*N, where N is the number of parents.
PGT is indicated by 2*N numbers separated by a pipe symbol "|". The first number in PGT represents the allele of haplotype 1 at the marker, and the rest numbers also show the alleles of rest haplotypes 2, 3 and 4, if your population is of biparental diploid samples. As a default, `gbsrGDS2VCF()` outputs the estimated genotype data as entries of the CGT tag in the FORMAT field. With `node = "cor"`, you can output a VCF file in which GT field was replaced with the estimated genotype data obtained via `estGeno()`.

```{r}
out_fn <- tempfile("sample_est", fileext = ".vcf.gz")
gbsrGDS2VCF(gds, out_fn)
gbsrGDS2VCF(gds, out_fn, node = "cor")
```
The output vcf file contains data like the one shown below.
```{}
##fileformat=VCFv4.2
##fileDate=20240909
##source=SeqArray_Format_v1.0
##INFO=<ID=PGT,Number=1,Type=String,Description="Estimated allele of parental haplotype by GBScleanR">
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=FAD,Number=2,Type=Integer,Description="Call-filtered read counts generated by GBScleanR">
##FORMAT=<ID=FGT,Number=2,Type=Float,Description="Estimated mismapping rate by GBScleanR">
##FORMAT=<ID=HAP,Number=1,Type=String,Description="Estimated haplotype data by GBScleanR">
#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  F2_1054  F2_1055  F2_1059  F2_1170  F2_1075  F2_1070  F2_1074  F2_1007  F2_1009  F2_1489  F2_1010  F2_1014  F2_1490  F2_1022  F2_1267  F2_1382  F2_1038  Founder1  F2_1216  F2_1571  F2_1695  F2_1575  F2_1584  F2_1355  F2_1236  F2_1357  F2_1470  F2_1762  F2_1884  F2_1763  F2_1400  F2_1885  F2_1643  F2_1547  F2_1784  F2_1785  F2_1314  F2_1317  F2_1600  F2_1848  F2_1849  F2_1719  F2_1850  F2_1502  F2_1868  F2_1740  F2_1741  F2_1755  F2_1756  F2_1874  Founder2  F2_1811  F2_1812  F2_1700  F2_1824  F2_1827  F2_1710  F2_1714  F2_1166  F2_1282  F2_1178  F2_1192  F2_1195  F2_1361  F2_1122  F2_1486  F2_1132  F2_1378  F2_1257  F2_1258  F2_1262  F2_1384  F2_1152  F2_1274  F2_1150  F2_1688  F2_1681  F2_1682  F2_1684  F2_1686  F2_1565  F2_1339  F2_1333  F2_1222  F2_1592  F2_1472  F2_1476  F2_1404  F2_1527  F2_1641  F2_1666  F2_1541  F2_1677  F2_1671  F2_1602  F2_1618  F2_1635  F2_1637
1  522289  S01_522289  A  G  100  PASS  PGT=0|0|1|1;ADB=0.520833;MR=0,0  GT:AD:FAD:FGT:HAP:EDS  0/0:10,0:10,0:0/0:1|1:0  1/0:6,6:6,6:0/1:2|1:1  0/0:6,0:6,0:0/0:1|1:0  1/1:0,12:0,0:./.:2|2:2  1/0:12,9:0,0:./.:2|1:1  1/0:3,1:3,1:0/1:2|1:1  0/1:5,5:5,5:0/1:1|2:1  1/0:5,2:5,2:0/1:2|1:1  0/1:4,1:4,1:0/1:1|2:1  1/0:1,4:1,4:0/1:2|1:1  1/1:0,14:0,0:./.:2|2:2  1/0:3,6:3,6:0/1:2|1:1  1/0:6,5:6,5:0/1:2|1:1  1/1:9,6:0,0:./.:2|2:2  1/1:0,14:0,14:1/1:2|2:2  0/0:12,0:12,0:0/0:1|1:0  0/0:1,0:1,0:0/0:1|1:0  0/0:146,0:146,0:0/0:1|1:0  0/0:8,0:8,0:0/0:1|1:0  1/0:4,6:4,6:0/1:2|1:1  0/1:11,3:11,3:0/1:1|2:1  0/1:2,2:2,2:0/1:1|2:1  0/1:11,8:11,8:0/1:1|2:1  0/1:6,6:6,6:0/1:1|2:1  0/1:3,2:3,2:0/1:1|2:1  0/0:16,0:16,0:0/0:1|1:0  0/0:24,0:0,0:./.:1|1:0  1/0:6,10:6,10:0/1:2|1:1  0/0:19,0:19,0:0/0:1|1:0  0/0:7,0:7,0:0/0:1|1:0  0/0:8,0:8,0:0/0:1|1:0  1/1:0,9:0,9:1/1:2|2:2  ./.:3,6:0,0:./.:.|.:0  0/0:19,0:19,0:0/0:1|1:0  0/0:8,0:8,0:0/0:1|1:0  0/0:6,0:6,0:0/0:1|1:0  1/0:7,4:7,4:0/1:2|1:1  0/1:10,7:10,7:0/1:1|2:1  0/1:4,8:4,8:0/1:1|2:1  0/1:6,4:6,4:0/1:1|2:1  1/0:7,1:7,1:0/0:2|1:1  0/1:4,8:4,8:0/1:1|2:1  0/1:7,3:7,3:0/1:1|2:1  0/0:5,0:0,0:./.:1|1:0  0/1:4,18:4,18:0/1:1|2:1  0/0:8,0:8,0:0/0:1|1:0  0/1:4,6:0,0:./.:1|2:1  0/1:5,4:5,4:0/1:1|2:1  0/0:8,0:8,0:0/0:1|1:0  1/1:0,8:0,0:./.:2|2:2  1/1:0,285:0,285:1/1:2|2:2  0/1:3,2:3,2:0/1:1|2:1  1/0:6,3:6,3:0/1:2|1:1  1/0:4,5:4,5:0/1:2|1:1  1/0:10,4:0,0:./.:2|1:1  0/0:12,0:12,0:0/0:1|1:0  0/0:10,0:10,0:0/0:1|1:0  0/0:12,0:0,0:./.:1|1:0  0/0:13,0:0,0:./.:1|1:0  0/1:5,3:5,3:0/1:1|2:1  0/0:8,0:8,0:0/0:1|1:0  0/1:5,1:5,1:0/1:1|2:1  ./.:0,6:0,6:1/1:.|.:0  1/0:2,7:2,7:0/1:2|1:1  1/1:0,7:0,7:1/1:2|2:2  1/0:3,1:3,1:0/1:2|1:1  0/0:8,0:0,0:./.:1|1:0  1/0:4,2:4,2:0/1:2|1:1  0/1:2,1:2,1:0/1:1|2:1  1/1:0,10:0,10:1/1:2|2:2  1/0:5,2:5,2:0/1:2|1:1  0/0:3,0:3,0:0/0:1|1:0  0/0:10,0:10,0:0/0:1|1:0  1/1:0,4:0,4:1/1:2|2:2  0/0:4,0:4,0:0/0:1|1:0  0/0:9,0:9,0:0/0:1|1:0  1/1:0,11:0,11:1/1:2|2:2  0/1:3,2:3,2:0/1:1|2:1  0/1:5,2:5,2:0/1:1|2:1  0/0:4,0:4,0:0/0:1|1:0  1/0:3,3:3,3:0/1:2|1:1  1/1:0,8:0,0:./.:2|2:2  0/0:9,0:9,0:0/0:1|1:0  0/1:4,5:4,5:0/1:1|2:1  0/0:10,0:10,0:0/0:1|1:0  0/0:7,0:7,0:0/0:1|1:0  0/1:1,3:1,3:0/1:1|2:1  0/0:7,0:7,0:0/0:1|1:0  1/1:0,11:0,11:1/1:2|2:2  1/0:0,3:0,3:1/1:2|1:1  1/1:0,11:0,0:./.:2|2:2  0/1:6,5:6,5:0/1:1|2:1  1/0:1,4:1,4:0/1:2|1:1  1/0:4,2:4,2:0/1:2|1:1  1/1:0,4:0,4:1/1:2|2:2  0/0:9,0:9,0:0/0:1|1:0  1/0:4,4:4,4:0/1:2|1:1  1/0:6,6:0,0:./.:2|1:1 
```
In the output vcf, "PGT=0|0|1|1" in the INFO field indicates the parental 
samples alleles for haplotype 1, 2, 3, and 4 separated by |. The numbers 0 and 1 represent reference and alternative alleles that are A and G at the marker shown above, resectively.
FAD and FGT in the FORMAT field show the allele read counts and genotype call after filtering by `setCallFiler()`.
HAP and EDS indicate descendent haplotypes and haplotype dosage at the given marker. EDS will be provided only if you specified two parents in `setParents()` and specified `het_parent = FALSE` for `estGeno()`.



Alternatively, you can also output the genotype data into a CSV file using `gbsrGDS2CSV()`
```{r}
out_fn <- tempfile("sample_est", fileext = ".csv")
gbsrGDS2CSV(gds, out_fn)
gbsrGDS2CSV(gds, out_fn, node = "cor")
```
You can format the output for the `r CRANpkg("qtl")` package. 
```{r}
out_fn <- tempfile("sample_est", fileext = ".csv")
gbsrGDS2CSV(gds, out_fn, format = "qtl")
gbsrGDS2CSV(gds, out_fn, node = "cor", format = "qtl")
```
When you set `format = "qtl"`, the marker positions will be automatically converted from physical positions (bp) to genetic distances (cM). The conversion is performed by multiplying the physical positions by the value set to `bp2cm`. The default is `bp2cm = 4e-06` if `format = "qtl"`.

\ 

Please use `reopenGDS()` to open the connection again if you need.
```{r}
gds <- reopenGDS(gds)
```

# Plot Read Ratio and Dosage {#plot_dosage}
The ratio of reference and alternative allele reads can be visualized in dot plots using the `plotReadRatio()` function.
Every single call of `plotReadRatio()` generates dot plots for all chromosomes but only for a single sample. 
```{r}
plotReadRatio(x = gds, coord = c(3, 4), ind = 3, node = "raw", size = 2)
```
Since the sample data contains only one chromosome, `coord = c(3, 4)` does not change any coordinate of the plot. If, for example, you sample has 12 chromosomes, the dot plots for the chromosomes will be shown in 3 rows and 4 columns.
\  

If you need to draw plots for all samples in your dataset, please loop the function over your samples.
```{r eval = FALSE}
for(i in seq_len(nsam(gds))){
plotReadRatio(x = gds, coord = c(3, 4), ind = i, node = "raw")
}
```

If you set `node = "filt"`, you can visualize the read ratio obtained by applying the `setCallFilter()` function.
```{r}
plotReadRatio(x = gds, coord = c(3, 4), ind = 3, node = "filt", size = 2)
```
\  


The `plotDosage()` function enables you to visualize the estimated dosage overlaying on the read ratio plot.
Since this function visualize dosages estimated by the `estGeno()` function, you will get an error message if you call this function before running `estGeno()`.
```{r}
plotDosage(x = gds, coord = c(3, 4), ind = 3, node = "raw", size = 2)
```
The magenta line indicates the estimated alternative allele dosage par marker while colored dots shows alternative allele read ratio par marker. The `plotDosage()` function also draw plots for a single sample in a single call. Please loop the function over samples if needed.

You can also use the filtered read count data for the read ratio visualization in the dosage plot.
```{r}
plotDosage(x = gds, coord = c(3, 4), ind = 3, node = "filt", size = 2)
```

# Closing the Connection
To safely close the connection to the GDS file, use `closeGDS()`.
```{r}
closeGDS(gds)
```



# Session Info{-}
```{r}
sessionInfo()
```