---
title: "Using TENET"
output:
    BiocStyle::html_document:
        toc: true
        toc_depth: 2
package: TENET
author: Rhie Lab at the University of Southern California
date: "`r Sys.Date()`"
abstract: >
    TENET identifies key transcription factors (TFs) and regulatory elements
    (REs) linked to a specific cell type by finding significantly correlated
    differences in gene expression and RE DNA methylation between case and
    control input datasets, and identifying the top genes by number of
    significant RE DNA methylation site links. It also includes many tools for
    visualization and analysis of the results, including plots displaying
    and comparing methylation and expression data and methylation site
    link counts, survival analysis, TF motif searching in the vicinity of
    linked RE DNA methylation sites, custom TAD and peak overlap analysis, and
    UCSC Genome Browser track file generation. A utility function is also
    provided to download methylation, expression, and patient survival data
    from The Cancer Genome Atlas (TCGA) for use in TENET or other analyses.
vignette: >
    %\VignetteIndexEntry{Using TENET}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteDepends{MotifDb}
    \usepackage[utf8]{inputenc}
---

\RaggedRight

```{r echo = FALSE, message = FALSE}
## `crop = NULL` resolves "The magick package is required to crop ..." error;
## see: https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", crop = NULL)
options(tibble.print_min = 4, tibble.print_max = 4)
```

# Introduction

There is a lack of publicly available bioinformatic tools to identify
transcription factors (TFs) that regulate cell type-specific regulatory
elements (REs). To address this, we developed the Tracing regulatory Element
Networks using Epigenetic Traits (TENET) R package. TENET uses histone mark and
open chromatin datasets, along with matched DNA methylation and gene expression
data, to identify dysregulated REs and the TFs bound to them in a particular
cell or tissue type in comparison with another. To assist in identifying these
TFs and REs, we collected hundreds of epigenomic datasets from a variety of cell
lines, primary cells, and tissues, and developed methods to interrogate findings
using motif databases, clinical information, and other genomic datasets from 10
cancer types. Additionally, many downstream analysis functions are included to
aid in the analysis and visualization of the results generated by the TENET
workflow.

This vignette provides an overview of how to run the TENET workflow to identify
key TFs and linked RE DNA methylation sites. It describes how to install TENET,
the necessary input data, and the use of the TENET step 1-7 functions and the
`easyTENET` and `TCGADownloader` utility functions.

# Acquiring and installing TENET and associated packages

To use TENET, users must install the base package, as well as its associated
example experiment data package,
[TENET.ExperimentHub](https://github.com/rhielab/TENET.ExperimentHub).
**Note:** TENET.ExperimentHub will install automatically when TENET is
installed from Bioconductor.

TENET also uses annotation datasets hosted in the Bioconductor AnnotationHub
database. These datasets will be automatically loaded from AnnotationHub when
necessary. They are also available separately via the
[TENET.AnnotationHub](https://github.com/rhielab/TENET.AnnotationHub) package.
It is not necessary to install this package to use TENET.

R 4.5 or a newer version is required.

On Ubuntu 24.04, successful installation required several additional packages.
They can be installed by running the following command in a terminal:

`sudo apt-get install r-base-dev libcurl4-openssl-dev libfreetype-dev libfribidi-dev libfontconfig-dev libharfbuzz-dev libtiff-dev libxml2-dev libssl-dev`

No dependencies other than R are required on macOS or Windows.

Two versions of this package are available.

To install the stable version from Bioconductor, start R and run:

```{r eval = FALSE}
## Install BiocManager, which is required to install packages from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("TENET")
```
The development version containing the most recent updates is available from
our GitHub repository (<https://github.com/rhielab/TENET>).

To install the development version from GitHub, start R and run:

```{r eval = FALSE}
## Install prerequisite packages to install the development version from GitHub
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

BiocManager::install(version = "devel")
BiocManager::install("rhielab/TENET.ExperimentHub")
BiocManager::install("rhielab/TENET")
```

# Loading TENET

To load the TENET package, start R and run:

```{r message = FALSE}
library(TENET)
```

To load the TENET.ExperimentHub package, start R and run:

```{r message = FALSE}
library(TENET.ExperimentHub)
```

To load the TENET.AnnotationHub package if it has been installed, start R and
run:

```{r message = FALSE}
library(TENET.AnnotationHub)
```

# Running TENET without internet access

Some TENET features and examples download datasets from the internet if they
have not already been cached. You must run `TENETCacheAllData()` once while
connected to the internet before using these TENET features or examples without
internet access (for example, on HPC cluster nodes). See the documentation for
`TENETCacheAllData` for more information.

# Input data

TENET primarily makes use of a MultiAssayExperiment object containing the
following data:

## Expression SummarizedExperiment object

A SummarizedExperiment object named "expression" containing gene expression
data for genes in the human genome. Although gene expression values may be
given in a variety of forms, TENET has been primarily tested using
log2-transformed, upper-quartile normalized fragments per kilobase of
transcript per million mapped reads (FPKM-UQ) values. Gene expression values
for each gene in each sample must be included in the assay object within the
"expression" SummarizedExperiment object. Its rownames must contain gene IDs,
and its column names must contain sample IDs. Sample IDs must match those in the
"methylation" SummarizedExperiment object described below. The object does not
need to include any colData values.

```{r message = FALSE}
## Load the MultiAssayExperiment package. This is not required, but allows the
## user to avoid typing MultiAssayExperiment:: before its functions.
library(MultiAssayExperiment)
```

```{r}
## Load the example MultiAssayExperiment dataset from the TENET.ExperimentHub
## package
exampleTENETMultiAssayExperiment <-
    TENET.ExperimentHub::exampleTENETMultiAssayExperiment()

## Examine the SummarizedExperiments that must be contained in a
## MultiAssayExperiment object appropriate for use in TENET analyses
experiments(exampleTENETMultiAssayExperiment)

## Examine the "expression" object
class(
    experiments(
        exampleTENETMultiAssayExperiment
    )[["expression"]]
)

## Examine data for the first 6 genes and samples in the assay object of the
## "expression" SummarizedExperiment object
assays(
    experiments(
        exampleTENETMultiAssayExperiment
    )[["expression"]]
)[[1]][
    seq_len(6), seq_len(6)
]
```

Genomic coordinates for genes can be provided as a GRanges object via the
rowRanges of the "expression" SummarizedExperiment object, or provided
separately via the `geneAnnotationDataset` argument to TENET functions as a
GRanges object (such as one imported by `rtracklayer::import`) or a path to a
GFF3 or GTF file. The provided dataset must contain information on both genes
and transcripts, including gene and transcript IDs (see below), which must match
the IDs in the rownames of the expression assay object. Coordinates for all
genes included in the "expression" SummarizedExperiment object must be present.
It must also include the chromosome, 1-indexed coordinates, and strand of each
gene, and a metadata column named "gene_name" which maps gene IDs to gene names
(used for data summary and plots). Additional columns may be included, but are
not used by TENET. As an exception, rowRanges must be used if the
`easyTENET` function is used to run the step 1 through step 6 functions all at
once.

**Note:** TENET has only been tested with GENCODE and Ensembl gene
annotation datasets. If another dataset is being used, it must use the values
"gene" and "transcript" for feature types, which must be stored in a column
named "type". In GFF3 files, feature types may alternatively be stored in the
first colon-separated field of the "ID" column, the second field of which must
be the ID itself. Types stored there will only be used if the "type" column does
not contain the required types. Gene names must be stored in a column named
"geneName" or "Name". GTF files must contain a "geneId" column, and GFF3 files
must contain an "ID" column. A GRanges object will be assumed to be derived from
a GFF3 file if it contains an "ID" column, and from a GTF file otherwise.
Ensembl GTF files older than release 75 are not supported.

```{r}
## Examine the rowRanges of the "expression" SummarizedExperiment object for the
## first genes.
## Note: Additional columns are included here, but only the chromosome
## (seqnames), coordinates (rowRanges), strand, and geneName columns are used.
head(
    rowRanges(
        experiments(
            exampleTENETMultiAssayExperiment
        )[["expression"]]
    )
)

## The names of this object are the gene IDs
head(
    names(
        rowRanges(
            experiments(
                exampleTENETMultiAssayExperiment
            )[["expression"]]
        )
    )
)
```

## Methylation SummarizedExperiment object

A SummarizedExperiment object named "methylation" containing DNA methylation
data for methylation sites. Methylation values should be given in the form of
beta ($\beta$) values ranging from 0 (low methylation) to 1 (high methylation).
Methylation values for each RE DNA methylation site in each sample must be
included in the assay object within the "methylation" SummarizedExperiment
object. Its rownames must contain methylation site IDs (often the IDs of the
corresponding probes in a methylation array) and its column names must contain
sample IDs. Sample IDs must match those in the "expression" SummarizedExperiment
object described above. The object does not need to include any colData values.

```{r}
## Examine the "methylation" SummarizedExperiment object
class(
    experiments(
        exampleTENETMultiAssayExperiment
    )[["methylation"]]
)

## Examine data for the first six RE DNA methylation sites and samples in the
## assay object of the "methylation" SummarizedExperiment object
assays(
    experiments(
        exampleTENETMultiAssayExperiment
    )[["methylation"]]
)[[1]][
    seq_len(6), seq_len(6)
]
```

Genomic coordinates for RE DNA methylation sites can be provided as a GRanges
object via the rowRanges of the "expression" SummarizedExperiment object, or
retrieved for probes in one of the Illumina methylation arrays supported by the
sesameData package (see `?sesameData::sesameData_getManifestGRanges`) via the
`DNAMethylationArray` argument to TENET functions. As before, rowRanges must be
used with the `easyTENET` function.

If coordinates are provided via a rowRanges object, it must include methylation
site IDs as names, which must match the IDs in the rownames of the methylation
assay object. It must also include the chromosome and 1-indexed coordinates of
all RE DNA methylation sites included in the "methylation" SummarizedExperiment
object. Unlike in the "expression" SummarizedExperiment object, strand
information and a name column are not used. Additional columns may be included,
but are not used by TENET.

```{r}
## Examine the rowRanges of the "methylation" SummarizedExperiment object for
## the first six RE DNA methylation sites.
## Note: Additional columns are included here, but only the chromosome
## (seqnames) and coordinates (ranges) are used.
head(
    rowRanges(
        experiments(
            exampleTENETMultiAssayExperiment
        )[["methylation"]]
    )[, seq_len(6)]
)

## The names of this object are the RE DNA methylation site IDs (usually probe
## IDs)
head(
    names(
        rowRanges(
            experiments(
                exampleTENETMultiAssayExperiment
            )[["methylation"]]
        )
    )
)
```

## MultiAssayExperiment colData data frame (optional)

The colData data frame of the MultiAssayExperiment stores optional data for use
with some downstream step 7 TENET functions, which are not required by most
TENET functions. If present, it must contain information for each of the
patients from which samples in the "expression" and "methylation"
SummarizedExperiment objects have been derived. The rownames of the colData must
contain patient IDs, which will be matched to samples using the
MultiAssayExperiment object's sampleMap, discussed below. The data may
include "vital_status" and "time" columns with information on each patient's
survival status and survival time used by the `step7TopGenesSurvival` function,
as well as a "purity" column and columns with gene copy number ("...\_CNV") and
somatic mutation ("...\_SM") status used by the
`step7ExpressionVsDNAMethylationScatterplots` function. See documentation
for these functions for more information on how these data must be formatted.
Additional columns may be included, but are not used by TENET.

```{r}
## Examine the number of rows in the colData of the MultiAssayExperiment object
## compared to the number of samples (columns) in the "expression" and
## "methylation" summarized experiment objects. The number of patient entries
## does not need to match the number of samples included in the "expression" or
## "methylation" objects, since multiple samples may be derived from the same
## patient (though ideally, they should have at most one control and case sample
## each).
nrow(colData(exampleTENETMultiAssayExperiment))

experiments(exampleTENETMultiAssayExperiment)

## Examine some of the rownames, which must contain a unique identifier for
## each patient. These will be used in the MultiAssayExperiment's sampleMap
## data frame to match them to the samples included in the "expression" and
## "methylation" objects.
rownames(colData(exampleTENETMultiAssayExperiment))[seq_len(6)]
```

## MultiAssayExperiment sampleMap data frame

The final required component of the MultiAssayExperiment object is the sampleMap
data frame. It is used to match the samples in the "expression" and
"methylation" objects to each other and to the optional data contained in the
MultiAssayExperiment colData, and to categorize samples as case or control. It
must have the following format:

```{r}
## It must contain a row for each sample in the "expression" and "methylation"
## objects
nrow(sampleMap(exampleTENETMultiAssayExperiment))

experiments(exampleTENETMultiAssayExperiment)

## 4 columns must be included: "assay", "colname", "primary", and "sampleType"
colnames(sampleMap(exampleTENETMultiAssayExperiment))

## The "assay" column must be a factor describing the data type of each sample
## ("expression" or "methylation")
sampleMap(exampleTENETMultiAssayExperiment)$assay[seq_len(6)]

levels(sampleMap(exampleTENETMultiAssayExperiment)$assay)

table(sampleMap(exampleTENETMultiAssayExperiment)$assay)

## The "primary" column must contain the patient IDs to which each sample maps.
## They must match those in the MultiAssayExperiment's colData object, if
## present.
sampleMap(exampleTENETMultiAssayExperiment)$primary[seq_len(6)]

table(
    sampleMap(exampleTENETMultiAssayExperiment)$primary %in%
        rownames(colData(exampleTENETMultiAssayExperiment))
)

## The "colname" column must contain the names of each of the samples as listed
## in the colnames of the "expression" or "methylation" SummarizedExperiment's
## assay objects
sampleMap(exampleTENETMultiAssayExperiment)$colname[seq_len(6)]

table(
    sampleMap(exampleTENETMultiAssayExperiment)$colname %in% c(
        colnames(
            assays(
                experiments(
                    exampleTENETMultiAssayExperiment
                )[["expression"]]
            )[[1]]
        ),
        colnames(
            assays(
                experiments(
                    exampleTENETMultiAssayExperiment
                )[["methylation"]]
            )[[1]]
        )
    )
)

## The "sampleType" column must contain "Case" or "Control"
sampleMap(exampleTENETMultiAssayExperiment)$sampleType[seq_len(6)]

table(sampleMap(exampleTENETMultiAssayExperiment)$sampleType)
```

# Overview of main TENET functions

* `easyTENET`: Run the step 1 through step 6 functions with default arguments
* `step1MakeExternalDatasets`: Create a GRanges object representing putative regulatory element regions, based on the data sources selected for inclusion, to be used in later TENET steps
* `step2GetDifferentiallyMethylatedSites`: Identify differentially methylated RE DNA methylation sites
* `step3GetAnalysisZScores`: Calculate Z-scores comparing the mean expression of each gene in the case samples that are hyper- and/or hypomethylated for each RE DNA methylation site identified in step 2
* `step4SelectMostSignificantLinksPerDNAMethylationSite`: Select the most significant RE DNA methylation site-gene links to each RE DNA methylation site
* `step5OptimizeLinks`: Find final RE DNA methylation site-gene links using various optimization metrics
* `step6DNAMethylationSitesPerGeneTabulation`: Tabulate the total number of RE DNA methylation sites linked to each gene
* `TCGADownloader`: Download TCGA gene expression, DNA methylation, and clinical datasets and compile them into a MultiAssayExperiment object
* `TENETCacheAllData`: Cache all online datasets required by TENET examples and optional features

## `easyTENET`: Run the step 1 through step 6 functions with default arguments

This function runs the main six TENET functions, `step1MakeExternalDatasets`,
`step2GetDifferentiallyMethylatedSites`, `step3GetAnalysisZScores`,
`step4SelectMostSignificantLinksPerDNAMethylationSite`, `step5OptimizeLinks`,
and `step6DNAMethylationSitesPerGeneTabulation`, in sequence on the specified
TENETMultiAssayExperiment object. The operation and arguments of this
function reflect those of its component functions. For additional details on
them, refer to their respective sections in this vignette.

`easyTENET` includes only the most important arguments to its component
functions, thus maintaining core TENET functionality while simplifying its use
and allowing users to run all six key TENET steps with one function call. The
majority of its arguments are those from `step1MakeExternalDatasets`, which
select the types of regulatory elements the user wishes to investigate, as well
as those from `step2GetDifferentiallyMethylatedSites`. If an argument to one of
the component functions is not specified, it will be set to its default value
for that function.

As stated earlier, using `easyTENET` requires the user to include rowRanges
objects in the input MultiAssayExperiment's "expression" and "methylation"
SummarizedExperiments containing the locations of transcription start sites
of genes and DNA methylation sites, respectively.

**Note:** Since this function runs the `step3GetAnalysisZScores` function,
it may take a while to run. It is highly recommended to use multiple cores if
possible, especially when using large datasets.

```{r eval = FALSE}
## This example first creates a dataset of putative enhancer regulatory elements
## from consensus datasets and breast invasive carcinoma-relevant sources
## collected in the TENET.AnnotationHub package, then runs the step 2 through
## step 6 functions analyzing RE DNA methylation sites in potential
## enhancer elements located over 1500 bp from transcription start sites
## listed for genes and transcripts in the GENCODE v36 human genome
## annotations (already contained in the exampleTENETMultiAssayExperiment
## object loaded earlier), using a minimum case sample count of 5 and one
## CPU core to perform the analysis.
exampleObject <- easyTENET(
    TENETMultiAssayExperiment = exampleTENETMultiAssayExperiment,
    extHM = NA,
    extNDR = NA,
    publicEnhancer = TRUE,
    publicNDR = TRUE,
    cancerType = "BRCA",
    ENCODEdELS = TRUE,
    minCaseCount = 5
)

## The exampleObject will have data from the step 2 through step 6 functions,
## as the dataset created by the step1MakeExternalDatasets function is used by
## and saved in the output of the step2GetDifferentiallyMethylatedSites
## function.

## See the types of data that were saved by the step 2 function
names(metadata(exampleObject)$step2GetDifferentiallyMethylatedSites)

## See the GRanges object created by the step 1 function
metadata(
    exampleObject
)$step2GetDifferentiallyMethylatedSites$regulatoryElementGRanges

## See the types of data that were saved by the step 3 function
names(metadata(exampleObject)$step3GetAnalysisZScores)

## See the types of data that were saved by the step 4 function
names(
    metadata(exampleObject)$step4SelectMostSignificantLinksPerDNAMethylationSite
)

## See the types of data that were saved by the step 5 function
names(metadata(exampleObject)$step5OptimizeLinks)

## See the types of data that were saved by the step 6 function
names(
    metadata(
        exampleObject
    )$step6DNAMethylationSitesPerGeneTabulation
)
```

## `step1MakeExternalDatasets`: Create a GRanges object representing putative regulatory element regions, based on the data sources selected for inclusion, to be used in later TENET steps

This function creates a GRanges object containing regions representing
putative regulatory elements, either enhancers or promoters, of
interest to the user, based on the presence of specific histone marks
and open chromatin/nucleosome-depleted regions. This function can take input
from user-specified BED-like files (see
<https://genome.ucsc.edu/FAQ/FAQformat.html#format1>) containing regions with
histone modification (via the `extHM` argument) and/or open
chromatin/nucleosome-depleted regions (via the `extNDR` argument), as well
as preprocessed enhancer, promoter, and open chromatin datasets from many
cell/tissue types included in the TENET.AnnotationHub repository. The
resulting GRanges object will be returned. GRanges objects created by this
function can be used by the `step2GetDifferentiallyMethylatedSites` function
or other downstream functions. **Note:** Using datasets from
TENET.AnnotationHub requires an internet connection, as those datasets are
hosted in the Bioconductor AnnotationHub Data Lake.

Regulatory element regions of interest identified by this function represent
those with overlapping histone mark peaks as well as open chromatin regions,
combined with any regions identified in the selected ENCODE SCREEN datasets (as
these regions already represent the overlap of regions with relevant histone
marks as well as with open chromatin).

```{r}
## Create an example GRanges object representing putative enhancer regions for
## BRCA using all available enhancer-relevant BRCA datasets present in the
## TENET.ExperimentHub package. These datasets include consensus enhancer
## histone mark and open chromatin datasets from a wide variety of tissue and
## cell types, enhancer histone mark and open chromatin datasets from a
## variety of BRCA-relevant samples from the Cistrome database and TCGA, as well
## as identified distal enhancer regions from the ENCODE SCREEN project.
step1Output <- step1MakeExternalDatasets(
    consensusEnhancer = TRUE,
    consensusNDR = TRUE,
    publicEnhancer = TRUE,
    publicNDR = TRUE,
    cancerType = "BRCA",
    ENCODEdELS = TRUE
)

## View the first regions in the created GRanges object
head(step1Output)
```

## `step2GetDifferentiallyMethylatedSites`: Identify differentially methylated RE DNA methylation sites

This function identifies DNA methylation sites that mark putative regulatory
elements (REs), including enhancer and promoter regions. These are sites
that lie within regions from a user-supplied GRanges object, such as one
created by the `step1MakeExternalDatasets` function, and which are located at
a user-specified distance relative to the transcription start sites (TSS)
listed in either the rowRanges of the elementMetadata of the "expression"
SummarizedExperiment in the TENETMultiAssayExperiment object, or the
selected `geneAnnotationDataset` (which will be filtered to only genes and
transcripts). After identifying DNA methylation sites representing the
specified REs, the function classifies the RE DNA methylation sites as
methylated, unmethylated, hypermethylated, or hypomethylated based on their
differential methylation between the control and case samples supplied by
the user, defined by cutoff values which are either automatically based
on the mean methylation densities of the identified RE DNA methylation
sites, or manually set by the user. **Note:** Using the algorithm to set
cutoffs is recommended for use with DNA methylation array data, and may not
work for whole-genome DNA methylation data.

To run step 2, the user will need to provide a MultiAssayExperiment object
constructed in the manner described previously, as well as a GRanges object,
such as one created by the `step1MakeExternalDatasets` function. Additionally,
the minimum number of case samples that must exhibit a difference in
methylation for a given RE DNA methylation site to be considered hyper- or
hypomethylated will need to be set by the user.

The output of the `step2GetDifferentiallyMethylatedSites` function, as well as
later TENET functions, is saved in the metadata of the returned
MultiAssayExperiment object.

```{r}
## Identify differentially methylated RE DNA methylation sites using the
## step2GetDifferentiallyMethylatedSites function, using the
## exampleTENETMultiAssayExperiment object loaded previously from the
## TENET.ExperimentHub package as a reference, and the GRanges object that was
## just created using the step1MakeExternalDatasets function. At least 5 case
## samples in the dataset are required to show methylation levels above/below
## the hyper/hypomethylation cutoff for a given RE DNA methylation site to be
## potentially considered differentially methylated.
## All transcription start sites (TSS) included in the rowRanges of the
## elementMetadata of the TENETMultiAssayExperiment object will be considered
## when selecting enhancer DNA methylation sites (which must be at least 1500
## bp from these TSS).
exampleObject <- step2GetDifferentiallyMethylatedSites(
    TENETMultiAssayExperiment = exampleTENETMultiAssayExperiment,
    regulatoryElementGRanges = step1Output,
    minCaseCount = 5
)

## See the data that were saved by the step 2 function
names(metadata(exampleObject)$step2GetDifferentiallyMethylatedSites)

## Since cutoffs were set automatically by the step 2 function in this case,
## we can see what they are set to, using the hypomethylation cutoff as an
## example.
metadata(
    exampleObject
)$step2GetDifferentiallyMethylatedSites$hypomethCutoff

## The identities of all identified RE DNA methylation sites, as well as the
## methylated, unmethylated, and most importantly, hyper- and hypomethylated
## RE DNA methylation sites are also recorded in the siteIdentitiesList. To
## demonstrate this, view the first hypomethylated RE DNA methylation sites
## that were identified.
head(
    metadata(
        exampleObject
    )$step2GetDifferentiallyMethylatedSites$siteIdentitiesList$
        hypomethylatedSites
)
```

## `step3GetAnalysisZScores`: Calculate Z-scores comparing the mean expression of each gene in the case samples that are hyper- and/or hypomethylated for each RE DNA methylation site identified in step 2

This function calculates Z-scores comparing the mean expression of each gene in
the case samples that are hyper- and/or hypomethylated for each RE DNA
methylation site identified in step 2 to the mean expression of the remaining
non-hyper- or hypomethylated case samples. By identifying significant Z-scores,
initial RE DNA methylation site-gene links are identified, in the form of case
samples with hyper- or hypomethylation of a particular RE DNA methylation site
also displaying particularly high or low expression of specific genes.

TENET supports the use of two different formulas for calculating Z-scores in
this step. By setting the zScoreCalculation argument to "oneSample", a
one-sample Z-score calculation will be used (similar to previous versions of
the TENET package), while a two-sample Z-score calculation will be used if the
zScoreCalculation argument is set to "twoSample".

The sparseResults argument has been included in order to reduce the size
of the MultiAssayExperiment object with TENET results. By setting this to TRUE,
only links with significant Z-scores (as determined by the value of the pValue
argument) are saved in the MultiAssayExperiment object returned by this
function. However, setting this to TRUE affects the function of the subsequent
`step4SelectMostSignificantLinksPerDNAMethylationSite` function if the user
wishes to perform multiple testing correction to select the most significant
links per RE DNA methylation site. Therefore, if you want to use multiple
testing correction instead of just selecting the top n most significant links
per RE DNA methylation site in the
`step4SelectMostSignificantLinksPerDNAMethylationSite`, the sparseResults
argument should be set to FALSE so the multiple testing correction is properly
applied for all results, not just the significant ones.

**Note:** This function takes the longest of all TENET functions to run. It is
highly recommended to use multiple cores if possible, especially when using
large datasets.

```{r}
## Identify significant Z-scores and initial RE DNA methylation site-gene links
## using the exampleTENETMultiAssayExperiment with results from the
## step2GetDifferentiallyMethylatedSites function. For this analysis, we will
## use the one-sample Z-score function, consider only TFs, rather than all
## genes, and save only significant Z-scores, to cut down on computational time
## and reduce the size of the returned MultiAssayExperiment object. Two CPU
## cores will be used if they exist (except on Windows where the
## parallel::mclapply function is not supported).
exampleObject <- step3GetAnalysisZScores(
    TENETMultiAssayExperiment = exampleObject,
    pValue = 0.05,
    TFOnly = TRUE,
    zScoreCalculation = "oneSample",
    hypermethAnalysis = TRUE,
    hypomethAnalysis = TRUE,
    includeControl = FALSE,
    sparseResults = TRUE,
    coreCount = min(
        parallel::detectCores(), ifelse(.Platform$OS.type != "windows", 2, 1)
    )
)

## See the data that were saved by the step 3 function. They are subdivided into
## hypermeth and/or hypometh results based on function options.
names(
    metadata(
        exampleObject
    )$step3GetAnalysisZScores
)

## Since the sparseResults argument was set to TRUE, only
## significant Z-scores are saved, and since the TFOnly argument was also set
## to TRUE, only TF genes were analyzed.
## View the significant Z scores for the first TF genes with links to
## hypomethylated RE DNA methylation sites.
head(
    metadata(
        exampleObject
    )$step3GetAnalysisZScores$hypomethResults
)
```

## `step4SelectMostSignificantLinksPerDNAMethylationSite`: Select the most significant RE DNA methylation site-gene links to each RE DNA methylation site

This function takes the calculated Z-scores for the hyper- and/or
hypomethylated G+ RE DNA methylation site-gene links and selects the most
significant links to each RE DNA methylation site, either up to a number
specified by the user, or based on a significant p-value level set by the
user after multiple testing correction is performed on the Z-scores output
by the `step3GetAnalysisZScores` function per RE DNA methylation site in the
RE DNA methylation site-gene pairs. This helps prioritize individual RE DNA
methylation site-gene links where there are many genes linked to a single
site.

As described previously, if you wish to use multiple testing correction, the
sparseResults argument in the previous `step3GetAnalysisZScores` function
should have been set to FALSE, otherwise it will affect the generated results
(TENET will display a message warning about this if this is the case) as with
sparseResults, only significant results are saved from step 3 and then used in
the multiple testing which affects the values and number of tests accounted for.

This warning will also occur if multiple testing is done using the example
MultiAssayExperiment object, as the results from step 3 in the object were
created with sparseResults set to TRUE. This is just a warning however, and
results will still be generated by the function and can be used in downstream
functions.

```{r}
## Get the 25 (if present) most significant links per RE DNA methylation site
## identified by the step3GetAnalysisZScores function
exampleObject <- step4SelectMostSignificantLinksPerDNAMethylationSite(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = TRUE,
    hypomethGplusAnalysis = TRUE,
    linksPerREDNAMethylationSiteMaximum = 25
)

## See the data that were saved by the step 4 function. They are subdivided into
## hypermeth and/or hypometh results based on function options.
names(
    metadata(exampleObject)$step4SelectMostSignificantLinksPerDNAMethylationSite
)

## View the results for the most significant links to the hypomethylated RE
## DNA methylation sites
head(
    metadata(
        exampleObject
    )$step4SelectMostSignificantLinksPerDNAMethylationSite$hypomethGplusResults
)
```

## `step5OptimizeLinks`: Find final RE DNA methylation site-gene links using various optimization metrics

This function takes the most significant hyper- and/or hypomethylated G+ RE
DNA methylation site-gene links selected in step 4, and selects optimized
links based on the relative expression of the given gene in hyper- or
hypomethylated case samples compared to control samples, using an unpaired
two-sided Wilcoxon rank-sum test to check that the hyper- or hypomethylated
samples for that given RE DNA methylation site-gene link also show
appropriately higher/lower expression of the linked gene in a number of case
samples greater than or equal to the `minCaseCount` number specified in the
`step2GetDifferentiallyMethylatedSites` function that have maximum/minimum
methylation above/below the `hyperStringency`/`hypoStringency` cutoff values
selected.

This identifies the final RE DNA methylation site-gene links by prioritizing
those that meet the above criteria. The output of this function is used in many
of the downstream TENET functions, and helps users examine the individual RE
DNA methylation sites linked to each gene.

```{r}
## Identify final optimized RE DNA methylation site-gene links
exampleObject <- step5OptimizeLinks(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = TRUE,
    hypomethGplusAnalysis = TRUE,
    expressionPvalCutoff = 0.05
)

## See the data that were saved by the step 5 function. They are again
## subdivided into hypermeth and/or hypometh results based on function options.
names(metadata(exampleObject)$step5OptimizeLinks)

## Check the results, which include various metrics used to priortize the
## optimized final RE DNA methylation site-gene links.
## This is a subsection of the data frame detailing all the hypomethylated RE
## DNA methylation site-gene links as an example.
head(
    metadata(
        exampleObject
    )$step5OptimizeLinks$hypomethGplusResults
)
```

## `step6DNAMethylationSitesPerGeneTabulation`: Tabulate the total number of RE DNA methylation sites linked to each gene

This function takes the final optimized RE DNA methylation site-gene links
identified in step 5 and tabulates the number of links per gene separately
for the hyper- and/or hypomethylated G+ analysis quadrants.

This aids in prioritizing genes for downstream analyses, as the genes
with the most linked RE DNA methylation sites are the most likely to represent
those with widespread genomic impact.

```{r}
## Prioritize the top genes by adding up the number of RE DNA methylation sites
## linked to each of the genes
exampleObject <- step6DNAMethylationSitesPerGeneTabulation(
    TENETMultiAssayExperiment = exampleObject
)

## See the data that were saved by the step 6 function. They are again
## subdivided into hypermeth and/or hypometh results based on function options.
names(
    metadata(
        exampleObject
    )$step6DNAMethylationSitesPerGeneTabulation
)

## Check the results, which include a count of the RE DNA methylation sites per
## gene, and is organized by decreasing RE DNA methylation site count.
## This is a subsection of the data frame detailing the number of hypomethylated
## RE DNA methylation site links to the top TFs.
head(
    metadata(
        exampleObject
    )$step6DNAMethylationSitesPerGeneTabulation$hypomethGplusResults
)
```

## `TCGADownloader`: Download TCGA gene expression, DNA methylation, and clinical datasets and compile them into a MultiAssayExperiment object

This function downloads and compiles TCGA gene expression and DNA
methylation datasets, as well as clinical data primarily intended for use
with the TENET package. This simplifies the TCGAbiolinks download functions,
identifies samples with matching gene expression and DNA methylation data,
and can also remove duplicate tumor samples taken from the same patient
donor. Data are compiled into a MultiAssayExperiment object, which is
returned and optionally saved in an `.rda` file at the path specified by the
`outputFile` argument.

```{r eval = FALSE}
## Download a TCGA BRCA dataset with log2-normalized
## FPKM-UQ expression values from tumor and adjacent normal tissue samples
## with matching expression and methylation data and keeping only one tumor
## sample from each patient. Additionally, survival data will be combined
## from three clinical datasets downloaded by TCGAbiolinks. Raw data files
## will be saved to the working directory, and the processed dataset will
## be returned as a variable.
TCGADataset <- TCGADownloader(
    rawDataDownloadDirectory = ".",
    TCGAStudyAbbreviation = "BRCA",
    RNASeqWorkflow = "STAR - FPKM-UQ",
    RNASeqLog2Normalization = TRUE,
    matchingExpAndMetSamples = TRUE,
    clinicalSurvivalData = "combined",
    outputFile = NA
)
```

## `TENETCacheAllData`: Cache all online datasets required by TENET examples and optional features

This function locally caches all online TENET and SeSAMe datasets required
by TENET examples and optional features (TENET.ExperimentHub objects used in
examples, TENET.AnnotationHub datasets used in step 1, and SeSAMe datasets
loaded via the `DNAMethylationArray` argument). The main purpose of this
function is to enable the use of TENET in an environment without internet
access, such as the compute nodes of an HPC cluster. In this case, you must
run `TENETCacheAllData()` once while connected to the internet before using
TENET examples or these optional features.

```{r eval = FALSE}
## Cache all online datasets required by optional TENET features.
## This function takes no arguments and returns NULL.
TENETCacheAllData()
```

# Overview of downstream step 7 functions

The step 7 functions aim to perform downstream analyses based on the identified
RE DNA methylation site-gene links, integrating multi-omic datasets such as
Hi-C, copy number variation, somatic mutation, and patient survival information.

* `step7ExpressionVsDNAMethylationScatterplots`: Create scatterplots displaying the expression of the top genes and the methylation levels of each of their linked RE DNA methylation sites, along with copy number variation, somatic mutation, and purity data, if provided by the user
* `step7LinkedDNAMethylationSiteCountHistograms`: Create histograms displaying the number of total genes and transcription factor genes linked to a given number of RE DNA methylation sites
* `step7LinkedDNAMethylationSitesMotifSearching`: Search for transcription factor motifs in the vicinity of DNA methylation sites and/or within custom regions defined by the user
* `step7SelectedDNAMethylationSitesCaseVsControlBoxplots`: Generate boxplots or violin plots comparing the methylation level of the specified RE DNA methylation sites in case and control samples
* `step7StatesForLinks`: Identify which of the case samples harbor each of the identified regulatory element DNA methylation site-gene links
* `step7TopGenesCaseVsControlExpressionBoxplots`: Generate boxplots or violin plots comparing the expression level of the top genes and transcription factors in case and control samples
* `step7TopGenesCircosPlots`: Generate Circos plots displaying the links between the top identified genes and each of the RE DNA methylation sites linked to them
* `step7TopGenesDNAMethylationHeatmaps`: Generate heatmaps displaying the methylation level of all RE DNA methylation sites linked to the top genes and transcription factors, along with the expression of those genes in the column headers, in the case samples within the supplied MultiAssayExperiment object
* `step7TopGenesExpressionCorrelationHeatmaps`: Generate mirrored heatmaps displaying the correlation of the expression values of the top genes/TFs
* `step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps`: Generate binary heatmaps displaying which of the top genes and transcription factors share links with each of the unique regulatory element DNA methylation sites linked to at least one top gene/TF
* `step7TopGenesSurvival`: Perform Kaplan-Meier and Cox regression analyses to assess the association of patient survival with the expression of top genes and transcription factors and methylation of their linked RE DNA methylation sites
* `step7TopGenesTADTables`: Create tables using user-supplied topologically associating domain (TAD) information which identify the TADs containing each RE DNA methylation site linked to the top genes and transcription factors, as well as other genes in the same TAD as potential downstream targets
* `step7TopGenesUCSCBedFiles`: Create BED-formatted interact files which can be loaded on the UCSC Genome Browser to display links between top genes and transcription factors and their linked RE DNA methylation sites
* `step7TopGenesUserPeakOverlap`: Identify if RE DNA methylation sites linked to top genes and transcription factors are located within a specific distance of specified genomic regions

Here we will demonstrate the usage of some step 7 functions.

## `step7ExpressionVsDNAMethylationScatterplots`: Create scatterplots displaying the expression of the top genes and the methylation levels of each of their linked RE DNA methylation sites, optionally incorporating copy number variation, somatic mutation, and purity data

These scatterplots show the relationship between genes and RE DNA methylation
sites, displaying the expression of the genes in the X-axis and the methylation
of the sites in the Y-axis. The sample type (case or control) is also displayed
in these plots. The shape and size of the points on the scatterplots represent
copy number variation (CNV), somatic mutation (SM) status, and purity for the
samples in the scatterplots.

First, we load the example CNV, SM, and purity data from the
`exampleTENETClinicalDataFrame` object.

```{r}
## Load the exampleTENETClinicalDataFrame object from the TENET.ExperimentHub
## package. It contains copy number variation (CNV), somatic mutation (SM),
## and purity data for the top 10 TFs by linked hypomethylated RE
## DNA methylation sites in the exampleTENETMultiAssayExperiment object.
exampleTENETClinicalDataFrame <-
    TENET.ExperimentHub::exampleTENETClinicalDataFrame()
CNVData <- subset(exampleTENETClinicalDataFrame,
    select = grepl("_CNV$", colnames(exampleTENETClinicalDataFrame))
)
SMData <- subset(exampleTENETClinicalDataFrame,
    select = grepl("_SM$", colnames(exampleTENETClinicalDataFrame))
)
purityData <- subset(exampleTENETClinicalDataFrame, select = "purity")
```

The CNV dataset is a numeric data frame with rownames representing sample names
and colnames representing gene IDs folllowed by "\_CNV", with -2 representing a
loss of both copies, -1 a single copy loss, 0 no copy number change, and
positive integer values representing a gain of that many copies (though changes
of +2 or greater are grouped together in the scatterplots).

```{r}
## Show the CNV data for the first 4 TFs
head(CNVData[, 1:4])
```

The SM dataset is a numeric data frame with rownames representing sample names
and colnames representing gene IDs folllowed by "\_SM", with 0 representing no
mutation and 1 representing mutation.

```{r}
## Show the SM data for the first 4 TFs
head(SMData[, 1:4])
```

The purity dataset in this example is a numeric data frame with rownames
representing sample names and the first column representing the purity, with
the values ranging from 0 (low purity) to 1 (high purity). It can also be a
numeric vector with names representing the sample names.

```{r}
## Show the first few rows of the purity data
head(purityData)
```

The options `CNVData`, `SMData`, and `purityData` are not required. If they are
supplied and `simpleOrComplex` is set to "complex", complex scatterplots will
be created displaying this information. Otherwise, simple scatterplots will be
created. At this time, either all or none of `CNVData`, `SMData`, and
`purityData` must be specified.

```{r}
## Create complex scatterplots using the previously acquired data.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for TFs will be
## skipped is displayed.
exampleObject <- step7ExpressionVsDNAMethylationScatterplots(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    purityData = purityData,
    SMData = SMData,
    CNVData = CNVData,
    simpleOrComplex = "complex",
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7ExpressionVsDNAMethylationScatterplots list.
## For each analysis type, results are included in sub-lists, each of which
## contains results for topGenes and topTFs, unless the top genes are
## all TFs, in which case the separate top TFs output is skipped.
names(
    metadata(
        exampleObject
    )$step7ExpressionVsDNAMethylationScatterplots$hypomethGplusResults$topGenes
)

## For each gene, scatterplots are generated showing the expression of that
## gene and the methylation of each RE DNA methylation site linked to it for
## the given analysis type.
head(
    names(
        metadata(
            exampleObject
        )$step7ExpressionVsDNAMethylationScatterplots$hypomethGplusResults$
            topGenes$ENSG00000124664
    )
)
```

As an example, we examine the scatterplot with the expression of the TF SPDEF
(ENSG00000124664) and the methylation of its linked hypomethylated RE DNA
methylation site with the ID cg25731211. Gene expression is given in the X-axis
and methylation is given in the Y-axis. Samples are colored based on whether
they are are cases (red) or controls (blue). The shape and size of the points
are determined by each sample's CNV/SM status and purity, respectively, since
complex scatterplots were selected.

```{r fig.width=10, fig.height=7}
metadata(
    exampleObject
)$step7ExpressionVsDNAMethylationScatterplots$hypomethGplusResults$
    topGenes$ENSG00000124664$cg25731211
```

## `step7LinkedDNAMethylationSiteCountHistograms`: Create histograms displaying the number of total genes and transcription factor genes linked to a given number of RE DNA methylation sites

This function generates histograms displaying the number of total genes and
transcription factor genes linked to a given number of RE DNA methylation sites.
These are designed to highlight the top overall genes and TF genes, which likely
have a disproportionately large number of linked RE DNA methylation sites
compared to most genes.

```{r fig.width=10, fig.height=7}
## Run the step7LinkedDNAMethylationSiteCountHistograms function.
## Since we performed analyses using only TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7LinkedDNAMethylationSiteCountHistograms(
    TENETMultiAssayExperiment = exampleObject,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7LinkedDNAMethylationSiteCountHistograms list.
## For each analysis type, histograms are included in sub-lists, each of which
## contains results for topGenes and topTFs, unless the top genes are all TFs,
## in which case the separate top TFs output is skipped.

## Display the histogram. Note the relatively small number of top TF genes with
## larger numbers of linked RE DNA methylation sites.
metadata(
    exampleObject
)$step7LinkedDNAMethylationSiteCountHistograms$hypomethGplusResults$topGenes
```

## `step7LinkedDNAMethylationSitesMotifSearching`: Search for transcription factor motifs in the vicinity of DNA methylation sites and/or within custom regions defined by the user

To run the motif searching function, DNA binding motifs for the TFs of interest
are required. These can be automatically retrieved via the MotifDb package by
providing `andStrings`, `orStrings`, and/or `notStrings` arguments, which will
be passed to the MotifDb package's `query` function. To include or exclude the
motifs of all TFs in TENET's dataset, the string "humanTranscriptionFactors" can
be included the `orStrings` or `notStrings` list. Motifs may also be directly
supplied via the `TFMotifList` argument as position weight matrices (PWMs) in a
named list, with each PWM named after the TF it represents.

**Note:** If the `TFMotifList` argument is specified and
`useOnlyDNAMethylationSitesLinkedToTFs` is not explicitly set to FALSE, only RE
DNA methylation sites found by TENET to be linked to the TFs specified in the
`TFMotifList` argument are considered. Using many input motifs or RE DNA
methylation sites may cause the search to take a significant amount of time, so
in this case, using multiple CPU cores is highly recommended.

By default, the specified motifs are only searched for in the vicinity of RE
DNA methylation sites present in the rowRanges of the "methylation"
SummarizedExperiment object. Additional regions within which to search may be
specified via the `GRangesToSearch` argument.

If a DNA methylation array is specified via the `DNAMethylationArray` argument,
only RE DNA methylation sites present in both the array's manifest and the
rownames of the methylation dataset provided in the "methylation"
SummarizedExperiment object within the TENETMultiAssayExperiment object will be
be considered for analysis. If an array is not specified, all RE DNA methylation
sites present in the rowRanges of the "methylation" SummarizedExperiment object
will be used.

By default, only RE DNA methylation sites found to be linked to TFs by previous
TENET steps are considered, in which case the analysis quadrants to consider
are selected by the `hypermethGplusAnalysis` and `hypomethGplusAnalysis`
arguments. The `useOnlyDNAMethylationSitesLinkedToTFs` argument can be set to
FALSE to disable this filtering. Additional DNA methylation sites to consider
may be specified via the `DNAMethylationSites` argument.

The distance upstream and downstream from DNA methylation sites within which to
search can be specified via the `distanceFromREDNAMethylationSites` argument.
The default distance is 100bp.

The stringency of the motif matching can be adjusted via the `matchPWMMinScore`
argument, and can be given as a minimum number of matching base pairs or
accuracy percentage. It defaults to 75%. See the documentation for the
`min.score` argument in `?Biostrings::matchPWM` for more information. Note that
longer motifs tend to have fewer matches than shorter motifs.

Although MotifDb search results may be directly integrated via the `andStrings`,
`orStrings`, and/or `notStrings` arguments, for the purposes of demonstrating
the format of the `TFMotifList` argument, we will manually retrieve PWMs of
interest from MotifDb.

```{r}
## View the first few available motifs for the FOXA1 TF
head(names(MotifDb::query(MotifDb::MotifDb, "FOXA1")))

## View the first few available motifs for the ESR1 TF
head(names(MotifDb::query(MotifDb::MotifDb, "ESR1")))
```

Next, we create a named list using the human HOCOMOCO v11 core motif PWMs
available for both TFs, which will be used when running the
`step7LinkedDNAMethylationSitesMotifSearching` function.

Note that TF genes can be specified via their names (taken from the
input `TENETMultiAssayExperiment` object or the `geneAnnotationDataset`
argument) or their IDs.

```{r}
## The human HOCOMOCO v11 core motif is the 3rd listed for FOXA1, and 4th for
## ESR1
TFMotifList <- list(
    "FOXA1" = MotifDb::query(MotifDb::MotifDb, "FOXA1")[[3]],
    "ESR1" = MotifDb::query(MotifDb::MotifDb, "ESR1")[[4]]
)

TFMotifList
```

Finally, we run the `step7LinkedDNAMethylationSitesMotifSearching` function to
search for the selected FOXA1 and ESR1 motifs. We use the default settings to
search within 100bp of identified hypomethylated RE DNA methylation sites found
to be linked to TFs in both the hypermethylated and hypomethylated G+ analysis
quadrants, but specify a custom accuracy threshold of 80%.

```{r eval = FALSE}
exampleObject <- step7LinkedDNAMethylationSitesMotifSearching(
    TENETMultiAssayExperiment = exampleObject,
    TFMotifList = TFMotifList,
    matchPWMMinScore = "80%"
)

## For each analysis type and TF, a seqLogo diagram of the chosen PWM and two
## data frames with information on the found motifs in the vicinity of RE
## DNA methylation sites, and total motifs found per RE DNA methylation site
## linked to those TFs, respectively, are saved in the metadata of the returned
## MultiAssayExperiment object under the
## step7LinkedDNAMethylationSitesMotifSearching list
names(
    metadata(
        exampleObject
    )$step7LinkedDNAMethylationSitesMotifSearching$hypomethGplusResults$FOXA1
)

## View the motif occurrences near hypomethylated RE DNA methylation sites
## linked to the FOXA1 TF
head(
    metadata(
        exampleObject
    )$step7LinkedDNAMethylationSitesMotifSearching$hypomethGplusResults$
        FOXA1$DNAMethylationSiteMotifOccurrences
)

## View the total number of motifs found in the vicinity of each RE DNA
## methylation site
head(
    metadata(
        exampleObject
    )$step7LinkedDNAMethylationSitesMotifSearching$hypomethGplusResults$
        FOXA1$totalMotifOccurrencesPerREDNAMethylationSite
)
```

## `step7SelectedDNAMethylationSitesCaseVsControlBoxplots`: Generate boxplots or violin plots comparing the methylation level of the specified RE DNA methylation sites in case and control samples

We begin this example by identifying some RE DNA methylation sites for which to
generate methylation boxplots and violin plots.

First, we look at the top genes by number of linked hypomethylated RE DNA
methylation sites.

```{r}
head(
    metadata(
        exampleObject
    )$step6DNAMethylationSitesPerGeneTabulation$hypomethGplusResults
)
```

Next, we retrieve hypomethylated RE DNA methylation sites linked to the FOXA1
(ENSG00000129514) TF. They can be acquired from the output of the
`step5OptimizeLinks` function.

```{r}
DNAMethylationSites <- subset(
    metadata(
        exampleObject
    )$step5OptimizeLinks$hypomethGplusResults,
    geneID == "ENSG00000129514"
)$DNAMethylationSiteID
head(DNAMethylationSites)
```

Finally, we generate boxplots for the selected RE DNA methylation sites.

```{r}
exampleObject <- step7SelectedDNAMethylationSitesCaseVsControlBoxplots(
    TENETMultiAssayExperiment = exampleObject,
    DNAMethylationSites = DNAMethylationSites
)

## Each plot is saved under the ID of the RE DNA methylation site and included
## in the metadata of the returned MultiAssayExperiment object under the
## step7SelectedDNAMethylationSitesCaseVsControlBoxplots list
head(names(
    metadata(
        exampleObject
    )$step7SelectedDNAMethylationSitesCaseVsControlBoxplots
))
```

As an example, we examine the boxplot for the RE DNA methylation site with the
ID cg13776095.

```{r fig.width=10, fig.height=7}
## Note: There may be a warning for "rows containing non-finite values" if there
## are any samples lacking methylation data for the RE DNA methylation site.
metadata(
    exampleObject
)$step7SelectedDNAMethylationSitesCaseVsControlBoxplots$cg13776095
```

We next demonstrate the use of this function to generate violin plots.

```{r}
exampleObject <- step7SelectedDNAMethylationSitesCaseVsControlBoxplots(
    TENETMultiAssayExperiment = exampleObject,
    DNAMethylationSites = DNAMethylationSites,
    violinPlots = TRUE
)

## Each plot is saved under the ID of the RE DNA methylation site and included
## in the metadata of the returned MultiAssayExperiment object under the
## step7SelectedDNAMethylationSitesCaseVsControlBoxplots list
head(names(
    metadata(
        exampleObject
    )$step7SelectedDNAMethylationSitesCaseVsControlBoxplots
))
```

As an example, we examine the violin plot for the RE DNA methylation site with
the ID cg13776095, as above.

```{r fig.width=10, fig.height=7}
## Note: There may be a warning for "rows containing non-finite values" if there
## are any samples lacking methylation data for the RE DNA methylation site.
metadata(
    exampleObject
)$step7SelectedDNAMethylationSitesCaseVsControlBoxplots$cg13776095
```

## `step7StatesForLinks`: Identify which of the case samples harbor each of the identified regulatory element DNA methylation site-gene links

This function identifies which of the samples provided in the dataset likely
harbor each of the RE DNA methylation site-gene links. This is accomplished by
examining if the RE DNA methylation site in a given link is hyper- and/or
hypomethylated in a given case sample, and if expression of the gene in the
link for that case sample is significantly less than or greater than,
respectively, the mean expression of the gene in the control samples. This may
aid in identifying case samples for further analyses.

```{r}
## Calculate potential link status for case samples for hypomethylated RE DNA
## methylation site-gene links
exampleObject <- step7StatesForLinks(
    TENETMultiAssayExperiment = exampleObject,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7StatesForLinks list.
## A single data frame is returned for each analysis type, with the combined RE
## DNA methylation site ID and gene ID from each link in the rows, and each case
## sample in the columns.
dim(
    metadata(
        exampleObject
    )$step7StatesForLinks$hypomethGplusResults
)

## Show the results for the first 6 case samples. 1 indicates that a given case
## sample might harbor a given RE DNA methylation site-gene link, and 0
## indicates that it does not. NA values are shown for samples that lack
## methylation data for the site or expression data for the gene.
head(
    metadata(
        exampleObject
    )$step7StatesForLinks$hypomethGplusResults[
        , seq_len(6)
    ]
)
```

## `step7TopGenesCaseVsControlExpressionBoxplots`: Generate boxplots or violin plots comparing the expression level of the top genes and transcription factors in case and control samples

This function generates boxplots or violin plots comparing the expression of the
top genes/TFs by number of linked RE DNA methylation sites in the case versus
control samples.

To demonstrate this function, we first generate boxplots for the top 10 TFs by
number of linked hypermethylated G+ RE DNA methylation sites.

```{r}
## Run the step7TopGenesCaseVsControlExpressionBoxplots function.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7TopGenesCaseVsControlExpressionBoxplots(
    TENETMultiAssayExperiment = exampleObject,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesCaseVsControlExpressionBoxplots list.
## For each analysis type, results are included in sub-lists, each of which
## contains lists with results for topGenes and topTFs, unless the
## top genes are all TFs, in which case the separate top TFs output is skipped.
## Each boxplot is saved under its gene ID.
names(
    metadata(
        exampleObject
    )$step7TopGenesCaseVsControlExpressionBoxplots$
        hypomethGplusResults$topGenes
)
```

As an example, we examine the boxplot for gene ENSG00000107485 (GATA3).

```{r fig.width=10, fig.height=7}
## Note: There may be a warning for "rows containing non-finite values" if there
## are any samples lacking expression data for the given gene.
metadata(
    exampleObject
)$step7TopGenesCaseVsControlExpressionBoxplots$hypomethGplusResults$
    topGenes$ENSG00000107485
```

We next demonstrate the use of this function to generate violin plots.

```{r}
## Run the step7TopGenesCaseVsControlExpressionBoxplots function, this time
## using the option to generate violin plots.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7TopGenesCaseVsControlExpressionBoxplots(
    TENETMultiAssayExperiment = exampleObject,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE,
    violinPlots = TRUE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesCaseVsControlExpressionBoxplots list.
## For each analysis type, results are included in sub-lists, each of which
## contains lists with results for topGenes and topTFs, unless the
## top genes are all TFs, in which case the separate top TFs output is skipped.
## Each violin plot is saved under its gene ID.
names(
    metadata(
        exampleObject
    )$step7TopGenesCaseVsControlExpressionBoxplots$
        hypomethGplusResults$topGenes
)
```

As an example, we examine the violin plot for gene ENSG00000107485 (GATA3), as
above.

```{r fig.width=10, fig.height=7}
## Note: There may be a warning for "rows containing non-finite values" if there
## are any samples lacking expression data for the given gene.
metadata(
    exampleObject
)$step7TopGenesCaseVsControlExpressionBoxplots$hypomethGplusResults$
    topGenes$ENSG00000107485
```

## `step7TopGenesCircosPlots`: Generate Circos plots displaying the links between the top identified genes and each of the RE DNA methylation sites linked to them

This function generates Circos plots for each of the top genes and TFs by number
of linked RE DNA methylation sites showing the links between the gene and each
of its linked RE DNA methylation sites.

```{r fig.width=8, fig.height=8, out.width="80%", out.height="80%"}
## Run the step7TopGenesCircosPlots function
exampleObject <- step7TopGenesCircosPlots(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesCircosPlots list.
## For each analysis type, results are included in sub-lists, each
## of which contains lists with results for topGenes and topTFs, unless the
## top genes are all TFs, in which case the separate top TFs output is skipped.
## Each Circos plot is saved under its gene ID.

## Note: Since we performed analyses only using TFs in the step 3 function,
## the top genes are all TFs, so topTFs will be NA here.
names(
    metadata(
        exampleObject
    )$step7TopGenesCircosPlots$hypomethGplusResults$topGenes
)

## Display an example Circos plot for ENSG00000091831 (ESR1).
## Note: Plots may take some time to display.
metadata(
    exampleObject
)$step7TopGenesCircosPlots$hypomethGplusResults$topGenes$ENSG00000091831
```

## `step7TopGenesExpressionCorrelationHeatmaps`: Generate mirrored heatmaps displaying the correlation of the expression values of the top genes and TFs

This function generates heatmaps displaying the correlation of the expression
of each of the top genes and TFs in the case samples. Each of the top genes is
displayed in both the rows and columns, so the heatmaps are mirrored, with
correlation values of each gene to itself displayed in a diagonal line in the
center of the heatmaps. Red values represent positive correlation and blue
values represent negative correlation, with darker colors representing a
stronger correlation. Dendrograms are included to identify genes which are
closely related in expression correlation.

```{r fig.width=10, fig.height=10.5, out.width="80%", out.height="80%"}
## Run the step7TopGenesExpressionCorrelationHeatmaps function
exampleObject <- step7TopGenesExpressionCorrelationHeatmaps(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesExpressionCorrelationHeatmaps list.
## For each analysis type, results are included in sub-lists, each
## of which contains lists with results for the topGenes and topTFs, unless
## the top genes are all TFs, in which case the separate top TFs output is
## skipped. For each of these, the heatmap is generated along with a data frame
## with the correlation values displayed in the heatmap.

## Display the mirrored heatmap.
## Note: Since we performed analyses only using TFs in the step 3 function,
## the top genes are all TFs, so topTFs will be NA here.
metadata(
    exampleObject
)$step7TopGenesExpressionCorrelationHeatmaps$hypomethGplusResults$
    topGenes$heatmap

## Display the data frame with correlation values
head(
    metadata(
        exampleObject
    )$step7TopGenesExpressionCorrelationHeatmaps$hypomethGplusResults$
        topGenes$correlationMatrix
)
```

## `step7TopGenesDNAMethylationHeatmaps`: Generate heatmaps displaying the methylation level of all RE DNA methylation sites linked to the top genes and transcription factors, along with the expression of those genes in the column headers, in the case samples within the supplied MultiAssayExperiment object

This function creates heatmaps displaying the methylation of unique RE DNA
methylation sites linked to the top genes in the main body of the heatmaps, as
well as a smaller heatmap showing expression of the top genes labeling the
columns. Expression/methylation for each case sample is shown per column, while
expression of each of the top genes or methylation of their linked RE DNA
methylation sites is shown in the rows. Warm colors represent relatively higher
expression/methylation levels, while cold colors represent relatively lower
expression/methylation levels. These are determined per gene/RE DNA methylation
site, and are not comparable between genes/RE DNA methylation sites, only
between samples. Column dendrograms are included to identify subsets of the
case samples which display particular expression or methylation patterns in the
top genes and their linked RE DNA methylation sites.

```{r}
## Run the step7TopGenesDNAMethylationHeatmaps function
exampleObject <- step7TopGenesDNAMethylationHeatmaps(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesDNAMethylationHeatmaps list.
## For each analysis type, results are included in sub-lists, each
## of which contains heatmaps for the topGenes and topTFs, unless the
## top genes are all TFs, in which case the separate top TFs output is skipped.

## Note: Since we performed analyses only using TFs in the step 3 function,
## the top genes are all TFs, so topTFs will be NA here.
names(metadata(
    exampleObject
)$step7TopGenesDNAMethylationHeatmaps$hypomethGplusResults)
```

## `step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps`: Generate binary heatmaps displaying which of the top genes and transcription factors share links with each of the unique regulatory element DNA methylation sites linked to at least one top gene/TF

These binary heatmaps provide a visual representation of which of the top genes
each RE DNA methylation site is linked to, as RE DNA methylation sites can be
linked to multiple genes. RE DNA methylation sites are displayed in the
columns, while the top genes are displayed in the rows. Black indicates that a
given RE DNA methylation site is linked to that gene, and white indicates it is
not. Dendrograms are included to identify blocks of RE DNA methylation sites
that are linked to similar genes.

```{r fig.width=20, fig.height=14, out.width="80%", out.height="80%"}
## Run the step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps function
exampleObject <- step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps
## list. For each analysis type, results are included in sub-lists, each of
## which contains lists with results for the topGenes and topTFs, unless
## the top genes are all top TFs,  in which case the separate top TFs output is
## skipped. For each of these, the heatmap is generated along with a data frame
## with the correlation values displayed in the heatmap.

## Display the binary heatmap.
## Note: Since we performed analyses only using TFs in the step 3 function,
## the top genes are all TFs, so topTFs will be NA here.
metadata(
    exampleObject
)$step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps$
    hypomethGplusResults$topGenes$heatmap

## Display a subset of the data frame noting the presence/absence of links.
## 1 indicates a link, while 0 indicates no link.
head(
    metadata(
        exampleObject
    )$step7TopGenesOverlappingLinkedDNAMethylationSitesHeatmaps$
        hypomethGplusResults$topGenes$linkTable[
        , seq_len(6)
    ]
)
```

## `step7TopGenesSurvival`: Perform Kaplan-Meier and Cox regression analyses to assess the association of patient survival with the expression of top genes and transcription factors and methylation of their linked RE DNA methylation sites

This function takes the top genes and transcription factors (TFs) by number
of linked RE DNA methylation sites identified by the
`step6DNAMethylationSitesPerGeneTabulation` function, up to the number
specified by the user, along with patient survival data, and generates plots
and tables with statistics assessing the association of patient survival with
the expression of top genes and transcription factors and methylation of
their linked RE DNA methylation sites, using groupings based on percentile
cutoffs or Jenks natural breaks for Kaplan-Meier analyses.

First, we load the example survival status and survival time data from the
`exampleTENETClinicalDataFrame` object.

```{r}
## Load the exampleTENETClinicalDataFrame object from the TENET.ExperimentHub
## package. It contains the vital_status (survival status) and time (survival
## time) data for each sample in the exampleTENETMultiAssayExperiment
exampleTENETClinicalDataFrame <-
    TENET.ExperimentHub::exampleTENETClinicalDataFrame()
vitalStatusData <- subset(
    exampleTENETClinicalDataFrame,
    select = "vital_status"
)
survivalTimeData <- subset(exampleTENETClinicalDataFrame, select = "time")
```

The vital status dataset is a data frame with rownames representing sample
names and the first column representing the vital status. Sample values are
either "alive" or "dead" (case-insensitive) or 1 or 2, indicating that samples
were collected from a patient who was alive/censored or dead/reached the
outcome of interest, respectively.

Similarly, the survival time dataset is a data frame with rownames representing
sample names and the first column representing the survival time of the patient
the sample was derived from.

```{r}
## Show the vital status data
head(vitalStatusData)

## Show the survival time data
head(survivalTimeData)
```

Next, we perform the survival analysis using the vital status and survival time
data.

In our first example, we use the default sample grouping settings, which assign
the top and bottom 50% of the samples to high and low methylation and
expression, respectively.

```{r fig.width=10, fig.height=7}
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7TopGenesSurvival(
    TENETMultiAssayExperiment = exampleObject,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    vitalStatusData = vitalStatusData,
    survivalTimeData = survivalTimeData,
    topGeneNumber = 10,
    generatePlots = TRUE
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesSurvival list.
## For each analysis type, results are included in sub-lists, each
## of which contains lists with results for topGenes and topTFs, unless the
## top genes are all TFs, in which case the separate top TFs output is skipped.
## Each includes two data frames with the survival statistics for Kaplan-Meier
## and Cox regression survival analyses, and if the generatePlots
## argument is TRUE, topGenesSurvivalPlots and topMethylationSitesSurvivalPlots
## lists are included which contain the Kaplan-Meier survival plots for the top
## genes and each of their unique linked RE DNA methylation sites, respectively.
names(
    metadata(
        exampleObject
    )$step7TopGenesSurvival$hypomethGplusResults$topGenes
)

## The topGenesSurvivalStats and topMethylationSitesSurvivalStats variables are
## data frames containing survival statistics.
## Note: A significant amount of data is output, so selected values are shown
## here.
head(
    metadata(
        exampleObject
    )$step7TopGenesSurvival$hypomethGplusResults$
        topGenes$topGenesSurvivalStats[
        , c(1:2, 15, 17, 22:24, 26)
    ]
)

head(
    metadata(
        exampleObject
    )$step7TopGenesSurvival$hypomethGplusResults$
        topGenes$topMethylationSitesSurvivalStats[
        , c(1, 24, 26, 31:33, 35)
    ]
)

## Show the names of the gene survival plots
names(
    metadata(
        exampleObject
    )$step7TopGenesSurvival$hypomethGplusResults$
        topGenes$topGenesSurvivalPlots
)

## Plot the Kaplan-Meier survival plot for GATA3 (ENSG00000107485) as an example
metadata(
    exampleObject
)$step7TopGenesSurvival$hypomethGplusResults$
    topGenes$topGenesSurvivalPlots$ENSG00000107485
```

Sample groups can also be automatically calculated using Jenks natural breaks.
To demonstrate, we perform the same analysis using three natural breakpoints in
the data.

```{r fig.width=10, fig.height=7}
## Use the example dataset to perform the survival analysis
exampleObject <- step7TopGenesSurvival(
    TENETMultiAssayExperiment = exampleObject,
    jenksBreaksGroupCount = 3
)

## Plot the Kaplan-Meier survival plot for GATA3 (ENSG00000107485) as an example
metadata(
    exampleObject
)$step7TopGenesSurvival$hypomethGplusResults$
    topGenes$topGenesSurvivalPlots$ENSG00000107485
```

Custom sample groups used for the survival analysis can also be specified. This
allows the use of any number of groups with arbitrary names and multiple cutoff
ranges. To demonstrate, we perform the same analysis with four custom groups.

```{r fig.width=10, fig.height=7}
## Create a custom cutoff matrix which will split the samples into quartiles
## for the purposes of the survival analyses and will also define custom
## names for these groups
cutoffMatrix <- data.frame(
    "Low" = c(0, (1 / 4), (1 / 2), (3 / 4)),
    "High" = c((1 / 4), (1 / 2), (3 / 4), 1)
)
rownames(cutoffMatrix) <- c(
    "GroupOne",
    "GroupTwo",
    "GroupThree",
    "GroupFour"
)

## Use the example dataset and cutoffMatrix to perform the survival analysis
exampleObject <- step7TopGenesSurvival(
    TENETMultiAssayExperiment = exampleObject,
    survivalGroupingCutoffs = cutoffMatrix
)

## Plot the Kaplan-Meier survival plot for GATA3 (ENSG00000107485) as an example
metadata(
    exampleObject
)$step7TopGenesSurvival$hypomethGplusResults$
    topGenes$topGenesSurvivalPlots$ENSG00000107485
```

## `step7TopGenesTADTables`: Create tables using user-supplied topologically associating domain (TAD) information which identify the TADs containing each RE DNA methylation site linked to the top genes and transcription factors, as well as other genes in the same TAD as potential downstream targets

This function requires the user to supply either a path to a directory
containing BED-like files with TAD regions of interest, or a single TAD object
given as a GRanges, data frame, or matrix object, as seen in the example below.
To illustrate the use of this function, we will use an example TAD dataset from
the TENET.ExperimentHub package.

```{r}
## Load the example TAD dataset from the TENET.ExperimentHub package
exampleTADRegions <- TENET.ExperimentHub::exampleTENETTADRegions()

## TAD files for this function must include the chromosome of each TAD region
## in the first column, and the start and end positions of each in the second
## and third columns respectively. Additional columns can be included but
## are not considered in this function.
class(exampleTADRegions)
head(exampleTADRegions)
```

The unique RE DNA methylation sites linked to the top genes, as selected by the
user, will be overlapped with the TAD files, and genes within the same TAD of
each RE DNA methylation site will be recorded (as possible downstream target
genes for the regulatory elements represented by those RE DNA methylation sites,
for further analysis purposes).

```{r}
## Use the example TAD object to perform TAD overlapping.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7TopGenesTADTables(
    TENETMultiAssayExperiment = exampleObject,
    TADFiles = exampleTADRegions,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE,
    topGeneNumber = 10
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesTADTables list.
## For each analysis type, results are included in sub-lists, each
## of which contains results in the form of a data frame for topGenes and
## topTFs, unless the top genes are all TFs, in which case
## the separate top TFs output is skipped.
class(
    metadata(
        exampleObject
    )$step7TopGenesTADTables$hypomethGplusResults$topGenes
)

## Display results for selected hypomethylated RE DNA methylation sites. A
## variety of data are included for each RE DNA methylation site, including its
## location, the top genes it is linked to, and information on the count and
## identities of other genes found within the same TAD of the RE DNA methylation
## site. Note: A significant amount of data is output, so selected values are
## shown here.
head(
    metadata(
        exampleObject
    )$step7TopGenesTADTables$hypomethGplusResults$topGenes[
        c(1:6, 16:17)
    ]
)
```

## `step7TopGenesUCSCBedFiles`: Create BED-formatted interact files which can be loaded on the UCSC Genome Browser to display links between top genes and transcription factors and their linked RE DNA methylation sites

This function will output interact files in the specified folder. For the
purposes of this example, we will save the output file in a temporary folder.
This interact file can be uploaded to the UCSC Genome Browser to visualize the
links between each of the top genes/TFs and their linked RE DNA methylation
sites.

```{r}
## Get the path to a temporary directory in which to save the output interact
## file
tempDirectory <- tempdir()

## Run the step7TopGenesUCSCBedFiles function.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
filePaths <- step7TopGenesUCSCBedFiles(
    TENETMultiAssayExperiment = exampleObject,
    outputDirectory = tempDirectory,
    hypomethGplusAnalysis = TRUE,
    hypermethGplusAnalysis = FALSE,
    topGeneNumber = 10
)

## Unlike other functions, this function does not return the
## given TENETMultiAssayExperiment with additional information generated by the
## function in its metadata, but rather returns an object with information on
## where to find the created interact file.

## Get the path of the output file for top genes with hypomethylated G+ links
bedPath <- filePaths$hypoGplus$topGenes

## Read the first few lines of the file.
## The file largely contains information about each RE DNA methylation site-gene
## link, with additional information in the first line which allows it to be
## loaded by the UCSC Genome Browser.
cat(head(readLines(bedPath)), sep = "\n")

## Delete the output file, since this is just an example.
## Do not use this line if running on real data, as it will delete your created
## file.
invisible(file.remove(unlist(bedPath)))
```

## `step7TopGenesUserPeakOverlap`: Identify if RE DNA methylation sites linked to top genes and transcription factors are located within a specific distance of specified genomic regions

The `step7TopGenesUserPeakOverlap` function requires the user to supply one or
more paths to BED-like files with peaks of interest, direcctories containing
these files, or peak datasets given as GRanges, data frame, or matrix objects.
If multiple inputs are provided, they should be passed in the form of a named
list. To illustrate the use of this function, we will use an example peak
dataset from the TENET.ExperimentHub package.

```{r}
## Load the example peak dataset from the TENET.ExperimentHub package
examplePeakDataset <- TENET.ExperimentHub::exampleTENETPeakRegions()

## Peak datasets must include the chromosome of each peak region
## in the first column, and the start and end positions of each peak in the
## second and third columns respectively. Peak names are taken from the fourth
## column of the input if it exists, or, if the input is a GRanges object, the
## names of the ranges. If no names are present, they are generated from peak
## coordinates and take the form
## '<chromosome>_<start>_<end>[.<optionalDuplicateNumber>]'. Input files may
## optionally be compressed (.gz/.bz2/.xz). Additional columns can be included,
## but are not used by this function.
class(examplePeakDataset)
head(examplePeakDataset)
```

The unique RE DNA methylation sites linked to the top genes will be overlapped
with the peak datasets, with a specified buffer region added to the RE DNA
methylation sites, so RE DNA methylation sites can be found in the vicinity of
peaks, rather than directly inside of them.

```{r}
## Run the step7TopGenesUserPeakOverlap function.
## Since we performed analyses only using TFs in the step 3 function, the
## top genes are all TFs, so a message that separate output for
## TFs will be skipped is displayed.
exampleObject <- step7TopGenesUserPeakOverlap(
    TENETMultiAssayExperiment = exampleObject,
    peakData = examplePeakDataset,
    hypermethGplusAnalysis = FALSE,
    hypomethGplusAnalysis = TRUE,
    topGeneNumber = 10,
    distanceFromREDNAMethylationSites = 100
)

## Results are included in the metadata of the returned MultiAssayExperiment
## object under the step7TopGenesUserPeakOverlap list.
## For each analysis type, lists of data frames with peak overlap information
## are included in sub-lists, with each list saved under the names topGenes and
## topTFs, unless the top genes are all TFs, in which case the separate top TFs
## output is skipped.
names(
    metadata(
        exampleObject
    )$step7TopGenesUserPeakOverlap$hypomethGplusResults$topGenes
)

## Display the peak overlap information for RE DNA methylation sites linked to
## the top TFs. Since a single R object was provided as input, the output list
## contains a single data frame named 'peakData'. This data frame contains peak
## names in the column names and RE DNA methylation site IDs in the row names.
## The Boolean values indicate whether each RE DNA methylation site overlaps
## with each peak.
metadata(
    exampleObject
)$step7TopGenesUserPeakOverlap$hypomethGplusResults$topGenes$
    peakFileOverlapInfo$peakData[seq_len(5), seq_len(5)]

## Display the data frame of information for RE DNA methylation sites linked to
## the top TFs. A variety of data are included for each RE DNA methylation site,
## including its location, the coordinates of its search window, and the top
## genes it is linked to.
head(
    metadata(
        exampleObject
    )$step7TopGenesUserPeakOverlap$hypomethGplusResults$topGenes$
        linkedDNAMethylationSiteInfo
)
```

# Datasets included in the TENET package

The following objects are contained in the TENET package. Since LazyData is not
enabled, objects will need to be accessed using the `data()` function, as
demonstrated below.

## `humanTranscriptionFactorList`: Human transcription factor list

A character vector of the Ensembl IDs of genes identified as human TFs by
Lambert SA et al (PMID: 29425488). Candidate proteins were manually examined by
a panel of experts based on available data. Proteins with experimentally
demonstrated DNA binding specificity were considered TFs. Other proteins, such
as co-factors and RNA binding proteins, were classified as non-TFs.
**Citation:** Lambert SA, Jolma A, Campitelli LF, et al. The Human
Transcription Factors. Cell. 2018 Feb 8;172(4):650-665. doi:
10.1016/j.cell.2018.01.029. Erratum in: Cell. 2018 Oct 4;175(2):598-599. PMID:
29425488.

```{r}
## Load the humanTranscriptionFactorList dataset
data("humanTranscriptionFactorList", package = "TENET")
## Display the names of the first few TFs on the list
head(humanTranscriptionFactorList)
```

## `humanTranscriptionFactorDb`: Human transcription factor database

A data frame with information on genes identified as human TFs by
Lambert SA et al (PMID: 29425488). Candidate proteins were manually examined by
a panel of experts based on available data. Proteins with experimentally
demonstrated DNA binding specificity were considered TFs. Other proteins, such
as co-factors and RNA binding proteins, were classified as non-TFs.
**Citation:** Lambert SA, Jolma A, Campitelli LF, et al. The Human
Transcription Factors. Cell. 2018 Feb 8;172(4):650-665. doi:
10.1016/j.cell.2018.01.029. Erratum in: Cell. 2018 Oct 4;175(2):598-599. PMID:
29425488.

```{r}
## Load the humanTranscriptionFactorDb dataset
data("humanTranscriptionFactorDb", package = "TENET")
## Display some information for the first few TFs in the dataset
humanTranscriptionFactorDb[seq_len(5), seq_len(5)]
```

# Acknowledgements

This work was supported in part by grants K01CA229995, R21CA260082,
R21CA264637, R21HG011506, and P30CA014089 from the National Institutes of
Health, grant W81XWH-21-1-0805 from the Department of Defense, pilot grants from
the USC Keck School of Medicine, the USC Center for Genetic Epidemiology, the
USC Norris Comprehensive Cancer Center [to S.K.R.], and the John H. Richardson
Endowed Postdoctoral Fellowship in Oncology Research [to D.J.M.].

# Session info

```{r}
sessionInfo()
```