---
title: "Introduction to *In Vitro-In Vivo* Extrapolation (IVIVE) with R Package httk"
author: "John Wambaugh and Elaina Kenyon"
date: "November, 2022"
output: rmarkdown::html_vignette
vignette: >
 %\VignetteIndexEntry{2) Introduction to IVIVE}
 %\VignetteEngine{knitr::rmarkdown}
 %\VignetteEncoding{UTF-8}
---
*Please send questions to wambaugh.john@epa.gov*

# Introduction
Chemical risk assessment depends on knowledge of inherent chemical hazard,
the dose-response relationship, and the extent of exposure that occurs 
([NASEM 2017](<https://doi.org/10.17226/24635>)).
High throughput screening (HTS) for *in vitro* bioactivity allows 
characterization of potential hazard for thousands of chemicals for which no 
other testing has occurred
([Judson et al., 2009](<https://doi.org/10.1289/ehp.0800168>)).

Toxicokinetics (TK) describes the Absorption, Distribution, Metabolism, and 
Excretion (ADME) of a chemical by the body. TK relates external exposures to 
internal tissue concentrations of chemical and
therefore informs the chemical dose-response relationship 
([Coecke et al., 2013](<https://doi.org/10.1016/j.tiv.2012.06.012>)). TK 
requires chemical-specific information that is traditionally obtained *in vivo*.
However, most chemicals do not have TK data, so a new approach methodology 
(NAM) uses *in vitro* TK methods adapted from the pharmaceutical industry to 
provide the necessary chemical-specific information
([Wambaugh et al., 2019](https://doi.org/10.1016/j.cotox.2019.07.001), 
[Chang et al., 2022](<https://doi.org/10.3390/toxics10050232>)).
High(er) throughput toxicokinetics (HTTK) involves the combination of 
chemical-specific *in vitro* TK data  and chemical-independent ("generic") TK 
models to predict TK 
([Breen et al., 2021](<https://doi.org/10.1080/17425255.2021.1935867>)).

The primary goal of HTTK is to provide a human dose context for bioactive 
*in vitro* concentrations from HTS (that is, *in vitro-in vivo* extrapolation, 
or IVIVE) ([Wetmore, 2015](<https://doi.org/10.1016/j.tox.2014.05.012>)).
IVIVE is Utilization of *in vitro* experimental data to predict phenomena 
*in vivo*.
In pharmaceutical development, HTTK methods allow IVIVE to estimate therapeutic 
doses for clinical studies – predicted concentrations are typically on the 
order of values measured in clinical trials 
([Wang, 2010](<https://doi.org/10.1124/dmd.110.032177>)).
High throughput screening + IVIVE can predict a daily chemical dose rate 
(mg/kg bw/day) that might be adverse 
([Paul Friedman et al., 2019](<https://doi.org/10.1093/toxsci/kfz201>)).

A secondary goal is to provide open source data and models for evaluation and 
use by the scientific community 
([Pearce et al., 2017](<https://doi.org/10.18637%2Fjss.v079.i04>)).

*The following material is from a lecture taught as part of the course "Computational Methods in Chemical Risk Assessment" at Rutgers University in September, 2019.*

## Prepare R to run the vignette
R package **knitr** generates html and PDF documents from this RMarkdown file,
Each bit of code that follows is known as a "chunk". We start by telling 
**knitr** how we want our chunks to look.
```{r configure_knitr, eval = TRUE}
knitr::opts_chunk$set(collapse = TRUE, comment = '#>')
options(rmarkdown.html_vignette.check_title = FALSE)
```

### Clear the memory

It is a bad idea to let variables and other information from previous R
sessions float around, so we first remove everything in the R memory.
```{r clear_memory, eval = TRUE}
rm(list=ls()) 
```

## HTTK Version

This vignette was created using **httk** v2.2.2 in 2022. Although we attempt to 
maintain backward compatibility, if you encounter issues with the latest 
release of **httk** and cannot easily address the changes, historical versions 
of **httk** are available from: 
https://cran.r-project.org/src/contrib/Archive/httk/


# Background

If you are familiar with computer files and R you might want to skip directly 
to "Step 1: Loading In Vitro Bioactivity Data".

## Files and Folders

R (or any computer program) needs to know where to look to find particular 
information (such as data generated by a panel of in vitro assays). 
We describe a chunk of information as a "file". Files range in size and can be 
small (even 0 bytes) or very large (several GB).
A file my physically be located on a disk drive inside your computing device 
or, increasingly, it might be located somewhere in a cloud storage network.
Regardless, most computers will treat a file as if it has a specific location.
Each file has a name, although multiple files can have the same name as long as 
those files are stored in different "folders".

Files for all the operating systems with R are organized using a folder (also 
called a "directory") and sub-folder structure. 
A sub-folder is a folder within another folder. They can be well organized:

/animals/cats/tabbies/

Or not:

/animals/gradschoolfiles/photosfromchildhood

The sequence of folders and sub-folders leading to a particular file is 
referred to as a "path". 
For information on how folder structures work, please check out these sites:

* <https://en.wikipedia.org/wiki/Directory_structure>
* <https://www.geeksforgeeks.org/structures-of-directory-in-operating-system/>

Within the command shell you use the "cd" command to "change directories".

One complication to watch out for is the direction of the symbol that separates 
folder names in the path.
On Windows the path is separated by "back-slashes": \\

\\animals\\cats\\black

On Unix-based operating systems (including Linux and MacOS) the path is 
separated by "forward slashes": /

/animals/cats/black

R always uses Unix forward slashes, even on a Windows machine, so if you know 
that the path is:

c:\\users\\USERNAME\\Downloads\\

You need to tell R that you want:

c:/users/USERNAME/Downloads/

Finally, the bit at the front of the path (such as "C:\\" or "L:\\" tells the 
computer which "drive" to look in to find your folders.
Often "C drive" (c:\\) is physically inside your machine while other letters 
might indicate network (or cloud) drives.

### Home Directory
Whether you are on Linux or Windows you have a home directory. 
Often you will only be able to write and edit files within your home directory 
and its sub-directories. However, you may be able to read and copy from most 
other folders.

On Windows your home directory will be a sub-directory of "C:\\users\\". For 
example, "C:\\users\\jwambaug".

### Common and Important Directories
You will want to create a directory for your R packages locally on your machine:

C:\\users\\USERNAME\\AppData\\Loca\\R

### Download Folders
Many web browsers put their files in a default directory such as

C:\\users\\USERNAME\\Downloads

### Temporary (or "Swap") Space
On Windows the path "c:\\" refers to the physical hard drive inside your 
computer. Therefore the only copy of "c:\\users\\USERNAME\\" is on your 
computer and if that drive or the whole computer is damaged or destroyed you 
have lost those files. For this reason we call this "temporary" space. You 
don't want to keep the only copies of your files there.

## Installing Necessary Software

* Users will need the freely available R statistical computing language: <https://www.r-project.org/>
* Users will likely want a development environment like RStudio: <https://posit.co/download/rstudio-desktop/>
* If you get the message "Error in library(X) : there is no package called 'X'" then you will need to install that package: 
	From the R command prompt: install.packages("X") 
  Or, if using RStudio, look for 'Install Packages' under the 'Tools' tab.
* Note that R does not recognize fancy versions of quotation marks ‘,$~$’,$~$“, or$~$”. 
If you are cutting and pasting from software like Word or Outlook you may need 
to replace the quotation marks that curve toward each other with ones typed by 
the keyboard.

### Installing R package "httk"
```{r install_httk, eval = FALSE}
install.packages("httk")
```

### Installing R package "readxl"
```{r install_readxl, eval = FALSE}
install.packages("readxl")
```

### Installing R package "ggplot2"
```{r install_ggplot2, eval = FALSE}
install.packages("ggplot2")
```

### eval = execute.vignette
If you are using the RMarkdown version of this vignette (extension, .RMD) you
will be able to see that several chunks of code in this vignette
have the statement "eval = execute.vignette". The next chunk of code, by default,
sets execute.vignette = FALSE.  This means that the code is included (and necessary) but was
not run when the vignette was built. We do this because some steps require 
extensive computing time and the checks on CRAN limit how long we can spend
building the package. If you want this vignette to work, you must run all code,
either by cutting and pasting it into R. Or, if viewing the .RMD file, you can
either change execute.vignette to TRUE or press "play" (the green arrow) on
each chunk in RStudio.
```{r runchunks, eval = TRUE}
# Set whether or not the following chunks will be executed (run):
execute.vignette <- FALSE
```

## Load the necessary packages:
```{r load_packages, eval = execute.vignette}
library(httk)
library(readxl)
library(ggplot2)
```

## Control run speed
Portions of this vignette use Monte Carlo sampling to simulate variability and
propagate uncertainty. The more samples that are used, the more stable the 
results are (that is, the less likely they are to change if a different random
sequence is used). However, the more samples that are used, the longer it takes
to run. So, to speed up how fast these examples run, we specify here that we
only want to use 5 samples, even though the actual **httk** default is 1000.
Increase this number to get more stable (and accurate) results:
```{r MC_samples, eval = execute.vignette}
NSAMP <- 5
```

# Step One: Loading *In Vitro* Screening Data

The [ToxCast](https://doi.org/10.1021/tx3000939) and 
[Tox21](https://doi.org/10.1289/ehp.1205784) research programs
employ batteries of high-throughput assays to assess
chemical bioactivity *in vitro*.
Not every chemical is tested through every assay 
([Richard et al., 2016](https://doi.org/10.1289/ehp.1205784)). 
Most assays are conducted in concentration response,
and each corresponding assay endpoint is analyzed statistically
to determine if there is a concentration-dependent response or
“hit” using the ToxCast Pipeline 
([Filer et al., 2017](https://doi.org/10.1093/bioinformatics/btw680)). 
Most assay
endpoint-chemical combinations are nonresponsive. Here, only
the hits are treated as potential indicators of bioactivity.

*In vitro* bioactivity does not necessarily indicate adversity or hazard.
Biological models can be used to make predictions of
toxicity based on *in vitro* data (for example, 
[Sipes et al., 2011](https://doi.org/10.1093/toxsci/kfr220), [Browne et al., 2015](https://doi.org/10.1021/acs.est.5b02641) and [Kleinstreuer et al., 2017](https://doi.org/10.1021/acs.chemrestox.6b00347)).
However, here we make a more 
conservative, precautionary assumption that concentrations too low to cause 
*in vitro* bioactivity are more likely to be safe
([Paul Friedman et al., 2019](<https://doi.org/10.1093/toxsci/kfz201>)).
Among all of the assay hits for each chemical, we choose to use the lower 10th 
percentile
of the $\mu M$ potencies (50% active concentration or $AC_{50}$). We are 
assuming that the
10th percentile represents a low concentration for activating
multiple assays (assuming 10 or more bioactive assays were observed
for a given chemical). However, this bioactivity does not necessarily
have a direct toxicological interpretation. 

## Downloading ToxCast Data

The main page for the data is here:
https://www.epa.gov/comptox-tools/exploring-toxcast-data

Most useful to us is a single file containing all the hits across all chemicals
and assays:
https://clowder.edap-cluster.com/datasets/6364026ee4b04f6bb1409eda?space=62bb560ee4b07abf29f88fef

As of November, 2022 the most recent version was 3.5 and was available as an
.Rdata file (invitrodb_3_5_mc5.Rdata), which we can download and place in our 
working directory (FOLDERPATH).

## Loading the Data into R

Don't forget to use "setwd(FOLDERPATH)" to change to your R working folder:
```{r change_directory, eval = FALSE}
setwd("FOLDERPATH")
```

Then load the file:
```{r load_toxcast, eval = FALSE}
load("invitrodb_3_5_mc5.Rdata")
```
We now have a file "mc5" in our woking environment. It's absolutely huge, so
lets shrink it down to contain only the chemicals we care about.

This list could be chemicals with HTTK data (note, this step
cuts things down to chemicals with human HTTK data, you could change species to
rat and get something different).

```{r subset_toxcast1, eval = FALSE}
toxcast.httk <- subset(mc5, dsstox_substance_id %in% get_cheminfo(
         info="DTXSID",
         suppress.messages=TRUE))
```

# Critical Step: Select your chemicals of interest
We might have chemicals we are interested in for one reason or another --
you could type any chemical ID's you want into the following my.chems vector,
or even load it from a file. 

Here we'll pick 50 chemicals at random from among 
the ToxCast chemicals:
```{r subset_toxcast2, eval = FALSE}
set.seed(1234)
my.chems <- sample(mc5$dsstox_substance_id,50)
example.toxcast <- as.data.frame(mc5[mc5$dsstox_substance_id %in% my.chems,])
```
Unfortunately for this vignette there are too many ToxCast data to fit into a
5mb R package. So we will subset to just the selected chemicals and distribute
only those data. In addition, out of 78 columns in the data, we will keep only 
eight. Download the full data following the instructions above.
```{r subset_toxcast3, eval = FALSE}
example.toxcast <- example.toxcast[, c("chnm",
              "dsstox_substance_id",
              "spid",
              "hitc",
              "modl",
              "aeid",
              "modl_ga",
              "modl_ac10",
              "modl_acc")]
# reduce precision to decrease space:
cols <- c("modl_ga", "modl_ac10", "modl_acc")
for (this.col in cols)
  example.toxcast[, this.col] = signif(example.toxcast[, this.col], 3)
save(example.toxcast,file="introtoivive-toxcast.Rdata",version=2)
```

## Interpreting the Assay Information

In the mc5 table, "aenm" gives the assay name while "dsstox_substance_id" is the
chemical identity from the 
[CompTox Chemicals Dashboard](https://comptox.epa.gov/dashboard). 
Since both names and CAS-RN can be subject to variations, we track chemical 
identities with the DSSTox Substance ID’s or “DTXSIDs” 
([Grulke et al., 2019](https://doi.org/10.1016/j.comtox.2019.100096)).

The column "hitc" is 1 (that is, "TRUE") if there was a statistically-signficant
systematic concentration-response observed (this is a "hit"). "hitc" is 0 
(FALSE) if there was no hit. The column "mc6_flags" indicates features of the
hit that might be of interest for further examination but are not necessarily
disqualifying.

The column "modl" indicates the type of concentration-response modl that was 
selected (based on 
[Akaike Information Criterion](https://doi.org/10.1002/wics.1460))
to best describe the data. The possibilities are constant ("cnst" -- no 
concentation response), 
[Hill equation](https://en.wikipedia.org/wiki/Hill_equation_(biochemistry)) 
("hill"),
or gain-loss model ("gnls" -- the product of an increasing and decreasing Hill
equations). "gnls" is often interpreted as being caused
by cytotoxicity.

The column "modl_ga" indicates the log10 uM concentration at which the chemical 
caused 50% activity. This is also called the chemical-assay “potency” or 
“$AC_{50}$”.

Each chemical might have multiple assays for which there is a hit. Each chemical 
may have been run multiple times (experimental replicates), so that there may be 
multiple samples from the same chemical for the same assay. Different samples 
are indicated by the column “spid” (sample id).

```{r display_toxcast, eval = TRUE}
example.toxcast <- httk::example.toxcast
knitr::kable(head(example.toxcast), caption = "ToxCast In Vitro Bioactivity Data",
             floating.environment="sidewaystable")
```

## Find the concentrations of interest

Then we can build a table summarizing the *in vitro* data. For a "hit", the 
"AC50" indicates the 50%
activation concentration from the best curve-fit, "AC10" indicates the concentration
estimated to cause 10%, and "ACC" is the "activity concentration at cutoff" are 
which is the concentration that first causes statistically significant activity.
See [Filer et al. (2017)](https://doi.org/10.1093/bioinformatics/btw680) figure 
three for additional discussion.

To speed things up we only look at first ten chemials:
```{r toxcast_summary.table, eval = TRUE}
toxcast.table <- NULL
for (this.id in unique(example.toxcast$dsstox_substance_id)[1:10])
{
  this.subset <- subset(example.toxcast, dsstox_substance_id == this.id)
  these.hits <- subset(this.subset, hitc==1)
  if (dim(these.hits)[1]>0){
      this.row <- data.frame(Compound=as.character(this.subset[1,"chnm"]),
                         DTXSID=this.id,
                         Total.Assays = dim(this.subset)[1],
                         Unique.Assays = length(unique(this.subset$aeid)),
                         Total.Hits = dim(these.hits)[1],
                         Unique.Hits = length(unique(these.hits$aeid)),
                         Low.AC50 = signif(min(these.hits$modl_ga),3),
                         Low.AC10 = signif(min(these.hits$modl_ac10),3),
                         Low.ACC = signif(min(these.hits$modl_acc),3),
                         Q10.AC50 = signif(quantile(these.hits$modl_ga,probs=0.1),3),
                         Q10.AC10 = signif(quantile(these.hits$modl_ac10,probs=0.1),3),
                         Q10.ACC = signif(quantile(these.hits$modl_acc,probs=0.1),3),
                         Med.AC50 = signif(median(these.hits$modl_ga),3),
                         Med.AC10 = signif(median(these.hits$modl_ac10),3),
                         Med.ACC = signif(median(these.hits$modl_acc),3),
                         Q90.AC50 = signif(quantile(these.hits$modl_ga,probs=0.9),3),
                         Q90.AC10 = signif(quantile(these.hits$modl_ac10,probs=0.9),3),
                         Q90.ACC = signif(quantile(these.hits$modl_acc,probs=0.9),3)
                         )
    toxcast.table <- rbind(toxcast.table, this.row)
  }
}
rownames(toxcast.table) <- seq(1,dim(toxcast.table)[1])
knitr::kable(head(toxcast.table[,1:6]), caption = "Summarized ToxCast Data",
             floating.environment="sidewaystable")
```


# Step Two: High Throughput Toxicokinetics from *In Vitro* Data

For each chemical in your subset of ToxCast, add the HTTK plasma steady-state 
concentration if it is available. calc_mc_css() gives us the predicted 
steady-state plasma concentration resulting from an ongoing 1 mg/kg/day
exposure.

## Steady State Plasma Concentration $C_{ss}$
```{r calc_css1, eval = execute.vignette}
for (this.id in unique(toxcast.table$DTXSID))
{
# get_cheminfo() gives a list of all the CAS numbers for which HTTK will work:
  if (this.id %in% get_cheminfo(info="dtxsid", suppress.messages=TRUE))
  {
# Set a random number generator seed so that the Monte Carlo will always give
# the same random sequence:
    set.seed(12345)
    toxcast.table[toxcast.table$DTXSID==this.id,"Css"] <- 
      calc_mc_css(dtxsid=this.id,
                  output.units="uM",
                  samples=NSAMP,
                  suppress.messages=TRUE)
    toxcast.table[toxcast.table$DTXSID==this.id,"Css.Type"] <- "in vitro"
  }
}
```

Let's look at just the relevant columns from our table. Any Css values of "NA" 
(not a number) indicate that we don't currently have *in vitro* TK data for 
those chemicals:
```{r css_table1, eval = execute.vignette}
knitr::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","Css.Type")], 
                                caption = "Summarized ToxCast Data",
             floating.environment="sidewaystable")
```


# Step Three: High Throughput Toxicokinetics with QSPR Predictions
For chemicals where no *in vitro* toxicokinetic data are available, we can 
sometimes load predictions from *in silico* models. We do not provide *in silico*
models for $f_{up}$ and $Cl_{int}$ themselves (in many cases those models are much larger
in terms of file size than the whole **httk** package). However we do have tables
of pre-made predictions form a variety of models. For example, you can
load the table of quantitative structure-property relationship (QSPR) model 
predictions from the 
[Sipes et al. (2017)](https://doi.org/10.1021/acs.est.7b00650) 
supplemental material:
```{r load_qspr, eval = execute.vignette}
load_sipes2017()
```
Now go through the list of chemicals again but only calculate if we didn't 
already have a value (NOTE: load_sipes2017 will not overwrite the *in vitro*
data by default, so you'll still get the same answer if you use the same random 
number seed, but you would also be wasting time):
```{r calc_css2, eval = execute.vignette}
for (this.id in unique(toxcast.table$DTXSID))
{
  if (this.id %in% get_cheminfo(info="dtxsid", suppress.messages=TRUE) &
    is.na(toxcast.table[toxcast.table$DTXSID==this.id,"Css"]))
  {
# Set a random number generator seed so that the Monte Carlo will always give
# the same random sequence:
    set.seed(12345)
    toxcast.table[toxcast.table$DTXSID==this.id,"Css"] <- 
      calc_mc_css(dtxsid=this.id,
                  output.units="uM",
                  samples=NSAMP,
                  suppress.messages=TRUE)
    toxcast.table[toxcast.table$DTXSID==this.id,"Css.Type"] <- "in silico"
  }
}
```
Take another look at our table -- many of the gaps are now filled in. 
Any Css values of "NA" 
(not a number) indicate that we don't currently have *in vitro* TK data or 
*in silico* predictions from 
[Sipes et al. (2017)](https://doi.org/10.1021/acs.est.7b00650)
for those chemicals (note that there are also predictions available from
[Pradeep et al. (2020)](https://doi.org/10.1016/j.comtox.2020.100136) 
(load_pradeep2020) and 
[Dawson et al. (2021)](https://doi.org/10.1021/acs.est.0c06117) 
(load_dawson2021):
```{r css_table2, eval = execute.vignette}
knitr::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","Css.Type")], 
                                caption = "Summarized ToxCast Data",
             floating.environment="sidewaystable")
```


# Step Four: Reverse Dosimetry *In Vitro-In Vivo* Extrapolation
Most IVIVE calculations in **httk** take advantage of the fact that, to date,
the toxicokinetic models included in **httk** are linear in dose. What this means is
that your dose rate is $X~mg/kg/day$ and the $C_{ss}$ for $1~mg/kg/day$ is $Y~uM$, then 
the $C_{ss}$ for $X~mg/kg/day$ is $X*Y~uM$, that is:
$$C_{ss}\left(X~mg/kg/day\right) = X * C_{ss}\left(1~mg/kg/day\right)$$.
Using toxicokinetics to predict tissue concentrations resulting from a known
dose is known as "forward dosimetry". Using toxicokinetics to estimate the dose
that might have led to a known tissue cocentration is known as "reverse dosimetry"
[(Clewell, et al., 2008)](https://doi.org/10.1016/j.taap.2008.04.021).
**httk** allows the use of reverse dosimetry to calculate the equivalent 
steady-state dose to produce a plasma concentration 
equal to a bioactive concentration identified *in vitro* 
[(Breen et al., 2021)](https://doi.org/10.1080/17425255.2021.1935867). 
For example, we can start from the tenth percentile ToxCast $AC_{50}$ and find 
the administered equivalent
dose that would cause that concentration in the plasma. We do this by dividing 
the *in vitro* concentration by the $C_{ss}$ from 1 mg/kg/day,
ending up with a dose rate with units of mg/kd/day that will product the
*in vitro* concentration:
$$AED = \frac{AC_{50}}{C_{ss}\left(1~mg/kg/day\right)}~mg/kg/day$$
where b0oth $AC_{50}$ and $C_{ss}$ must have the same units. 
Do not forget that the ToxCast concentrations are given on the log-10 scale:
```{r calc_equivalent_dose16, eval = execute.vignette}
toxcast.table$EquivDose <- signif(10^toxcast.table$Q10.AC50 / toxcast.table$Css,
                                  3)
```
Look at the table again:
```{r display_table2, eval = execute.vignette}
knitr::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","EquivDose")], 
                                 caption = "Summarized ToxCast Data",
              floating.environment="sidewaystable")          
```    

                  
# Step Five: Compare with Estimated Exposure Rates
The U.S. National Academy of Sciences, Engineering, and Medicine advises 
calculating chemical risk posed to the public health on the basis of hazard,
the dose-response relationship, and exposure 
[(NRC, 1983)](https://doi.org/10.17226/366). We can use *in vitro* bioactivity as
a surrogate for hazard and **httk** as a surrogate for the dose-reponse relationship.
However, to prioritize
chemicals on the basis of putative risk we need some estimate of human exposure.
Because most chemicals lack measured
data characterizing population intake rates (mg/kg body weight/
day) ([Egeghy, 2012](https://doi.org/10.1016/j.scitotenv.2011.10.046)), a 
consensus prediction from EPA's Systematic Empirical Evaluation of Models (SEEM)
is often the only option available for exposures. 
The SEEM consensus prediction is composed of multiple exposure predictors 
aggregated on the basis of likely exposure pathways 
([Ring, 2018](https://doi.org/10.1021/acs.est.8b04056)). The individual
exposure predictors are weighted based upon their ability to predict intake 
rates that could be inferred from biomonitoring data ([Ring,
2017](https://doi.org/10.1016/j.envint.2017.06.004); 
[Wambaugh, 2014](https://doi.org/10.1021/es503583j)).  
The SEEM exposure forecasts are uncertain, and the upper boundary of
the 95% credible interval is used here. Although a 95% credible
interval is calculated, this reflects uncertainty but not variability --
the [(Ring, 2018)](https://doi.org/10.1021/acs.est.8b04056) model predicts 
population median exposure rates for the U.S. population.

There are three ways to access the SEEM predictions. If you can access the journal article online, we can load the SEEM predictions
from [Supplemental Table 1](https://pubs.acs.org/doi/suppl/10.1021/acs.est.8b04056/suppl_file/es8b04056_si_002.zip)
of the Supporting Information for 
[Ring et al. (2018)](https://doi.org/10.1021/acs.est.8b04056). Don't forget that
you have to "unzip" a file that ends in ".zip" and place the ".txt" file in 
your working directory with your other files.
```{r load_seem1, eval = FALSE}
SEEM <- read.csv("SupTable-all.chem.preds-2018-11-28.txt",stringsAsFactors=F)
```

If you want, you can also download the predictions for specific chemicals from
the [Comptox Chemicals Dashboard](https://comptox.epa.gov/dashboard/). Be
sure to use Batch Search and select the option for "NHANES/Predicted Exposure".
You can download the file in a "comma separated values" or ".CSV" format, in 
which case you use "read.csv". Or you can download an Excel ".xls" file which
requires load the package *readXL*. 
```{r load_seem2, eval = FALSE}
SEEM <- read_excel("CCD-Batch-Search_DATE.xlsx",sheet=2)
```

Perhaps most easily we can grab SEEM predictions already in RData format from [GitHub](https://github.com/HumanExposure/SEEM3RPackage/tree/main/SEEM3Predictor/data)
Download the file Ring2018Preds.RData and load it:
```{r load_seem3, eval = FALSE}
load("Ring2018Preds.RData")
SEEM <- Ring2018.preds
```

Again we do not have the space to distribute all the SEEM predictions within
this R package, but we can give you our example chemicals:
```{r save_seem, eval = FALSE}
example.seem <- as.data.frame(subset(SEEM, 
                              dsstox_substance_id %in% toxcast.table$DTXSID))
for (this.col in 4:36) 
  example.seem[,this.col] <- signif(example.seem[,this.col],4)
save(example.seem,file="introtoivive-seem.Rdata",version=2)
```
Merge the consensus model predictions with your table:
```{r mergetoxexposure, eval = execute.vignette}
example.seem <- httk::example.seem
toxcast.table <- merge(toxcast.table,
                       example.seem[,c(
                         "dsstox_substance_id",
                         "seem3",
                         "seem3.l95",
                         "seem3.u95",
                         "Pathway",
                         "AD")],
                       by.x="DTXSID",
                       by.y="dsstox_substance_id",
                       all.x = TRUE)
```

Now we can calculate a bioactivity:exposure ratio (BER), which is a risk-like 
metric since it takes hazard, dose-response, and exposure into account
([Wetmore, 2015](https://doi.org/10.1093/toxsci/kfv171)):
```{r calc_ber, eval = execute.vignette}
toxcast.table$BER <- signif(toxcast.table$EquivDose/toxcast.table$seem3.u95, 3)
``` 

Reorder your table by BER:
```{r sort_by_ber, eval = execute.vignette}
toxcast.table <- toxcast.table[order(toxcast.table$BER),]
knitr::kable(toxcast.table[1:10,c("Compound","EquivDose","seem3.u95","Pathway","BER")], 
                                 caption = "Bioactivity:Exposure Ratios",
              floating.environment="sidewaystable")  
```

## Make a BER plot
Convert the chemical names into factors for plotting:
```{r chem_name_factors, eval = execute.vignette}
toxcast.table$Compound <- factor(toxcast.table$Compound,
                                    levels=toxcast.table$Compound)
```
Then we can create a plot where the BER for each chemical is shown by the margin 
between putative hazard (in red) and estimated exposure (in blue). The wider
the margin, the lesser the risk. Chemicals are arranged from left-to-right by
increasing margin (that is, decreasing risk).
```{r ber_plot, eval = execute.vignette}
BER.plot <- ggplot(data=toxcast.table) +
  geom_segment(aes(x=Compound,
                   xend=Compound,
                   y=(10^Q10.AC50)/Css,
                   yend=(10^Q90.AC50)/Css),
               size=1,
               color="red",
               alpha=0.5) + 
  geom_segment(aes(x=Compound,
                   xend=Compound,
                   y=seem3.l95,
                   yend=seem3.u95),
               size=1,
               color="blue",
               alpha=0.5) +   
  geom_point(aes(x=Compound,y=(10^Med.AC50)/Css),shape=3,color="red") +
  geom_point(aes(x=Compound,y=seem3),shape=3,color="blue") +
  theme_bw() +
  theme(axis.title.x = element_text(size=8)) + 
  theme(axis.title.y = element_text(size=8)) + 
  scale_y_log10() + 
  theme(axis.text.x = element_text(
    face = "bold", 
    hjust = 1, 
    vjust = 1, 
    size = 4, 
    angle = 45)) +
  ylab("Bioactive Dose & Exposure\nmg/kg BW/day") + 
  xlab("Chemicals Ranked By BER")
print(BER.plot)
```