---
title: "Troubleshooting"
author: 
  - name: Zachary Kurtz
    email: zdkurtz@gmail.com
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r format(Sys.time(), '%B %d, %Y')`"
package: "SpiecEasi"
version: "1.99.5"
vignette: >
  %\VignetteIndexEntry{Troubleshooting}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, echo = FALSE, eval=TRUE, message=FALSE}
library(BiocStyle)
knitr::opts_knit$set(
  upload.fun = NULL,
  base.url = NULL) # use local files for images
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)
# Override BiocStyle plot hook to avoid css_align issues
knitr::knit_hooks$set(plot = function(x, options) {
  paste0('![', basename(x), '](', x, ')')
})
runchunks = if (Sys.getenv("FORCE_VIGNETTE_REBUILD", "FALSE") == "TRUE") TRUE else FALSE

cache_file <- '../vignette_cache/troubleshooting.RData'
if (!runchunks && file.exists(cache_file)) {
  load(cache_file)
  # If we loaded trimmed objects, reassign them to original names
  if (exists("se.trimmed")) {
    se <- se.trimmed
    rm(se.trimmed)
  }
  if (exists("se4.trimmed")) {
    se4 <- se4.trimmed
    rm(se4.trimmed)
  }
  if (exists("se5.trimmed")) {
    se5 <- se5.trimmed
    rm(se5.trimmed)
  }
} else {
  if (!runchunks) {
    message("Cache file troubleshooting.RData not found - building from scratch")
  }
  runchunks <- TRUE
}
saveout   = runchunks

```

# Troubleshooting

A common issue that comes up when running `spiec.easi` is coming up with an empty network after running StARS.

For example:

```{r, echo=TRUE, message=FALSE, warning=TRUE, eval=runchunks}
library(SpiecEasi)
data(amgut1.filt)

pargs <- list(seed=10010)
se3 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=5e-1, nlambda=10, pulsar.params=pargs)
```
```{r, eval=TRUE}
getOptInd(se3)
sum(getRefit(se3))/2
```
As the warning indicates, the network stability could not be determined from the lambda path. Looking at the stability along the lambda path, `se$select$stars$summary`, we can see that the maximum value of the StARS summary statistic never crosses the default threshold (0.05).

This problem we can fix by lowering `lambda.min.ratio` to explore denser networks:

```{r, message=FALSE, warning=FALSE, eval=runchunks}
se4 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=10, pulsar.params=pargs)
```

We have now fit a network, but since we have only a rough, discrete sampling of networks along the lambda path, we should check how far we are from the target stability threshold (0.05):

```{r}
getStability(se4)
sum(getRefit(se4))/2
```

To get closer to the mark, we should bump up `nlambda` to more finely sample of the lambda path, which gives a denser network:

```{r, message=FALSE, warning=FALSE, eval=runchunks}
se5 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=100, pulsar.params=pargs)
```

```{r,  message=FALSE, warning=FALSE}
getStability(se5)
sum(getRefit(se5))/2
```

## Common issues and solutions

### 1. Empty networks

**Problem**: After running `spiec.easi`, you get an empty network (no edges).

**Solutions**:
- Lower `lambda.min.ratio` to explore denser networks
- Increase `nlambda` for finer sampling of the lambda path
- Check if your data has sufficient signal-to-noise ratio
- Try different methods ('mb' vs 'glasso')

### 2. Very dense networks

**Problem**: The inferred network has too many edges.

**Solutions**:
- Increase `lambda.min.ratio` to explore sparser networks
- Adjust the StARS threshold in `pulsar.params`
- Use cross-validation instead of StARS

### 3. Computational issues

**Problem**: The analysis takes too long or runs out of memory.

**Solutions**:
- Use parallel processing with `ncores` parameter (Unix-like systems only)
- Use B-StARS method for large datasets
- Reduce `rep.num` in pulsar.params
- Use batch mode for HPC clusters

### 4. Windows parallel processing issues

**Problem**: Error "'mc.cores' > 1 is not supported on Windows"

**Solutions**:
- Use `ncores=1` for serial processing on Windows
- Use snow cluster for parallel processing on Windows:
```{r, eval=FALSE}
library(parallel)
cl <- makeCluster(4, type = "SOCK")
pargs.windows <- list(rep.num=50, seed=10010, cluster=cl)
se.windows <- spiec.easi(data, method='mb', pulsar.params=pargs.windows)
stopCluster(cl)
```
- Use batch mode which works on all platforms
- Consider using WSL (Windows Subsystem for Linux) for Unix-like environment

### 5. Convergence issues

**Problem**: The algorithm doesn't converge or gives warnings.

**Solutions**:
- Check data preprocessing and normalization
- Ensure data doesn't have constant columns
- Try different starting values
- Check for missing or infinite values

### 6. Memory issues

**Problem**: R runs out of memory during analysis.

**Solutions**:
- Use sparse matrices where possible
- Reduce dataset size by filtering rare taxa
- Use batch processing for large datasets
- Increase system memory if available

## Platform-specific considerations

### Windows users:
- Default parallel processing (`mc.cores > 1`) is not supported
- Use `ncores=1` for serial processing
- Use snow cluster for parallel processing
- Consider batch mode for large datasets

### Unix-like systems (Linux, macOS):
- Full support for parallel processing with `mc.cores`
- Can use `ncores` parameter directly
- Both multicore and snow clusters available

## Diagnostic functions

SpiecEasi provides several functions to help diagnose issues:

```{r, eval=FALSE}
# Check stability along lambda path
getStability(se)

# Get optimal lambda index
getOptInd(se)

# Get summary statistics
se$select$stars$summary

# Check network density
sum(getRefit(se))/2

# Visualize stability curve
plot(se$select$stars$summary)

# Check platform information
.Platform$OS.type
```

## Parameter tuning guidelines

### For small datasets (< 100 samples, < 50 taxa):
- `lambda.min.ratio = 1e-2`
- `nlambda = 20-50`
- `rep.num = 20-50`

### For medium datasets (100-1000 samples, 50-200 taxa):
- `lambda.min.ratio = 1e-3`
- `nlambda = 50-100`
- `rep.num = 50-100`
- Use parallel processing (Unix-like systems only)

### For large datasets (> 1000 samples, > 200 taxa):
- `lambda.min.ratio = 1e-4`
- `nlambda = 100+`
- `rep.num = 100+`
- Use B-StARS method
- Consider batch processing

### Windows-specific recommendations:
- Use `ncores=1` for serial processing
- Use snow cluster for parallel processing
- Consider batch mode for large datasets
- Use B-StARS method to reduce computational time

Session info:
```{r}
sessionInfo()
```

```{r, echo = FALSE, eval=TRUE, message=FALSE}
# Save objects if they exist
if (exists("se") && exists("pargs")) {
  cache_file <- '../vignette_cache/troubleshooting.RData'
  tryCatch({
    # Load trimming function and trim objects to reduce size
    source('../vignette_cache/trim_spiec_easi.R')
    se.trimmed <- trim_spiec_easi(se)
    se4.trimmed <- trim_spiec_easi(se4)
    se5.trimmed <- trim_spiec_easi(se5)
    
    # Save trimmed objects
    save(se.trimmed, se4.trimmed, se5.trimmed, pargs, file=cache_file)
    message("Saved trimmed cache file: ", cache_file, " in directory: ", getwd())
  }, error = function(e) {
    message("Failed to save cache file: ", e$message)
  })
}
```