--- title: "Troubleshooting" author: - name: Zachary Kurtz email: zdkurtz@gmail.com output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r format(Sys.time(), '%B %d, %Y')`" package: "SpiecEasi" version: "1.99.5" vignette: > %\VignetteIndexEntry{Troubleshooting} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo = FALSE, eval=TRUE, message=FALSE} library(BiocStyle) knitr::opts_knit$set( upload.fun = NULL, base.url = NULL) # use local files for images knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) # Override BiocStyle plot hook to avoid css_align issues knitr::knit_hooks$set(plot = function(x, options) { paste0('![', basename(x), '](', x, ')') }) runchunks = if (Sys.getenv("FORCE_VIGNETTE_REBUILD", "FALSE") == "TRUE") TRUE else FALSE cache_file <- '../vignette_cache/troubleshooting.RData' if (!runchunks && file.exists(cache_file)) { load(cache_file) # If we loaded trimmed objects, reassign them to original names if (exists("se.trimmed")) { se <- se.trimmed rm(se.trimmed) } if (exists("se4.trimmed")) { se4 <- se4.trimmed rm(se4.trimmed) } if (exists("se5.trimmed")) { se5 <- se5.trimmed rm(se5.trimmed) } } else { if (!runchunks) { message("Cache file troubleshooting.RData not found - building from scratch") } runchunks <- TRUE } saveout = runchunks ``` # Troubleshooting A common issue that comes up when running `spiec.easi` is coming up with an empty network after running StARS. For example: ```{r, echo=TRUE, message=FALSE, warning=TRUE, eval=runchunks} library(SpiecEasi) data(amgut1.filt) pargs <- list(seed=10010) se3 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=5e-1, nlambda=10, pulsar.params=pargs) ``` ```{r, eval=TRUE} getOptInd(se3) sum(getRefit(se3))/2 ``` As the warning indicates, the network stability could not be determined from the lambda path. Looking at the stability along the lambda path, `se$select$stars$summary`, we can see that the maximum value of the StARS summary statistic never crosses the default threshold (0.05). This problem we can fix by lowering `lambda.min.ratio` to explore denser networks: ```{r, message=FALSE, warning=FALSE, eval=runchunks} se4 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=10, pulsar.params=pargs) ``` We have now fit a network, but since we have only a rough, discrete sampling of networks along the lambda path, we should check how far we are from the target stability threshold (0.05): ```{r} getStability(se4) sum(getRefit(se4))/2 ``` To get closer to the mark, we should bump up `nlambda` to more finely sample of the lambda path, which gives a denser network: ```{r, message=FALSE, warning=FALSE, eval=runchunks} se5 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=100, pulsar.params=pargs) ``` ```{r, message=FALSE, warning=FALSE} getStability(se5) sum(getRefit(se5))/2 ``` ## Common issues and solutions ### 1. Empty networks **Problem**: After running `spiec.easi`, you get an empty network (no edges). **Solutions**: - Lower `lambda.min.ratio` to explore denser networks - Increase `nlambda` for finer sampling of the lambda path - Check if your data has sufficient signal-to-noise ratio - Try different methods ('mb' vs 'glasso') ### 2. Very dense networks **Problem**: The inferred network has too many edges. **Solutions**: - Increase `lambda.min.ratio` to explore sparser networks - Adjust the StARS threshold in `pulsar.params` - Use cross-validation instead of StARS ### 3. Computational issues **Problem**: The analysis takes too long or runs out of memory. **Solutions**: - Use parallel processing with `ncores` parameter (Unix-like systems only) - Use B-StARS method for large datasets - Reduce `rep.num` in pulsar.params - Use batch mode for HPC clusters ### 4. Windows parallel processing issues **Problem**: Error "'mc.cores' > 1 is not supported on Windows" **Solutions**: - Use `ncores=1` for serial processing on Windows - Use snow cluster for parallel processing on Windows: ```{r, eval=FALSE} library(parallel) cl <- makeCluster(4, type = "SOCK") pargs.windows <- list(rep.num=50, seed=10010, cluster=cl) se.windows <- spiec.easi(data, method='mb', pulsar.params=pargs.windows) stopCluster(cl) ``` - Use batch mode which works on all platforms - Consider using WSL (Windows Subsystem for Linux) for Unix-like environment ### 5. Convergence issues **Problem**: The algorithm doesn't converge or gives warnings. **Solutions**: - Check data preprocessing and normalization - Ensure data doesn't have constant columns - Try different starting values - Check for missing or infinite values ### 6. Memory issues **Problem**: R runs out of memory during analysis. **Solutions**: - Use sparse matrices where possible - Reduce dataset size by filtering rare taxa - Use batch processing for large datasets - Increase system memory if available ## Platform-specific considerations ### Windows users: - Default parallel processing (`mc.cores > 1`) is not supported - Use `ncores=1` for serial processing - Use snow cluster for parallel processing - Consider batch mode for large datasets ### Unix-like systems (Linux, macOS): - Full support for parallel processing with `mc.cores` - Can use `ncores` parameter directly - Both multicore and snow clusters available ## Diagnostic functions SpiecEasi provides several functions to help diagnose issues: ```{r, eval=FALSE} # Check stability along lambda path getStability(se) # Get optimal lambda index getOptInd(se) # Get summary statistics se$select$stars$summary # Check network density sum(getRefit(se))/2 # Visualize stability curve plot(se$select$stars$summary) # Check platform information .Platform$OS.type ``` ## Parameter tuning guidelines ### For small datasets (< 100 samples, < 50 taxa): - `lambda.min.ratio = 1e-2` - `nlambda = 20-50` - `rep.num = 20-50` ### For medium datasets (100-1000 samples, 50-200 taxa): - `lambda.min.ratio = 1e-3` - `nlambda = 50-100` - `rep.num = 50-100` - Use parallel processing (Unix-like systems only) ### For large datasets (> 1000 samples, > 200 taxa): - `lambda.min.ratio = 1e-4` - `nlambda = 100+` - `rep.num = 100+` - Use B-StARS method - Consider batch processing ### Windows-specific recommendations: - Use `ncores=1` for serial processing - Use snow cluster for parallel processing - Consider batch mode for large datasets - Use B-StARS method to reduce computational time Session info: ```{r} sessionInfo() ``` ```{r, echo = FALSE, eval=TRUE, message=FALSE} # Save objects if they exist if (exists("se") && exists("pargs")) { cache_file <- '../vignette_cache/troubleshooting.RData' tryCatch({ # Load trimming function and trim objects to reduce size source('../vignette_cache/trim_spiec_easi.R') se.trimmed <- trim_spiec_easi(se) se4.trimmed <- trim_spiec_easi(se4) se5.trimmed <- trim_spiec_easi(se5) # Save trimmed objects save(se.trimmed, se4.trimmed, se5.trimmed, pargs, file=cache_file) message("Saved trimmed cache file: ", cache_file, " in directory: ", getwd()) }, error = function(e) { message("Failed to save cache file: ", e$message) }) } ```