--- title: "Quick Start: dump R docs and vignettes to text files for LLMs" vignette: > %\VignetteIndexEntry{Quick Start: dump R docs and vignettes to text files for LLMs} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} execute: eval: false format: html: toc: true toc-depth: 2 code-overflow: wrap --- # Introduction `{rdocdump}` is an R package designed to combine an R package’s documentation and vignettes into a single plain text file. This is particularly useful when you want to ingest complete package documentation into large language models (LLMs) or for archival purposes. `{rdocdump}` works with installed packages, source directories, tar.gz archives, or package sources available on CRAN. # Installation Install the latest stable release of `rdocdump` from CRAN with: ```r install.packages("rdocdump") ``` You can install the development version of `rdocdump` from R Universe with: ```r install.packages('rdocdump', repos = c('https://e-kotov.r-universe.dev', 'https://cloud.r-project.org') ) ``` or from GitHub with: ```r # install.packages("pak") pak::pak("e-kotov/rdocdump") ``` # Setting the Cache Path By default, `{rdocdump}` stores temporary files in a directory within R’s temporary directory. You can override this by setting a custom cache path using the helper function `rdd_set_cache_path()`. For example: ```r # Set a custom cache directory cache_dir <- file.path(tempdir(), "my_rdocdump_cache") rdd_set_cache_path(cache_dir) ``` ``` rdocdump.cache_path set to: /private/var/folders/gb/t5zr5rn15sldqybrmqbyh6y80000gn/T/Rtmpp5wRxV/my_rdocdump_cache ``` This ensures that temporary tar.gz archives and extracted files are stored in your specified directory. # Extracting Documentation The main function in rdocdump is `rdd_to_txt()`, which accepts several types of inputs for the `pkg` argument: - An installed package name (e.g., `"stats"`). - A full path to a package source directory. - A full path to a package archive (tar.gz). - A package name available on CRAN (downloaded automatically if not installed). Here is an example that downloads the source for the package `{rJavaEnv}` from CRAN, extracts its documentation, and saves it to a file: ```r # Extract documentation for 'rJavaEnv' and save to a text file. rdd_to_txt( pkg = "rJavaEnv", file = tempfile("rJavaEnv_docs_", fileext = ".txt"), force_fetch = TRUE, # Force download even if the package is installed. keep_files = "none" # Delete temporary files after extraction. ) ``` ``` Fetching package source from CRAN... trying URL 'https://cloud.r-project.org/src/contrib/rJavaEnv_0.3.0.tar.gz' Content type 'application/x-gzip' length 104016 bytes (101 KB) ================================================== downloaded 101 KB [1] "/private/var/folders/gb/t5zr5rn15sldqybrmqbyh6y80000gn/T/Rtmpp5wRxV/rJavaEnv_docs_c0736421bbc2.txt" ``` If you prefer to simply get the combined documentation as a character string, call the function without the `file` argument: ```r # Extract and capture the combined documentation in a variable. docs <- rdd_to_txt(pkg = "splines") cat(substr(docs, 1, 1000)) # Print the first 1000 characters for a preview. ``` ``` DESCRIPTION: Package: splines Version: 4.5.1 Priority: base Imports: graphics, stats Title: Regression Spline Functions and Classes Author: Douglas M. Bates and William N. Venables Maintainer: R Core Team Contact: R-help mailing list Description: Regression spline functions and classes. License: Part of R 4.5.1 Suggests: Matrix, methods NeedsCompilation: yes Built: R 4.5.1; aarch64-apple-darwin20; 2025-06-14 01:29:30 UTC; unix -------------------------------------------------------------------------------- Function: asVector() Coerce an Object to a Vector Description: This is a generic function. Methods for this function coerce objects of given classes to vectors. Usage: asVector(object) Arguments: object: An object. Details: Methods for vector coercion in new classes must be created for the ‘asVector’ generic instead of ‘as.vector’. ``` # Choosing what to dump to text You can also choose if you want just the package documentaiton, just the vignettes, or both to be combined with the `content` argument: ```r docs <- rdd_to_txt( pkg = "utils", content = "vignettes" ) cat(substr(docs, 1, 1000)) # Print the first 1000 characters for a preview. ``` As you can see below, only the vignettes were combined: ``` -------------------------------------------------------------------------------- Vignette: Sweave.Rnw % File src/library/utils/vignettes/Sweave.Rnw % Part of the R package, https://www.R-project.org % Copyright 2002-2022 Friedrich Leisch and the R Core Team % Distributed under GPL 2 or later \documentclass[a4paper]{article} %\VignetteIndexEntry{Sweave User Manual} %\VignettePackage{utils} %\VignetteDepends{tools, datasets, stats, graphics} \title{Sweave User Manual} \author{Friedrich Leisch and R Core Team} \usepackage[round]{natbib} \usepackage{graphicx, Rd} \usepackage{listings} \lstset{frame=trbl,basicstyle=\small\tt} \usepackage{hyperref} \usepackage{color} \definecolor{Blue}{rgb}{0,0,0.8} \hypersetup{% colorlinks,% plainpages=true,% linkcolor=black,% citecolor=black,% urlcolor=Blue,% %pdfstartview=FitH,% or Fit pdfstartview={XYZ null null 1},% pdfview={XYZ null null null},% pdfpagemode=UseNone,% for no outline pdfauthor={Friedrich Leisch and R Core Team},% pdftitle={Sweave ``` # Handling Temporary Files The argument `keep_files` controls whether temporary files (the downloaded archive and/or extracted directory) are retained: - `"none"` (default): Delete both the tar.gz archive and the extracted files. - `"tgz"`: Keep only the tar.gz archive. - `"extracted"`: Keep only the extracted files. - `"both"`: Keep both the tar.gz archive and the extracted files. Choose the option that best fits your workflow.