{rdocdump}
is an R package designed to combine an R package’s documentation and vignettes into a single plain text file. This is particularly useful when you want to ingest complete package documentation into large language models (LLMs) or for archival purposes. {rdocdump}
works with installed packages, source directories, tar.gz archives, or package sources available on CRAN.
Install the latest stable release of rdocdump
from CRAN with:
install.packages("rdocdump")
You can install the development version of rdocdump
from R Universe with:
install.packages('rdocdump',
repos = c('https://e-kotov.r-universe.dev', 'https://cloud.r-project.org')
)
or from GitHub with:
# install.packages("pak")
::pak("e-kotov/rdocdump") pak
By default, {rdocdump}
stores temporary files in a directory within R’s temporary directory. You can override this by setting a custom cache path using the helper function rdd_set_cache_path()
. For example:
# Set a custom cache directory
<- file.path(tempdir(), "my_rdocdump_cache")
cache_dir rdd_set_cache_path(cache_dir)
rdocdump.cache_path set to: /private/var/folders/gb/t5zr5rn15sldqybrmqbyh6y80000gn/T/Rtmpp5wRxV/my_rdocdump_cache
This ensures that temporary tar.gz archives and extracted files are stored in your specified directory.
The main function in rdocdump is rdd_to_txt()
, which accepts several types of inputs for the pkg
argument:
An installed package name (e.g., "stats"
).
A full path to a package source directory.
A full path to a package archive (tar.gz).
A package name available on CRAN (downloaded automatically if not installed).
Here is an example that downloads the source for the package {rJavaEnv}
from CRAN, extracts its documentation, and saves it to a file:
# Extract documentation for 'rJavaEnv' and save to a text file.
rdd_to_txt(
pkg = "rJavaEnv",
file = tempfile("rJavaEnv_docs_", fileext = ".txt"),
force_fetch = TRUE, # Force download even if the package is installed.
keep_files = "none" # Delete temporary files after extraction.
)
Fetching package source from CRAN...
trying URL 'https://cloud.r-project.org/src/contrib/rJavaEnv_0.3.0.tar.gz'
Content type 'application/x-gzip' length 104016 bytes (101 KB)
==================================================
downloaded 101 KB
[1] "/private/var/folders/gb/t5zr5rn15sldqybrmqbyh6y80000gn/T/Rtmpp5wRxV/rJavaEnv_docs_c0736421bbc2.txt"
If you prefer to simply get the combined documentation as a character string, call the function without the file
argument:
# Extract and capture the combined documentation in a variable.
<- rdd_to_txt(pkg = "splines")
docs cat(substr(docs, 1, 1000)) # Print the first 1000 characters for a preview.
DESCRIPTION:
Package: splines
Version: 4.5.1
Priority: base
Imports: graphics, stats
Title: Regression Spline Functions and Classes
Author: Douglas M. Bates <bates@stat.wisc.edu> and
William N. Venables <Bill.Venables@csiro.au>
Maintainer: R Core Team <do-use-Contact-address@r-project.org>
Contact: R-help mailing list <r-help@r-project.org>
Description: Regression spline functions and classes.
License: Part of R 4.5.1
Suggests: Matrix, methods
NeedsCompilation: yes
Built: R 4.5.1; aarch64-apple-darwin20; 2025-06-14 01:29:30 UTC; unix
--------------------------------------------------------------------------------
Function: asVector()
Coerce an Object to a Vector
Description:
This is a generic function. Methods for this function coerce
objects of given classes to vectors.
Usage:
asVector(object)
Arguments:
object: An object.
Details:
Methods for vector coercion in new classes must be created for the
‘asVector’ generic instead of ‘as.vector’.
You can also choose if you want just the package documentaiton, just the vignettes, or both to be combined with the content
argument:
<- rdd_to_txt(
docs pkg = "utils",
content = "vignettes"
)cat(substr(docs, 1, 1000)) # Print the first 1000 characters for a preview.
As you can see below, only the vignettes were combined:
--------------------------------------------------------------------------------
Vignette: Sweave.Rnw
% File src/library/utils/vignettes/Sweave.Rnw
% Part of the R package, https://www.R-project.org
% Copyright 2002-2022 Friedrich Leisch and the R Core Team
% Distributed under GPL 2 or later
\documentclass[a4paper]{article}
%\VignetteIndexEntry{Sweave User Manual}
%\VignettePackage{utils}
%\VignetteDepends{tools, datasets, stats, graphics}
\title{Sweave User Manual}
\author{Friedrich Leisch and R Core Team}
\usepackage[round]{natbib}
\usepackage{graphicx, Rd}
\usepackage{listings}
\lstset{frame=trbl,basicstyle=\small\tt}
\usepackage{hyperref}
\usepackage{color}
\definecolor{Blue}{rgb}{0,0,0.8}
\hypersetup{%
colorlinks,%
plainpages=true,%
linkcolor=black,%
citecolor=black,%
urlcolor=Blue,%
%pdfstartview=FitH,% or Fit
pdfstartview={XYZ null null 1},%
pdfview={XYZ null null null},%
pdfpagemode=UseNone,% for no outline
pdfauthor={Friedrich Leisch and R Core Team},%
pdftitle={Sweave
The argument keep_files
controls whether temporary files (the downloaded archive and/or extracted directory) are retained:
"none"
(default): Delete both the tar.gz archive and the extracted files.
"tgz"
: Keep only the tar.gz archive.
"extracted"
: Keep only the extracted files.
"both"
: Keep both the tar.gz archive and the extracted files.
Choose the option that best fits your workflow.