Contents

1 FLAMES

The overhauled FLAMES 2.3.1 pipeline provides convenient pipelines for performing single-cell and bulk long-read analysis of mutations and isoforms. The pipeline is designed to take various type of experiments, e.g. with or without known cell barcodes and custome cell barcode designs, to reduce the need of constantly re-inventing the wheel for every new sequencing protocol.

Figure 1: FLAMES pipeline


1.1 Creating a pipeline

To start your long-read RNA-seq analysis, simply create a pipeline via either BulkPipeline(), SingleCellPipeline() or MultiSampleSCPipeline(). Let’s try SingleCellPipeline() first:

outdir <- tempfile()
dir.create(outdir)
# some example data
# known cell barcodes, e.g. from coupled short-read sequencing
bc_allow <- file.path(outdir, "bc_allow.tsv")
R.utils::gunzip(
  filename = system.file("extdata", "bc_allow.tsv.gz", package = "FLAMES"),
  destname = bc_allow, remove = FALSE
)
# reference genome
genome_fa <- file.path(outdir, "rps24.fa")
R.utils::gunzip(
  filename = system.file("extdata", "rps24.fa.gz", package = "FLAMES"),
  destname = genome_fa, remove = FALSE
)

pipeline <- SingleCellPipeline(
  # use the default configs
  config_file = create_config(outdir),
  outdir = outdir,
  # the input fastq file
  fastq = system.file("extdata", "fastq", "musc_rps24.fastq.gz", package = "FLAMES"),
  # reference annotation file
  annotation = system.file("extdata", "rps24.gtf.gz", package = "FLAMES"),
  genome_fa = genome_fa,
  barcodes_file = bc_allow
)
#> Writing configuration parameters to:  /tmp/RtmpXPeAYp/file16a7494daed072/config_file_1484617.json
#> Configured steps: 
#>  barcode_demultiplex: TRUE
#>  genome_alignment: TRUE
#>  gene_quantification: TRUE
#>  isoform_identification: TRUE
#>  read_realignment: TRUE
#>  transcript_quantification: TRUE
#> samtools not found, will use Rsamtools package instead
pipeline
#> → A FLAMES.SingleCellPipeline outputting to '/tmp/RtmpXPeAYp/file16a7494daed072'
#> 
#> ── Inputs
#> ✔ fastq: '...445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz'
#> ✔ annotation: '...Rinst16369d65445136/FLAMES/extdata/rps24.gtf.gz'
#> ✔ genome_fa: '/tmp/RtmpXPeAYp/file16a7494daed072/rps24.fa'
#> ✔ barcodes_file: '/tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv'
#> 
#> ── Outputs
#> ℹ demultiplexed_fastq: 'matched_reads.fastq.gz'
#> ℹ deduped_fastq: 'matched_reads_dedup.fastq.gz'
#> ℹ genome_bam: 'align2genome.bam'
#> ℹ transcriptome_assembly: 'transcript_assembly.fa'
#> ℹ transcriptome_bam: 'realign2transcript.bam'
#> 
#> ── Pipeline Steps
#> ℹ barcode_demultiplex (pending)
#> ℹ genome_alignment (pending)
#> ℹ gene_quantification (pending)
#> ℹ isoform_identification (pending)
#> ℹ read_realignment (pending)
#> ℹ transcript_quantification (pending)

1.2 Running the pipeline

To execute the pipeline, simply call run_FLAMES(pipeline). This will run through all the steps in the pipeline, returning a updated pipeline object:

pipeline <- run_FLAMES(pipeline)
#> ── Running step: barcode_demultiplex @ Wed Jun 18 17:34:39 2025 ────────────────
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 8
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmppR5wzP/Rinst16369d65445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz
#> Searching for barcodes...
#> Number of reads processed: 393
#> Number of reads where at least one barcode was found: 368
#> Number of reads with exactly one barcode match: 364
#> Number of chimera reads: 1
#> All done!
#> Using Python: /home/biocbuild/.pyenv/versions/3.11.9/bin/python3.11
#> Creating virtual environment '/home/biocbuild/.cache/R/basilisk/1.21.5/FLAMES/2.3.3/flames_env' ...
#> + /home/biocbuild/.pyenv/versions/3.11.9/bin/python3.11 -m venv /home/biocbuild/.cache/R/basilisk/1.21.5/FLAMES/2.3.3/flames_env
#> Done!
#> Installing packages: pip, wheel, setuptools
#> + /home/biocbuild/.cache/R/basilisk/1.21.5/FLAMES/2.3.3/flames_env/bin/python -m pip install --upgrade pip wheel setuptools
#> Installing packages: 'numpy==2.1.1', 'scipy==1.14.1', 'pysam==0.22.1', 'cutadapt==4.9', 'tqdm==4.66.5', 'pandas==2.2.3', 'fast-edit-distance==1.2.2', 'blaze2==2.5.1', 'matplotlib==3.9.1'
#> + /home/biocbuild/.cache/R/basilisk/1.21.5/FLAMES/2.3.3/flames_env/bin/python -m pip install --upgrade --no-user 'numpy==2.1.1' 'scipy==1.14.1' 'pysam==0.22.1' 'cutadapt==4.9' 'tqdm==4.66.5' 'pandas==2.2.3' 'fast-edit-distance==1.2.2' 'blaze2==2.5.1' 'matplotlib==3.9.1'
#> Virtual environment '/home/biocbuild/.cache/R/basilisk/1.21.5/FLAMES/2.3.3/flames_env' successfully created.
#> ── Running step: genome_alignment @ Wed Jun 18 17:35:08 2025 ───────────────────
#> Creating junction bed file from GFF3 annotation.
#> Aligning sample /tmp/RtmpXPeAYp/file16a7494daed072/matched_reads.fastq.gz -> /tmp/RtmpXPeAYp/file16a7494daed072/align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning in FLAMES:::minimap2_align(fq_in = fastqs[i], fa_file = genome, :
#> samtools not found, using Rsamtools instead, this could be slower and might
#> fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> ── Running step: gene_quantification @ Wed Jun 18 17:35:09 2025 ────────────────
#> 17:35:09 Wed Jun 18 2025 quantify genes
#> Using BAM(s): '/tmp/RtmpXPeAYp/file16a7494daed072/align2genome.bam'
#> Assigning reads to genes...
#> Writing the gene count matrix ...
#> Plotting the saturation curve ...
#> Generating deduplicated fastq file ...
#> ── Running step: isoform_identification @ Wed Jun 18 17:35:10 2025 ─────────────
#> #### Read gene annotations
#>  Removed similar transcripts in gene annotation: Counter()
#> #### find isoforms
#> chr14
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
#> ── Running step: read_realignment @ Wed Jun 18 17:35:12 2025 ───────────────────
#> Checking for fastq file(s) /tmp/RtmppR5wzP/Rinst16369d65445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz
#>  files found
#> Checking for fastq file(s) /tmp/RtmpXPeAYp/file16a7494daed072/matched_reads.fastq.gz
#>  files found
#> Checking for fastq file(s) /tmp/RtmpXPeAYp/file16a7494daed072/matched_reads_dedup.fastq.gz
#>  files found
#> Realigning sample /tmp/RtmpXPeAYp/file16a7494daed072/matched_reads_dedup.fastq.gz -> /tmp/RtmpXPeAYp/file16a7494daed072/realign2transcript.bam
#> Warning in minimap2_align(fq_in = fastqs[i], fa_file =
#> pipeline@transcriptome_assembly, : samtools not found, using Rsamtools instead,
#> this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> 
#> ── Running step: transcript_quantification @ Wed Jun 18 17:35:12 2025 ──────────
#> oarfish not found. Downloading it now.
#> oarfish binary downloaded to /home/biocbuild/.cache/R/FLAMES/2.3.3/oarfish
#> Import genomic features from the file as a GRanges object ... OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ... OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
pipeline
#> ✔ A FLAMES.SingleCellPipeline outputting to '/tmp/RtmpXPeAYp/file16a7494daed072'
#> 
#> ── Inputs 
#> ✔ fastq: '...445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz'
#> ✔ annotation: '...Rinst16369d65445136/FLAMES/extdata/rps24.gtf.gz'
#> ✔ genome_fa: '/tmp/RtmpXPeAYp/file16a7494daed072/rps24.fa'
#> ✔ barcodes_file: '/tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv'
#> 
#> ── Outputs 
#> ✔ demultiplexed_fastq: 'matched_reads.fastq.gz' [217.0 KB]
#> ✔ deduped_fastq: 'matched_reads_dedup.fastq.gz' [203.9 KB]
#> ✔ genome_bam: 'align2genome.bam' [267.5 KB]
#> ✔ novel_isoform_annotation: 'isoform_annotated.gff3' [7.4 KB]
#> ✔ transcriptome_assembly: 'transcript_assembly.fa' [8.4 KB]
#> ✔ transcriptome_bam: 'realign2transcript.bam' [392.3 KB]
#> 
#> ── Pipeline Steps 
#> ✔ barcode_demultiplex (completed in 28.88 sec)
#> ✔ genome_alignment (completed in 0.42 sec)
#> ✔ gene_quantification (completed in 1.55 sec)
#> ✔ isoform_identification (completed in 1.57 sec)
#> ✔ read_realignment (completed in 0.23 sec)
#> ✔ transcript_quantification (completed in 3.89 sec)

If you run into any error, run_FLAMES() will stop and return the pipeline object with the error message. After resolving the error, you can run resume_FLAMES(pipeline) to continue the pipeline from the last step. There is also run_step(pipeline, step_name) to run a specific step in the pipeline. Let’s show this by deliberately causing an error via deleting the input files:

# set up a new pipeline
outdir2 <- tempfile()
pipeline2 <- SingleCellPipeline(
  config_file = create_config(outdir),
  outdir = outdir2,
  fastq = system.file("extdata", "fastq", "musc_rps24.fastq.gz", package = "FLAMES"),
  annotation = system.file("extdata", "rps24.gtf.gz", package = "FLAMES"),
  genome_fa = genome_fa,
  barcodes_file = bc_allow
)
#> Output directory does not exists: one is being created
#> [1] "/tmp/RtmpXPeAYp/file16a7493d24057b"
#> Writing configuration parameters to:  /tmp/RtmpXPeAYp/file16a7494daed072/config_file_1484617.json
#> Configured steps: 
#>  barcode_demultiplex: TRUE
#>  genome_alignment: TRUE
#>  gene_quantification: TRUE
#>  isoform_identification: TRUE
#>  read_realignment: TRUE
#>  transcript_quantification: TRUE
#> samtools not found, will use Rsamtools package instead

# delete the reference genome
unlink(genome_fa)
pipeline2 <- run_FLAMES(pipeline2)
#> ── Running step: barcode_demultiplex @ Wed Jun 18 17:35:16 2025 ────────────────
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 8
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmppR5wzP/Rinst16369d65445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz
#> Searching for barcodes...
#> Number of reads processed: 393
#> Number of reads where at least one barcode was found: 368
#> Number of reads with exactly one barcode match: 364
#> Number of chimera reads: 1
#> All done!
#> ── Running step: genome_alignment @ Wed Jun 18 17:35:16 2025 ───────────────────
#> Creating junction bed file from GFF3 annotation.
#> Aligning sample /tmp/RtmpXPeAYp/file16a7493d24057b/matched_reads.fastq.gz -> /tmp/RtmpXPeAYp/file16a7493d24057b/align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning in FLAMES:::minimap2_align(fq_in = fastqs[i], fa_file = genome, :
#> samtools not found, using Rsamtools instead, this could be slower and might
#> fail for large BAM files.
#> Warning in value[[3L]](cond): Error in step genome_alignment: error running minimap2:
#> 1, pipeline stopped.
pipeline2
#> ! A FLAMES.SingleCellPipeline outputting to '/tmp/RtmpXPeAYp/file16a7493d24057b'
#> 
#> ── Inputs
#> ✔ fastq: '...445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz'
#> ✔ annotation: '...Rinst16369d65445136/FLAMES/extdata/rps24.gtf.gz'
#> ! genome_fa: '/tmp/RtmpXPeAYp/file16a7494daed072/rps24.fa' [missing]
#> ✔ barcodes_file: '/tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv'
#> 
#> ── Outputs
#> ✔ demultiplexed_fastq: 'matched_reads.fastq.gz' [217.0 KB]
#> ℹ deduped_fastq: 'matched_reads_dedup.fastq.gz'
#> ℹ genome_bam: 'align2genome.bam'
#> ℹ transcriptome_assembly: 'transcript_assembly.fa'
#> ℹ transcriptome_bam: 'realign2transcript.bam'
#> 
#> ── Pipeline Steps
#> ✔ barcode_demultiplex (completed in 0.26 sec)
#> ✖ genome_alignment (failed: Error in FLAMES:::minimap2_align(fq_in = fastqs[i], fa_file = genome, : error running minimap2:
#> 1
#> )
#> ℹ gene_quantification (pending)
#> ℹ isoform_identification (pending)
#> ℹ read_realignment (pending)
#> ℹ transcript_quantification (pending)

Let’s then fix the error by re-creating the reference genome file and resume the pipeline:

R.utils::gunzip(
  filename = system.file("extdata", "rps24.fa.gz", package = "FLAMES"),
  destname = genome_fa, remove = FALSE
)
pipeline2 <- resume_FLAMES(pipeline2)
#> Resuming pipeline from step: genome_alignment
#> ── Running step: genome_alignment @ Wed Jun 18 17:35:17 2025 ───────────────────
#> Aligning sample /tmp/RtmpXPeAYp/file16a7493d24057b/matched_reads.fastq.gz -> /tmp/RtmpXPeAYp/file16a7493d24057b/align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning in FLAMES:::minimap2_align(fq_in = fastqs[i], fa_file = genome, :
#> samtools not found, using Rsamtools instead, this could be slower and might
#> fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> ── Running step: gene_quantification @ Wed Jun 18 17:35:17 2025 ────────────────
#> 17:35:17 Wed Jun 18 2025 quantify genes
#> Using BAM(s): '/tmp/RtmpXPeAYp/file16a7493d24057b/align2genome.bam'
#> Assigning reads to genes...
#> Writing the gene count matrix ...
#> Plotting the saturation curve ...
#> Generating deduplicated fastq file ...
#> ── Running step: isoform_identification @ Wed Jun 18 17:35:18 2025 ─────────────
#> #### Read gene annotations
#>  Removed similar transcripts in gene annotation: Counter()
#> #### find isoforms
#> chr14
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
#> ── Running step: read_realignment @ Wed Jun 18 17:35:19 2025 ───────────────────
#> Checking for fastq file(s) /tmp/RtmppR5wzP/Rinst16369d65445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz
#>  files found
#> Checking for fastq file(s) /tmp/RtmpXPeAYp/file16a7493d24057b/matched_reads.fastq.gz
#>  files found
#> Checking for fastq file(s) /tmp/RtmpXPeAYp/file16a7493d24057b/matched_reads_dedup.fastq.gz
#>  files found
#> Realigning sample /tmp/RtmpXPeAYp/file16a7493d24057b/matched_reads_dedup.fastq.gz -> /tmp/RtmpXPeAYp/file16a7493d24057b/realign2transcript.bam
#> Warning in minimap2_align(fq_in = fastqs[i], fa_file =
#> pipeline@transcriptome_assembly, : samtools not found, using Rsamtools instead,
#> this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> 
#> ── Running step: transcript_quantification @ Wed Jun 18 17:35:19 2025 ──────────
#> Import genomic features from the file as a GRanges object ... OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ... OK
#> Prepare the 'metadata' data frame ... OK
#> Make the TxDb object ...
#> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
#> is not available for this TxDb object
#> OK
pipeline2
#> ✔ A FLAMES.SingleCellPipeline outputting to '/tmp/RtmpXPeAYp/file16a7493d24057b'
#> 
#> ── Inputs 
#> ✔ fastq: '...445136/FLAMES/extdata/fastq/musc_rps24.fastq.gz'
#> ✔ annotation: '...Rinst16369d65445136/FLAMES/extdata/rps24.gtf.gz'
#> ✔ genome_fa: '/tmp/RtmpXPeAYp/file16a7494daed072/rps24.fa'
#> ✔ barcodes_file: '/tmp/RtmpXPeAYp/file16a7494daed072/bc_allow.tsv'
#> 
#> ── Outputs 
#> ✔ demultiplexed_fastq: 'matched_reads.fastq.gz' [217.0 KB]
#> ✔ deduped_fastq: 'matched_reads_dedup.fastq.gz' [203.9 KB]
#> ✔ genome_bam: 'align2genome.bam' [267.5 KB]
#> ✔ novel_isoform_annotation: 'isoform_annotated.gff3' [7.4 KB]
#> ✔ transcriptome_assembly: 'transcript_assembly.fa' [8.4 KB]
#> ✔ transcriptome_bam: 'realign2transcript.bam' [392.3 KB]
#> 
#> ── Pipeline Steps 
#> ✔ barcode_demultiplex (completed in 0.26 sec)
#> ✔ genome_alignment (completed in 0.17 sec)
#> ✔ gene_quantification (completed in 0.76 sec)
#> ✔ isoform_identification (completed in 0.87 sec)
#> ✔ read_realignment (completed in 0.21 sec)
#> ✔ transcript_quantification (completed in 1.55 sec)

After completing the pipeline, a SingleCellExperiment object is created (or SummarizedExperiment for bulk pipeline and list of SingleCellExperiment for multi-sample pipeline). You can access the results via experiment(pipeline):

experiment(pipeline)
#> class: SingleCellExperiment 
#> dim: 10 137 
#> metadata(0):
#> assays(1): counts
#> rownames(10): ENSMUST00000169826.2 ENSMUSG00000025290.17_19_5159_1 ...
#>   ENSMUSG00000025290.17_19_5159_8 ENSMUST00000225023.1
#> rowData names(2): transcript_id gene_id
#> colnames(137): AACTCTTGTCACCTAA AACCATGAGTCGTTTG ... TTGTAGGTCAGTGTTG
#>   TTTATGCAGACTAGAT
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

1.2.1 HPC support

Individual steps can be submitted as HPC jobs via crew and crew.cluster packages, simply supply a list of crew controllers (named by step name) to the controllers argument of the pipeline constructors. For example, we could run the alignment steps through controllers, while keeping the rest in the main R session.

# example_pipeline provides an example pipeline for each of the three types
# of pipelines: BulkPipeline, SingleCellPipeline and MultiSampleSCPipeline
mspipeline <- example_pipeline("MultiSampleSCPipeline")
#> Writing configuration parameters to:  /tmp/RtmpXPeAYp/file16a7491954a0ff/config_file_1484617.json
#> Configured steps: 
#>  barcode_demultiplex: TRUE
#>  genome_alignment: TRUE
#>  gene_quantification: TRUE
#>  isoform_identification: TRUE
#>  read_realignment: TRUE
#>  transcript_quantification: TRUE
#> samtools not found, will use Rsamtools package instead
# Providing a single controller will run all steps in it:
controllers(mspipeline) <- crew::crew_controller_local()
# Setting controllers to an empty list will run all steps in the main R session:
controllers(mspipeline) <- list()
# Alternatively, we can run only the alignment steps in the crew controller:
controllers(mspipeline)[["genome_alignment"]] <- crew::crew_controller_local(workers = 4)
# Or `controllers(mspipeline) <- list(genome_alignment = crew::crew_cluster())`
# to remove controllers for all other steps.
# Replace `crew_controller_local()` with `crew.cluster::crew_controller_slurm()` or other
# crew controllers according to your HPC job scheduler.

run_FLAMES() will then submit the alignment step to the crew controller. By default, run_step() will ignore the crew controllers and run the step in the main R session, since it is easier to debug. You can use run_step(pipeline, step_name, disable_controller = FALSE) to prevent this behavior and run the step in crew controllers if available.

You can tailor the resources for each step by specifying different arguments to the controllers. The alignment step typically benifits from more cores (e.g. 64 cores and 20GB memory), whereas other steps might not need as much cores but more memory hungry.

# An example helper function to create a Slurm controller with specific resources
create_slurm_controller <- function(
    cpus, memory_gb, workers = 10, seconds_idle = 10,
    script_lines = "module load R/flexiblas/4.5.0") {
  name <- sprintf("slurm_%dc%dg", cpus, memory_gb)
  crew.cluster::crew_controller_slurm(
    name = name,
    workers = workers,
    seconds_idle = seconds_idle,
    retry_tasks = FALSE,
    options_cluster = crew.cluster::crew_options_slurm(
      script_lines = script_lines,
      memory_gigabytes_required = memory_gb,
      cpus_per_task = cpus,
      log_output = file.path("logs", "crew_log_%A.txt"),
      log_error = file.path("logs", "crew_log_%A.txt")
    )
  )
}
controllers(mspipeline)[["genome_alignment"]] <-
  create_slurm_controller(cpus = 64, memory_gb = 20)

See also the crew.cluster website for more information on the supported job schedulers.

1.3 Visualizations

1.3.1 QC plots

Quality metrics are collected throughout the pipeline, and FLAMES provide visiualization functions to plot the metrics. For the first demultiplexing step, we can use the plot_demultiplex function to see how well many reads are retained after demultiplexing:

# don't have to run the entire pipeline for this
# let's just run the demultiplexing step
mspipeline <- run_step(mspipeline, "barcode_demultiplex")
#> ── Running step: barcode_demultiplex @ Wed Jun 18 17:35:21 2025 ────────────────
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 1
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7491954a0ff/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample1.fq.gz
#> Searching for barcodes...
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample2.fq.gz
#> Searching for barcodes...
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample3.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 393
#> Number of reads where at least one barcode was found: 368
#> Number of reads with exactly one barcode match: 364
#> Number of chimera reads: 1
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 1
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7491954a0ff/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample1.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 100
#> Number of reads where at least one barcode was found: 92
#> Number of reads with exactly one barcode match: 91
#> Number of chimera reads: 1
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 1
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7491954a0ff/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample2.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 100
#> Number of reads where at least one barcode was found: 95
#> Number of reads with exactly one barcode match: 94
#> Number of chimera reads: 0
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be  replaced
#> Setting number of threads to 1
#> Search pattern: 
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpXPeAYp/file16a7491954a0ff/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpXPeAYp/file16a7491954a0ff/fastq/sample3.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 193
#> Number of reads where at least one barcode was found: 181
#> Number of reads with exactly one barcode match: 179
#> Number of chimera reads: 0
#> All done!
plot_demultiplex(mspipeline)
#> $reads_count_plot

#> 
#> $knee_plot
#> `geom_smooth()` using formula = 'y ~ x'

#> 
#> $flank_editdistance_plot

#> 
#> $barcode_editdistance_plot

#> 
#> $cutadapt_plot

1.3.2 Work in progress

More examples coming soon.

1.4 FLAMES on Windows

Due to FLAMES requiring minimap2 and pysam, FLAMES is currently unavaliable on Windows.

1.5 Citation

Please cite the flames’s paper (Tian et al. 2020) if you use flames in your research. As FLAMES used incorporates BLAZE (You et al. 2023), flexiplex (Davidson et al. 2023) and minimap2 (Li 2018), samtools, bambu (Chen et al. 2023). Please make sure to cite when using these tools.

2 Session Info

#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] FLAMES_2.3.3     BiocStyle_2.37.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.5.0               later_1.4.2                
#>   [3] BiocIO_1.19.0               bitops_1.0-9               
#>   [5] filelock_1.0.3              tibble_3.3.0               
#>   [7] R.oo_1.27.1                 polyclip_1.10-7            
#>   [9] bambu_3.11.1                XML_3.99-0.18              
#>  [11] httr2_1.1.2                 lifecycle_1.0.4            
#>  [13] pwalign_1.5.0               edgeR_4.7.2                
#>  [15] doParallel_1.0.17           processx_3.8.6             
#>  [17] vroom_1.6.5                 MASS_7.3-65                
#>  [19] globals_0.18.0              lattice_0.22-7             
#>  [21] magrittr_2.0.3              limma_3.65.1               
#>  [23] sass_0.4.10                 rmarkdown_2.29             
#>  [25] jquerylib_0.1.4             yaml_2.3.10                
#>  [27] metapod_1.17.0              reticulate_1.42.0          
#>  [29] cowplot_1.1.3               DBI_1.2.3                  
#>  [31] RColorBrewer_1.1-3          abind_1.4-8                
#>  [33] ShortRead_1.67.0            GenomicRanges_1.61.0       
#>  [35] purrr_1.0.4                 R.utils_2.13.0             
#>  [37] BiocGenerics_0.55.0         RCurl_1.98-1.17            
#>  [39] yulab.utils_0.2.0           rappdirs_0.3.3             
#>  [41] tweenr_2.0.3                circlize_0.4.16            
#>  [43] IRanges_2.43.0              S4Vectors_0.47.0           
#>  [45] ggrepel_0.9.6               irlba_2.3.5.1              
#>  [47] listenv_0.9.1               dqrng_0.4.1                
#>  [49] parallelly_1.45.0           DelayedMatrixStats_1.31.0  
#>  [51] codetools_0.2-20            DropletUtils_1.29.3        
#>  [53] DelayedArray_0.35.2         xml2_1.3.8                 
#>  [55] scuttle_1.19.0              ggforce_0.5.0              
#>  [57] tidyselect_1.2.1            shape_1.4.6.1              
#>  [59] UCSC.utils_1.5.0            farver_2.1.2               
#>  [61] ScaledMatrix_1.17.0         viridis_0.6.5              
#>  [63] BiocFileCache_2.99.5        matrixStats_1.5.0          
#>  [65] stats4_4.5.0                GenomicAlignments_1.45.0   
#>  [67] jsonlite_2.0.0              GetoptLong_1.0.5           
#>  [69] BiocNeighbors_2.3.1         scater_1.37.0              
#>  [71] iterators_1.0.14            foreach_1.5.2              
#>  [73] progress_1.2.3              tools_4.5.0                
#>  [75] Rcpp_1.0.14                 glue_1.8.0                 
#>  [77] gridExtra_2.3               SparseArray_1.9.0          
#>  [79] mgcv_1.9-3                  xfun_0.52                  
#>  [81] MatrixGenerics_1.21.0       GenomeInfoDb_1.45.4        
#>  [83] dplyr_1.1.4                 HDF5Array_1.37.0           
#>  [85] withr_3.0.2                 BiocManager_1.30.26        
#>  [87] fastmap_1.2.0               basilisk_1.21.5            
#>  [89] bluster_1.19.0              latticeExtra_0.6-30        
#>  [91] rhdf5filters_1.21.0         digest_0.6.37              
#>  [93] rsvd_1.0.5                  R6_2.6.1                   
#>  [95] colorspace_2.1-1            biomaRt_2.65.0             
#>  [97] jpeg_0.1-11                 dichromat_2.0-0.1          
#>  [99] RSQLite_2.4.1               R.methodsS3_1.8.2          
#> [101] h5mread_1.1.1               tidyr_1.3.1                
#> [103] generics_0.1.4              data.table_1.17.6          
#> [105] rtracklayer_1.69.0          prettyunits_1.2.0          
#> [107] httr_1.4.7                  S4Arrays_1.9.1             
#> [109] scatterpie_0.2.4            pkgconfig_2.0.3            
#> [111] gtable_0.3.6                blob_1.2.4                 
#> [113] ComplexHeatmap_2.25.0       hwriter_1.3.2.1            
#> [115] SingleCellExperiment_1.31.0 XVector_0.49.0             
#> [117] htmltools_0.5.8.1           bookdown_0.43              
#> [119] clue_0.3-66                 scales_1.4.0               
#> [121] Biobase_2.69.0              png_0.1-8                  
#> [123] nanonext_1.6.0              SpatialExperiment_1.19.1   
#> [125] scran_1.37.0                ggfun_0.1.8                
#> [127] knitr_1.50                  tzdb_0.5.0                 
#> [129] rjson_0.2.23                nlme_3.1-168               
#> [131] curl_6.3.0                  crew_1.2.1                 
#> [133] cachem_1.1.0                rhdf5_2.53.1               
#> [135] GlobalOptions_0.1.2         stringr_1.5.1              
#> [137] parallel_4.5.0              vipor_0.4.7                
#> [139] AnnotationDbi_1.71.0        restfulr_0.0.15            
#> [141] pillar_1.10.2               grid_4.5.0                 
#> [143] vctrs_0.6.5                 promises_1.3.3             
#> [145] BiocSingular_1.25.0         dbplyr_2.5.0               
#> [147] beachmat_2.25.1             cluster_2.1.8.1            
#> [149] archive_1.1.12              beeswarm_0.4.0             
#> [151] evaluate_1.0.4              tinytex_0.57               
#> [153] readr_2.1.5                 GenomicFeatures_1.61.3     
#> [155] magick_2.8.7                cli_3.6.5                  
#> [157] locfit_1.5-9.12             compiler_4.5.0             
#> [159] Rsamtools_2.25.0            rlang_1.1.6                
#> [161] crayon_1.5.3                labeling_0.4.3             
#> [163] interp_1.1-6                ps_1.9.1                   
#> [165] fs_1.6.6                    ggbeeswarm_0.7.2           
#> [167] stringi_1.8.7               viridisLite_0.4.2          
#> [169] deldir_2.0-4                BiocParallel_1.43.4        
#> [171] txdbmaker_1.5.4             Biostrings_2.77.1          
#> [173] Matrix_1.7-3                dir.expiry_1.17.0          
#> [175] BSgenome_1.77.0             hms_1.1.3                  
#> [177] sparseMatrixStats_1.21.0    bit64_4.6.0-1              
#> [179] future_1.58.0               ggplot2_3.5.2              
#> [181] Rhdf5lib_1.31.0             KEGGREST_1.49.1            
#> [183] statmod_1.5.0               SummarizedExperiment_1.39.0
#> [185] mirai_2.3.0                 igraph_2.1.4               
#> [187] memoise_2.0.1               bslib_0.9.0                
#> [189] bit_4.6.0                   xgboost_1.7.11.1

References

Chen, Ying, Andre Sim, Yuk Kei Wan, Keith Yeo, Joseph Jing Xian Lee, Min Hao Ling, Michael I Love, and Jonathan Göke. 2023. “Context-Aware Transcript Quantification from Long-Read Rna-Seq Data with Bambu.” Nature Methods, 1–9.

Davidson, Nadia M, Noorul Amin, Ling Min Hao, Changqing Wang, Oliver Cheng, Jonathan M Goeke, Matthew E Ritchie, and Shuyi Wu. 2023. “Flexiplex: A Versatile Demultiplexer and Search Tool for Omics Data.” bioRxiv, 2023–08.

Li, Heng. 2018. “Minimap2: pairwise alignment for nucleotide sequences.” Bioinformatics 34 (18): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.

Tian, Luyi, Jafar S. Jabbari, Rachel Thijssen, Quentin Gouil, Shanika L. Amarasinghe, Hasaru Kariyawasam, Shian Su, et al. 2020. “Comprehensive Characterization of Single Cell Full-Length Isoforms in Human and Mouse with Long-Read Sequencing.” bioRxiv. https://doi.org/10.1101/2020.08.10.243543.

You, Yupei, Yair DJ Prawer, De Paoli-Iseppi, Cameron PJ Hunt, Clare L Parish, Heejung Shim, Michael B Clark, and others. 2023. “Identification of Cell Barcodes from Long-Read Single-Cell Rna-Seq with Blaze.” Genome Biology 24 (1): 1–23.