Changes in version 2.2.0 ------------------------ NEW FEATURES o Added chunks parameter to Coverage.R and calculateBamCoverageByInterval to reduce memory usage (#218) SIGNIFICANT USER-VISIBLE CHANGES o When base quality scores are found in the VCF, they are now used to calculate the minimum number of supporting reads (instead of assuming a default BQ of 30). By default BQ is capped at 50 and variants below 25 are ignored. Set min.supporting.reads to 0 to turn this off (#206). o More robust annotation of intervals with gene symbols o Remove chromosomes not present in the centromeres GRanges object; useful to remove altcontigs somehow present (should not happen with intervals generated by IntervalFile.R) BUGFIXES o Fixed an issue with old R versions where factors were not converted to strings, resulting in numbers instead of gene symbols o Fix for a crash when there are no off-target reads in off-target regions (#209). o Fixed parsing of base quality scores in Mutect 2.2 o Fixed crash in GenomicsDB parsing when there were no variants in contig (#225) Changes in version 2.0.0 ------------------------ NEW FEATURES o Report median absolute pairwise difference (MAPD) of tumor vs normal log2 ratios in runAbsoluteCN o Improved mapping bias estimates: variants with insufficient information for position-specific fits (default 3-6 heterozygous variants) are clustered and assigned to the most similar fit o Make Cosmic.CNT INFO field name customizable SIGNIFICANT USER-VISIBLE CHANGES o Cleanup of naming of command line arguments (will throw lots of deprecated warnings, but was long overdue) o More robust alignment of on- and off-target tumor vs normal log2 ratios. Ratios are shifted so that median difference of neighboring on/off-target pairs is 0. This should fix spurious segments consisting of only on- or off-target regions in high quality samples where those minor off-sets sometimes exceeded the noise. o Added min.variants argument to runAbsoluteCN o Added PureCN version to runAbsoluteCN results object (ret$version) o Addressed observed over-segmentations in very clean data: - Do not attempt two-step segmentation in PSCBS when off-target noise is still very small (< 0.15, min.logr.sdev in runAbsoluteCN) - Increase automatically determined undo.SD in all segmentation functions when noise is very small (< min.logr.sdev) - min.logr.sdev is now accessible in PureCN.R via --min-logr-sdev o Added pairwise sample distances to normalDB output object helpful for finding noisy samples or batches in normal databases o Do not error out readCurationFile when CSV is missing and directory is not writable when re-generating it (#196) o Add segmentation parameters as attributes to segmentation data.frame o Added min.betafit.rho and max.betafit.rho to calculateMappingBias* o Made --normal_panel in PureCN.R defunct o Added GATK/Picard header with sequence lengths to interval file, added readIntervalFile function to parse it BUGFIXES o Fix for crash when --normal_panel in NormalDB.R contained no variants (#180). o Fix for crash when rtracklayer failed to parse --infile in FilterCallableLoci.R (#182) o More robust parsing of VCF with missing GT field (#184) o Fix for bug and crash when mapping bias RDS file contains variants with multiple alt alleles (#184) o Added missing dependency 'markdown' o Fix for crash when only a small number of off-target intervals pass filters (#190) o Fix for crash when PSCBS segmentation was selected without VCF file (#190) o Fix for crash when Hclust segmentation was selected without segmentation file (#190) o Fix for crashes when not many variant pass filters (#192, #195) o Fix for crash when provided segmentation does not have chromosomes in common with VCF (#192) or does not provide all chromosomes present in the coverage file (#192) Changes in version 1.22.0 ------------------------- NEW FEATURES o calculateNormalDatabase now suggests an off-target interval width that minimizes noise while keeping the resolution as high as possible o Added support for GATK4 CollectAllelicCounts output as alternative to Mutect o Added segmentationGATK4 to use GATK4's segmentation function ModelSegments SIGNIFICANT USER-VISIBLE CHANGES o Added min.total.counts filter to filterIntervals to remove intervals with low number of read counts in combined tumor and normal. Useful especially for off-target filtering in highly efficient assays where standard filters keep too many high variance regions. o Changed default of min.mappability in preprocessIntervals for on-target intervals to 0.6 (from 0.5) o Added min.mappability also to filterIntervals so that more conservative cutoffs can be tested after normalDB generation o PSCBS: 1.20.0 two-step segmentation slightly tweaked in that only high quality on-target intervals (high mappability and low PoN noise) are used in the first segmentation o Added --skipgcnorm flag to Coverage.R to skip GC-normalization o Added AF.info.field option to calculateMappingBiasGatk4 for non-standard GenomicsDB imports o If segmentation functions add breakpoints within baits, these breakpoints are now moved to the beginning or end of that bait to avoid that a single bait is assigned to two segments o Dx.R now always generates a _signatures.csv file with --signatures, even if insufficient number of mutations o Removed defunct calculateIntervalWeights function BUGFIXES o Fix for nonsensical error message when VCF does not contain germline variants (#166). o Fix for various issues related to the seqlevelsStyle function (e.g. #171) o Fix for crash in calculateMappingBiasGatk4 when not all samples had a single variant call on a particular chromosome (chrY) o Fix related to annotating mapping bias with triallelic sites and GenomicsDB o Fixed an issue in Mutect 1.1.7 data in which good SNPs were ignored (#174) Changes in version 1.20.0 ------------------------- NEW FEATURES o Support for GATK4 GenomicsDB import for mapping bias calculation o Added --additionaltumors to PureCN.R to provide coverage files from additional biopsies from the same patient when available o PSCBS segmentation now identifies on-target breakpoints first when off-target is noisy, thus boosting sensitivity in on-target regions o Beta-binomial model in runAbsoluteCN now uses the fits in mapping bias database. We plan to set this as default in upcoming versions and appreciate feedback. SIGNIFICANT USER-VISIBLE CHANGES o We now check if POP_AF or POPAF is -log10 scaled as new Mutect2 versions do. o Added support for GERMQ info field containing Phred-scaled germline probabilities. o Detect Mutect2 VCF more reliably o Updated Mutect2 failure flags: "strand_bias", "slippage", "weak_evidence", "orientation", "haplotype" o Removed defunct normal.panel.vcf.file from setMappingBiasVcf o Removed defunct interval.weight.file from segmentationPSCBS, segmentationCBS and processMultipleSamples o Made calculateIntervalWeights defunct o Changed default of min.normals in calculateMappingBiasVcf/Gatk4 to 1 from 2 o Changed default of --signature_databases to "signatures.exome.cosmic.v3.may2019" (v3 instead of v2) o Now warn if recommended -funsegmentation is not used o Added parallel option for callAmplificationsInLowPurity o callMutationBurden now uses all non-filtered targets as callable region when callable is not provided o plotAbs in chromosome mode now displays wider range of log2 ratios (makes it possible to examine outliers) o Moved vcf.field.prefix from predictSomatic to runAbsoluteCN since it now adds more fields like prior somatic and mapping bias to the VCF o Changed default of runAbsoluteCN min.ploidy to 1.4 BUGFIXES o Fix for crash with CNVkit input when log-ratio contained highly negative outliers o Fixed a bug in preprocessIntervals/IntervalFile.R when input contained overlapping and stranded intervals o Fix for crash when GC-correction is attempted on empty coverage (for example off-target region without any off-target reads) o Fix for crash when VCF FA field contained missing values o Fix for a bug in callAmplificationsInLowPurity that can cause a wrong chromosome percentile Changes in version 1.18.0 ------------------------- SIGNIFICANT USER-VISIBLE CHANGES o callAlterations: columns C and seg.mean now provide the values of the segment listed in seg.id. This changes the behaviour in cases where the gene contains breakpoints and thus multiple segments overlap (#112) BUGFIXES o Fix for bug that can result in crash when candidates were provided in runAbsoluteCN and test.purity, max.ploidy and/or min.ploidy were set to non-default values Changes in version 1.16.0 ------------------------- NEW FEATURES o Flag segments in poor quality regions o predictSomatic now provides log-likelihood of allelic balance (ALLELIC.IMBALANCE column) for each variant o Added readLogRatioFile function to read GATK4 DenoiseReadCounts output files containing log2 tumor/normal ratios o Added readSegmentationFile function to read GATK4 ModelSegment output files containing segmented log2 tumor/normal ratios o Added callAmplificationsInLowPurity to call gene-level amplifications in samples < 10% purity o Dx.R now reports chromosomal instability scores (available also via callCIN function) o Dx.R supports deconstructSigs 1.9.0 and COSMIC signatures v3. To run both v2 and v3, simply add --signature_databases signatures.exome.cosmic.v3.may2019:signatures.cosmic to Dx.R SIGNIFICANT USER-VISIBLE CHANGES o Made filterTargets and createTargetWeights defunct o setMappingBiasVcf now returns a data.frame o Best practices vignette now HTML-based o Renamed normal.panel.vcf.file in setMappingBiasVcf to mapping.bias.file; in 1.18, setMappingBiasVcf will not accept a VCF anymore but requires a precomputed mapping bias RDS file. o calculateIntervalWeights now directly called by createNormalDatabase and information included in the normalDB RDS object. This function is thus deprecated. o Column gene.mean in callAlterations output now weighted by interval weights when available o Changed default of min.target.width in preprocessIntervals from 10 to 100 (#73) o replaced write.table with data.table::fwrite to automatically support producing gzipped output (requires data.table 1.12.4, #106) o Coverage.R now gzips BAM file coverage (requires data.table 1.12.4, #106) o Output coverage files now code FALSE as 0 and TRUE as 1 o PureCN.R now bgzips and tabix indexes VCFs when --vcf is provided BUGFIXES o Fix for bug in CCF calculation resulting in NAs (happens in high coverage samples, early mutations with > 1 allele copy number) o Fix for a bug in preprocessIntervals when small targets (< min.target.width) were present o Fix for a bug in callMutationBurden when VCF contained indels (#82) o Die with helpful error message when snp.blacklist import failed o Check input segmentation files for missing values resulting in crash o Fixed a crash in Varscan2 produced VCFs when ALT field missed ref counts (#109) Changes in version 1.14.0 ------------------------- NEW FEATURES o support for copynumber package and its multisample segmentation o beta support for PSCBS weighting o support for gene symbol filtering in FilterCallableLoci.R (e.g. --exclude "^HLA") o added segmentationHclust function that clusters provided segmentation using log2-ratio and B-allele frequencies o min.target.width and small.targets in preprocessIntervals to automatically deal with too small targets o calculate confidence intervals for cellular fractions o throw additional warning when sample is flagged as NON-ABERRANT and pick the diploid solution with lowest purity as best SIGNIFICANT USER-VISIBLE CHANGES o significant runtime improvements o callLOH now reports all segments, even if there are no informative SNPs since some users were not aware that segments are missing from this output. Use keep.no.snp.segments = FALSE to restore old behaviour. o more detailed output of callLOH o renamed num.snps.segment to num.snps in callAlterations output BUGFIXES o fixed crash in PureCN.R when gene symbols are missing from interval file o fixed crash in runAbsoluteCN with matched normals and high test.purity minimum (#74) Changes in version 1.12.0 ------------------------- NEW FEATURES o normalDB does not need input normal coverage files anymore after creation (so the resulting normalDB.rds file can be moved) o base quality filtering can be turned off by setting min.base.quality to 0 or NULL o possible to change the POP_AF info field name o possible to change POP_AF cutoff to set a high germline prior o possible to change min.cosmic.cnt and max.homozygous.loss in PureCN.R o set number of cores in PureCN.R (thanks Brad) SIGNIFICANT USER-VISIBLE CHANGES o renamed reptimingbinsize to reptimingwidth in IntervalFile.R, added this feature to preprocessIntervals o clarified "targets" vs. "intervals"; whenever something affects both on-target and off-target, it is now called "intervals". When only targets, e.g. in annotateTargets, "targets" was kept. o made gc.gene.file defunct o new default for min.cosmic.cnt = 6 (instead of 4) BUGFIXES o catch various input problems and provide better error messages instead of crashing o stranded input BED files do not cause problems anymore o fixed a bug when only a single local optimum was tested (happens only when users copy the examples that restrict the search speach to avoid long runtimes) o added missing QC flag to predictSomatic VCF annotation Changes in version 1.10.0 ------------------------- SIGNIFICANT USER-VISIBLE CHANGES o New normal database format o Runtime performance improvements (skip unlikely local optima, support for BiocParallel in runAbsoluteCN, pre-calculation of mapping bias) o Support for replication timing scores in coverage normalization o More accurate confidence intervals in callMutationBurden o More accurate copy numbers for high-level amplifications o Very low or high coverage samples are now by default dropped in normal database creation (less than 25% or more than 4 times the median sample coverage) o Improved support for third-party upstream tools like GATK4 (experimental) o More checks for wrong or sub-optimal input and providing suggestions for fixing those issues o Gibbs sampling of log tumor/normal coverage error rate o Better imputation of mapping bias (instead of smoothing over neighboring variants in the sample, smooth over neighboring SNPs in the pool of normals - only available when pre-calculated) o Experimental support for indels o Code cleanups (switch to testthat, removed several obsolete and minor features) API CHANGES o renamed gc.gene.file to interval.file since it now provides more than GC-content and gene symbols o plotAbs ids changed to id (this function now only plots a single purity/ploidy solution) o changed default of runAbsoluteCN max.logr.sdev to 0.6 (from 0.75) o createTargetWeights does not require tumor coverages anymore o calculateGCContentByInterval was renamed to preprocessIntervals o renamed plot.gc.bias to plot.bias in correctCoverageBias since it now also includes replication timing o added calculateMappingBiasVcf to pre-compute mapping bias from a panel of normal VCF, thus avoiding time loading and parsing of huge VCFs o max.homozygous.loss now defines the maximum fraction of a chromosome lost, not the whole genome, to avoid wrong maximum likelihood solutions with completely deleted chromosome arms Changes in version 1.8.0 ------------------------ NEW FEATURES o Support for off-target reads in copy number normalization and segmentation o Added mutation burden calculation o More robust mapping bias estimation o Added support for CNVkit coverage files (*.cnn, *.cnr) o IntervalFile.R can annotate targets with gene symbols and automatically convert chromosome naming styles o Better artifact filtering by using normalDB more efficiently o Support for mappability scores o Coverage calculation can now include duplicates o calculateBamCoverageByInterval now provides fragment counts and duplication rates o findBestNormal pooling now fragment count based, not coverage based o Experimental support for GATK4 o predictSomatic now reports posterior probabilites of minor segment copy numbers, flags segments if copy numbers are unreliable o Targets can be annotated with multiple gene symbols (comma separated) o Code cleanups (switch to GRanges where possible, switch to optparse in command line tools) API CHANGES o Due to novel optimizations of provided bait intervals, we highly recommend to regenerate the interval files and normal databases and recalculate all coverages from BAM files o New functions: annotateTargets, callMutationBurden o Defunct functions: createSNPBlacklist, getDiploid, autoCurateResults, readCoverageGatk o min.normals defaults to 2 (changed from 4) in setMappingBiasVcf o normalDB.min.coverage defaults to 0.25 (changed from 0.2) in filterTargets o log.ratio.calibration defaults to 0.1 (from 0.25) in runAbsoluteCN; now relative to purity, not log-ratio noise o Removed gc.data from filterTargets since gc_bias is now added to tumor coverage o dropped purecn.output from correctCoverageBias (no two-pass anymore) o Coverage.R argument --gatkcoverage renamed to --coverage o Dropped GC-normalization functionality in NormalDB, since this is now conveniently done in Coverage.R o Renamed PureCN.R --outdir argument to --out. Can now specify a file prefix as in GATK. Filenames are thus not forced to sample id anymore. If --out is a directory, it will behave like before and will use out/sampleid_suffix as filename.