CHANGES IN VERSION 1.38.0 ------------------------- UTILITIES o new option 'ext_nbyte' in seqGet2bGeno() o `seqAlleleCount()` and `seqGetAF_AC_Missing()` return NA instead of zero when all genotypes are missing at a site o `seqGDS2VCF()` does not output the FORMAT column if there is no selected sample (e.g., site-only VCF files) o `seqGetData(, "$chrom_pos2")` is similar to `seqGetData(, "$chrom_pos")` except the duplicates with the suffix ("_1", "_2" or >2) NEW FEATURES o `seqGDS2BED()` can convert to PLINK BED files with the best-guess genotypes when there are only numeric dosages in the GDS file o `seqEmptyFile()` outputs an empty GDS file CHANGES IN VERSION 1.36.2 ------------------------- BUG FIXES o fix the bug at multi-allelic sites with more than 15 different alleles, see https://github.com/zhengxwen/SeqArray/issues/78 CHANGES IN VERSION 1.36.1 ------------------------- BUG FIXES o `seqExport()` failed when there is no variant o `seqSetFilter(, ret.idx=TRUE)`, see https://github.com/zhengxwen/SeqArray/issues/80 CHANGES IN VERSION 1.36.0 ------------------------- NEW FEATURES o new functions `seqUnitCreate()`, `seqUnitSubset()` and `seqUnitMerge()` o new functions `seqFilterPush()` and `seqFilterPop()` o new functions `seqGet2bGeno()` and `seqGetAF_AC_Missing()` o new function `seqGetData(, "$dosage_sp")` for a sparse matrix of dosages o the first argument 'gdsfile' can be a file name in `seqAlleleFreq()`, `seqAlleleCount()`, `seqMissing()` o new function `seqMulticoreSetup()` for setting a multicore cluster according to a numeric value assigned to the argument 'parallel' UTILITIES o allow opening a duplicated GDS file ('allow.duplicate=TRUE') when the input is a file name instead of a GDS object in `seqGDS2VCF()`, `seqGDS2SNP()`, `seqGDS2BED()`, `seqVCF2GDS()`, `seqSummary()`, `seqCheck()` and `seqMerge()` o remove the deprecated '.progress' in `seqMissing()`, `seqAlleleCount()` and `seqAlleleFreq()` o add `summary.SeqUnitListClass()` o no genotype and phase data nodes from `seqSNP2GDS()` if SNP dosage GDS is the input BUG FIXES o `seqUnitApply()` works correctly with selected samples if 'parallel' is a non-fork cluster o `seqVCF2GDS()` and `seqVCF_Header()` work correctly if the VCF header has white space o `seqGDS2BED()` with selected samples for sex and phenotype information o `seqGDS2VCF()` failed if there is no 'genotype/data' in the GDS file CHANGES IN VERSION 1.32.0 ------------------------- NEW FEATURES o new option 'ret.idx' in `seqSetFilter()` for unsorted sample and variant indices o new option 'ret.idx' in `seqSetFilterAnnotID()` for unsorted variant index o rewrite the function `seqSetFilterPos()`: new options 'ref' and 'alt', 'multi.pos=TRUE' by default o new option 'packed.idx' in `seqAddValue()` for packing an indexing variable o new option 'warn' in `seqSetFilter()` to enable or disable the warning o new functions `seqNewVarData()` and `seqListVarData()` for variable-length data UTILITIES o allow no variant in `seqApply()` and `seqBlockApply()` o the list object returned from `seqGetData()` always have names if there are more than one input variable names BUG FIXES o `seqGDS2VCF()` should output "." instead of NA in the FILTER column o `seqGetData()` should support factor when '.padNA=TRUE' or '.tolist=TRUE' o fix `seqGDS2VCF()` with factor variables o `seqSummary(gds, "$filter")` should return a data frame with zero row if 'annotation/filter' is not a factor CHANGES IN VERSION 1.30.0 ------------------------- UTILITIES o show a warning when an unsorted index is used in `seqSetFilter()` o show a message if `seqVCF_Header()` fails o a new option 'chr_prefix' in `seqGDS2VCF()` BUG FIXES o `seqVCF_Header()` fixes 'contig' in the header of VCF if there are different fields CHANGES IN VERSION 1.28.1 ------------------------- BUG FIXES o `seqRecompress(, verbose=FALSE)` works correctly o `seqSetFilter(, action="push+set")` should not reset the filter before setting a new filter CHANGES IN VERSION 1.28.0 ------------------------- NEW FEATURES o new functions `seqUnitSlidingWindows()`, `seqUnitApply()`, `seqUnitFilterCond()` o new variable "$variant_index", "$sample_index" in `SeqGetData()`, `seqBlockApply()` and `seqUnitApply()` to get the indices of selected variants o new arguments '.padNA' and '.envir' in `seqGetData()` o new functions `seqSetFilterAnnotID()` and `seqGDS2BED()` o multicore function in `seqBED2GDS(, parallel=)` o new package-wide option `options(seqarray.nofork=TRUE)` to disable forking o new option 'minor' in `seqAlleleFreq()` and `seqAlleleCount()` o new option 'verbose' in `seqMissing()`, `seqAlleleFreq()` and `seqAlleleCount()`; '.progress' is deprecated, but still can be used for compatiblity o `seqAlleleFreq()`, `seqAlleleCount()`, `seqMissing()`, `seqSetFilterCond()` work on 'annotation/format/DS', if 'genotype/data' is not available UTILITIES o `seqAddValue()` adds vectors, matrices and data frame to "annotation/info" o `seqBED2GDS()` allows a single file name without the extended file names (.bed, .fam, .bim) o allele flip in `seqBED2GDS()` to allow major allele to be reference o rewrite `seqGetData()` for faster loading o significantly improve `seqBlockApply()` on 'annotation/info/VARIABLE' (https://github.com/zhengxwen/SeqArray/issues/59) o add a S3 method `print.SeqVCFHeaderClass()` for `seqVCF_Header()` o new option '.tolist' in `seqGetData()`, `seqBlockApply()` and `seqUnitApply()` o String "." in a VCF file are converted to a blank string (missing value) in `seqVCF2GDS()` o add a class name 'SeqVarDataList' to the returned 'list(length, data)' from `seqGetData()` o new option `seqMissing(, per.variant=NA)` o add `comment.char=""` to `seqBED2GDS()` CHANGES IN VERSION 1.26.2 ------------------------- NEW FEATURES o multiple variable names are allowed in `seqGetData(, var.name=)` BUG FIXES o fix `seqGetData(, "genotype", .useraw=NA)` (https://github.com/zhengxwen/SeqArray/issues/58) CHANGES IN VERSION 1.26.1 ------------------------- BUG FIXES o fails to correctly select duplicate indices in `seqSetFilter(f, variant.sel=)` CHANGES IN VERSION 1.26.0 ------------------------- NEW FEATURES o new function `seqAddValue()` UTILITIES o RLE chromosome coding in `seqBED2GDS()` o change the file name "vignettes/R_Integration.Rmd" to "vignettes/SeqArray.Rmd", so `vignette("SeqArray")` can work directly o correct Estimated remaining Time to Complete (ETC) for load balancing in `seqParallel()` BUG FIXES o `seqBED2GDS(, verbose=FALSE)` should have no display CHANGES o use a svg file instead of png in vignettes CHANGES IN VERSION 1.24.2 ------------------------- NEW FEATURES o add the compiler information in `seqSystem()` o new arguments '.balancing', '.bl_size' and '.bl_progress' in `seqParallel()` for load balancing UTILITIES o improve unix forking processes for load balancing in `seqParallel()` BUG FIXES o fix `seqSummary()` when no phase data CHANGES IN VERSION 1.24.0 ------------------------- NEW FEATURES o a new function `seqResetVariantID()` o a new option in `seqRecompress(, compress="none")` to uncompress all data o `seqGetData()` allows a GDS file name in the first argument CHANGES IN VERSION 1.22.6 ------------------------- BUG FIXES o `seqSetFilter(, sample.id=)` fails to correctly select samples in a few cases (since SeqArray>=v1.22.0 uses the distribution of selected samples to optimize the data access of genotypes, see https://github.com/zhengxwen/SeqArray/issues/48) o the bgzf VCF file is truncated in `seqGDS2VCF()` since the file is not closed appropriately o invalid chromosomes and position in the output of `seqMerge()` when merging different samples but same variants CHANGES IN VERSION 1.22.3 ------------------------- NEW FEATURES o a new option 'scenario' in `seqVCF2GDS()` and `seqBCF2GDS()` UTILITIES o more information in `seqDelete()` BUG FIXES o export a haploid VCF file using `seqGDS2VCF()` o export VCF without any FORMAT data in `seqGDS2VCF()` o export GDS without genotypes in `seqExport()` o fix parallel file writing in seqVCF2GDS(), when no genotype CHANGES IN VERSION 1.22.0 ------------------------- NEW FEATURES o `seqSNP2GDS()` imports dosage GDS files o `seqVCF_Header()` allows a BCF file as an input o a new function `seqRecompress()` o a new function `seqCheck()` for checking the data integrity of a SeqArray GDS file o `seqGDS2SNP()` exports dosage GDS files UTILITIES o avoid duplicated meta-information lines in `seqVCF2GDS()` and `seqVCF_Header()` o require >= R_v3.5.0, since reading from connections in text mode is buffered o `seqDigest()` requires the digest package o optimization in reading genotypes from a subset of samples (according to gdsfmt_1.17.5) BUG FIXES o `seqVCF2GDS()` and `seqVCF_Header()` are able to import site-only VCF files (i.e., VCF with no sample) o fix `seqVCF2GDS()` and `seqBCF2GDS()` since reading from connections in text mode is buffered in R >= v3.5.0 CHANGES IN VERSION 1.20.1 ------------------------- BUG FIXES o `seqExport()` fails to export haploid data (e.g., Y chromosome) o `seqVCF2GDS()` fails to convert INFO variables when Number="R" CHANGES IN VERSION 1.20.0 ------------------------- NEW FEATURES o `seqGDS2VCF()` outputs a bgzip vcf file for tabix indexing o two more options "Ultra" and "UltraMax" in `seqStorageOption()` o '@chrom_rle_val' and '@chrom_rle_len' are added to a GDS file for faster chromosome indexing o new function `seqBCF2GDS()` (requiring the software bcftools) o new function `seqSetFilterPos()` o new variable "$dosage_alt" in `seqGetData()` and `seqApply()` o import VCF files with no GT in `seqVCF2GDS()` UTILITIES o `seqDigest(f, "annotation/filter")` works on a factor variable o improve the computational efficiency of `seqMerge()` to avoid genotype recompression by padding the 2-bit genotype array in bytes o significantly improve `seqBlockApply()` (its speed is close to `seqApply()`) o reduce the overhead in `seqSetFilter(, variant.sel=...)` CHANGES IN VERSION 1.18.2 ------------------------- BUG FIXES o fix an issue: `seqSetFilterChrom()` extends a genomic range upstream and downstream 1bp o use `.onLoad()` instead of `.onAttach()` to fix https://support.bioconductor.org/p/104405/#104443 CHANGES IN VERSION 1.18.0 ------------------------- NEW FEATURES o progress information: showing overall running time when completed o new variable names "$ref" and "$alt" can be used in `seqGetData()` and `seqBlockApply()` o new argument '.progress' in `seqDigest()` o new argument 'ref.allele' in `seqAlleleCount()` o new variable name "$chrom_pos_allele" can be used in `seqGetData()` and `seqBlockApply()` UTILITIES o move VariantAnnotation to the suggest field from the import field o remove an unused argument '.list_dup' in `seqBlockApply()` o slightly improve the computational efficiency of `seqAlleleFreq()` and `seqAlleleCount()` when 'ref.allele=0' o `seqGetData(f, "$chrom_pos")` outputs characters with the format 'chromosome:position' instead of 'chromosome_position' BUG FIXES o fix the unexpected behaviors in `seqSetFilter(, action="push")` and `seqSetFilter(, action="push+intersect")` o fix a bug in `seqGetData(f, "$dosage")` when the number of unique alleles at a site greater than 3 (https://github.com/zhengxwen/SeqArray/issues/21) o fix a bug in `seqSNP2GDS()` for inverted genotypes during importing data from SNP GDS files (https://github.com/zhengxwen/SeqArray/issues/22) o fix an issue of no phase data in `seqExport()` CHANGES IN VERSION 1.16.0 ------------------------- o a new argument 'intersect' in `seqSetFilter()` and `seqSetFilterChrom()` o a new function `seqSetFilterCond()` o `seqVCF2GDS()` allows arbitrary numbers of different alleles if REF and ALT in VCF are missing o optimize internal indexing for FORMAT annotations to avoid reloading the indexing from the GDS file o a new CITATION file o 'LZMA_RA' is the default compression method in `seqBED2GDS()` and `seqSNP2GDS()` o `seqVCF_Header()` correctly calculates ploidy with missing genotypes CHANGES IN VERSION 1.14.1 ------------------------- o The default compression setting in `seqVCF2GDS()` and `seqMerge()` is changed from "ZIP_RA" to "LZMA_RA" o `seqVCF2GDS()`: variable-length encoding method is used to store integers in the FORMAT field of VCF files to reduce the file size and compression time CHANGES IN VERSION 1.12.9 ------------------------- o the version number was bumped for the Bioconductor release version 3.3 o `seqVCF_SampID()`, `seqVCF_Header()` and `seqVCF2GDS()` allow a connection object instead of a file name o "$num_allele" is allowed in `seqGetData()` and `seqApply()` (the numbers of distinct alleles) o a new option '.progress' in `seqAlleleFreq()`, `seqMissing()` and `seqAlleleCount()` o 'as.is' can be a `gdsn.class` object in `seqApply()` o v1.12.7: a new argument 'parallel' in `seqApply()`, BiocParallel integration in `seqParallel()` and a new function `seqBlockApply()` o v1.12.8: a new function `seqGetParallel()` CHANGES IN VERSION 1.12.0 ------------------------- o utilizes the official C API `R_GetConnection()` to accelerate text import and export, requiring R (>=v3.3.0); alternative version (backward compatible with R_v2.15.0) is also available on github (https://github.com/zhengxwen/SeqArray/releases/tag/v1.11.18) o ~4x speedup in the sequential version of `seqVCF2GDS()`, and `seqVCF2GDS()` can run in parallel o variables in "annotation/format/" should be two-dimensional as what mentioned in the vignette. o rewrite `seqSummary()` o a new vignette file with Rmarkdown format (replacing SeqArray-JSM2013.pdf) o bug fix in `seqBED2GDS()` if the total number of genotypes > 2^31 (integer overflow) o bug fixes in `seqMerge()` if chromosome and positions are not unique o `seqStorage.Option()` is renamed to `seqStorageOption()` o new function `seqDigest()` o `seqVCF.Header()` is renamed to `seqVCF_Header()`, `seqVCF.SampID()` is renamed to `seqVCF_SampID()` o seqSetFilter(): 'samp.sel' is deprecated since v1.11.12, please use 'sample.sel' instead o accelerate reading genotypes with SSE2(+13%) and AVX2(+23%) o new function `seqSystem()` o allow "$dosage" in `seqGetData()` and `seqApply()` for the dosages of reference allele o accelerate `seqSetFilterChrom()` and allow a selection with multiple regions o new methods `\S4method{seqSetFilter}{SeqVarGDSClass, GRanges}()` and `\S4method{seqSetFilter}{SeqVarGDSClass, GRangesList}()` o 'as.is' in `seqApply()` allows a 'connection' object (created by file, gzfile, etc) o `seqSummary(f, "genotype")$seldim` returns a vector with 3 integers (ploidy, # of selected samples, # of selected variants) instead of 2 integers CHANGES IN VERSION 1.10.6 ------------------------- o fix a memory issue in `seqAlleleFreq()` when 'ref.allele' is a vector o `seqSetFilter()` allows numeric vectors in 'samp.sel' and 'variant.sel' o `seqSummary()` returns ploidy and reference o `seqStorage.Option()` controls the compression level of FORMAT/DATA o `seqVCF2GDS()` allows extract part of VCF files via 'start' and 'count' o `seqMerge()` combines multiple GDS files with the same samples o export methods for compatibility with VariantAnnotation o a new argument '.useraw' in `seqGetFilter()` o a new argument 'allow.duplicate' in `seqOpen()` o fix a bug in `seqParallel()` (https://github.com/zhengxwen/SeqArray/issues/11) and optimize its performance o 'gdsfile' could be NULL in `seqParallel()` CHANGES IN VERSION 1.10.0 ------------------------- o a new function `seqGDS2SNP()` o supported by the SNPRelate package o support `seqApply(..., margin="by.sample")` o new functions `seqOptimize()`, `seqMissing()`, `seqAlleleFreq()`, `seqNumAllele()` and `seqSetFilterChrom()` o "intersection" and "push+intersection" in `seqSetFilter()` o parallel implementation in `seqNumAllele()`, `seqMissing()` and `seqAlleleFreq()` o a new function `seqExport()` o new argument ".useraw" in `seqApply()` o fix a bug for duplicated "variant.id", https://github.com/zhengxwen/SeqArray/issues/7 o fix an issue of `seqVCF2GDS()` when there are duplicated format or info ID o improve access speed (+50%, benchmark on calling seqApply(..., FUN=function(x) {})) o new functions `seqSNP2GDS()`, `seqBED2GDS()`, `seqAlleleCount()` and `seqResetFilter()` o `seqCompress.Option()` is renamed to `seqStorage.Option()` o "ZIP_RA" is the default value in `seqStorageOption()` and other functions instead of "ZIP_RA.max" o `seqSetFilter()` becomes a S4 method CHANGES IN VERSION 1.8.0 ------------------------- o bug fix in getting genotypes if position > 2^31 o add an option 'ignore.chr.prefix' to the function `seqVCF2GDS()` o `seqVCF2GDS()` ignores the INFO or FORMAT variables if they are not defined ahead o a new action 'push+set' in the function `seqSetFilter()` o bug fix if 'requireNamespace("SeqArray")' is called from other packages CHANGES IN VERSION 1.6.0 ------------------------- o fix a bug in `seqVCF2GDS()` when the values in the FILTER column are all missing o enhance `seqVCF.Header()` o support the LinkingTo mechanism o fix the error in haploid genotypes (Y chromosome) CHANGES IN VERSION 1.4.0 ------------------------- o update according to the new version of VariantAnnotation o update test codes to avoid the conflict o bumped version as all packages that depend on Rcpp must be rebuilt o modify to new biocViews to DESCRIPTION file CHANGES IN VERSION 1.2.0 ------------------------- o add a new argument "action" to the function `seqSetFilter()` o add a new function 'seqInfoNewVar' which allows adding new variables to the INFO fields o minor bug fix in asVCF o update man page "SeqVarGDSClass-class.Rd" with new methods o in DESCRIPTION, BiocGenerics listed in "Suggests" instead of "Imports" as suggested by R CMD check o bug fix in seqDelete o revise the function 'seqTranspose' according to the update of gdsfmt (v1.0.0) o revise the argument 'var.index' in the function `seqApply()` o basic supports of 'GRanges' and 'DNAStringSetList' o added methods 'qual', 'filt', 'asVCF' o 'granges' method uses length of reference allele to set width o minor bug fix to avoid `seqGetData()` crashing when no value returned from a variable-length variable o update documents CHANGES IN VERSION 1.0.0 ------------------------- o the version number was bumped for the Bioconductor release version CHANGES IN VERSION 0.99.0 ------------------------- o initial Bioconductor package submission