This vignette shows how to process long-read PacBio HiFi variant calls from a validated trio (HG002–HG003–HG004) and prepare them for UPDhmm analysis.
Ashkenazi trio (GIAB, NIST) – PacBio HiFi Revio, DeepVariant calls (GRCh38).
# Proband (HG002)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG002.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG002.GRCh38.deepvariant.phased.vcf.gz.tbi
# Father (HG003)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG003.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG003.GRCh38.deepvariant.phased.vcf.gz.tbi
# Mother (HG004)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG004.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG004.GRCh38.deepvariant.phased.vcf.gz.tbiThe following filtering steps are applied:
keep only biallelic variants
remove sites where all trio members are reference (0/0 or 0|0)
remove sites where all trio members are missing (./. or .|.)
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocStyle_2.40.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0
## [4] xfun_0.57 maketools_1.3.2 cachem_1.1.0
## [7] knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
## [10] buildtools_1.0.0 lifecycle_1.0.5 cli_3.6.6
## [13] sass_0.4.10 jquerylib_0.1.4 compiler_4.6.0
## [16] sys_3.4.3 tools_4.6.0 evaluate_1.0.5
## [19] bslib_0.10.0 yaml_2.3.12 BiocManager_1.30.27
## [22] jsonlite_2.0.0 rlang_1.2.0