AnVIL Hosts Release 1 of the Human Pangenome Reference Consortium Data
Posted: March 11, 2021
The AnVIL is proud to host release 1 from the Human Pangenome Reference Consortium (HPRC). The HPRC is an international effort focused on developing an inclusive collection of human reference genomes that represent human haplotype diversity and the tool ecosystem needed to fully utilize this resource.
The human pan-genome reference will be a broad, unbiased, and highly accurate representation constructed from more than 350 individuals representing highly diverse ethnic backgrounds. The success of this work demands accurate haplotype phasing and comprehensive variant discovery, which requires a new generation of scalable genome sequencing production and innovative assembly technologies.
HPRC members selected samples for this study based on adding diversity to the current human reference. Samples were selected from the 1,000 Genome Collection at Coriell and are broadly consented to, allowing for maximum data sharing enabling scientists to work together.
Production teams generated the data as part of NHGRI’s Human Genome Reference Program, a multi-faceted program to create a complete representation of haplotype variation and an ecosystem of tools.
HPRC seeks to work with other international teams focused on building the human pangenome reference. In keeping with the spirit of the HPRC, data is being made available through three primary data repositories – NCBI, EMBL-EBI, and DDBJ, as well as the NHGRI AnVIL.
Data for the first ~30 samples are openly available and includes:
- PacBio Hifi Data
- 1000G Reads (Illumina) For Parents/Children
- Hi-C Sequencing Data (coming soon!)
- Bionano Data
- Strand-Seq Data (7 samples)
- Oxford Nanopore Data
- Hifiasm Assemblies (internal release only, not fully QC'd or corrected)
Viewing HPRC Sequencing Data in AnVIL
To view currently available sequencing data, navigate to the DATA tab of the AnVIL_HPRC workspace and view data stored in the "sample" data table. In-progress assemblies can also be found in the "sample_assembly" table. These assemblies were generated with Hifiasm v0.14.
This data release represents the consortium’s commitment to its mission under some of the most challenging research conditions due to the COVID pandemic. All work was completed within the safety guidelines and mandates of each entity involved in the project.
Note that the HPRC Year 1 data is pending publication, currently scheduled for mid-2021. If you would like to publish using this, please contact the HPRC Coordinating Center at email@example.com.
Additional efforts by the HPRC this year will focus on the creation and release of assemblies and annotation for the Year 1 genomes and data production for Year 2 samples.
About the AnVIL
The AnVIL is the National Human Genome Research Institute’s Genomic Data Science Analysis, Visualization, and Informatics Lab-Space. The AnVIL is a cloud computing environment for managing data, genomic computation, and data sharing. AnVIL is being developed through a collaboration between Johns Hopkins University and the Broad Institute.