Kids First Data Resource Center
Gabriella Miller Kids First Vision: Alleviate suffering from childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of these diseases and supporting data sharing within the pediatric research community.
NIH Common Fund’s Gabriella Miller Kids First Pediatric Research Program; administered by NHLBI (PO: Charlene Schramm) overseen by NIH Kids First Working Group.
Adam Resnick, Brandi Davis-Dusenbery, Vincent Ferretti, Robert Grossman, Allison Heath, Deanne Taylor, Sam Volchenboum.
Children’s Hospital of Philadelphia, CHU Sainte-Justine, University of Chicago, Seven Bridges Genomics.
Kids First Released Datasets (22K + Participants):
- 21 Kids First datasets released as of 9/14 (more in coming weeks)
- 14 Structural Birth Defects datasets spanning congenital diaphragmatic hernia, congenital heart defects, orofacial cleft, other craniofacial anomalies, disorders of sex development etc.
- 7 Childhood Cancer datasets spanning neuroblastoma, ewing sarcoma, leukemia, enchondromatoses etc.
- 1 cohort of children with both structural birth defects and cancer
- 5 fully interoperable datasets (e.g., TARGET, CBTN)
- 44 datasets
- 48,000 genomes
- 20,000 cases approved for Sequencing as Part of Kids First (Link)
Interoperability Use Cases
We have a number of researchers that will be working across datasets in multiple platforms. In general, pediatrics and developmental biology spans organs, disease and data types - making effective and sustained interoperability a critical need.
Examples of overlapping cohorts:
- INCLUDE: KFDRC and INCLUDE Data Hub
- PCGC: KFDRC and BDC
- CMG: KFDRC and AnVIL
- CSER: KFDRC and AnVIL
- NBL: KFDRC and NCI CRDC
- AML: KFDRC and NCI CRDC
- Brain Tumors/CCDI: KFDRC and NCI CRDC
Early on, we realized we needed “intra”-operability for releases and developed a release coordination protocol. Perhaps an area for interoprability as well.
Clinical/Phenotipic Data Interop Pilot Opportunities
Clinical and phenotypic data has been identified as a limiting factor across supported studies. The wide range provides challenges, but also opportunities for interoperability. Areas under investigation include:
- FHIR as a core standard
- Terminologies for harmonized and computable data values
- Identifying virtual cohorts across the platforms based on clinical and phenotypic data
- Workflows for clinical/phenotypic data harmonization
Kids First DRC Workflows
- Goal - functional equivalency to other large datasets/resources
- Germline - Trio/family-based GATK germline best practices
- Somatic - Strelka2, Mutect2, Lancet, VarDict, Manta, Control-FREEC
- RNA-Seq - STAR 2-pass, RSEM, Kallisto, STAR-fusion, Arriba
- Data analysis and sharing platform
- Over 450 public apps
- Interactive analysis with Jupyter Notebooks and RStudio
- Explore annotated Kids First variants across datasets in your browser
Variant Workbench (Coming Soon!)
- Query Kids First data within a Zeppelin Notebook environment
- Identify rare variants using SQL and PySpark
Integration with PedCBioPortal
- Open access visualization and analysis of somatic mutations and expression
- Controlled access - eRA Commons
- Registered access - Google, Facebook and ORCiD (OAuth2)
- NIH Datasets - Telemetry reports from dbGaP either to Kids First Framework Services or NCI CRDC Framework Services
- Consortium Datasets - Whitelist via Gen3 (also eRA Commons AuthN)
The Kids First Data Service integrates with Gen3 indexed to index and associate relevant data for querying to the files. Cavatica also recently released GA4GH DRS endpoints for all files.
- On This Page
- Kids First Released Datasets (22K + Participants):
- Interoperability Use Cases
- Clinical/Phenotipic Data Interop Pilot Opportunities
- Kids First DRC Workflows
- Variant Search
- Variant Workbench (Coming Soon!)
- Integration with PedCBioPortal