Pimplaskar A et al., The American Journal of Human Genetics - In UCLA ATLAS EHR-linked biobank analyses, random forest-derived enrollment probabilities and inverse-probability weighting increased replication of known GWAS variants and altered PGS associations. Key terms: inclusion bias, UCLA ATLAS, inverse-probability weighting, random forest, polygenic scores.
Study Highlights:
Using the UCLA ATLAS EHR-linked biobank, the authors trained random forest classifiers on demographics, healthcare utilization, and ICD-10 features to distinguish enrolled from background patients. They converted predicted enrollment probabilities into inverse-probability weights and applied these to GWAS replication tests and PGS-PheWAS scans. The classifier achieved AUROC≈0.85 and weighting increased replication of known GWAS variants by 54% while changing phenome-wide PGS association patterns. These results indicate that enrollment-driven inclusion bias can materially affect variant discovery and downstream PGS-based phenotypic associations in health-system biobanks.
Conclusion:
Inclusion bias in EHR-linked biobanks like UCLA ATLAS measurably affects common-variant discovery and PGS associations, and enrollment-aware inverse-probability weighting can improve replication while reducing effective sample size.
Music:
Enjoy the music based on this article at the end of the episode.
Article title:
Inclusion bias affects common variant discovery and replication in a health-system linked biobank
First author:
Pimplaskar A
Journal:
The American Journal of Human Genetics
DOI:
10.1016/j.ajhg.2026.02.011
Reference:
Pimplaskar A, Qiu J, Lapinska S, Tozzo V, Chiang JN, Pasaniuc B, Olde Loohuis LM. Inclusion bias affects common variant discovery and replication in a health-system linked biobank. The American Journal of Human Genetics. 2026;113:1–13. https://doi.org/10.1016/j.ajhg.2026.02.011
License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/
Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00
Official website https://basebybase.com
On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.
Episode link: https://basebybase.com/episodes/inclusion-bias-ucla-atlas
QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-03-14.
QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited the transcript sections describing enrollment-bias methodology (random forest classifier, inverse-probability weighting), key numeric results (AUROC/AUPRC, enrollment counts, ORs), GWAS replication improvements, and PGS-PheWAS outcomes, plus implications and limitations.
- transcript topics: Enrollment bias in UCLA ATLAS biobank; Random forest classifier for enrollment prediction; Inverse-probability weighting and normalization; Effective sample size and trade-offs; GWAS variant replication under weighting; Variant-level associations and ancestry effects
QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 8
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0
Metadata Audited:
- article_doi
- article_title
- article_journal
- license
Factual Items Audited:
- Enrollment in ATLAS: background population ~1.57–1.57 million; enrolled ~104,516
- Primary care at UCLA strongly predicts enrollment: ~70.2% enrolled vs ~21.8% unenrolled; OR ≈ 8.44
- Enrolled individuals have higher healthcare utilization: ~12.8 visits/year vs ~6.7
- RF model discriminates enrollment with AUROC ≈ 0.85 and AUPRC ≈ 0.82
- Inverse-probability weighting reduces effective sample size to ≈11,319.9 (4.3× reduction; from ≈48k)
- Weighting increases replication of known GWAS variants by ≈54%
QC result: Pass.
Fler avsnitt av Base by Base
Visa alla avsnitt av Base by BaseBase by Base med Gustavo Barra finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
