Zabad S et al., The American Journal of Human Genetics - This episode covers Zabad et al.'s methods to scale summary-statistics-based polygenic risk score (PRS) inference to millions of variants. The authors introduce compressed LD storage, memory-efficient coordinate-ascent variational algorithms, and multi-level parallelism to cut storage, runtime, and RAM by orders of magnitude while retaining competitive prediction accuracy. Key terms: polygenic risk scores, linkage disequilibrium, variational inference, LD compression, VIPRS.
Study Highlights:
The authors design a compact LD-matrix format (CSR stored in Zarr with quantization) and algorithmic optimizations that reduce LD storage by over 50-fold. They reimplement coordinate-ascent variational updates in C/C++ using single-precision floats, triangular-LD updates, dequantize-on-the-fly, and two layers of parallelism to cut runtime and memory use by orders of magnitude. VIPRS v0.1 can run variational Bayesian regression on 1.1M HapMap3 variants in under a minute and converges genome wide on up to 18M variants in tens of minutes using <15 GB RAM. The paper also analyzes spectral causes of numerical instability in LD matrices and gives practical recommendations to improve stability and prediction accuracy.
Conclusion:
The updated VIPRS toolkit enables fast, memory-efficient whole-genome PRS inference at biobank scale with competitive accuracy and provides storage formats and numerical safeguards to improve reproducibility and portability.
Music:
Enjoy the music based on this article at the end of the episode.
Article title:
Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms
First author:
Zabad S
Journal:
The American Journal of Human Genetics
DOI:
10.1016/j.ajhg.2025.05.002
Reference:
Zabad S., Haryan C.A., Gravel S., Misra S., Li Y. (2025). Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms. The American Journal of Human Genetics 112, 1–19. https://doi.org/10.1016/j.ajhg.2025.05.002
License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/
Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00
Official website https://basebybase.com
On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics.
Episode link: https://basebybase.com/episodes/viprs-whole-genome-prs
QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-06-07.
QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited the transcript's presentation of VIPRS architecture (LD storage, quantization, DQF, triangular LD), memory/performance benchmarks, parallelism, numerical stability guards, and cross-ancestry/cross-biobank findings against the original article.
- transcript topics: Polygenic risk scores and LD challenges; LD matrix compression via upper-triangular storage; CSR storage and Zarr cloud-native format; Quantization to int8/int16 and scale quantization; Dequantize-on-the-Fly (DQF) memory management; Coordinate ascent updates and OpenMP parallelism
QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 8
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0
Metadata Audited:
- article_doi
- article_title
- article_journal
- license
Factual Items Audited:
- LD matrix compression reduces storage by >50-fold; 1.4M HapMap3 variants stored in ~300 MB
- LD matrices stored in CSR format with quantization to int8/int16 (scale quantization)
- Dequantize-on-the-Fly (DQF) streams data and avoids full in-memory decompression, reducing memory usage
- Triangular LD mode reduces memory usage by about 40% compared with symmetric LD mode
- Two layers of parallelism: across chromosomes and within coordinate-ascent; ~30% total runtime reduction with 4 threads
- VIPRS v0.1 can infer 1.1M HapMap3 variants in under a minute; converges on up to 18M variants in tens of minutes using <15 GB RAM
QC result: Pass.
Fler avsnitt av Base by Base
Visa alla avsnitt av Base by BaseBase by Base med Gustavo Barra finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
