TY - JOUR
T1 - Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet
AU - Jia, Peng
AU - Dong, Lianhua
AU - Yang, Xiaofei
AU - Wang, Bo
AU - Bush, Stephen J.
AU - Wang, Tingjie
AU - Lin, Jiadong
AU - Wang, Songbo
AU - Zhao, Xixi
AU - Xu, Tun
AU - Che, Yizhuo
AU - Dang, Ningxin
AU - Ren, Luyao
AU - Zhang, Yujing
AU - Wang, Xia
AU - Liang, Fan
AU - Wang, Yang
AU - Ruan, Jue
AU - Xia, Han
AU - Zheng, Yuanting
AU - Shi, Leming
AU - Lv, Yi
AU - Wang, Jing
AU - Ye, Kai
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Background: Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). Results: The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent–child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity—including those located at long repeat regions, complex structural variants, and de novo mutations—are systematically examined in this study. Conclusions: In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
AB - Background: Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). Results: The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent–child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity—including those located at long repeat regions, complex structural variants, and de novo mutations—are systematically examined in this study. Conclusions: In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
UR - https://www.scopus.com/pages/publications/85178462502
U2 - 10.1186/s13059-023-03116-3
DO - 10.1186/s13059-023-03116-3
M3 - 文章
C2 - 38049885
AN - SCOPUS:85178462502
SN - 1474-7596
VL - 24
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 277
ER -