TY - JOUR
T1 - Mako
T2 - A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
AU - The Human Genome Structural Variation Consortium
AU - Lin, Jiadong
AU - Yang, Xiaofei
AU - Kosters, Walter
AU - Xu, Tun
AU - Jia, Yanyan
AU - Wang, Songbo
AU - Zhu, Qihui
AU - Ryan, Mallory
AU - Guo, Li
AU - Zhang, Chengsheng
AU - Gerstein, Mark B.
AU - Sanders, Ashley D.
AU - Zody, Micheal C.
AU - Talkowski, Michael E.
AU - Mills, Ryan E.
AU - Korbel, Jan O.
AU - Marschall, Tobias
AU - Ebert, Peter
AU - Audano, Peter A.
AU - Rodriguez-Martin, Bernardo
AU - Porubsky, David
AU - Jan Bonder, Marc
AU - Sulovari, Arvis
AU - Ebler, Jana
AU - Zhou, Weichen
AU - Serra Mari, Rebecca
AU - Yilmaz, Feyza
AU - Zhao, Xuefang
AU - Hsieh, Ping Hsun
AU - Lee, Joyce
AU - Kumar, Sushant
AU - Rausch, Tobias
AU - Chen, Yu
AU - Chong, Zechen
AU - Munson, Katherine M.
AU - Chaisson, Mark J.P.
AU - Chen, Junjie
AU - Shi, Xinghua
AU - Wenger, Aaron M.
AU - Harvey, William T.
AU - Hansenfeld, Patrick
AU - Regier, Allison
AU - Hall, Ira M.
AU - Flicek, Paul
AU - Hastie, Alex R.
AU - Fairely, Susan
AU - Lee, Charles
AU - Devine, Scott E.
AU - Eichler, Evan E.
AU - Ye, Kai
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2022/2
Y1 - 2022/2
N2 - Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
AB - Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
KW - Complex structural variant
KW - Formation mechanism
KW - Graph mining
KW - Next-generation sequencing
KW - Pattern growth
UR - https://www.scopus.com/pages/publications/85129631502
U2 - 10.1016/j.gpb.2021.03.007
DO - 10.1016/j.gpb.2021.03.007
M3 - 文章
C2 - 34224879
AN - SCOPUS:85129631502
SN - 1672-0229
VL - 20
SP - 205
EP - 218
JO - Genomics, proteomics & bioinformatics / Beijing Genomics Institute
JF - Genomics, proteomics & bioinformatics / Beijing Genomics Institute
IS - 1
ER -