Skip to main navigation Skip to search Skip to main content

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

  • The Human Genome Structural Variation Consortium
  • Xi'an Jiaotong University
  • The First Affiliated Hospital of Xi’an Jiaotong University
  • Leiden University
  • Jackson Laboratory
  • Yale University
  • European Molecular Biology Laboratory
  • New York Genome Center
  • Harvard University
  • University of Michigan, Ann Arbor
  • Heinrich Heine University Düsseldorf
  • University of Washington
  • German Cancer Research Center
  • Bionano Genomics Inc.
  • University of Alabama at Birmingham
  • University of Southern California
  • Temple University
  • Inc
  • Washington University St. Louis
  • University of Maryland, Baltimore

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.

Original languageEnglish
Pages (from-to)205-218
Number of pages14
JournalGenomics, proteomics & bioinformatics / Beijing Genomics Institute
Volume20
Issue number1
DOIs
StatePublished - Feb 2022

Keywords

  • Complex structural variant
  • Formation mechanism
  • Graph mining
  • Next-generation sequencing
  • Pattern growth

Fingerprint

Dive into the research topics of 'Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants'. Together they form a unique fingerprint.

Cite this