GSDcreator: An Efficient and Comprehensive Simulator for Genarating NGS Data with Population Genetic Information

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

In recent decades, NGS data analysis has become a major research field in bioinformatics, which presents great advantages in many application scenarios. Many algorithms and software were designed for analyzing the NGS data, while simulation datasets are urgently needed for testing software and optimizing their parameter configurations. Thus, a series of NGS data simulators have been published. However, the existing simulators cannot satisfy the requirements from many specific scenarios. First, they do not support many newly discovered variations. Second, complex structural variations are difficult to generate. In addition, along with the increase of population data, it is urgent to increase population information simulation. In this paper, we propose GSDcreator, a comprehensive NGS simulator that overcome the three weaknesses mentioned above. It can produce all known types of variation, where the complex of variations are also supported. Furthermore, it can capture many important real data features including population polymorphism, insert size distribution, adjacent site depth distribution, overall depth distribution, quality score distribution, amplification bias, sequencing errors and so on. It's highlighted that 1000 Genomes Project Database is taken as a reference and integrates population genetic information to simulate population polymorphism. To test the performance, we did a lot of experiments and found that simulated data produced by GSDcreator are quit mimic to the real sequencing data.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
EditorsIllhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1868-1875
Number of pages8
ISBN (Electronic)9781728118673
DOIs
StatePublished - Nov 2019
Event2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States
Duration: 18 Nov 201921 Nov 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Country/TerritoryUnited States
CitySan Diego
Period18/11/1921/11/19

Keywords

  • data simulator
  • next-generation sequencing data analysis
  • population genomics
  • population information

Fingerprint

Dive into the research topics of 'GSDcreator: An Efficient and Comprehensive Simulator for Genarating NGS Data with Population Genetic Information'. Together they form a unique fingerprint.

Cite this