TY - GEN
T1 - GSDcreator
T2 - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
AU - Wang, Shenjie
AU - Wang, Jiayin
AU - Xiao, Xiao
AU - Zhang, Xuanping
AU - Wang, Xuwen
AU - Zhu, Xiaoyan
AU - Lai, Xin
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - In recent decades, NGS data analysis has become a major research field in bioinformatics, which presents great advantages in many application scenarios. Many algorithms and software were designed for analyzing the NGS data, while simulation datasets are urgently needed for testing software and optimizing their parameter configurations. Thus, a series of NGS data simulators have been published. However, the existing simulators cannot satisfy the requirements from many specific scenarios. First, they do not support many newly discovered variations. Second, complex structural variations are difficult to generate. In addition, along with the increase of population data, it is urgent to increase population information simulation. In this paper, we propose GSDcreator, a comprehensive NGS simulator that overcome the three weaknesses mentioned above. It can produce all known types of variation, where the complex of variations are also supported. Furthermore, it can capture many important real data features including population polymorphism, insert size distribution, adjacent site depth distribution, overall depth distribution, quality score distribution, amplification bias, sequencing errors and so on. It's highlighted that 1000 Genomes Project Database is taken as a reference and integrates population genetic information to simulate population polymorphism. To test the performance, we did a lot of experiments and found that simulated data produced by GSDcreator are quit mimic to the real sequencing data.
AB - In recent decades, NGS data analysis has become a major research field in bioinformatics, which presents great advantages in many application scenarios. Many algorithms and software were designed for analyzing the NGS data, while simulation datasets are urgently needed for testing software and optimizing their parameter configurations. Thus, a series of NGS data simulators have been published. However, the existing simulators cannot satisfy the requirements from many specific scenarios. First, they do not support many newly discovered variations. Second, complex structural variations are difficult to generate. In addition, along with the increase of population data, it is urgent to increase population information simulation. In this paper, we propose GSDcreator, a comprehensive NGS simulator that overcome the three weaknesses mentioned above. It can produce all known types of variation, where the complex of variations are also supported. Furthermore, it can capture many important real data features including population polymorphism, insert size distribution, adjacent site depth distribution, overall depth distribution, quality score distribution, amplification bias, sequencing errors and so on. It's highlighted that 1000 Genomes Project Database is taken as a reference and integrates population genetic information to simulate population polymorphism. To test the performance, we did a lot of experiments and found that simulated data produced by GSDcreator are quit mimic to the real sequencing data.
KW - data simulator
KW - next-generation sequencing data analysis
KW - population genomics
KW - population information
UR - https://www.scopus.com/pages/publications/85084332203
U2 - 10.1109/BIBM47256.2019.8983192
DO - 10.1109/BIBM47256.2019.8983192
M3 - 会议稿件
AN - SCOPUS:85084332203
T3 - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
SP - 1868
EP - 1875
BT - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
A2 - Yoo, Illhoi
A2 - Bi, Jinbo
A2 - Hu, Xiaohua Tony
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 November 2019 through 21 November 2019
ER -