跳到主要导航 跳到搜索 跳到主要内容

A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data

  • Yu Geng
  • , Zhongmeng Zhao
  • , Zhaofang Du
  • , Yixuan Wang
  • , Tian Zheng
  • , Siyu He
  • , Xuanping Zhang
  • , Jiayin Wang
  • Xi'an Jiaotong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The third generation sequencing data exposes great advantage on read length, which extremely benefits the genomic analyses. However, the third generation sequencing data implies error models different from the ones that the second generation data brings. It is suggested to correct sequencing errors, which could significantly reduce false positives in downstream analyses. Existing error correction approaches often suffer accuracy loss when the hybrid reads present diversity or the coverage varies. In this paper, we propose a novel method based on crowdsourcing strategy, which is implemented as CLTC. CLTC is also a hybrid correction algorithm, which consists of four steps. The second generation reads are first collected and mapped to the third generation reads. Then, the base difficult level is defined to describe the diversities on a base among a group of 2nd-generation reads covered it. The capability is evaluated for each 2nd-generation read, which considers the base difficult levels across the read, the consistency among overlapped reads and the mapping quality between the 2nd- and 3rd-generation reads. A heuristic algorithm is designed for the calculation of capabilities. An expectation-maximization algorithm is finally used to compute the corrected result for each base-pair. We test CLTC on different datasets and compare to the existing approaches. The results demonstrate that CLTC is able to achieve higher accuracy and performs faster than the existing ones.

源语言英语
主期刊名Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
编辑Illhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
出版商Institute of Electrical and Electronics Engineers Inc.
1626-1633
页数8
ISBN(电子版)9781509030491
DOI
出版状态已出版 - 15 12月 2017
活动2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, 美国
期限: 13 11月 201716 11月 2017

出版系列

姓名Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
2017-January

会议

会议2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
国家/地区美国
Kansas City
时期13/11/1716/11/17

学术指纹

探究 'A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data' 的科研主题。它们共同构成独一无二的指纹。

引用此