TY - GEN
T1 - Modeling exome sequencing data with generalized Gaussian distribution with application to copy number variation detection
AU - Duan, Junbo
AU - Wan, Mingxi
AU - Deng, Hong Wen
AU - Wang, Yu Ping
PY - 2013
Y1 - 2013
N2 - Exome sequencing provides us an effective way to discover genetic factors that might be associated with phenotypes for complex diseases. Compared with the whole-genome sequencing, exome sequencing can satisfy the high sequencing coverage requirement while under the limited budge constraint. However, due to the nature that exons are distributed sparsely along the genome, and the technical variability between samples, the analysis of exome sequencing data is complicated and direct utilization of current whole-genome sequencing targeted methods yields wrong results. In this paper, we propose a novel model to represent the exome sequencing data. Under this model, we show that the technical variability as well as random sequencing error follow the generalized Gaussian distribution. Based on this observation, we propose a method to detect the copy number variation. Studies on real data from 1000 Genomes Projects validate the proposed algorithm.
AB - Exome sequencing provides us an effective way to discover genetic factors that might be associated with phenotypes for complex diseases. Compared with the whole-genome sequencing, exome sequencing can satisfy the high sequencing coverage requirement while under the limited budge constraint. However, due to the nature that exons are distributed sparsely along the genome, and the technical variability between samples, the analysis of exome sequencing data is complicated and direct utilization of current whole-genome sequencing targeted methods yields wrong results. In this paper, we propose a novel model to represent the exome sequencing data. Under this model, we show that the technical variability as well as random sequencing error follow the generalized Gaussian distribution. Based on this observation, we propose a method to detect the copy number variation. Studies on real data from 1000 Genomes Projects validate the proposed algorithm.
KW - 1000 Genomes Project
KW - Next generation sequencing
KW - copy number variation
KW - exome sequencing
KW - generalized Gaussian distribution
KW - iteratively reweighted least squares
UR - https://www.scopus.com/pages/publications/84894519918
U2 - 10.1109/BIBM.2013.6732619
DO - 10.1109/BIBM.2013.6732619
M3 - 会议稿件
AN - SCOPUS:84894519918
SN - 9781479913091
T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
SP - 1
EP - 7
BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Y2 - 18 December 2013 through 21 December 2013
ER -