TY - JOUR
T1 - A novel method for in silico identification of regulatory SNPs in human genome
AU - Li, Rong
AU - Zhong, Dexing
AU - Liu, Ruiling
AU - Lv, Hongqiang
AU - Zhang, Xinman
AU - Liu, Jun
AU - Han, Jiuqiang
N1 - Publisher Copyright:
© 2016 Elsevier Ltd
PY - 2017/2/21
Y1 - 2017/2/21
N2 - Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/.
AB - Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/.
KW - Hydroxyl radical cleavage patterns
KW - Imbalanced data
KW - Position weight matrix
KW - Support vector machine
UR - https://www.scopus.com/pages/publications/85006823529
U2 - 10.1016/j.jtbi.2016.11.022
DO - 10.1016/j.jtbi.2016.11.022
M3 - 文章
C2 - 27908705
AN - SCOPUS:85006823529
SN - 0022-5193
VL - 415
SP - 84
EP - 89
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
ER -