TY - JOUR
T1 - Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies
AU - Sun, Shiquan
AU - Zhu, Jiaqiang
AU - Mozaffari, Sahar
AU - Ober, Carole
AU - Chen, Mengjie
AU - Zhou, Xiang
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2019/2/1
Y1 - 2019/2/1
N2 - Motivation Genomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large datasets. Results Here, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites. Availability and implementation PQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html. Supplementary informationSupplementary dataare available at Bioinformatics online.
AB - Motivation Genomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large datasets. Results Here, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites. Availability and implementation PQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html. Supplementary informationSupplementary dataare available at Bioinformatics online.
UR - https://www.scopus.com/pages/publications/85061121807
U2 - 10.1093/bioinformatics/bty644
DO - 10.1093/bioinformatics/bty644
M3 - 文章
C2 - 30020412
AN - SCOPUS:85061121807
SN - 1367-4803
VL - 35
SP - 487
EP - 496
JO - Bioinformatics
JF - Bioinformatics
IS - 3
ER -