TY - JOUR
T1 - PVTree
T2 - A sequential pattern mining method for alignment independent phylogeny reconstruction
AU - Kang, Yongyong
AU - Yang, Xiaofei
AU - Lin, Jiadong
AU - Ye, Kai
N1 - Publisher Copyright:
© 2019 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.
AB - Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.
KW - Alignment free
KW - Multiple sequence alignment
KW - Phylogenetic tree
KW - Sequential pattern mining
UR - https://www.scopus.com/pages/publications/85061905492
U2 - 10.3390/genes10020073
DO - 10.3390/genes10020073
M3 - 文章
AN - SCOPUS:85061905492
SN - 2073-4425
VL - 10
JO - Genes
JF - Genes
IS - 2
M1 - 73
ER -