跳到主要导航 跳到搜索 跳到主要内容

Mining unique-m substrings from genomes

  • Kai Ye
  • , Zhenyu Jia
  • , Yipeng Wang
  • , Paul Flicek
  • , Rolf Apweiler
  • Leiden University
  • University of California at Irvine
  • Vaccine Research Institute of San Diego
  • European Molecular Biology Laboratory

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Unique substrings in genomes may indicate high level of specificity which is crucial and fundamental to many genetics studies, such as PCR, microarray hybridization, Southern and Northern blotting, RNA interference (RNAi), and genome (re)sequencing. However, being unique sequence in the genome alone is not adequate to guaranty high specificity. For example, nucleotides mismatches within a certain tolerance may impair specificity even if an interested substring occur only once in the genome. In this study we propose the concept of unique-m substrings of genomes for controlling specificity in genome-wide assays. A unique-m substring is defined if it only has a single perfect match on one strand of the entire genome while all other approximate matches must have more than m mismatches. We developed a pattern growth approach to systematically mine such unique-m substrings from a given genome. Our algorithm does not need a pre-processing step to extract sequential information which is required by most of other rival methods. The search for unique-m substrings from genomes is performed as a single task of regular data mining so that the similarities among queries are utilized to achieve tremendous speedup. The runtime of our algorithm is linear to the sizes of input genomes and the length of unique-m substrings. In addition, the unique-m mining algorithm has been parallelized to facilitate genome-wide computation on a cluster or a single machine of multiple CPUs with shared memory.

源语言英语
页(从-至)99-100
页数2
期刊Journal of Proteomics and Bioinformatics
3
3
DOI
出版状态已出版 - 3月 2010
已对外发布

学术指纹

探究 'Mining unique-m substrings from genomes' 的科研主题。它们共同构成独一无二的指纹。

引用此