Invited Talk

Special Topics in Computer Science
Department of Computer Science and Information Engineering
National Chung Cheng University
January 6, 2003

Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis

林耀鈴教授 (Prof. Yaw-Ling Lin)
靜宜大學資管系 (Dept CSIM, Providence University)

Abstract

We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length $n$ and an upper bound $U$, find a consecutive subsequence of length at most $U$ with the maximum sum and (ii) given a sequence of real numbers of length $n$ and a lower bound $L$, find a consecutive subsequence of length at least $L$ with the maximum average. We present an $O(n)$-time algorithm for the first problem and an $O(n \log L)$-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.

appeared in the Journal of Computer and System Sciences, JCSS.
Click Here for the Slides