Header logo is ei

PALMA: mRNA to Genome Alignments using Large Margin Algorithms




Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin learning that combines accurate plice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Availability: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma.

Author(s): Schulze, U. and Hepp, B. and Ong, CS. and Rätsch, G.
Journal: Bioinformatics
Volume: 23
Number (issue): 15
Pages: 1892-1900
Year: 2007
Month: May
Day: 0

Department(s): Empirical Inference
Bibtex Type: Article (article)

Digital: 0
DOI: 10.1093/bioinformatics/btm275
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: Web


  title = {PALMA: mRNA to Genome Alignments using Large Margin Algorithms},
  author = {Schulze, U. and Hepp, B. and Ong, CS. and R{\"a}tsch, G.},
  journal = {Bioinformatics},
  volume = {23},
  number = {15},
  pages = {1892-1900},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = may,
  year = {2007},
  month_numeric = {5}