Header logo is ei


2006


no image
Machine Learning Algorithms for Polymorphism Detection

Schweikert, G., Zeller, G., Clark, R., Ossowski, S., Warthmann, N., Shinn, P., Frazer, K., Ecker, J., Huson, D., Weigel, D., Schölkopf, B., Rätsch, G.

2nd ISCB Student Council Symposium, August 2006 (talk)

Abstract
Analyzing resequencing array data using machine learning, we obtain a genome-wide inventory of polymorphisms in 20 wild strains of Arabidopsis thaliana, including 750,000 single nucleotide poly- morphisms (SNPs) and thousands of highly polymorphic regions and deletions. We thus provide an unprecedented resource for the study of natural variation in plants.

Web [BibTex]

2006

Web [BibTex]


no image
Inferential structure determination: Overview and new developments

Habeck, M.

Sixth CCPN Annual Conference: Efficient and Rapid Structure Determination by NMR, July 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
MCMC inference in (Conditionally) Conjugate Dirichlet Process Gaussian Mixture Models

Rasmussen, C., Görür, D.

ICML Workshop on Learning with Nonparametric Bayesian Methods, June 2006 (talk)

Abstract
We compare the predictive accuracy of the Dirichlet Process Gaussian mixture models using conjugate and conditionally conjugate priors and show that better density models result from using the wider class of priors. We explore several MCMC schemes exploiting conditional conjugacy and show their computational merits on several multidimensional density estimation problems.

Web [BibTex]

Web [BibTex]


no image
Sampling for non-conjugate infinite latent feature models

Görür, D., Rasmussen, C.

(Editors: Bernardo, J. M.), 8th Valencia International Meeting on Bayesian Statistics (ISBA), June 2006 (talk)

Abstract
Latent variable models are powerful tools to model the underlying structure in data. Infinite latent variable models can be defined using Bayesian nonparametrics. Dirichlet process (DP) models constitute an example of infinite latent class models in which each object is assumed to belong to one of the, mutually exclusive, infinitely many classes. Recently, the Indian buffet process (IBP) has been defined as an extension of the DP. IBP is a distribution over sparse binary matrices with infinitely many columns which can be used as a distribution for non-exclusive features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described, however requiring conjugacy restricts the use of IBP. We describe an MCMC algorithm for non-conjugate IBP models. Modelling the choice behaviour is an important topic in psychology, economics and related fields. Elimination by Aspects (EBA) is a choice model that assumes each alternative has latent features with associated weights that lead to the observed choice outcomes. We formulate a non-parametric version of EBA by using IBP as the prior over the latent binary features. We infer the features of objects that lead to the choice data by using our sampling scheme for inference.

PDF [BibTex]

PDF [BibTex]


no image
Object Classification using Local Image Features

Nowozin, S.

Biologische Kybernetik, Technical University of Berlin, Berlin, Germany, May 2006 (diplomathesis)

Abstract
Object classification in digital images remains one of the most challenging tasks in computer vision. Advances in the last decade have produced methods to repeatably extract and describe characteristic local features in natural images. In order to apply machine learning techniques in computer vision systems, a representation based on these features is needed. A set of local features is the most popular representation and often used in conjunction with Support Vector Machines for classification problems. In this work, we examine current approaches based on set representations and identify their shortcomings. To overcome these shortcomings, we argue for extending the set representation into a graph representation, encoding more relevant information. Attributes associated with the edges of the graph encode the geometric relationships between individual features by making use of the meta data of each feature, such as the position, scale, orientation and shape of the feature region. At the same time all invariances provided by the original feature extraction method are retained. To validate the novel approach, we use a standard subset of the ETH-80 classification benchmark.

PDF [BibTex]

PDF [BibTex]


no image
Kernel PCA for Image Compression

Huhle, B.

Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, Germany, April 2006 (diplomathesis)

PDF [BibTex]

PDF [BibTex]


no image
An Inventory of Sequence Polymorphisms For Arabidopsis

Clark, R., Ossowski, S., Schweikert, G., Rätsch, G., Shinn, P., Zeller, G., Warthmann, N., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Ecker, J., Weigel, D.

17th International Conference on Arabidopsis Research, April 2006 (talk)

Abstract
We have used high-density oligonucleotide arrays to characterize common sequence variation in 20 wild strains of Arabidopsis thaliana that were chosen for maximal genetic diversity. Both strands of each possible SNP of the 119 Mb reference genome were represented on the arrays, which were hybridized with whole genome, isothermally amplified DNA to minimize ascertainment biases. Using two complementary approaches, a model based algorithm, and a newly developed machine learning method, we identified over 550,000 SNPs with a false discovery rate of ~ 0.03 (average of 1 SNP for every 216 bp of the genome). A heuristic algorithm predicted in addition ~700 highly polymorphic or deleted regions per accession. Over 700 predicted polymorphisms with major functional effects (e.g., premature stop codons, or deletions of coding sequence) were validated by dideoxy sequencing. Using this data set, we provide the first systematic description of the types of genes that harbor major effect polymorphisms in natural populations at moderate allele frequencies. The data also provide an unprecedented resource for the study of genetic variation in an experimentally tractable, multicellular model organism.

[BibTex]

[BibTex]


no image
Machine Learning and Applications in Biology

Shin, H.

6th Course in Bioinformatics for Molecular Biologist, March 2006 (talk)

Abstract
The emergence of the fields of computational biology and bioinformatics has alleviated the burden of solving many biological problems, saving the time and cost required for experiments and also providing predictions that guide new experiments. Within computational biology, machine learning algorithms have played a central role in dealing with the flood of biological data. The goal of this tutorial is to raise awareness and comprehension of machine learning so that biologists can properly match the task at hand to the corresponding analytical approach. We start by categorizing biological problem settings and introduce the general machine learning schemes that fit best to each or these categories. We then explore representative models in further detail, from traditional statistical models to recent kernel models, presenting several up-to-date research projects in bioinfomatics to exemplify how biological questions can benefit from a machine learning approach. Finally, we discuss how cooperation between biologists and machine learners might be made smoother.

PDF [BibTex]

PDF [BibTex]


no image
Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning

Kuss, M.

Biologische Kybernetik, Technische Universität Darmstadt, Darmstadt, Germany, March 2006, passed with distinction, published online (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
Semigroups applied to transport and queueing processes

Radl, A.

Biologische Kybernetik, Eberhard Karls Universität, Tübingen, 2006 (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
Local Alignment Kernels for Protein Homology Detection

Saigo, H.

Biologische Kybernetik, Kyoto University, Kyoto, Japan, 2006 (phdthesis)

[BibTex]

[BibTex]


no image
Discrete vs. Continuous: Two Sides of Machine Learning

Zhou, D.

October 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.

PDF [BibTex]


no image
Independent component analysis and beyond

Harmeling, S.

Biologische Kybernetik, Universität Potsdam, Potsdam, October 2004 (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung

Eichhorn, J.

September 2004 (talk)

Abstract
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)

PDF [BibTex]


no image
The benefit of liquid Helium cooling for Cryo-Electron Tomography: A quantitative comparative study

Schweikert, G., Luecken, U., Pfeifer, G., Baumeister, W., Plitzko, J.

The thirteenth European Microscopy Congress, August 2004 (talk)

[BibTex]

[BibTex]


no image
Riemannian Geometry on Graphs and its Application to Ranking and Classification

Zhou, D.

June 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.

PDF [BibTex]


no image
Exploration of combining Echo-State Network Learning with Recurrent Neural Network Learning techniques

Erhan, D.

Biologische Kybernetik, International University Bremen, Bremen, Germany, May 2004 (diplomathesis)

PDF [BibTex]

PDF [BibTex]


no image
Computational Analysis of Gene Expression Data

Zien, A.

(4), Biologische Kybernetik, March 2004 (phdthesis)

[BibTex]

[BibTex]


no image
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking

Zhou, D.

January 2004 (talk)

Abstract
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

PDF [BibTex]


no image
Introduction to Category Theory

Bousquet, O.

Internal Seminar, January 2004 (talk)

Abstract
A brief introduction to the general idea behind category theory with some basic definitions and examples. A perspective on higher dimensional categories is given.

PDF [BibTex]

PDF [BibTex]


no image
The p53 Oligomerization Domain: Sequence-Structure Relationships and the Design and Characterization of Altered Oligomeric States

Davison, TS.

University of Toronto, Canada, University of Toronto, Canada, 2004 (phdthesis)

[BibTex]

[BibTex]


no image
Statistical Learning with Similarity and Dissimilarity Functions

von Luxburg, U.

pages: 1-166, Technische Universität Berlin, Germany, Technische Universität Berlin, Germany, 2004 (phdthesis)

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Classification and Feature Extraction in Man and Machine

Graf, AAB.

Biologische Kybernetik, University of Tübingen, Germany, 2004, online publication (phdthesis)

[BibTex]

[BibTex]


no image
Advanced Statistical Learning Theory

Bousquet, O.

Machine Learning Summer School, 2004 (talk)

PDF [BibTex]

PDF [BibTex]

2002


no image
Nonlinear Multivariate Analysis with Geodesic Kernels

Kuss, M.

Biologische Kybernetik, Technische Universität Berlin, February 2002 (diplomathesis)

GZIP [BibTex]

2002

GZIP [BibTex]


no image
Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms

Bousquet, O.

Biologische Kybernetik, Ecole Polytechnique, 2002 (phdthesis) Accepted

Abstract
New classification algorithms based on the notion of 'margin' (e.g. Support Vector Machines, Boosting) have recently been developed. The goal of this thesis is to better understand how they work, via a study of their theoretical performance. In order to do this, a general framework for real-valued classification is proposed. In this framework, it appears that the natural tools to use are Concentration Inequalities and Empirical Processes Theory. Thanks to an adaptation of these tools, a new measure of the size of a class of functions is introduced, which can be computed from the data. This allows, on the one hand, to better understand the role of eigenvalues of the kernel matrix in Support Vector Machines, and on the other hand, to obtain empirical model selection criteria.

PostScript [BibTex]


no image
Support Vector Machines: Induction Principle, Adaptive Tuning and Prior Knowledge

Chapelle, O.

Biologische Kybernetik, 2002 (phdthesis)

Abstract
This thesis presents a theoretical and practical study of Support Vector Machines (SVM) and related learning algorithms. In a first part, we introduce a new induction principle from which SVMs can be derived, but some new algorithms are also presented in this framework. In a second part, after studying how to estimate the generalization error of an SVM, we suggest to choose the kernel parameters of an SVM by minimizing this estimate. Several applications such as feature selection are presented. Finally the third part deals with the incoporation of prior knowledge in a learning algorithm and more specifically, we studied the case of known invariant transormations and the use of unlabeled data.

GZIP [BibTex]

2001


no image
Variationsverfahren zur Untersuchung von Grundzustandseigenschaften des Ein-Band Hubbard-Modells

Eichhorn, J.

Biologische Kybernetik, Technische Universität Dresden, Dresden/Germany, May 2001 (diplomathesis)

Abstract
Using different modifications of a new variational approach, statical groundstate properties of the one-band Hubbard model such as energy and staggered magnetisation are calculated. By taking into account additional fluctuations, the method ist gradually improved so that a very good description of the energy in one and two dimensions can be achieved. After a detailed discussion of the application in one dimension, extensions for two dimensions are introduced. By use of a modified version of the variational ansatz in particular a description of the quantum phase transition for the magnetisation should be possible.

PostScript [BibTex]

2001

PostScript [BibTex]


no image
Cerebellar Control of Robot Arms

Peters, J.

Biologische Kybernetik, Technische Univeristät München, München, Germany, 2001 (diplomathesis)

[BibTex]

[BibTex]


no image
On Unsupervised Learning of Mixtures of Markov Sources

Seldin, Y.

Biologische Kybernetik, The Hebrew University of Jerusalem, Israel, 2001 (diplomathesis)

PDF [BibTex]

PDF [BibTex]


no image
Support Vector Machines: Theorie und Anwendung auf Prädiktion epileptischer Anfälle auf der Basis von EEG-Daten

Lal, TN.

Biologische Kybernetik, Institut für Angewandte Mathematik, Universität Bonn, 2001, Advised by Prof. Dr. S. Albeverio (diplomathesis)

ZIP [BibTex]

ZIP [BibTex]