ICML 2008, Proceedings of the 25th International Conference on Machine Learning (ICML 2008):704-711, Biologische Kybernetik, Max-Planck-Gesellschaft, July, 2008
A recent trend in exemplar based unsupervised
learning is to formulate the learning
problem as a convex optimization problem.
Convexity is achieved by restricting the set
of possible prototypes to training exemplars.
In particular, this has been done for clustering,
vector quantization and mixture model
density estimation. In this paper we propose
a novel algorithm that is theoretically and
practically superior to these convex formulations.
This is possible by posing the unsupervised
learning problem as a single convex
master problem" with non-convex subproblems.
We show that for the above learning
tasks the subproblems are extremely wellbehaved
and can be solved efficiently.
Kernel Methods in Bioengineering, Signal and Image Processing:284-302, Biologische Kybernetik, Max-Planck-Gesellschaft, January, 2007
In this chapter we are concerned with the problem of reconstructing patterns from their representation in feature space, known as the pre-image problem. We review existing algorithms and propose a learning based approach. All algorithms are discussed regarding their usability and complexity and evaluated on an image denoising application.
ICCV 2007, Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV 2007):1919-1923, Biologische Kybernetik, Max-Planck-Gesellschaft, October, 2007
Recent approaches to action classification in videos have
used sparse spatio-temporal words encoding local appearance
around interesting movements. Most of these approaches
use a histogram representation, discarding the
temporal order among features. But this ordering information
can contain important information about the action
itself, e.g. consider the sport disciplines of hurdle race
and long jump, where the global temporal order of motions
(running, jumping) is important to discriminate between
the two. In this work we propose to use a sequential
representation which retains this temporal order. Further,
we introduce Discriminative Subsequence Mining to find
optimal discriminative subsequence patterns. In combination
with the LPBoost classifier, this amounts to simultaneously
learning a classification function and performing feature
selection in the space of all possible feature sequences.
The resulting classifier linearly combines a small number
of interpretable decision functions, each checking for the
presence of a single discriminative pattern. The classifier is
benchmarked on the KTH action classification data set and
outperforms the best known results in the literature.
CVPR 2007, Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007):1-8, Biologische Kybernetik, Max-Planck-Gesellschaft, June, 2007
In web-related applications of image categorization, it is
desirable to derive an interpretable classification rule with
high accuracy. Using the bag-of-words representation and
the linear support vector machine, one can partly fulfill the
goal, but the accuracy of linear classifiers is not high and
the obtained features are not informative for users. We propose
to combine item set mining and large margin classifiers
to select features from the power set of all visual words.
Our resulting classification rule is easier to browse and simpler
to understand, because each feature has richer information.
As a next step, each image is represented as a graph
where nodes correspond to local image features and edges
encode geometric relations between features. Combining
graph mining and boosting, we can obtain a classification
rule based on subgraph features that contain more information
than the set features. We evaluate our algorithm
in a web-retrieval ranking task where the goal is to reject
outliers from a set of images returned for a keyword
query. Furthermore, it is evaluated on the supervised classification
tasks with the challenging VOC2005 data set.
Our approach yields excellent accuracy in the unsupervised
ranking task compared to a recently proposed probabilistic
model and competitive results in the supervised classification task.
Journal of Machine Learning Research, 7:603-624, Biologische Kybernetik, Max-Planck-Gesellschaft, April, 2006
Many Kernel Learning Algorithms(KLA), including Support Vector Machine (SVM), result in a Kernel Machine (KM), such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build Sparse Kernel Learning Algorithms (SKLA) by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting KM is explicitly controlled while at the same time the performance of the resulting KM can be kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying this method to the SVM results in a concrete algorithm for building Sparse Large Margin Classifiers (SLMC). Further analysis of the SLMC algorithm indicates that it essentially finds a discriminating subspace that can be spanned by
a small number of vectors, and in this subspace, the different classes of data are linearly well separated. Experimental results over several classification benchmarks demonstrate the effectiveness of our approach.
Advances in Neural Information Processing Systems 17, Advances in Neural Information Processing Systems:673-680, Biologische Kybernetik, Max-Planck-Gesellschaft, July, 2005
This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning an entire image, this decreases the computational complexity of a scan by a significant amount. We present experimental results on a standard face detection database.
Proceedings of the 22nd International Conference on Machine Learning, Proceedings of the 22nd International Conference on Machine Learning (ICML 2005):996-1003, Biologische Kybernetik, Max-Planck-Gesellschaft, August, 2005
This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated problem. When considering the dual of this problem, it can be seen that building an SLMC is equivalent to constructing an SVM with a modified kernel function. Further analysis of this kernel function indicates that the proposed approach essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace different classes of data are linearly well separated. Experimental results over several classification benchmarks show that in most cases the proposed approach outperforms the state-of-art sparse learning algorithms.
Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Biologische Kybernetik, Max-Planck-Gesellschaft, 2005
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In
contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control.
Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different
feature type. Furthermore we argue that the question: Which feature type is more discriminative for task X? is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.
Technische Universität Berlin, Berlin, Biologische Kybernetik, Max-Planck-Gesellschaft, November, 2005
Kernel Dependency Estimation(KDE) is a novel technique which was designed to learn mappings between sets without making assumptions on the type of the involved input and output data. It learns the mapping in two stages. In a first step, it tries to estimate coordinates of a feature space representation of elements of the set by solving a high dimensional multivariate regression problem in feature space. Following this, it tries to reconstruct the original representation given the estimated coordinates.
This thesis introduces various algorithmic extensions to both stages in KDE. One of the contributions of this thesis is to propose a novel linear regression algorithm that explores low-dimensional subspaces during learning. Furthermore various existing strategies for reconstructing patterns from feature maps involved in KDE are discussed and novel pre-image techniques are introduced. In particular, pre-image techniques for data-types that are of discrete nature such as graphs and strings are investigated.
KDE is then explored in the context of robot pose imitation where the input is a an image with a human operator and the output is the robot articulated variables. Thus, using KDE, robot pose imitation is formulated as a regression problem.
DAGM 2004, Pattern Recognition, Proceedings of the 26th DAGM Symposium:54-61, Biologische Kybernetik, Max-Planck-Gesellschaft, 2004
We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire
image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.
26th DAGM Symposium, Pattern Recognition: 26th DAGM Symposium, Deutsche Arbeitsgemeinschaft für Mustererkennung e.V.:245-252, LNCS 3175, Biologische Kybernetik, Max-Planck-Gesellschaft, September, 2004
We compare two approaches to the problem of estimating the depth
of a point in space from observing its image position in two
different cameras: 1.~The classical photogrammetric approach
explicitly models the two cameras and estimates their intrinsic
and extrinsic parameters using a tedious calibration procedure;
2.~A generic machine learning approach where the mapping from
image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic
learning approach, in addition to simplifying the procedure of
calibration, can lead to higher depth accuracies than classical
calibration although no specific domain knowledge is used.
Lecture Notes in Computer Science, Vol. 3175, Pattern Recognition, Proceedings of the 26th DAGM Symposium:262-269, Biologische Kybernetik, Max-Planck-Gesellschaft, 2004
We introduce a learning technique for regression
between high-dimensional spaces. Standard methods typically reduce
this task to many one-dimensional problems, with each output
dimension considered independently. By contrast, in our approach
the feature construction and the regression estimation are
performed jointly, directly minimizing a loss function that we
specify, subject to a rank constraint. A major advantage of this
approach is that the loss is no longer chosen according to the
algorithmic requirements, but can be tailored to the
characteristics of the task at hand; the features will then be
optimal with respect to this objective, and dependence between the
outputs can be exploited.
Pattern Recognition, Pattern Recognition: Proceedings of the 26th DAGM Symposium:253-261, Biologische Kybernetik, Max-Planck-Gesellschaft, August, 2004
The recent development of graph kernel functions
has made it possible to apply well-established
machine learning methods to graphs.
However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems