Header logo is ei


2011


no image
Kernel Bayes’ Rule

Fukumizu, K., Song, L., Gretton, A.

In Advances in Neural Information Processing Systems 24, pages: 1737-1745, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

2011

PDF [BibTex]


no image
Optimal Reinforcement Learning for Gaussian Systems

Hennig, P.

In Advances in Neural Information Processing Systems 24, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finitedimensional projection gives an impression for how this result may be helpful.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient inference in matrix-variate Gaussian models with iid observation noise

Stegle, O., Lippert, C., Mooij, J., Lawrence, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 24, pages: 630-638, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Expectation Propagation for the Estimation of Conditional Bivariate Copulas

Hernandez-Lobato, J., Lopez-Paz, D., Gharhamani, Z.

In pages: 2, NIPS, Workshop on Copulas in Machine Learning, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence

Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.

In IEEE International Conference on Computer Vision, ICCV 2011, pages: 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Introducing the detection of auditory error responses based on BCI technology for passive interaction

Zander, TO., Klippel, DM., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 252-255, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Phase transition in the family of p-resistances

Alamgir, M., von Luxburg, U.

In Advances in Neural Information Processing Systems 24, pages: 379-387, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study the family of p-resistances on graphs for p ≥ 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p=1, the p-resistance coincides with the shortest path distance, for p=2 it coincides with the standard resistance distance, and for p → ∞ it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p^* and p^** such that if p < p^*, then the p-resistance depends on meaningful global properties of the graph, whereas if p > p^**, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p^* = 1 + 1/(d-1) and p^** = 1 + 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p^* and p^** is an artifact of our proofs. We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p^* + 1/q = 1.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval

Sra, S., Cherian, A.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 318-332, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Restricted boltzmann machines as useful tool for detecting oscillatory eeg components

Balderas, D., Zander, TO., Bachl, F., Neuper, C., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 68-71, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kkreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

Görnitz, N., Widmer, C., Zeller, G., Kahles, A., Sonnenburg, S., Rätsch, G.

In Advances in Neural Information Processing Systems 24, pages: 2690-2698, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
On Fast Approximate Submodular Minimization

Jegelka, S., Lin, H., Bilmes, J.

In Advances in Neural Information Processing Systems 24, pages: 460-468, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We are motivated by an application to extract a representative subset of machine learning training data and by the poor empirical performance we observe of the popular minimum norm algorithm. In fact, for our application, minimum norm can have a running time of about O(n7) (O(n5) oracle calls). We therefore propose a fast approximate method to minimize arbitrary submodular functions. For a large sub-class of submodular functions, the algorithm is exact. Other submodular functions are iteratively approximated by tight submodular upper bounds, and then repeatedly optimized. We show theoretical properties, and empirical results suggest significant speedups over minimum norm while retaining higher accuracies.

PDF Web [BibTex]

PDF Web [BibTex]


no image
PAC-Bayesian Analysis of Contextual Bandits

Seldin, Y., Auer, P., Laviolette, F., Shawe-Taylor, J., Ortner, R.

In Advances in Neural Information Processing Systems 24, pages: 1683-1691, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$. If the algorithm uses all the side information, the regret bound scales as $\sqrt{N \ln K}$, where $K$ is the number of actions (arms). However, if the side information $I_{\rho_t}(S;A)$ is not fully used, the regret bound is significantly tighter. In the extreme case, when $I_{\rho_t}(S;A) = 0$, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto. We also present an algorithm for multiarmed bandits with side information with computational complexity that is a linear in the number of actions.

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Fast projections onto L1,q-norm balls for grouped feature selection

Sra, S.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 305-317, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Kernel Belief Propagation

Song, L., Gretton, A., Bickson, D., Low, Y., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 707-715, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
On Causal Discovery with Cyclic Additive Noise Models

Mooij, J., Janzing, D., Schölkopf, B., Heskes, T.

In Advances in Neural Information Processing Systems 24, pages: 639-647, (Editors: J Shawe-Taylor and RS Zemel and PL Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Additive Gaussian Processes

Duvenaud, D., Nickisch, H., Rasmussen, C.

In Advances in Neural Information Processing Systems 24, pages: 226-234, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.

PDF Web [BibTex]

PDF Web [BibTex]


no image
k-NN Regression Adapts to Local Intrinsic Dimension

Kpotufe, S.

In Advances in Neural Information Processing Systems 24, pages: 729-737, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Newton-type Methods for Total-Variation with Applications

Barbero, A., Sra, S.

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML), 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees

Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 324-332, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Access to Unlabeled Data can Speed up Prediction Time

Urner, R., Shalev-Shwartz, S., Ben-David, S.

In Proceedings of the 28th International Conference on Machine Learning, pages: 641-648, ICML, 2011 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

Gehler, P., Rother, C., Kiefel, M., Zhang, L., Schölkopf, B.

In Advances in Neural Information Processing Systems 24, pages: 765-773, (Editors: Shawe-Taylor, John and Zemel, Richard S. and Bartlett, Peter L. and Pereira, Fernando C. N. and Weinberger, Kilian Q.), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field.

website + code pdf poster Project Page Project Page [BibTex]

website + code pdf poster Project Page Project Page [BibTex]

2001


no image
Pattern Selection Using the Bias and Variance of Ensemble

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 56-67, Korean Data Mining Conference, December 2001 (inproceedings)

[BibTex]

2001

[BibTex]


no image
Separation of post-nonlinear mixtures using ACE and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

In ICA 2001, pages: 433-438, (Editors: Lee, T.-W. , T.P. Jung, S. Makeig, T. J. Sejnowski), Third International Workshop on Independent Component Analysis and Blind Signal Separation, December 2001 (inproceedings)

Abstract
We propose an efficient method based on the concept of maximal correlation that reduces the post-nonlinear blind source separation problem (PNL BSS) to a linear BSS problem. For this we apply the Alternating Conditional Expectation (ACE) algorithm – a powerful technique from nonparametric statistics – to approximately invert the (post-)nonlinear functions. Interestingly, in the framework of the ACE method convergence can be proven and in the PNL BSS scenario the optimal transformation found by ACE will coincide with the desired inverse functions. After the nonlinearities have been removed by ACE, temporal decorrelation (TD) allows us to recover the source signals. An excellent performance underlines the validity of our approach and demonstrates the ACE-TD method on realistic examples.

PDF [BibTex]

PDF [BibTex]


no image
Anabolic and Catabolic Gene Expression Pattern Analysis in Normal Versus Osteoarthritic Cartilage Using Complementary DNA-Array Technology

Aigner, T., Zien, A., Gehrsitz, A., Gebhard, P., McKenna, L.

Arthritis and Rheumatism, 44(12):2777-2789, December 2001 (article)

Web [BibTex]

Web [BibTex]


no image
Nonlinear blind source separation using kernel feature spaces

Harmeling, S., Ziehe, A., Kawanabe, M., Blankertz, B., Müller, K.

In ICA 2001, pages: 102-107, (Editors: Lee, T.-W. , T.P. Jung, S. Makeig, T. J. Sejnowski), Third International Workshop on Independent Component Analysis and Blind Signal Separation, December 2001 (inproceedings)

Abstract
In this work we propose a kernel-based blind source separation (BSS) algorithm that can perform nonlinear BSS for general invertible nonlinearities. For our kTDSEP algorithm we have to go through four steps: (i) adapting to the intrinsic dimension of the data mapped to feature space F, (ii) finding an orthonormal basis of this submanifold, (iii) mapping the data into the subspace of F spanned by this orthonormal basis, and (iv) applying temporal decorrelation BSS (TDSEP) to the mapped data. After demixing we get a number of irrelevant components and the original sources. To find out which ones are the components of interest, we propose a criterion that allows to identify the original sources. The excellent performance of kTDSEP is demonstrated in experiments on nonlinearly mixed speech data.

PDF [BibTex]

PDF [BibTex]


no image
Pattern Selection for ‘Regression’ using the Bias and Variance of Ensemble Network

Shin, H., Cho, S.

In Proc. of the Korean Institute of Industrial Engineers Conference, pages: 10-19, Korean Industrial Engineers Conference, November 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Pattern Selection for ‘Classification’ using the Bias and Variance of Ensemble Neural Network

Shin, H., Cho, S.

In Proc. of the Korea Information Science Conference, pages: 307-309, Korea Information Science Conference, October 2001, Best Paper Award (inproceedings)

[BibTex]

[BibTex]


no image
Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators

Williamson, R., Smola, A., Schölkopf, B.

IEEE Transactions on Information Theory, 47(6):2516-2532, September 2001 (article)

Abstract
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite-dimensional unit ball in feature space into a finite-dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.

DOI [BibTex]

DOI [BibTex]


no image
Hybrid IDM/Impedance learning in human movements

Burdet, E., Teng, K., Chew, C., Peters, J., , B.

In ISHF 2001, 1, pages: 1-9, 1st International Symposium on Measurement, Analysis and Modeling of Human Functions (ISHF2001), September 2001 (inproceedings)

Abstract
In spite of motor output variability and the delay in the sensori-motor, humans routinely perform intrinsically un- stable tasks. The hybrid IDM/impedance learning con- troller presented in this paper enables skilful performance in strong stable and unstable environments. It consid- ers motor output variability identified from experimen- tal data, and contains two modules concurrently learning the endpoint force and impedance adapted to the envi- ronment. The simulations suggest how humans learn to skillfully perform intrinsically unstable tasks. Testable predictions are proposed.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining Off- and On-line Calibration of a Digital Camera

Urbanek, M., Horaud, R., Sturm, P.

In In Proceedings of Third International Conference on 3-D Digital Imaging and Modeling, pages: 99-106, In Proceedings of Third International Conference on 3-D Digital Imaging and Modeling, June 2001 (inproceedings)

Abstract
We introduce a novel outlook on the self­calibration task, by considering images taken by a camera in motion, allowing for zooming and focusing. Apart from the complex relationship between the lens control settings and the intrinsic camera parameters, a prior off­line calibration allows to neglect the setting of focus, and to fix the principal point and aspect ratio throughout distinct views. Thus, the calibration matrix is dependent only on the zoom position. Given a fully calibrated reference view, one has only one parameter to estimate for any other view of the same scene, in order to calibrate it and to be able to perform metric reconstructions. We provide a close­form solution, and validate the reliability of the algorithm with experiments on real images. An important advantage of our method is a reduced ­ to one ­ number of critical camera configurations, associated with it. Moreover, we propose a method for computing the epipolar geometry of two views, taken from different positions and with different (spatial) resolutions; the idea is to take an appropriate third view, that is "easy" to match with the other two.

ZIP [BibTex]

ZIP [BibTex]


no image
Regularized principal manifolds

Smola, A., Mika, S., Schölkopf, B., Williamson, R.

Journal of Machine Learning Research, 1, pages: 179-209, June 2001 (article)

Abstract
Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding. We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.

PDF [BibTex]

PDF [BibTex]


no image
Centralization: A new method for the normalization of gene expression data

Zien, A., Aigner, T., Zimmer, R., Lengauer, T.

Bioinformatics, 17, pages: S323-S331, June 2001, Mathematical supplement available at http://citeseer.ist.psu.edu/574280.html (article)

Abstract
Microarrays measure values that are approximately proportional to the numbers of copies of different mRNA molecules in samples. Due to technical difficulties, the constant of proportionality between the measured intensities and the numbers of mRNA copies per cell is unknown and may vary for different arrays. Usually, the data are normalized (i.e., array-wise multiplied by appropriate factors) in order to compensate for this effect and to enable informative comparisons between different experiments. Centralization is a new two-step method for the computation of such normalization factors that is both biologically better motivated and more robust than standard approaches. First, for each pair of arrays the quotient of the constants of proportionality is estimated. Second, from the resulting matrix of pairwise quotients an optimally consistent scaling of the samples is computed.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Failure Diagnosis of Discrete Event Systems

Son, HI., Kim, KW., Lee, S.

Journal of Control, Automation and Systems Engineering, 7(5):375-383, May 2001, In Korean (article)

[BibTex]

[BibTex]


no image
Support vector novelty detection applied to jet engine vibration spectra

Hayton, P., Schölkopf, B., Tarassenko, L., Anuzis, P.

In Advances in Neural Information Processing Systems 13, pages: 946-952, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
A system has been developed to extract diagnostic information from jet engine carcass vibration data. Support Vector Machines applied to novelty detection provide a measure of how unusual the shape of a vibration signature is, by learning a representation of normality. We describe a novel method for Support Vector Machines of including information from a second class for novelty detection and give results from the application to Jet Engine vibration analysis.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm

Still, S., Schölkopf, B., Hepp, K., Douglas, R.

In Advances in Neural Information Processing Systems 13, pages: 741-747, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
To control the walking gaits of a four-legged robot we present a novel neuromorphic VLSI chip that coordinates the relative phasing of the robot's legs similar to how spinal Central Pattern Generators are believed to control vertebrate locomotion [3]. The chip controls the leg movements by driving motors with time varying voltages which are the outputs of a small network of coupled oscillators. The characteristics of the chip's output voltages depend on a set of input parameters. The relationship between input parameters and output voltages can be computed analytically for an idealized system. In practice, however, this ideal relationship is only approximately true due to transistor mismatch and offsets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Algorithmic Stability and Generalization Performance

Bousquet, O., Elisseeff, A.

In Advances in Neural Information Processing Systems 13, pages: 196-202, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
We present a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties. A {\em stable} learner being one for which the learned solution does not change much for small changes in the training set. The bounds we obtain do not depend on any measure of the complexity of the hypothesis space (e.g. VC dimension) but rather depend on how the learning algorithm searches this space, and can thus be applied even when the VC dimension in infinite. We demonstrate that regularization networks possess the required stability property and apply our method to obtain new bounds on their generalization performance.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Kernel Trick for Distances

Schölkopf, B.

In Advances in Neural Information Processing Systems 13, pages: 301-307, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that the common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Vicinal Risk Minimization

Chapelle, O., Weston, J., Bottou, L., Vapnik, V.

In Advances in Neural Information Processing Systems 13, pages: 416-422, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS) , April 2001 (inproceedings)

Abstract
The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principle such as Support Vector Machines or Statistical Regularization. We explain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms for solving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Feature Selection for SVMs

Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.

In Advances in Neural Information Processing Systems 13, pages: 668-674, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems of face recognition, pedestrian detection and analyzing DNA microarray data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Occam’s Razor

Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 13, pages: 294-300, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Pattern Selection Using the Bias and Variance of Ensemble

Shin, H., Cho, S.

Journal of the Korean Institute of Industrial Engineers, 28(1):112-127, March 2001 (article)

Abstract
[Abstract]: A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern “utility index” that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.

[BibTex]

[BibTex]


no image
Structure and Functionality of a Designed p53 Dimer.

Davison, TS., Nie, X., Ma, W., Lin, Y., Kay, C., Benchimol, S., Arrowsmith, C.

Journal of Molecular Biology, 307(2):605-617, March 2001 (article)

Abstract
P53 is a homotetrameric tumor suppressor protein involved in transcriptional control of genes that regulate cell proliferation and death. In order to probe the role that oligomerization plays in this capacity, we have previously designed and characterized a series of p53 proteins with altered oligomeric states through hydrophilc substitution of residues Met340 or Leu344 in the normally tetrameric oligomerization domain. Although such mutations have little effect on the overall secondary structural content of the oligomerization domain, both solubility and the resistance to thermal denaturation are substantially reduced relative to that of the wild-type domain. Here, we report the design and characterization of a double-mutant p53 with alterations of residues at positions Met340 and Leu344. The double-mutations Met340Glu/Leu344Lys and Met340Gln/Leu344Arg resulted in distinct dimeric forms of the protein. Furthermore, we have verified by NMR structure determination that the double-mutant Met340Gln/Leu344Arg is essentially a "half-tetramer". Analysis of the in vivo activities of full-length p53 oligomeric mutants reveals that while cell-cycle arrest requires tetrameric p53, transcriptional transactivation activity of monomers and dimers retain roughly background and half of the wild-type activity, respectively.

Web [BibTex]

Web [BibTex]


no image
An Introduction to Kernel-Based Learning Algorithms

Müller, K., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.

IEEE Transactions on Neural Networks, 12(2):181-201, March 2001 (article)

Abstract
This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods. We first give a short background about Vapnik-Chervonenkis theory and kernel feature spaces and then proceed to kernel based learning in supervised and unsupervised scenarios including practical and algorithmic considerations. We illustrate the usefulness of kernel algorithms by discussing applications such as optical character recognition and DNA analysis

DOI [BibTex]

DOI [BibTex]


no image
Estimating the support of a high-dimensional distribution.

Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.

Neural Computation, 13(7):1443-1471, March 2001 (article)

Abstract
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

Web DOI [BibTex]

Web DOI [BibTex]


no image
An Improved Training Algorithm for Kernel Fisher Discriminants

Mika, S., Schölkopf, B., Smola, A.

In Proceedings AISTATS, pages: 98-104, (Editors: T Jaakkola and T Richardson), Morgan Kaufman, San Francisco, CA, Artificial Intelligence and Statistics (AISTATS), January 2001 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
The psychometric function: II. Bootstrap-based confidence intervals and sampling

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1314-1329, 2001 (article)

PDF [BibTex]

PDF [BibTex]


no image
Nonstationary Signal Classification using Support Vector Machines

Gretton, A., Davy, M., Doucet, A., Rayner, P.

In 11th IEEE Workshop on Statistical Signal Processing, pages: 305-305, 11th IEEE Workshop on Statistical Signal Processing, 2001 (inproceedings)

Abstract
In this paper, we demonstrate the use of support vector (SV) techniques for the binary classification of nonstationary sinusoidal signals with quadratic phase. We briefly describe the theory underpinning SV classification, and introduce the Cohen's group time-frequency representation, which is used to process the non-stationary signals so as to define the classifier input space. We show that the SV classifier outperforms alternative classification methods on this processed data.

PostScript [BibTex]

PostScript [BibTex]


no image
Enhanced User Authentication through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm

Wong, FWMH., Supian, ASM., Ismail, AF., Lai, WK., Ong, CS.

In 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Predicting the Nonlinear Dynamics of Biological Neurons using Support Vector Machines with Different Kernels

Frontzek, T., Lal, TN., Eckmiller, R.

In Proceedings of the International Joint Conference on Neural Networks (IJCNN'2001) Washington DC, 2, pages: 1492-1497, Proceedings of the International Joint Conference on Neural Networks (IJCNN'2001) Washington DC, 2001 (inproceedings)

Abstract
Based on biological data we examine the ability of Support Vector Machines (SVMs) with gaussian, polynomial and tanh-kernels to learn and predict the nonlinear dynamics of single biological neurons. We show that SVMs for regression learn the dynamics of the pyloric dilator neuron of the australian crayfish, and we determine the optimal SVM parameters with regard to the test error. Compared to conventional RBF networks and MLPs, SVMs with gaussian kernels learned faster and performed a better iterated one-step-ahead prediction with regard to training and test error. From a biological point of view SVMs are especially better in predicting the most important part of the dynamics, where the membranpotential is driven by superimposed synaptic inputs to the threshold for the oscillatory peak.

PDF [BibTex]

PDF [BibTex]


no image
Computationally Efficient Face Detection

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

In Computer Vision, ICCV 2001, vol. 2, (73):695-700, IEEE, 8th International Conference on Computer Vision, 2001 (inproceedings)

DOI [BibTex]

DOI [BibTex]