Header logo is ei


2012


no image
Evaluation of marginal likelihoods via the density of states

Habeck, M.

In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012) , 22, pages: 486-494, (Editors: N Lawrence and M Girolami), JMLR: W&CP 22, AISTATS, 2012 (inproceedings)

Abstract
Bayesian model comparison involves the evaluation of the marginal likelihood, the expectation of the likelihood under the prior distribution. Typically, this high-dimensional integral over all model parameters is approximated using Markov chain Monte Carlo methods. Thermodynamic integration is a popular method to estimate the marginal likelihood by using samples from annealed posteriors. Here we show that there exists a robust and flexible alternative. The new method estimates the density of states, which counts the number of states associated with a particular value of the likelihood. If the density of states is known, computation of the marginal likelihood reduces to a one- dimensional integral. We outline a maximum likelihood procedure to estimate the density of states from annealed posterior samples. We apply our method to various likelihoods and show that it is superior to thermodynamic integration in that it is more flexible with regard to the annealing schedule and the family of bridging distributions. Finally, we discuss the relation of our method with Skilling's nested sampling.

PDF [BibTex]

2012

PDF [BibTex]


no image
Distributed multisensory signals acquisition and analysis in dyadic interactions

Tawari, A., Tran, C., Doshi, A., Zander, TO.

In Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems Extended Abstracts, pages: 2261-2266, (Editors: JA Konstan and EH Chi and K Höök), ACM, New York, NY, USA, CHI, 2012 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Measuring Cognitive Load by means of EEG-data - how detailed is the picture we can get?

Scharinger, C., Cierniak, G., Walter, C., Zander, TO., Gerjets, P.

In Meeting of the EARLI SIG 22 Neuroscience and Education, 2012 (inproceedings)

[BibTex]

[BibTex]


no image
Optimal kernel choice for large-scale two-sample tests

Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K.

In Advances in Neural Information Processing Systems 25, pages: 1214-1222, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Measurement and calibration of noise bias in weak lensing galaxy shape estimation

Kacprzak, T., Zuntz, J., Rowe, B., Bridle, S., Refregier, A., Amara, A., Voigt, L., Hirsch, M.

Monthly Notices of the Royal Astronomical Society, 427(4):2711-2722, Oxford University Press, 2012 (article)

DOI [BibTex]

DOI [BibTex]


no image
Image analysis for cosmology: results from the GREAT10 Galaxy Challenge

Kitching, T. D., Balan, S. T., Bridle, S., Cantale, N., Courbin, F., Eifler, T., Gentile, M., Gill, M. S. S., Harmeling, S., Heymans, C., others,

Monthly Notices of the Royal Astronomical Society, 423(4):3163-3208, Oxford University Press, 2012 (article)

DOI [BibTex]

DOI [BibTex]


no image
On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples

Ben-David, S., Urner, R.

In Algorithmic Learning Theory - 23rd International Conference, 7568, pages: 139-153, Lecture Notes in Computer Science, (Editors: Bshouty, NH. and Stoltz, G and Vayatis, N and Zeugmann, T), Springer Berlin Heidelberg, ALT, 2012 (inproceedings)

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Domain Adaptation–Can Quantity compensate for Quality?

Ben-David, S., Shalev-Shwartz, S., Urner, R.

In International Symposium on Artificial Intelligence and Mathematics, ISAIM, 2012 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Learning from Weak Teachers

Urner, R., Ben-David, S., Shamir, O.

In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 22, pages: 1252-1260, (Editors: Lawrence, N. and Girolami, M.), JMLR, AISTATS, 2012 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
First SN Discoveries from the Dark Energy Survey

Abbott, T., Abdalla, F., Achitouv, I., Ahn, E., Aldering, G., Allam, S., Alonso, D., Amara, A., Annis, J., Antonik, M., others,

The Astronomer's Telegram, 4668, pages: 1, 2012 (article)

[BibTex]

[BibTex]


no image
A sensorimotor paradigm for Bayesian model selection

Genewein, T, Braun, DA

Frontiers in Human Neuroscience, 6(291):1-16, October 2012 (article)

Abstract
Sensorimotor control is thought to rely on predictive internal models in order to cope efficiently with uncertain environments. Recently, it has been shown that humans not only learn different internal models for different tasks, but that they also extract common structure between tasks. This raises the question of how the motor system selects between different structures or models, when each model can be associated with a range of different task-specific parameters. Here we design a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup, where one of the dimensions can be mapped to a model variable and the other dimension to the parameter variable. By introducing probe trials that are neutral in the parameter dimension, we can directly test for model selection. We found that model selection procedures based on Bayesian statistics provided a better explanation for subjects’ choice behavior than simple non-probabilistic heuristics. Our experimental design lends itself to the general study of model selection in a sensorimotor context as it allows to separately query model and parameter variables from subjects.

DOI [BibTex]

DOI [BibTex]


no image
Adaptive Coding of Actions and Observations

Ortega, PA, Braun, DA

pages: 1-4, NIPS Workshop on Information in Perception and Action, December 2012 (conference)

Abstract
The application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties, agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule—the solution to the adaptive coding problem when there are not only observations but also actions. This paper reviews the central ideas behind the Bayesian control rule.

link (url) [BibTex]

link (url) [BibTex]


no image
Risk-Sensitivity in Bayesian Sensorimotor Integration

Grau-Moya, J, Ortega, PA, Braun, DA

PLoS Computational Biology, 8(9):1-7, sep 2012 (article)

Abstract
Information processing in the nervous system during sensorimotor tasks with inherent uncertainty has been shown to be consistent with Bayesian integration. Bayes optimal decision-makers are, however, risk-neutral in the sense that they weigh all possibilities based on prior expectation and sensory evidence when they choose the action with highest expected value. In contrast, risk-sensitive decision-makers are sensitive to model uncertainty and bias their decision-making processes when they do inference over unobserved variables. In particular, they allow deviations from their probabilistic model in cases where this model makes imprecise predictions. Here we test for risk-sensitivity in a sensorimotor integration task where subjects exhibit Bayesian information integration when they infer the position of a target from noisy sensory feedback. When introducing a cost associated with subjects' response, we found that subjects exhibited a characteristic bias towards low cost responses when their uncertainty was high. This result is in accordance with risk-sensitive decision-making processes that allow for deviations from Bayes optimal decision-making in the face of uncertainty. Our results suggest that both Bayesian integration and risk-sensitivity are important factors to understand sensorimotor integration in a quantitative fashion.

DOI [BibTex]

DOI [BibTex]


no image
Free Energy and the Generalized Optimality Equations for Sequential Decision Making

Ortega, PA, Braun, DA

pages: 1-10, 10th European Workshop on Reinforcement Learning (EWRL), July 2012 (conference)

Abstract
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.

link (url) [BibTex]

link (url) [BibTex]

2002


no image
Optimized Support Vector Machines for Nonstationary Signal Classification

Davy, M., Gretton, A., Doucet, A., Rayner, P.

IEEE Signal Processing Letters, 9(12):442-445, December 2002 (article)

Abstract
This letter describes an efficient method to perform nonstationary signal classification. A support vector machine (SVM) algorithm is introduced and its parameters optimised in a principled way. Simulations demonstrate that our low complexity method outperforms state-of-the-art nonstationary signal classification techniques.

PostScript Web DOI [BibTex]

2002

PostScript Web DOI [BibTex]


no image
Gender Classification of Human Faces

Graf, A., Wichmann, F.

In Biologically Motivated Computer Vision, pages: 1-18, (Editors: Bülthoff, H. H., S.W. Lee, T. A. Poggio and C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
This paper addresses the issue of combining pre-processing methods—dimensionality reduction using Principal Component Analysis (PCA) and Locally Linear Embedding (LLE)—with Support Vector Machine (SVM) classification for a behaviorally important task in humans: gender classification. A processed version of the MPI head database is used as stimulus set. First, summary statistics of the head database are studied. Subsequently the optimal parameters for LLE and the SVM are sought heuristically. These values are then used to compare the original face database with its processed counterpart and to assess the behavior of a SVM with respect to changes in illumination and perspective of the face images. Overall, PCA was superior in classification performance and allowed linear separability.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Insect-Inspired Estimation of Self-Motion

Franz, MO., Chahl, JS.

In Biologically Motivated Computer Vision, (2525):171-180, LNCS, (Editors: Bülthoff, H.H. , S.W. Lee, T.A. Poggio, C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge about the environment. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A New Discriminative Kernel from Probabilistic Models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

Neural Computation, 14(10):2397-2414, October 2002 (article)

PDF [BibTex]

PDF [BibTex]


no image
Combining sensory Information to Improve Visualization

Ernst, M., Banks, M., Wichmann, F., Maloney, L., Bülthoff, H.

In Proceedings of the Conference on Visualization ‘02 (VIS ‘02), pages: 571-574, (Editors: Moorhead, R. , M. Joy), IEEE, Piscataway, NJ, USA, IEEE Conference on Visualization (VIS '02), October 2002 (inproceedings)

Abstract
Seemingly effortlessly the human brain reconstructs the three-dimensional environment surrounding us from the light pattern striking the eyes. This seems to be true across almost all viewing and lighting conditions. One important factor for this apparent easiness is the redundancy of information provided by the sensory organs. For example, perspective distortions, shading, motion parallax, or the disparity between the two eyes' images are all, at least partly, redundant signals which provide us with information about the three-dimensional layout of the visual scene. Our brain uses all these different sensory signals and combines the available information into a coherent percept. In displays visualizing data, however, the information is often highly reduced and abstracted, which may lead to an altered perception and therefore a misinterpretation of the visualized data. In this panel we will discuss mechanisms involved in the combination of sensory information and their implications for simulations using computer displays, as well as problems resulting from current display technology such as cathode-ray tubes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sampling Techniques for Kernel Methods

Achlioptas, D., McSherry, F., Schölkopf, B.

In Advances in neural information processing systems 14 , pages: 335-342, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Infinite Hidden Markov Model

Beal, MJ., Ghahramani, Z., Rasmussen, CE.

In Advances in Neural Information Processing Systems 14, pages: 577-584, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite - consider, for example, symbols being possible words appearing in English text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A new discriminative kernel from probabilistic models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 977-984, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called \Fisher kernel" has been combined with discriminative classi ers such as SVM and applied successfully in e.g. DNA and protein analysis. Whereas the Fisher kernel (FK) is calculated from the marginal log-likelihood, we propose the TOP kernel derived from Tangent vectors Of Posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments our new discriminative TOP kernel compares favorably to the Fisher kernel.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Incorporating Invariances in Non-Linear Support Vector Machines

Chapelle, O., Schölkopf, B.

In Advances in Neural Information Processing Systems 14, pages: 609-616, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Functional Genomics of Osteoarthritis

Aigner, T., Bartnik, E., Zien, A., Zimmer, R.

Pharmacogenomics, 3(5):635-650, September 2002 (article)

Web [BibTex]

Web [BibTex]


no image
Kernel feature spaces and nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 761-768, (Editors: Dietterich, T. G., S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Constructing Boosting algorithms from SVMs: an application to one-class classification.

Rätsch, G., Mika, S., Schölkopf, B., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1184-1199, September 2002 (article)

Abstract
We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm—one-class leveraging—starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

DOI [BibTex]

DOI [BibTex]


no image
Algorithms for Learning Function Distinguishable Regular Languages

Fernau, H., Radl, A.

In Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pages: 64-73, (Editors: Caelli, T. , A. Amin, R. P.W. Duin, M. Kamel, D. de Ridder), Springer, Berlin, Germany, Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, August 2002 (inproceedings)

Abstract
Function distinguishable languages were introduced as a new methodology of defining characterizable subclasses of the regular languages which are learnable from text. Here, we give details on the implementation and the analysis of the corresponding learning algorithms. We also discuss problems which might occur in practical applications.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Co-Clustering of Biological Networks and Gene Expression Data

Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.

Bioinformatics, (Suppl 1):145S-154S, 18, July 2002 (article)

Abstract
Motivation: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. Results: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.

Web [BibTex]

Web [BibTex]


no image
Confidence measures for protein fold recognition

Sommer, I., Zien, A., von Ohsen, N., Zimmer, R., Lengauer, T.

Bioinformatics, 18(6):802-812, June 2002 (article)

[BibTex]

[BibTex]


no image
Decision Boundary Pattern Selection for Support Vector Machines

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 33-41, Korean Data Mining Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
The contributions of color to recognition memory for natural scenes

Wichmann, F., Sharpe, L., Gegenfurtner, K.

Journal of Experimental Psychology: Learning, Memory and Cognition, 28(3):509-520, May 2002 (article)

Abstract
The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
k-NN based Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Proc. of the Korean Industrial Engineers Conference, pages: 645-651, Korean Industrial Engineers Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
Microarrays: How Many Do You Need?

Zien, A., Fluck, J., Zimmer, R., Lengauer, T.

In RECOMB 2002, pages: 321-330, ACM Press, New York, NY, USA, Sixth Annual International Conference on Research in Computational Molecular Biology, April 2002 (inproceedings)

Abstract
We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. Current knowlegde seems to suffice for the construction of models that are realistic with respect to searches for individual differentially expressed genes. Such models allow to investigate the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class; the fold changes in expression; the detection sensitivity of the microarrays; and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at http://cartan.gmd.de/~zien/classsize/ for fine tuning simulations to their particular actualities. Since the situation can be assumed to be very similar for large scale proteomics and metabolomics studies, our methods and results might also apply there.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Ideal 2002, pages: 97-103, (Editors: Yin, H. , N. Allinson, R. Freeman, J. Keane, S. Hubbard), Springer, Berlin, Germany, Third International Conference on Intelligent Data Engineering and Automated Learning, January 2002 (inproceedings)

Abstract
SVMs tend to take a very long time to train with a large data set. If "redundant" patterns are identified and deleted in pre-processing, the training time could be reduced significantly. We propose a k-nearest neighbors(k-NN) based pattern selection method. The method tries to select the patterns that are near the decision boundary and that are correctly labeled. The simulations over synthetic data sets showed promising results: (1) By converting a non-separable problem to a separable one, the search for an optimal error tolerance parameter became unnecessary. (2) SVM training time decreased by two orders of magnitude without any loss of accuracy. (3) The redundant SVs were substantially reduced.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training invariant support vector machines

DeCoste, D., Schölkopf, B.

Machine Learning, 46(1-3):161-190, January 2002 (article)

Abstract
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Model Selection for Small Sample Regression

Chapelle, O., Vapnik, V., Bengio, Y.

Machine Learning, 48(1-3):9-23, 2002 (article)

Abstract
Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right trade-off between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix.

PostScript [BibTex]

PostScript [BibTex]


no image
The leave-one-out kernel

Tsuda, K., Kawanabe, M.

In Artificial Neural Networks -- ICANN 2002, 2415, pages: 727-732, LNCS, (Editors: Dorronsoro, J. R.), Artificial Neural Networks -- ICANN, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Contrast discrimination with sinusoidal gratings of different spatial frequency

Bird, C., Henning, G., Wichmann, F.

Journal of the Optical Society of America A, 19(7), pages: 1267-1273, 2002 (article)

Abstract
The detectability of contrast increments was measured as a function of the contrast of a masking or “pedestal” grating at a number of different spatial frequencies ranging from 2 to 16 cycles per degree of visual angle. The pedestal grating always had the same orientation, spatial frequency and phase as the signal. The shape of the contrast increment threshold versus pedestal contrast (TvC) functions depend of the performance level used to define the “threshold,” but when both axes are normalized by the contrast corresponding to 75% correct detection at each frequency, the (TvC) functions at a given performance level are identical. Confidence intervals on the slope of the rising part of the TvC functions are so wide that it is not possible with our data to reject Weber’s Law.

PDF [BibTex]

PDF [BibTex]


no image
A Bennett Concentration Inequality and Its Application to Suprema of Empirical Processes

Bousquet, O.

C. R. Acad. Sci. Paris, Ser. I, 334, pages: 495-500, 2002 (article)

Abstract
We introduce new concentration inequalities for functions on product spaces. They allow to obtain a Bennett type deviation bound for suprema of empirical processes indexed by upper bounded functions. The result is an improvement on Rio's version \cite{Rio01b} of Talagrand's inequality \cite{Talagrand96} for equidistributed variables.

PDF PostScript [BibTex]


no image
Numerical evolution of axisymmetric, isolated systems in general relativity

Frauendiener, J., Hein, M.

Physical Review D, 66, pages: 124004-124004, 2002 (article)

Abstract
We describe in this article a new code for evolving axisymmetric isolated systems in general relativity. Such systems are described by asymptotically flat space-times, which have the property that they admit a conformal extension. We are working directly in the extended conformal manifold and solve numerically Friedrich's conformal field equations, which state that Einstein's equations hold in the physical space-time. Because of the compactness of the conformal space-time the entire space-time can be calculated on a finite numerical grid. We describe in detail the numerical scheme, especially the treatment of the axisymmetry and the boundary.

GZIP [BibTex]

GZIP [BibTex]


no image
Marginalized kernels for biological sequences

Tsuda, K., Kin, T., Asai, K.

Bioinformatics, 18(Suppl 1):268-275, 2002 (article)

PDF [BibTex]

PDF [BibTex]


no image
Localized Rademacher Complexities

Bartlett, P., Bousquet, O., Mendelson, S.

In Proceedings of the 15th annual conference on Computational Learning Theory, pages: 44-58, Proceedings of the 15th annual conference on Computational Learning Theory, 2002 (inproceedings)

Abstract
We investigate the behaviour of global and local Rademacher averages. We present new error bounds which are based on the local averages and indicate how data-dependent local averages can be estimated without {it a priori} knowledge of the class at hand.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Film Cooling: A Comparative Study of Different Heaterfoil Configurations for Liquid Crystals Experiments

Vogel, G., Graf, ABA., Weigand, B.

In ASME TURBO EXPO 2002, Amsterdam, GT-2002-30552, ASME TURBO EXPO, Amsterdam, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Stability and Generalization

Bousquet, O., Elisseeff, A.

Journal of Machine Learning Research, 2, pages: 499-526, 2002 (article)

Abstract
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a real-valued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and Kullback-Leibler regularization. We demonstrate how to apply the results to SVM for regression and classification.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Subspace information criterion for non-quadratic regularizers – model selection for sparse regressors

Tsuda, K., Sugiyama, M., Müller, K.

IEEE Trans Neural Networks, 13(1):70-80, 2002 (article)

PDF [BibTex]

PDF [BibTex]


no image
Modeling splicing sites with pairwise correlations

Arita, M., Tsuda, K., Asai, K.

Bioinformatics, 18(Suppl 2):27-34, 2002 (article)

PDF [BibTex]

PDF [BibTex]


no image
Perfusion Quantification using Gaussian Process Deconvolution

Andersen, IK., Szymkowiak, A., Rasmussen, CE., Hanson, LG., Marstrand, JR., Larsson, HBW., Hansen, LK.

Magnetic Resonance in Medicine, (48):351-361, 2002 (article)

Abstract
The quantification of perfusion using dynamic susceptibility contrast MR imaging requires deconvolution to obtain the residual impulse-response function (IRF). Here, a method using a Gaussian process for deconvolution, GPD, is proposed. The fact that the IRF is smooth is incorporated as a constraint in the method. The GPD method, which automatically estimates the noise level in each voxel, has the advantage that model parameters are optimized automatically. The GPD is compared to singular value decomposition (SVD) using a common threshold for the singular values and to SVD using a threshold optimized according to the noise level in each voxel. The comparison is carried out using artificial data as well as using data from healthy volunteers. It is shown that GPD is comparable to SVD variable optimized threshold when determining the maximum of the IRF, which is directly related to the perfusion. GPD provides a better estimate of the entire IRF. As the signal to noise ratio increases or the time resolution of the measurements increases, GPD is shown to be superior to SVD. This is also found for large distribution volumes.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Tracking a Small Set of Experts by Mixing Past Posteriors

Bousquet, O., Warmuth, M.

Journal of Machine Learning Research, 3, pages: 363-396, (Editors: Long, P.), 2002 (article)

Abstract
In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k+1 sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size m. Since k >> m, the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying log n for choosing the best expert in each section we first pay log (n choose m) bits in the bounds for identifying the pool of m experts and then log m bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.

PDF PostScript [BibTex]

PDF PostScript [BibTex]