Header logo is ei


2006


no image
Some observations on the pedestal effect or dipper function

Henning, B., Wichmann, F.

Journal of Vision, 6(13):50, 2006 Fall Vision Meeting of the Optical Society of America, December 2006 (poster)

Abstract
The pedestal effect is the large improvement in the detectabilty of a sinusoidal “signal” grating observed when the signal is added to a masking or “pedestal” grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noise - noise from which a 1.5-octave band centred on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and pedestal. The spatial-frequency components of the notched noise above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. Thus the pedestal or dipper effect measured without notched noise is not a characteristic of individual spatial-frequency tuned channels.

Web DOI [BibTex]

2006

Web DOI [BibTex]


no image
A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem

Kim, D., Sra, S., Dhillon, I.

(TR-06-54), Univ. of Texas, Austin, December 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Probabilistic inference for solving (PO)MDPs

Toussaint, M., Harmeling, S., Storkey, A.

(934), School of Informatics, University of Edinburgh, December 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Minimal Logical Constraint Covering Sets

Sinz, F., Schölkopf, B.

(155), Max Planck Institute for Biological Cybernetics, Tübingen, December 2006 (techreport)

Abstract
We propose a general framework for computing minimal set covers under class of certain logical constraints. The underlying idea is to transform the problem into a mathematical programm under linear constraints. In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf. We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for the transformation.

PDF [BibTex]

PDF [BibTex]


no image
New Methods for the P300 Visual Speller

Biessmann, F.

(1), (Editors: Hill, J. ), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Optimizing Spatial Filters for BCI: Margin- and Evidence-Maximization Approaches

Farquhar, J., Hill, N., Schölkopf, B.

Challenging Brain-Computer Interfaces: MAIA Workshop 2006, pages: 1, November 2006 (poster)

Abstract
We present easy-to-use alternatives to the often-used two-stage Common Spatial Pattern + classifier approach for spatial filtering and classification of Event-Related Desychnronization signals in BCI. We report two algorithms that aim to optimize the spatial filters according to a criterion more directly related to the ability of the algorithms to generalize to unseen data. Both are based upon the idea of treating the spatial filter coefficients as hyperparameters of a kernel or covariance function. We then optimize these hyper-parameters directly along side the normal classifier parameters with respect to our chosen learning objective function. The two objectives considered are margin maximization as used in Support-Vector Machines and the evidence maximization framework used in Gaussian Processes. Our experiments assessed generalization error as a function of the number of training points used, on 9 BCI competition data sets and 5 offline motor imagery data sets measured in Tubingen. Both our approaches sho w consistent improvements relative to the commonly used CSP+linear classifier combination. Strikingly, the improvement is most significant in the higher noise cases, when either few trails are used for training, or with the most poorly performing subjects. This a reversal of the usual "rich get richer" effect in the development of CSP extensions, which tend to perform best when the signal is strong enough to accurately find their additional parameters. This makes our approach particularly suitable for clinical application where high levels of noise are to be expected.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

Web [BibTex]

Web [BibTex]


no image
Learning Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

Sensory Coding And The Natural Environment, 2006, pages: 1, September 2006 (poster)

Abstract
The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified. Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti & Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

Web [BibTex]

Web [BibTex]


no image
A tutorial on spectral clustering

von Luxburg, U.

(149), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. Nevertheless, on the first glance spectral clustering looks a bit mysterious, and it is not obvious to see why it works at all and what it really does. This article is a tutorial introduction to spectral clustering. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

PDF [BibTex]

PDF [BibTex]


no image
Towards the Inference of Graphs on Ordered Vertexes

Zien, A., Raetsch, G., Ong, C.

(150), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs.

PDF [BibTex]

PDF [BibTex]


no image
Classification of natural scenes: Critical features revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

Journal of Vision, 6(6):561, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification (A. Torralba & A. Oliva, Network: Comput. Neural Syst., 2003). We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only (Drewes, Wichmann, Gegenfurtner VSS 2005). We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images (“best animals”, “best distractors” and “worst animals”, “worst distractors”). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced (cf. Wichmann, Rosas, Gegenfurtner, VSS 2005). Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

Web DOI [BibTex]

Web DOI [BibTex]


no image
The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control

Wichmann, F., Henning, B.

Journal of Vision, 6(6):194, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectabilty of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold‘ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5- octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Nonnegative Matrix Approximation: Algorithms and Applications

Sra, S., Dhillon, I.

Univ. of Texas, Austin, May 2006 (techreport)

[BibTex]

[BibTex]


no image
An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization

Zien, A., Ong, C.

(146), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006 (techreport)

Abstract
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Training a Support Vector Machine in the Primal

Chapelle, O.

(147), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006, The version in the "Large Scale Kernel Machines" book is more up to date. (techreport)

Abstract
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and there is no reason for ignoring it. Moreover, from the primal point of view, new families of algorithms for large scale SVM training can be investigated.

PDF [BibTex]

PDF [BibTex]


no image
The Pedestal Effect is Caused by Off-Frequency Looking, not Nonlinear Transduction or Contrast Gain-Control

Wichmann, F., Henning, G.

9, pages: 174, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold’ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5-octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

Web [BibTex]

Web [BibTex]


no image
Efficient tests for the deconvolution hypothesis

Langovoy, M.

Workshop on Statistical Inverse Problems, March 2006 (poster)

Web [BibTex]

Web [BibTex]


no image
Classification of Natural Scenes: Critical Features Revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

9, pages: 92, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification [1]. We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only [2]. We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images ("best animals", "best distractors" and "worst animals", "worst distractors"). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced [3]. Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

Web [BibTex]

Web [BibTex]


no image
Factorial Coding of Natural Images: How Effective are Linear Models in Removing Higher-Order Dependencies?

Bethge, M.

9, pages: 90, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multi-information reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zero-phase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multi-information between all decorrelation transforms (5% or less) for all patch sizes. Among the second-order methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters‘ found with ICA lead only to a surprisingly small improvement in terms of its actual objective.

Web [BibTex]


no image
Cross-Validation Optimization for Structured Hessian Kernel Methods

Seeger, M., Chapelle, O.

Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, February 2006 (techreport)

Abstract
We address the problem of learning hyperparameters in kernel methods for which the Hessian of the objective is structured. We propose an approximation to the cross-validation log likelihood whose gradient can be computed analytically, solving the hyperparameter learning problem efficiently through nonlinear optimization. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels or to large datasets. When applied to the problem of multi-way classification, our method scales linearly in the number of classes and gives rise to state-of-the-art results on a remote imaging task.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Classification of natural scenes: critical features revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 251, 2006 (poster)

[BibTex]

[BibTex]


no image
Texture and haptic cues in slant discrimination: combination is sensitive to reliability but not statistically optimal

Rosas, P., Wagemans, J., Ernst, M., Wichmann, F.

Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen (TeaP 2006), 48, pages: 80, 2006 (poster)

[BibTex]

[BibTex]


no image
Ähnlichkeitsmasse in Modellen zur Kategorienbildung

Jäkel, F., Wichmann, F.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 223, 2006 (poster)

[BibTex]

[BibTex]


no image
The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control

Wichmann, F., Henning, B.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 205, 2006 (poster)

[BibTex]

[BibTex]


Thumb xl screen shot 2012 06 06 at 11.31.38 am
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]

2005


no image
Popper, Falsification and the VC-dimension

Corfield, D., Schölkopf, B., Vapnik, V.

(145), Max Planck Institute for Biological Cybernetics, November 2005 (techreport)

PDF [BibTex]

2005

PDF [BibTex]


no image
Kernel methods for dependence testing in LFP-MUA

Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N.

35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience), November 2005 (poster)

Abstract
A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.

Web [BibTex]

Web [BibTex]


no image
Rapid animal detection in natural scenes: Critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Journal of Vision, 5(8):376, Fifth Annual Meeting of the Vision Sciences Society (VSS), September 2005 (poster)

Abstract
Thorpe et al (Nature 381, 1996) first showed how rapidly human observers are able to classify natural images as to whether they contain an animal or not. Whilst the basic result has been replicated using different response paradigms (yes-no versus forced-choice), modalities (eye movements versus button presses) as well as while measuring neurophysiological correlates (ERPs), it is still unclear which image features support this rapid categorisation. Recently Torralba and Oliva (Network: Computation in Neural Systems, 14, 2003) suggested that simple global image statistics can be used to predict seemingly complex decisions about the absence and/or presence of objects in natural scences. They show that the information contained in a small number (N=16) of spectral principal components (SPC)—principal component analysis (PCA) applied to the normalised power spectra of the images—is sufficient to achieve approximately 80% correct animal detection in natural scenes. Our goal was to test whether human observers make use of the power spectrum when rapidly classifying natural scenes. We measured our subjects' ability to detect animals in natural scenes as a function of presentation time (13 to 167 msec); images were immediately followed by a noise mask. In one condition we used the original images, in the other images whose power spectra were equalised (each power spectrum was set to the mean power spectrum over our ensemble of 1476 images). Thresholds for 75% correct animal detection were in the region of 20–30 msec for all observers, independent of the power spectrum of the images: this result makes it very unlikely that human observers make use of the global power spectrum. Taken together with the results of Gegenfurtner, Braun & Wichmann (Journal of Vision [abstract], 2003), showing the robustness of animal detection to global phase noise, we conclude that humans use local features, like edges and contours, in rapid animal detection.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning an Interest Operator from Eye Movements

Kienzle, W., Franz, M., Wichmann, F., Schölkopf, B.

International Workshop on Bioinspired Information Processing (BIP 2005), 2005, pages: 1, September 2005 (poster)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Classification of natural scenes using global image statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

Journal of Vision, 5(8):602, Fifth Annual Meeting of the Vision Sciences Society (VSS), September 2005 (poster)

Abstract
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and non-animal images. After applying a DFT to the image, we put the Fourier spectrum into bins (8 orientations with 6 frequency bands each). Using all bins, classification performance on the Fourier spectrum reached 70%. However, performance was similar (67%) when only the high spatial frequency information was used and decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially filtered images. A detailed analysis of the classification weights showed that a relatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical and horizontal orientation at the highest spatial frequency band. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Combinatorial View of Graph Laplacians

Huang, J.

(144), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, August 2005 (techreport)

Abstract
Discussions about different graph Laplacian, mainly normalized and unnormalized versions of graph Laplacian, have been ardent with respect to various methods in clustering and graph based semi-supervised learning. Previous research on graph Laplacians investigated their convergence properties to Laplacian operators on continuous manifolds. There is still no strong proof on convergence for the normalized Laplacian. In this paper, we analyze different variants of graph Laplacians directly from the ways solving the original graph partitioning problem. The graph partitioning problem is a well-known combinatorial NP hard optimization problem. The spectral solutions provide evidence that normalized Laplacian encodes more reasonable considerations for graph partitioning. We also provide some examples to show their differences.

[BibTex]

[BibTex]


no image
Beyond Pairwise Classification and Clustering Using Hypergraphs

Zhou, D., Huang, J., Schölkopf, B.

(143), Max Planck Institute for Biological Cybernetics, August 2005 (techreport)

Abstract
In many applications, relationships among objects of interest are more complex than pairwise. Simply approximating complex relationships as pairwise ones can lead to loss of information. An alternative for these applications is to analyze complex relationships among data directly, without the need to first represent the complex relationships into pairwise ones. A natural way to describe complex relationships is to use hypergraphs. A hypergraph is a graph in which edges can connect more than two vertices. Thus we consider learning from a hypergraph, and develop a general framework which is applicable to classification and clustering for complex relational data. We have applied our framework to real-world web classification problems and obtained encouraging results.

PDF [BibTex]

PDF [BibTex]


no image
Comparative evaluation of Independent Components Analysis algorithms for isolating target-relevant information in brain-signal classification

Hill, N., Schröder, M., Lal, T., Schölkopf, B.

Brain-Computer Interface Technology, 3, pages: 95, June 2005 (poster)

PDF [BibTex]


no image
Generalized Nonnegative Matrix Approximations using Bregman Divergences

Sra, S., Dhillon, I.

Univ. of Texas at Austin, June 2005 (techreport)

[BibTex]

[BibTex]


no image
Measuring Statistical Dependence with Hilbert-Schmidt Norms

Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.

(140), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2005 (techreport)

Abstract
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

PDF [BibTex]

PDF [BibTex]


no image
Consistency of Kernel Canonical Correlation Analysis

Fukumizu, K., Bach, F., Gretton, A.

(942), Institute of Statistical Mathematics, 4-6-7 Minami-azabu, Minato-ku, Tokyo 106-8569 Japan, June 2005 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Classification of natural scenes using global image statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

47, pages: 88, 47. Tagung Experimentell Arbeitender Psychologen, April 2005 (poster)

[BibTex]

[BibTex]


no image
Classification of Natural Scenes using Global Image Statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

8, pages: 88, 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and nonanimal images. After applying a DFT to the image, we put the Fourier spectrum of each image into 48 bins (8 orientations with 6 frequency bands). Using all of these bins, classification performance on the Fourier spectrum reached 70%. In an iterative procedure, we then removed the bins whose absence caused the smallest damage to the classification performance (one bin per iteration). Notably, performance stayed at about 70% until less then 6 bins were left. A detailed analysis of the classification weights showed that a comparatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical orientations at the highest spatial frequency band. When using only a single frequency band (8 bins) we found that 67% classification performance could be reached when only the high spatial frequency information was used, which decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially pre-filtered images. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

Web [BibTex]

Web [BibTex]


no image
Efficient Adaptive Sampling of the Psychometric Function by Maximizing Information Gain

Tanner, T., Hill, N., Rasmussen, C., Wichmann, F.

8, pages: 109, (Editors: Bülthoff, H. H., H. A. Mallot, R. Ulrich and F. A. Wichmann), 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
A psychometric function can be described by its shape and four parameters: position or threshold, slope or width, false alarm rate or chance level, and miss or lapse rate. Depending on the parameters of interest some points on the psychometric function may be more informative than others. Adaptive methods attempt to place trials on the most informative points based on the data collected in previous trials. We introduce a new adaptive bayesian psychometric method which collects data for any set of parameters with high efficency. It places trials by minimizing the expected entropy [1] of the posterior pdf over a set of possible stimuli. In contrast to most other adaptive methods it is neither limited to threshold measurement nor to forced-choice designs. Nuisance parameters can be included in the estimation and lead to less biased estimates. The method supports block designs which do not harm the performance when a sufficient number of trials are performed. Block designs are useful for control of response bias and short term performance shifts such as adaptation. We present the results of evaluations of the method by computer simulations and experiments with human observers. In the simulations we investigated the role of parametric assumptions, the quality of different point estimates, the effect of dynamic termination criteria and many other settings. [1] Kontsevich, L.L. and Tyler, C.W. (1999): Bayesian adaptive estimation of psychometric slope and threshold. Vis. Res. 39 (16), 2729-2737.

Web [BibTex]

Web [BibTex]


no image
Automatic Classification of Plankton from Digital Images

Sieracki, M., Riseman, E., Balch, W., Benfield, M., Hanson, A., Pilskaln, C., Schultz, H., Sieracki, C., Utgoff, P., Blaschko, M., Holness, G., Mattar, M., Lisin, D., Tupper, B.

ASLO Aquatic Sciences Meeting, 1, pages: 1, February 2005 (poster)

[BibTex]

[BibTex]


no image
Bayesian Inference for Psychometric Functions

Kuss, M., Jäkel, F., Wichmann, F.

8, pages: 106, (Editors: Bülthoff, H. H., H. A. Mallot, R. Ulrich and F. A. Wichmann), 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
In psychophysical studies of perception the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. We propose the use of Bayesian inference to extract the information contained in experimental data to learn about the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we use a Markov chain Monte Carlo method to generate samples from the posterior distribution over parameters. These samples can be used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. We compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with parametric bootstrap techniques for confidence interval estimation. Experiments indicate that Bayesian inference methods are superior to bootstrap-based methods and are thus the method of choice for estimating the psychometric function and its confidence-intervals.

Web [BibTex]

Web [BibTex]


no image
Approximate Inference for Robust Gaussian Process Regression

Kuss, M., Pfingsten, T., Csato, L., Rasmussen, C.

(136), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
Gaussian process (GP) priors have been successfully used in non-parametric Bayesian regression and classification models. Inference can be performed analytically only for the regression model with Gaussian noise. For all other likelihood models inference is intractable and various approximation techniques have been proposed. In recent years expectation-propagation (EP) has been developed as a general method for approximate inference. This article provides a general summary of how expectation-propagation can be used for approximate inference in Gaussian process models. Furthermore we present a case study describing its implementation for a new robust variant of Gaussian process regression. To gain further insights into the quality of the EP approximation we present experiments in which we compare to results obtained by Markov chain Monte Carlo (MCMC) sampling.

PDF [BibTex]

PDF [BibTex]


no image
Global image statistics of natural scenes

Drewes, J., Wichmann, F., Gegenfurtner, K.

Bioinspired Information Processing, 08, pages: 1, 2005 (poster)

[BibTex]

[BibTex]


no image
Maximum-Margin Feature Combination for Detection and Categorization

BakIr, G., Wu, M., Eichhorn, J.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control. Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different feature type. Furthermore we argue that the question: ”Which feature type is more discriminative for task X?” is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.

PDF [BibTex]

PDF [BibTex]


no image
Kernel-Methods, Similarity, and Exemplar Theories of Categorization

Jäkel, F., Wichmann, F.

ASIC, 4, 2005 (poster)

Abstract
Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

Web [BibTex]


no image
Rapid animal detection in natural scenes: critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Experimentelle Psychologie. Beitr{\"a}ge zur 47. Tagung experimentell arbeitender Psychologen, 47, pages: 225, 2005 (poster)

[BibTex]

[BibTex]


no image
Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London

von Luxburg, U., Ben-David, S.

Presented at the PASCAL workshop on clustering, London, 2005 (techreport)

Abstract
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.

PDF [BibTex]

PDF [BibTex]


no image
The human brain as large margin classifier

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2, pages: 1, 2005 (poster)

[BibTex]

[BibTex]


no image
Approximate Bayesian Inference for Psychometric Functions using MCMC Sampling

Kuss, M., Jäkel, F., Wichmann, F.

(135), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In psychophysical studies the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. In this report we propose the use of Bayesian inference to extract the information contained in experimental data estimate the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition we discuss the parameterisation of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generate d data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation. The appendix provides a description of an implementation for the R environment for statistical computing and provides the code for reproducing the results discussed in the experiment section.

PDF [BibTex]

PDF [BibTex]