Header logo is ei


2006


no image
A Direct Method for Building Sparse Kernel Learning Algorithms

Wu, M., Schölkopf, B., BakIr, G.

Journal of Machine Learning Research, 7, pages: 603-624, April 2006 (article)

Abstract
Many Kernel Learning Algorithms(KLA), including Support Vector Machine (SVM), result in a Kernel Machine (KM), such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build Sparse Kernel Learning Algorithms (SKLA) by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting KM is explicitly controlled while at the same time the performance of the resulting KM can be kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying this method to the SVM results in a concrete algorithm for building Sparse Large Margin Classifiers (SLMC). Further analysis of the SLMC algorithm indicates that it essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace, the different classes of data are linearly well separated. Experimental results over several classification benchmarks demonstrate the effectiveness of our approach.

PDF PDF [BibTex]

2006

PDF PDF [BibTex]


no image
An Inventory of Sequence Polymorphisms For Arabidopsis

Clark, R., Ossowski, S., Schweikert, G., Rätsch, G., Shinn, P., Zeller, G., Warthmann, N., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Ecker, J., Weigel, D.

17th International Conference on Arabidopsis Research, April 2006 (talk)

Abstract
We have used high-density oligonucleotide arrays to characterize common sequence variation in 20 wild strains of Arabidopsis thaliana that were chosen for maximal genetic diversity. Both strands of each possible SNP of the 119 Mb reference genome were represented on the arrays, which were hybridized with whole genome, isothermally amplified DNA to minimize ascertainment biases. Using two complementary approaches, a model based algorithm, and a newly developed machine learning method, we identified over 550,000 SNPs with a false discovery rate of ~ 0.03 (average of 1 SNP for every 216 bp of the genome). A heuristic algorithm predicted in addition ~700 highly polymorphic or deleted regions per accession. Over 700 predicted polymorphisms with major functional effects (e.g., premature stop codons, or deletions of coding sequence) were validated by dideoxy sequencing. Using this data set, we provide the first systematic description of the types of genes that harbor major effect polymorphisms in natural populations at moderate allele frequencies. The data also provide an unprecedented resource for the study of genetic variation in an experimentally tractable, multicellular model organism.

[BibTex]

[BibTex]


no image
Machine Learning and Applications in Biology

Shin, H.

6th Course in Bioinformatics for Molecular Biologist, March 2006 (talk)

Abstract
The emergence of the fields of computational biology and bioinformatics has alleviated the burden of solving many biological problems, saving the time and cost required for experiments and also providing predictions that guide new experiments. Within computational biology, machine learning algorithms have played a central role in dealing with the flood of biological data. The goal of this tutorial is to raise awareness and comprehension of machine learning so that biologists can properly match the task at hand to the corresponding analytical approach. We start by categorizing biological problem settings and introduce the general machine learning schemes that fit best to each or these categories. We then explore representative models in further detail, from traditional statistical models to recent kernel models, presenting several up-to-date research projects in bioinfomatics to exemplify how biological questions can benefit from a machine learning approach. Finally, we discuss how cooperation between biologists and machine learners might be made smoother.

PDF [BibTex]

PDF [BibTex]


no image
The Pedestal Effect is Caused by Off-Frequency Looking, not Nonlinear Transduction or Contrast Gain-Control

Wichmann, F., Henning, G.

9, pages: 174, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold’ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5-octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

Web [BibTex]

Web [BibTex]


no image
Efficient tests for the deconvolution hypothesis

Langovoy, M.

Workshop on Statistical Inverse Problems, March 2006 (poster)

Web [BibTex]

Web [BibTex]


no image
Kernel extrapolation

Vishwanathan, SVN., Borgwardt, KM., Guttman, O., Smola, AJ.

Neurocomputing, 69(7-9):721-729, March 2006 (article)

Abstract
We present a framework for efficient extrapolation of reduced rank approximations, graph kernels, and locally linear embeddings (LLE) to unseen data. We also present a principled method to combine many of these kernels and then extrapolate them. Central to our method is a theorem for matrix approximation, and an extension of the representer theorem to handle multiple joint regularization constraints. Experiments in protein classification demonstrate the feasibility of our approach.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Statistical Properties of Kernel Principal Component Analysis

Blanchard, G., Bousquet, O., Zwald, L.

Machine Learning, 66(2-3):259-294, March 2006 (article)

Abstract
We study the properties of the eigenvalues of Gram matrices in a non-asymptotic setting. Using local Rademacher averages, we provide data-dependent and tight bounds for their convergence towards eigenvalues of the corresponding kernel operator. We perform these computations in a functional analytic framework which allows to deal implicitly with reproducing kernel Hilbert spaces of infinite dimension. This can have applications to various kernel algorithms, such as Support Vector Machines (SVM). We focus on Kernel Principal Component Analysis (KPCA) and, using such techniques, we obtain sharp excess risk bounds for the reconstruction error. In these bounds, the dependence on the decay of the spectrum and on the closeness of successive eigenvalues is made explicit.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Network-based de-noising improves prediction from microarray data

Kato, T., Murata, Y., Miura, K., Asai, K., Horton, P., Tsuda, K., Fujibuchi, W.

BMC Bioinformatics, 7(Suppl. 1):S4-S4, March 2006 (article)

Abstract
Prediction of human cell response to anti-cancer drugs (compounds) from microarray data is a challenging problem, due to the noise properties of microarrays as well as the high variance of living cell responses to drugs. Hence there is a strong need for more practical and robust methods than standard methods for real-value prediction. We devised an extended version of the off-subspace noise-reduction (de-noising) method to incorporate heterogeneous network data such as sequence similarity or protein-protein interactions into a single framework. Using that method, we first de-noise the gene expression data for training and test data and also the drug-response data for training data. Then we predict the unknown responses of each drug from the de-noised input data. For ascertaining whether de-noising improves prediction or not, we carry out 12-fold cross-validation for assessment of the prediction performance. We use the Pearson‘s correlation coefficient between the true and predicted respon se values as the prediction performance. De-noising improves the prediction performance for 65% of drugs. Furthermore, we found that this noise reduction method is robust and effective even when a large amount of artificial noise is added to the input data. We found that our extended off-subspace noise-reduction method combining heterogeneous biological data is successful and quite useful to improve prediction of human cell cancer drug responses from microarray data.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Classification of Natural Scenes: Critical Features Revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

9, pages: 92, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification [1]. We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only [2]. We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images ("best animals", "best distractors" and "worst animals", "worst distractors"). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced [3]. Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

Web [BibTex]

Web [BibTex]


no image
Data mining problems and solutions for response modeling in CRM

Cho, S., Shin, H., Yu, E., Ha, K., MacLachlan, D.

Entrue Journal of Information Technology, 5(1):55-64, March 2006 (article)

Abstract
We present three data mining problems that are often encountered in building a response model. They are robust modeling, variable selection and data selection. Respective algorithmic solutions are given. They are bagging based ensemble, genetic algorithm based wrapper approach and nearest neighbor-based data selection in that order. A real world data set from Direct Marketing Educational Foundation, or DMEF4, is used to show their effectiveness. Proposed methods were found to solve the problems in a practical way.

PDF [BibTex]

PDF [BibTex]


no image
Factorial Coding of Natural Images: How Effective are Linear Models in Removing Higher-Order Dependencies?

Bethge, M.

9, pages: 90, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multi-information reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zero-phase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multi-information between all decorrelation transforms (5% or less) for all patch sizes. Among the second-order methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters‘ found with ICA lead only to a surprisingly small improvement in terms of its actual objective.

Web [BibTex]


no image
Model-based Design Analysis and Yield Optimization

Pfingsten, T., Herrmann, D., Rasmussen, C.

IEEE Transactions on Semiconductor Manufacturing, 19(4):475-486, February 2006 (article)

Abstract
Fluctuations are inherent to any fabrication process. Integrated circuits and micro-electro-mechanical systems are particularly affected by these variations, and due to high quality requirements the effect on the devices’ performance has to be understood quantitatively. In recent years it has become possible to model the performance of such complex systems on the basis of design specifications, and model-based Sensitivity Analysis has made its way into industrial engineering. We show how an efficient Bayesian approach, using a Gaussian process prior, can replace the commonly used brute-force Monte Carlo scheme, making it possible to apply the analysis to computationally costly models. We introduce a number of global, statistically justified sensitivity measures for design analysis and optimization. Two models of integrated systems serve us as case studies to introduce the analysis and to assess its convergence properties. We show that the Bayesian Monte Carlo scheme can save costly simulation runs and can ensure a reliable accuracy of the analysis.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Prenatal development of ocular dominance and orientation maps in a self-organizing model of V1

Jegelka, S., Bednar, J., Miikkulainen, R.

Neurocomputing, 69(10-12):1291-1296, February 2006 (article)

Abstract
How orientation and ocular-dominance (OD) maps develop before visual experience begins is controversial. Possible influences include molecular signals and spontaneous activity, but their contributions remain unclear. This paper presents LISSOM simulations suggesting that previsual spontaneous activity alone is sufficient for realistic OR and OD maps to develop. Individual maps develop robustly with various previsual patterns, and are aided by background noise. However, joint OR/OD maps depend crucially on how correlated the patterns are between eyes, even over brief initial periods. Therefore, future biological experiments should account for multiple activity sources, and should measure map interactions rather than maps of single features.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Weighting of experimental evidence in macromolecular structure determination

Habeck, M., Rieping, W., Nilges, M.

Proceedings of the National Academy of Sciences of the United States of America, 103(6):1756-1761, February 2006 (article)

Abstract
The determination of macromolecular structures requires weighting of experimental evidence relative to prior physical information. Although it can critically affect the quality of the calculated structures, experimental data are routinely weighted on an empirical basis. At present, cross-validation is the most rigorous method to determine the best weight. We describe a general method to adaptively weight experimental data in the course of structure calculation. It is further shown that the necessity to define weights for the data can be completely alleviated. We demonstrate the method on a structure calculation from NMR data and find that the resulting structures are optimal in terms of accuracy and structural quality. Our method is devoid of the bias imposed by an empirical choice of the weight and has some advantages over estimating the weight by cross-validation.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Subspace identification through blind source separation

Grosse-Wentrup, M., Buss, M.

IEEE Signal Processing Letters, 13(2):100-103, February 2006 (article)

Abstract
Given a linear and instantaneous mixture model, we prove that for blind source separation (BSS) algorithms based on mutual information, only sources with non-Gaussian distribution are consistently reconstructed independent of initial conditions. This allows the identification of non-Gaussian sources and consequently the identification of signal and noise subspaces through BSS. The results are illustrated with a simple example, and the implications for a variety of signal processing applications, such as denoising and model identification, are discussed.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Classification of Faces in Man and Machine

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Neural Computation, 18(1):143-165, January 2006 (article)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gaussian Processes for Machine Learning

Rasmussen, CE., Williams, CKI.

pages: 248, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, January 2006 (book)

Abstract
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Web [BibTex]

Web [BibTex]


no image
Dimension Reduction as a Deflation Method in ICA

Zhang, K., Chan, L.

IEEE Signal Processing Letters, 13(1):45-48, 2006 (article)

Web [BibTex]

Web [BibTex]


no image
Classification of natural scenes: critical features revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 251, 2006 (poster)

[BibTex]

[BibTex]


no image
Texture and haptic cues in slant discrimination: combination is sensitive to reliability but not statistically optimal

Rosas, P., Wagemans, J., Ernst, M., Wichmann, F.

Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen (TeaP 2006), 48, pages: 80, 2006 (poster)

[BibTex]

[BibTex]


no image
Symbol Recognition with Kernel Density Matching

Zhang, W., Wenyin, L., Zhang, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):2020-2024, 2006 (article)

Abstract
We propose a novel approach to similarity assessment for graphic symbols. Symbols are represented as 2D kernel densities and their similarity is measured by the Kullback-Leibler divergence. Symbol orientation is found by gradient-based angle searching or independent component analysis. Experimental results show the outstanding performance of this approach in various situations.

Web [BibTex]

Web [BibTex]


no image
An adaptive method for subband decomposition ICA

Zhang, K., Chan, L.

Neural Computation, 18(1):191-223, 2006 (article)

Abstract
Subband decomposition ICA (SDICA), an extension of ICA, assumes that each source is represented as the sum of some independent subcomponents and dependent subcomponents, which have different frequency bands. In this article, we first investigate the feasibility of separating the SDICA mixture in an adaptive manner. Second, we develop an adaptive method for SDICA, namely band-selective ICA (BS-ICA), which finds the mixing matrix and the estimate of the source independent subcomponents. This method is based on the minimization of the mutual information between outputs. Some practical issues are discussed. For better applicability, a scheme to avoid the high-dimensional score function difference is given. Third, we investigate one form of the overcomplete ICA problems with sources having specific frequency characteristics, which BS-ICA can also be used to solve. Experimental results illustrate the success of the proposed method for solving both SDICA and the over-complete ICA problems.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Ähnlichkeitsmasse in Modellen zur Kategorienbildung

Jäkel, F., Wichmann, F.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 223, 2006 (poster)

[BibTex]

[BibTex]


no image
The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control

Wichmann, F., Henning, B.

Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48, pages: 205, 2006 (poster)

[BibTex]

[BibTex]

2001


no image
Perception of Planar Shapes in Depth

Wichmann, F., Willems, B., Rosas, P., Wagemans, J.

Journal of Vision, 1(3):176, First Annual Meeting of the Vision Sciences Society (VSS), December 2001 (poster)

Abstract
We investigated the influence of the perceived 3D-orientation of planar elliptical shapes on the perception of the shapes themselves. Ellipses were projected onto the surface of a sphere and subjects were asked to indicate if the projected shapes looked as if they were a circle on the surface of the sphere. The image of the sphere was obtained from a real, (near) perfect sphere using a highly accurate digital camera (real sphere diameter 40 cm; camera-to-sphere distance 320 cm; for details see Willems et al., Perception 29, S96, 2000; Photometrics SenSys 400 digital camera with Rodenstock lens, 12-bit linear luminance resolution). Stimuli were presented monocularly on a carefully linearized Sony GDM-F500 monitor keeping the scene geometry as in the real case (sphere diameter on screen 8.2 cm; viewing distance 66 cm). Experiments were run in a darkened room using a viewing tube to minimize, as far as possible, extraneous monocular cues to depth. Three different methods were used to obtain subjects' estimates of 3D-shape: the method of adjustment, temporal 2-alternative forced choice (2AFC) and yes/no. Several results are noteworthy. First, mismatch between perceived and objective slant tended to decrease with increasing objective slant. Second, the variability of the settings, too, decreased with increasing objective slant. Finally, we comment on the results obtained using different psychophysical methods and compare our results to those obtained using a real sphere and binocular vision (Willems et al.).

Web DOI [BibTex]

2001

Web DOI [BibTex]


no image
Anabolic and Catabolic Gene Expression Pattern Analysis in Normal Versus Osteoarthritic Cartilage Using Complementary DNA-Array Technology

Aigner, T., Zien, A., Gehrsitz, A., Gebhard, P., McKenna, L.

Arthritis and Rheumatism, 44(12):2777-2789, December 2001 (article)

Web [BibTex]

Web [BibTex]


no image
Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators

Williamson, R., Smola, A., Schölkopf, B.

IEEE Transactions on Information Theory, 47(6):2516-2532, September 2001 (article)

Abstract
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite-dimensional unit ball in feature space into a finite-dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.

DOI [BibTex]

DOI [BibTex]


no image
Centralization: A new method for the normalization of gene expression data

Zien, A., Aigner, T., Zimmer, R., Lengauer, T.

Bioinformatics, 17, pages: S323-S331, June 2001, Mathematical supplement available at http://citeseer.ist.psu.edu/574280.html (article)

Abstract
Microarrays measure values that are approximately proportional to the numbers of copies of different mRNA molecules in samples. Due to technical difficulties, the constant of proportionality between the measured intensities and the numbers of mRNA copies per cell is unknown and may vary for different arrays. Usually, the data are normalized (i.e., array-wise multiplied by appropriate factors) in order to compensate for this effect and to enable informative comparisons between different experiments. Centralization is a new two-step method for the computation of such normalization factors that is both biologically better motivated and more robust than standard approaches. First, for each pair of arrays the quotient of the constants of proportionality is estimated. Second, from the resulting matrix of pairwise quotients an optimally consistent scaling of the samples is computed.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Regularized principal manifolds

Smola, A., Mika, S., Schölkopf, B., Williamson, R.

Journal of Machine Learning Research, 1, pages: 179-209, June 2001 (article)

Abstract
Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding. We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.

PDF [BibTex]

PDF [BibTex]


no image
Failure Diagnosis of Discrete Event Systems

Son, HI., Kim, KW., Lee, S.

Journal of Control, Automation and Systems Engineering, 7(5):375-383, May 2001, In Korean (article)

[BibTex]

[BibTex]


no image
Plaid maskers revisited: asymmetric plaids

Wichmann, F.

pages: 57, 4. T{\"u}binger Wahrnehmungskonferenz (TWK), March 2001 (poster)

Abstract
A large number of psychophysical and physiological experiments suggest that luminance patterns are independently analysed in channels responding to different bands of spatial frequency. There are, however, interactions among stimuli falling well outside the usual estimates of channels' bandwidths. Derrington & Henning (1989) first reported that, in 2-AFC sinusoidal-grating detection, plaid maskers, whose components are oriented symmetrically about the signal orientation, cause a substantially larger threshold elevation than would be predicted from their sinusoidal constituents alone. Wichmann & Tollin (1997a,b) and Wichmann & Henning (1998) confirmed and extended the original findings, measuring masking as a function of presentation time and plaid mask contrast. Here I investigate masking using plaid patterns whose components are asymmetrically positioned about the signal orientation. Standard temporal 2-AFC pattern discrimination experiments were conducted using plaid patterns and oblique sinusoidal gratings as maskers, and horizontally orientated sinusoidal gratings as signals. Signal and maskers were always interleaved on the display (refresh rate 152 Hz). As in the case of the symmetrical plaid maskers, substantial masking was observed for many of the asymmetrical plaids. Masking is neither a straightforward function of the plaid's constituent sinusoidal components nor of the periodicity of the luminance beats between components. These results cause problems for the notion that, even for simple stimuli, detection and discrimination are based on the outputs of channels tuned to limited ranges of spatial frequency and orientation, even if a limited set of nonlinear interactions between these channels is allowed.

Web [BibTex]

Web [BibTex]


no image
Pattern Selection Using the Bias and Variance of Ensemble

Shin, H., Cho, S.

Journal of the Korean Institute of Industrial Engineers, 28(1):112-127, March 2001 (article)

Abstract
[Abstract]: A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern “utility index” that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.

[BibTex]

[BibTex]


no image
Structure and Functionality of a Designed p53 Dimer.

Davison, TS., Nie, X., Ma, W., Lin, Y., Kay, C., Benchimol, S., Arrowsmith, C.

Journal of Molecular Biology, 307(2):605-617, March 2001 (article)

Abstract
P53 is a homotetrameric tumor suppressor protein involved in transcriptional control of genes that regulate cell proliferation and death. In order to probe the role that oligomerization plays in this capacity, we have previously designed and characterized a series of p53 proteins with altered oligomeric states through hydrophilc substitution of residues Met340 or Leu344 in the normally tetrameric oligomerization domain. Although such mutations have little effect on the overall secondary structural content of the oligomerization domain, both solubility and the resistance to thermal denaturation are substantially reduced relative to that of the wild-type domain. Here, we report the design and characterization of a double-mutant p53 with alterations of residues at positions Met340 and Leu344. The double-mutations Met340Glu/Leu344Lys and Met340Gln/Leu344Arg resulted in distinct dimeric forms of the protein. Furthermore, we have verified by NMR structure determination that the double-mutant Met340Gln/Leu344Arg is essentially a "half-tetramer". Analysis of the in vivo activities of full-length p53 oligomeric mutants reveals that while cell-cycle arrest requires tetrameric p53, transcriptional transactivation activity of monomers and dimers retain roughly background and half of the wild-type activity, respectively.

Web [BibTex]

Web [BibTex]


no image
An Introduction to Kernel-Based Learning Algorithms

Müller, K., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.

IEEE Transactions on Neural Networks, 12(2):181-201, March 2001 (article)

Abstract
This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods. We first give a short background about Vapnik-Chervonenkis theory and kernel feature spaces and then proceed to kernel based learning in supervised and unsupervised scenarios including practical and algorithmic considerations. We illustrate the usefulness of kernel algorithms by discussing applications such as optical character recognition and DNA analysis

DOI [BibTex]

DOI [BibTex]


no image
Estimating the support of a high-dimensional distribution.

Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.

Neural Computation, 13(7):1443-1471, March 2001 (article)

Abstract
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

Web DOI [BibTex]

Web DOI [BibTex]


no image
The psychometric function: II. Bootstrap-based confidence intervals and sampling

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1314-1329, 2001 (article)

PDF [BibTex]

PDF [BibTex]


no image
The psychometric function: I. Fitting, sampling and goodness-of-fit

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1293-1313, 2001 (article)

Abstract
The psychometric function relates an observer'sperformance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function'sparameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (or lapses ). We show that failure to account for this can lead to serious biases in estimates of the psychometric function'sparameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional X^2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods

PDF [BibTex]

PDF [BibTex]


no image
The pedestal effect with a pulse train and its constituent sinusoids

Henning, G., Wichmann, F., Bird, C.

Twenty-Sixth Annual Interdisciplinary Conference, 2001 (poster)

Abstract
Curves showing "threshold" contrast for detecting a signal grating as a function of the contrast of a masking grating of the same orientation, spatial frequency, and phase show a characteristic improvement in performance at masker contrasts near the contrast threshold of the unmasked signal. Depending on the percentage of correct responses used to define the threshold, the best performance can be as much as a factor of three better than the unmasked threshold obtained in the absence of any masking grating. The result is called the pedestal effect (sometimes, the dipper function). We used a 2AFC procedure to measure the effect with harmonically related sinusoids ranging from 2 to 16 c/deg - all with maskers of the same orientation, spatial frequency and phase - and with masker contrasts ranging from 0 to 50%. The curves for different spatial frequencies are identical if both the vertical axis (showing the threshold signal contrast) and the horizontal axis (showing the masker contrast) are scaled by the threshold contrast of the signal obtained with no masker. Further, a pulse train with a fundamental frequency of 2 c/deg produces a curve that is indistinguishable from that of a 2-c/deg sinusoid despite the fact that at higher masker contrasts, the pulse train contains at least 8 components all of them equally detectable. The effect of adding 1-D spatial noise is also discussed.

[BibTex]

[BibTex]


no image
The control structure of artificial creatures

Zhou, D., Dai, R.

Artificial Life and Robotics, 5(3), 2001, invited article (article)

Web [BibTex]

Web [BibTex]


no image
Markovian domain fingerprinting: statistical segmentation of protein sequences

Bejerano, G., Seldin, Y., Margalit, H., Tishby, N.

Bioinformatics, 17(10):927-934, 2001 (article)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Modeling the Dynamics of Individual Neurons of the Stomatogastric Networks with Support Vector Machines

Frontzek, T., Gutzen, C., Lal, TN., Heinzel, H-G., Eckmiller, R., Böhm, H.

Abstract Proceedings of the 6th International Congress of Neuroethology (ICN'2001) Bonn, abstract 404, 2001 (poster)

Abstract
In small rhythmic active networks timing of individual neurons is crucial for generating different spatial-temporal motor patterns. Switching of one neuron between different rhythms can cause transition between behavioral modes. In order to understand the dynamics of rhythmically active neurons we analyzed the oscillatory membranpotential of a pacemaker neuron and used different neural network models to predict dynamics of its time series. In a first step we have trained conventional RBF networks and Support Vector Machines (SVMs) using gaussian kernels with intracellulary recordings of the pyloric dilatator neuron in the Australian crayfish, Cherax destructor albidus. As a rule SVMs were able to learn the nonlinear dynamics of pyloric neurons faster (e.g. 15s) than RBF networks (e.g. 309s) under the same hardware conditions. After training SVMs performed a better iterated one-step-ahead prediction of time series in the pyloric dilatator neuron with regard to test error and error sum. The test error decreased with increasing number of support vectors. The best SVM used 196 support vectors and produced a test error of 0.04622 as opposed to the best RBF with 0.07295 using 26 RBF-neurons. In pacemaker neuron PD the timepoint at which the membranpotential will cross threshold for generation of its oscillatory peak is most important for determination of the test error. Interestingly SVMs are especially better in predicting this important part of the membranpotential which is superimposed by various synaptic inputs, which drive the membranpotential to its threshold.

[BibTex]

[BibTex]

2000


no image
Knowledge Discovery in Databases: An Information Retrieval Perspective

Ong, CS.

Malaysian Journal of Computer Science, 13(2):54-63, December 2000 (article)

Abstract
The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided.

PDF [BibTex]

2000

PDF [BibTex]


no image
A Simple Iterative Approach to Parameter Optimization

Zien, A., Zimmer, R., Lengauer, T.

Journal of Computational Biology, 7(3,4):483-501, November 2000 (article)

Abstract
Various bioinformatics problems require optimizing several different properties simultaneously. For example, in the protein threading problem, a scoring function combines the values for different parameters of possible sequence-to-structure alignments into a single score to allow for unambiguous optimization. In this context, an essential question is how each property should be weighted. As the native structures are known for some sequences, a partial ordering on optimal alignments to other structures, e.g., derived from structural comparisons, may be used to adjust the weights. To resolve the arising interdependence of weights and computed solutions, we propose a heuristic approach: iterating the computation of solutions (here, threading alignments) given the weights and the estimation of optimal weights of the scoring function given these solutions via systematic calibration methods. For our application (i.e., threading), this iterative approach results in structurally meaningful weights that significantly improve performance on both the training and the test data sets. In addition, the optimized parameters show significant improvements on the recognition rate for a grossly enlarged comprehensive benchmark, a modified recognition protocol as well as modified alignment types (local instead of global and profiles instead of single sequences). These results show the general validity of the optimized weights for the given threading program and the associated scoring contributions.

Web [BibTex]

Web [BibTex]


no image
Identification of Drug Target Proteins

Zien, A., Küffner, R., Mevissen, T., Zimmer, R., Lengauer, T.

ERCIM News, 43, pages: 16-17, October 2000 (article)

Web [BibTex]

Web [BibTex]


no image
Advances in Large Margin Classifiers

Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D.

pages: 422, Neural Information Processing, MIT Press, Cambridge, MA, USA, October 2000 (book)

Abstract
The concept of large margins is a unifying principle for the analysis of many different approaches to the classification of data from examples, including boosting, mathematical programming, neural networks, and support vector machines. The fact that it is the margin, or confidence level, of a classification--that is, a scale parameter--rather than a raw training error that matters has become a key tool for dealing with classifiers. This book shows how this idea applies to both the theoretical analysis and the design of algorithms. The book provides an overview of recent developments in large margin classifiers, examines connections with other methods (e.g., Bayesian inference), and identifies strengths and weaknesses of the method, as well as directions for future research. Among the contributors are Manfred Opper, Vladimir Vapnik, and Grace Wahba.

Web [BibTex]

Web [BibTex]


no image
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.

Bioinformatics, 16(9):799-807, September 2000 (article)

Abstract
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Meanfield Approach to the Thermodynamics of a Protein-Solvent System with Application to the Oligomerization of the Tumour Suppressor p53.

Noolandi, J., Davison, TS., Vokel, A., Nie, F., Kay, C., Arrowsmith, C.

Proceedings of the National Academy of Sciences of the United States of America, 97(18):9955-9960, August 2000 (article)

Web [BibTex]

Web [BibTex]


no image
New Support Vector Algorithms

Schölkopf, B., Smola, A., Williamson, R., Bartlett, P.

Neural Computation, 12(5):1207-1245, May 2000 (article)

Abstract
We propose a new class of support vector algorithms for regression and classification. In these algorithms, a parameter {nu} lets one effectively control the number of support vectors. While this can be useful in its own right, the parameterization has the additional benefit of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter {epsilon} in the regression case, and the regularization constant C in the classification case. We describe the algorithms, give some theoretical results concerning the meaning and the choice of {nu}, and report experimental results.

Web DOI [BibTex]

Web DOI [BibTex]