OPERA-lib (under development)

OPERA-lib: Operator-valued kernel methods library for Python

Kernel methods are well-established machine learning approaches that are able to learn complex relationships of data through high-dimensional Hilbert-spaces. Operator-valued kernel (OVK) methods extend these approaches into multi-dimensional output domains, where several target functions are predicted.

OPERA-lib will consist of various structured-learning machine learning methods utilising OVKs. The library is a Python module utilising standard open-source libraries Numpy, Scipy and Matplotlib, and is designed for compatibility to Scikit-learn machine learning library.



Operator-valued kernels

Let there be N-sized dataset . We wish to estimate a p-dimensional target function

which is in vector notation

The full operator-valued kernel over the dataset is

where each block consists of p times p kernel values, for instance over the "components" i,j

where p is the number of targets. A common scalar kernel is the gaussian kernel

The OVK kernel method extends trivially the kernel ridge regression and classification.

OPERA-lib example

>>> from operalib import ovk >>> X = [[0], [1], [2], [3]] >>> Y = [0, 1, 2, 3] >>> Xp = [0.5] >>> clf = ovk.OVKR(gamma=1.3, kernel=‘rbf’, ovkernel='cov', C=5, penalty='l1') >>> clf.fit(X, Y) OVKR(C=5.0, cache_size=200, class_weight=None,penalty='l1', gamma=1.3, kernel='rbf', ovkernel='cov', max_iter=-1, tol=0.001, verbose=False) >>> clf.predict(Xp) array([ 0.66939972]) TODO take syntax highlighting from scikit-learn webpage

Installation

To install OPERA-lib, use the Python pip package manager pip install operalib

To update OPERA-lib to most recent version, use pip install -U operalib

To install pip, see https://pip.pypa.io/en/latest/installing.html

Input-Output kernel Regression (IOKR)

Link prediction is addressed as an output kernel learning task through semi-supervised Output Kernel Regression. Working in the framework of RKHS theory with vector-valued functions, we establish a new representer theorem devoted to semi-supervised least square regression. We then apply it to get a new model (POKR: Penalized Output Kernel Regression) and show its relevance using numerical experiments on artificial networks and two real applications using a very low percentage of labeled data in a transductive setting.

OKVAR

Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. We have introduced a novel nonlinear autoregressive model based on operator-valued kernels. A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model.

Large-scale OVK learning

Randomized OVK

OVK ODE models


Literature

Vazquet and Walter (2003): Multi-output support vector regression. In IFAC Symposium on System Identification
Early paper connecting Gaussian processes, kriging and kernel methods over multi-dimensional outputs.
Caponnetto, Micchelli, Pontil and Ying (2008): Universal multi-task kernels. JMLR 9:1615-1646
Analysis and examples of classes of positive semi-definite and universal operator-valued kernels.
Micchelli and Pontil (2006): On learning vector-valued functions. Neural computation 17
Theory and practical considerations for OVKs.
Kadri, Ghavamzadeh and Preux (2013): A generalized kernel approach to structured output learning. ICML
Structured output learning with empirical covariance OVKs with various image recognition applications and experiments.
Senbabaoglu, Lim, Michalidis and d'Alche-Buc (2013): OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks. Bioinformatics 29 (11):1416-1423.
Auto-regressive time-series models using operator-valued kernels and boosting.
Bouard, d'Alché-Buc and Szafranski (2011): Semi-Supervized Penalized Output Kernel Regression for Link Prediction. In ICML 2011.
Operator-valued output kernels, with semisupervised extensions and network inference application.