Support Vector Method for Function 
Approximation, Regression Estimation, 
and Signal Processing' 
Vladimir Vapnik 
AT&T Research 
101 Crawfords Corner 
Holmdel, NJ 07733 
vlad@research.att.com 
Steven E. Golowich 
Bell Laboratories 
700 Mountain Ave. 
Murray Hill, NJ 07974 
golowich@bell-labs.com 
Alex Smola* 
GMD First 
Rudower Shausee 5 
12489 Berlin 
asm@big.att.com 
Abstract 
The Support Vector (SV) method was recently proposed for es- 
timating regressions, constructing multidimensional splines, and 
solving linear operator equations [Vapnik, 1995]. In this presenta- 
tion we report results of applying the SV method to these problems. 
I Introduction 
The Support Vector method is a universal tool for solving multidimensional function 
estimation problems. Initially it was designed to solve pattern recognition problems, 
where in order to find a decision rule with good generalization ability one selects 
some (small) subset of the training data, called the Support Vectors (SVs). Optimal 
separation of the SVs is equivalent to optimal separation the entire data. 
This led to a new method of representing decision functions where the decision 
functions are a linear expansion on a basis whose elements are nonlinear functions 
parameterized by the SVs (we need one SV for each element of the basis). This 
type of function representation is especially useful for high dimensional input space: 
the number of free parameters in this representation is equal to the number of SVs 
but does not depend on the dimensionality of the space. 
Later the SV method was extended to real-valued functions. This allows us to 
expand high-dimensional functions using a small basis constructed from SVs. This 
*smola@prosun.first.gmd.de 
282 V. Vapnik, S. E. Golowich and A. Smola 
novel type of function representation opens new opportunities for solving various 
problems of function approximation and estimation. 
In this paper we demonstrate that using the $V technique one can solve problems 
that in classical techniques would require estimating a large number of free param- 
eters. In particular we construct one and two dimensional splines with an arbitrary 
number of grid points. Using linear splines we approximate non-linear functions. 
We show that by reducing requirements on the accuracy of approximation, one de- 
creases the number of SVs which leads to data compression. We also show that the 
SV technique is a useful tool for regression estimation. Lastly we demonstrate that 
using the SV function representation for solving inverse ill-posed problems provides 
an additional opportunity for regularization. 
2 SV method for estimation of real functions 
Let x 6 R n and y 6 R . Consider the following set of real functions: a vector x is 
mapped into some a priori chosen Hilbert space, where we define functions that are 
linear in their parameters 
y - f(x, w) = E wiqJi(x), w = (w, ..., WN, ...) e f (1) 
i--1 
In [Vapnik, 1995] the following method for estimating functions in the set (1) based 
on training data (Xl, y), ..., (xt, Yt) was suggested: find the function that minimizes 
the following functional: 
where 
R(w) =   lY - f(zci, w)l, + 7(w, w), (2) 
i--1 
ly- f(x w)l, = { 0 if lY- f(x, w)l < , 
' ly - f(x, w)l-  otherwise, 
(w, w) is the inner product of two vectors, and 7 is some constant. 
that the function minimizing this functional has a form: 
where ai,ai >_ 
elements of Hilbert space. 
(3) 
It was shown 
f(x,a,a*) = -,(a? - ai)((I)(xi), (I)(x)) + b (4) 
i=1 
0 with aci - 0 and ((I)(xi),(I)(x)) is the inner product of two 
- = o, o < 
i'-1 
a 7_<C, i=1,...,. (6) 
W(a*, a) = -e E(a; +ai)+ E y(a;--ai)- E (a;-ai)(a]-aj)(q)(xi), (I)(xj)), 
i--1 i----1 i,j--1 
(5) 
subject to constraints 
To find the coefficients a? and ai one has to solve the following quadratic optimiza- 
tion problem' maximize the functional 
SV Method for Function Approximation and Regression Estimation 283 
The important feature of the solution (4) of this optimization problem is that only 
some of the coefficients ((' - (i) differ from zero. The corresponding vectors xi are 
called Support Vectors (SVs). Therefore (4) describes an expansion on SVs. 
It was shown in [Vapnik, 1995] that to evaluate the inner products ((xi), (x)) 
both in expansion (4) and in the objective function (5) one can use the general 
form of the inner product in Hilbert space. According to Hilbert space theory, to 
guarantee that a symmetric function K(u, v) has an expansion 
K(u, v) = y akqJk(u)qJk(v) 
k=l 
with positive coefficients a > 0, i.e. to guarantee that K(u, v) is an inner product 
in some feature space , it is necessary and sufficient that the conditions 
K(u,v)g(u)g(v)dudv > 0 (7) 
be valid for any non-zero function g on the Hilbert space (Mercer's theorem). 
Therefore, in the SV method, one can replace (4) with 
= - + (8) 
i=1 
where the inner product ((xi), (x)) is defined through a kernel K(xi, x). To find 
coefficients a and ai one has to maximize the function 
1 
W(a*,a) = -e (a+ai)+y(a-ai)-  (a:-ai)(aj-aj)K(xi,xj) (9) 
i=1 i=1 i,j=l 
subject to constraints (6). 
3 Constructing kernels for inner products 
To define a set of approximating functions one has to define a kernel K(xi, x) that 
generates the inner product in some feature space and solve the corresponding 
quadratic optimization problem. 
3.1 Kernels generating splines 
We start with the spline functions. According to their definition, splines are piece- 
wise polynomial functions, which we will consider on the set [0, 1]. Splines of order 
n have the following representation 
n N 
fn(x): Y arx r + y, wj(x -- ts) (10) 
r--O s--1 
where (x-t)+ = max{(x - t), 0}, tl,...,t N  [0,1] are the nodes, and ar,wj are 
real values. One can consider the spline function (10) as a linear function in the 
n + N + 1 dimensional feature space spanned by 
1,X, ...,Xn, (X -- tl)., ..., (X-- tN).. 
284 V. Vapnik, $. E. Golowich and A. Smola 
Therefore the inner product that generates splines of order n in one dimension is 
n N 
K(xi, xj)= y,x:x q- y,(xi-t).(xj-t).. (11) 
r----O 3=1 
Two dimensional splines are linear functions in the (N + n + 1) 2 dimensional space 
1, x,...,x,y,...,y,...,(x-t).(y-tl)+,...,(x-tv).(y-tv)+. (12) 
Let us denote by ui: (xi, Yi), uj = (xi, yj) two two-dimensional vectors. Then the 
generating kernel for two dimensional spline functions of order n is 
K(ui,uj) = K(xi,xj)K(yi,yj) 
It is easy to check that the generating kernel for the m-dimensional splines is the 
product of m one-dimensional generating kernels. 
In applications of the SV method the number of nodes does not play an impor- 
tant role. Therefore, we introduce splines of order d with an infinite number of 
nodes S? ). To do this in the R 1 case, we map any real value xi to the element 
1, xi x." (xi- t). of the Hilbert space. The inner product becomes 
,''', $ , 
t /0 
K(xi,xj)= x:x + (xi-t).(xj-t).dt (13) 
rmO 
For linear splines S ) we therefore have the following generating kernel: 
K(xi,xj) = 1 + xixj + xixj min(xi,xj) (xi + xj) (min(xi,xj))  + (min(xi,xj))  
2 3 
(14) 
In many applications expansions in B-splines tUnset & Aldroubi, 1992] are used, 
where 
 x+ r . 
B(x) = r 2 
r=0 + 
One mw use B-splines to perform a construction similar to the above, yielding 
the kernel 
K(xi,xi) = B(xi-t)B,(xj -t)dt = B2,+l(Xi- xj). 
3.2 Kernels generating Fourier expansions 
Lastly, Fourier expansion can be considered as a hyperplane in following 2N + 1 
dimensional feature space 
1 
, cos x, sinx, ..., cos Nx, sin Nx. 
The inner product in this space is defined by the Dirichlet formula: 
l+(cosrxicosrxj +sinrxisinrxj) = sin(N + 1/2)(xi- xj) (15) 
K(xi,xj) =  sin ,_y 
r=l 2 
SV Method for Function Approximation and Regression Estimation 285 
4 Function estimation and data compression 
In this section we approximate functions on the basis of observations at  points 
(xl, Yl), ..., (xt, yt). (16) 
We demonstrate that to construct an approximation within an accuracy of q-e at 
the data points, one can use only the subsequence of the data containing the SVs. 
We consider approximating the one and two dimensional functions 
sinlxl 
f(x) -- sinclrl- 
(17) 
on the basis of a sequence of measurements (without noise) on the uniform lattice 
(100 for the one dimensional case and 2,500 for the two-dimensional case). 
For different e we approximate this function by linear splines from 
/\ 
Figure 1: Approximations with different levels of accuracy require different numbers 
of SV: 31 SV for e = 0.02 (left) and 9 SV for e = 0.1. Large dots indicate SVs. 
05 
0 
-05 
Figure 
 . 01234 
2: Approximation of fix, y) 
'... '.: .. y'. ,',,-... 
..*  . -_ '.. %'... 
sincv/x 2 + y2 by two dimensional linear 
splines with accuracy e - 0.01 (left) required 157 $V (right) 
Figure 3: sincx function corrupted by different levels of noise (a = 0.2 left, 
right) and its regression. Black dots indicate SV, circles non-SV data. 
0.5 
286 V. Vapnik, S. E. Golowich and A. Smola 
5 Solution of the linear operator equations 
In this section we consider the problem of solving linear equations in the set of 
functions defined by SVs. Consider the problem of solving a linear operator equation 
Af(t) = F(x), fit) 6 E, F(x) 6 I,, (18) 
where we are given measurements of the right hand side 
(Xl,F1),...,(xt, Ft). (19) 
Consider the set of functions fit, w)  E linear in some feature space {(I)(t) = 
(0(t), ..., N(t), ...)}' 
f(t, w) =  wr(t) = (w, (t)). (20) 
r--0 
The operator A maps this set of functions into 
y(x, w) - Ai(t, w) -  wAO(t) -  w(x) - (W, ()) (2) 
r=O r=O 
where pr(x) = Ac)(t), (x) = (p(x),..., PN(X),...). Let us define the generating 
kernel in image space 
:(, ) =  ()() = ((x), ()) (22) 
r--O 
and the corresponding cross-kernel function 
(,,t) = y] (,)0r(t) = ((,), (t)). (23) 
r--0 
The problem of solving (18) in the set of functions f(t, w) E  (finding the vector 
W) is equivalent to the problem of regression estimation (21) using data (19). 
To estimate the regression on the basis of the kernel K(xi,xj) one can use the 
methods described in Section 1. The obtained parameters (( -(i, i = 1,...) 
define the approximation to the solution of equation (18) based on data (19): 
f(t,) = (? - )(x, t). 
i----1 
We have applied this method to solution of the Radon equation 
(m) f(m cos p 4- u sin p, rn sin p - u cos p)du - p(m, 
(-) 
-l _< m _< 1, O < p < r, a(m) = x/1- rn  (24) 
using noisy observations (ml,//1,Pl), ..., (mr, pt,pt), where Pi '- p(rni,pi) 4- i and 
{i} are independent with Ei = O, E  
SV Method for Function Approximation and Regression Estimation 287 
For two-dimensional linear splines S? ) we obtained analytical expressions for the 
'kernel (22) and cross-kernel (23). We have used these kernels for solving the corre- 
sponding regression problem and reconstructing images based on data that is similar 
to what one might get from a Positron Emission Tomography scan [Shepp, Vardi 
& Kaufman, 1985]. 
A remarkable feature of this solution is that it avoids a pixel representation of the 
function which would require the estimation of 10,000 to 60,000 parameters. The 
spline approximation shown here required only 172 SVs. 
Figure 4: Original image (dashed line) and its reconstruction (solid line) from 2,048 
observations (left). 172 SVs (support lines) were used in the reconstruction (right). 
6 Conclusion 
In this article we present a new method of function estimation that is especially 
useful for solving multi-dimensional problems. The complexity of the solution of 
the function estimation problem using the SV ;representation depends on the com- 
plexity of the desired solution (i.e. on the required number of SVs for a reasonable 
approximation of the desired function) rather than on the dimensionality of the 
space. Using the SV method one can solve various problems of function estimation 
both in statistics and in applied mathematics. 
Acknowledgments 
We would like to thank Chris Burges (Lucent Technologies) and Bernhard Sch51kopf 
(MPIK Tiibingen) for help with the code and useful discussions. 
This work was supported in part by NSF grant PHY 95-12729 (Steven Golowich) 
and by ARPA grant N00014-94-C-0186 and the German National Scholarship Foun- 
dation (Alex Smola). 
References 
1. Vladimir Vapnik, "The Nature of Statistical Learning Theory", 1995, Springer 
Verlag N.Y., 189 p. 
2. Michael Unser and Akram Aldroubi, "Polynomial Splines and Wevelets - A Signal 
Perspectives", In the book: "Wavelets -A tutorial in Theory and Applications", 
C.K. Chui (ed) pp. 91 - 122, 1992 Academic Press, Inc. 
3. L. Shepp, Y. Vardi, and L. Kaufman, "A statistical model for Positron Emission 
Tomography," J. Amer. Star. Assoc. 80:389 pp. 8-37 1985. 
