Coordinate Transformation Learning of 
Hand Position Feedback Controller by 
Using Change of Position Error Norm 
Eimei Oyama* 
Mechanical Eng. Lab. 
Namiki 1-2, Tsukuba Science City 
Ibaraki 305-8564 Japan 
Susurnu Tachi 
The University of Tokyo 
Hongo 7-3-1, Bunkyo-ku 
Tokyo 113-0033 Japan 
Abstract 
In order to grasp an object, we need to solve the inverse kine- 
matics problem, i.e., the coordinate transformation from the visual 
coordinates to the joint angle vector coordinates of the arm. Al- 
though several models of coordinate transformation learning have 
been proposed, they suffer from a number of drawbacks. In human 
motion control, the learning of the hand position error feedback 
controller in the inverse kinematics solver is important. This paper 
proposes a novel model of the coordinate transformation learning 
of the human visual feedback controller that uses the change of 
the joint angle vector and the corresponding change of the square 
of the hand position error norm. The feasibility of the proposed 
model is illustrated using numerical simulations. 
1 INTRODUCTION 
The task of calculating every joint angle that would result in a specific hand position 
is called the inverse kinematics problem. An important topic in neuroscience is the 
study of the learning mechanisms involved in the human inverse kinematics solver. 
We questioned five pediatricians about the motor function of infants suffering from 
serious upper limb disabilities. The doctors stated that the infants still were able 
to touch and stroke an object without hindrance. In one case, an infant without 
a thumb had a major kinematically influential surgical operation, transplanting an 
index finger as a thumb. After the operation, the child was able to learn how to 
use the index finger like a thumb [1]. In order to explain the human motor learning 
*Phone:+81-298-58-7298, Fax:+81-298-58-7201, e-mail:eimei@mel.go.jp 
Coordinate Transformation Learning of Feedback Controller 1039 
capability, we believe that the coordinate transformation learning of the feedback 
controller is a necessary component. 
Although a number of learning models of the inverse kinematics solver have been 
proposed, a definitive learning model has not yet been obtained. This is from the 
point of view of the structural complexity of the learning model and the biolog- 
ical plausibility of employed hypothesis. The Direct Inverse Modeling employed 
by many researchers [2] requires the complex switching of the input signal of the 
inverse model. When the hand position control is performed, the input of the in- 
verse model is the desired hand position, velocity, or acceleration. When the inverse 
model learning is performed, the input is the observed hand position, velocity, or 
acceleration. Although the desired signal and the observed signal could coincide, 
the characteristics of the two signals are very different. Currently, no research has 
succeesfully modeled the switching system. Furthermore, that learning model is not 
"goal-directed"; i.e., there is no direct way to find an action that corresponds to a 
particular desired result. The Forward and Inverse Modeling proposed by Jordan 
[3] requires the back-propagation signal, a technique does not have a biological ba- 
sis. That model also requires the complex switching of the desired output signal 
for the forward model. When the forward model learning is performed, the desired 
output is the observed hand position. When the inverse kinematics solver learn- 
ing is performed, the desired output is the desired hand position. The Feedback 
Error Learning proposed by Kawato [4] requires a pre-existing accurate feedback 
controller. 
It is necessary to obtain a learning model that possesses a number of characteristics: 
(1) it can explain the human learning function; (2) it has a simple structure; and 
(3) it is biologically plausible. This paper presents a learning model of coordinate 
transformation function of the hand position feedback controller. This model uses 
the joint angle vector change and the corresponding change of square of the hand 
position error norm. 
2 BACKGROUND 
2.1 Discrete Time First Order Model of Hand Position Controller 
Let 0  R m be the joint angle vector and x  R" be the hand position/orientation 
vector given by the vision system. The relationship between x and 0 is expressed 
as x - f(O) where f is a C  class function. The Jacobian of the hand position 
vector is expressed as J(O) = Of(O)/00. Let Xd be the desired hand position and 
e -- Xd -- x = Xd -- f(O) be the hand position error vector. In this paper, an 
inverse kinematics problem is assumed to be a least squares minimization problem 
that calculates 0 in order to minimize the square of the hand position error norm 
$(zd, 0) -lel2/2 -Izd- j(0)12/2. 
First, the feed-forward controller in the human inverse kinematics solver is disre- 
garded and the following first order control system, consisting of a learning feedback 
controller, is considered: 
0(k+ 1) = O(k) +AO(k) (1) 
Disturbance Joint Angle 
Position [Feedback ' Noise d(k) Vector 
Error + z__ 
 e(k) Controller  
Desired 
Hand [ '   
Position  T 
Hand 
Position 
x(? 
Figure 1: Configuration of 1-st Order Model of Hand Position Controller 
1040 E. Oyama and S. Tachi 
zx0(k) = fb(0(k),e()) + a() (2) 
e() = xd- f(0()) (a) 
where d(k) is assumed to be a disturbance noise from all components except the 
hand position control system. Figure 1 shows the configuration of the control sys- 
tem. In this figure, z -x is the operator that indicates a delay in the discrete time 
signal by a sampling interval of At. Although the human hand position control sys- 
tem includes higher order complex dynamics terms which are ignored in Equation 
(2), McRuer's experimental model of human compensation control suggests that the 
term that converts the hand position error to the hand velocity is a major term in 
the human control system [5]. We consider Equation (2) to be a good approximate 
model for the analysis of human coordinates transformation learning. 
The learner fb(0, e)  R ", which provides the hand position error feedback, is 
modeled using the artificial neural network. In this paper, the hand position error 
feedback controller learning by observing output x(k) is considered without any 
prior knowledge of the function f(O). 
2.2 Learning Model of the Neural Network 
Let Ib(O,e ) be the desired output of the learner Ifb(O,e). Ib(O,e ) functions as 
a teacher for I'fb(O, e). Let fb(O, e) be the updated output of I'ib(O, e) by the 
learning. Let E[t(O, e)[O, e] be the expected value of a scalar, a vector, or a matrix 
function t(O, e) when the input vector (0, e) is given. We assume that I'ib(O, e) is 
an ideal learner which is capable of realizing the mean of the desired output signal, 
completely. 'I'+ib(O, e) can be expressed as follows: 
 I'b(O,e )  E[I,(O,e)lO, e ] = I'.fb(O,e) + E[AI'.fb(O,e)lO, e ] (4) 
Ab(0, ) = (0, ) - (0, e) (5) 
When the expected value of Alb(0, e) is expressed as: 
E[,x,f(o,)lO,]  - a(0, ), () 
Ri b  R  x  is a positive definite matrix, d the inequality 
0(0, ) 0( -(n - )(0, )) 
10e) I = I 0fb(0, e) I < 1 (7) 
is satisfied, the fin learning result c be expressed : 
 f,(O,e) m RGive (8) 
by the iteration of the update of f, (0, e) expressed in Equation (4). 
3 USE OF CHANGE OF POSITION ERROR NORM 
3.1 A Novel Learning Model of Feedback Controller 
The change of the square of the hand position error norm S = S(xd, 0 + 0) -- 
S(xd, O) reflects whether or not the change of the joint angle vector A0 is in proper 
direction. The propose novel leaning model can be expressed  follows: 
where a is a small positive re number. We now consider a lge number of tris 
of Equation (2) with a large variety of initial status 0(0) with learnings conducted 
at the point of the input space of the feedback controller (0, e) = (O(k - 1), e(k - 1)) 
at time k. S d 0 can be cculated as follows. 
S = S(k) - S(k- 1) = ([e(k)l  -le(k- 1)1 ) (10) 
0 = 0(- 1) (xx) 
Coordinate Transformation Learning of Feedback Controller 1041 
Change of Square of 
[ S_.quat..of [ Hand Position Error Norm 
Hand 
Position 
Error 
e(k) 
Error Signal for '  1 
Feedback lnpul for  ] lnpm for 
Learning ', ] Control 
 AO(10 
Change of 
JointAngle 
Vector 
(15) 
where  is a 2-operand operator that indicates the Croneker's product. H(xd, O)  
R TM is the Hessian of S(xd, 0). O(AO 3) is the sum of third nd higher order 
terms of A0 in each equation. When A0 is small enough, the following approximate 
equations axe obtained: 
Therefore, AS can be approximated as follows: 
AS  -erJ(O)AO + 1 
10J(O) 
2 O0 
--  A0)A0 (16) 
(17) 
determined as: 
1 
AS(xa, O) - OS(xa, O) AO+ AoTH(xa, O)AO +O(AO 3) 
oo 
10J(O)  
= --eT(j(O) + 20  AO)AO + AOTjT(O)J(O)AO + O(AO a) 
Distutbalme 
Con.oiler d( k ) 
o(k- l ) o(k ) 
Figure 2: Configuration of Learning Model of Feedback Controller 
Figure 2 shows the conceptual diagram of the proposed learning model. 
Let p(q[O, e) be the probability density function of a vector q at at the point (0, e) 
in the input space of Ib(0, e). In order to simplify the analysis of the proposed 
learning model, d(k) is assumed to satisfy the following equation: 
p(dlO, e) = p(-dlO, e) (12) 
When A0 is small enough, the result of the learning using Equation (9) can be 
expressed as: 
 I, ib(O,e )  a(RoJr(O)J(O) + I)-XRoJr(O)e (13) 
Ro = E[AoAoTIO, e] (14) 
where Jr(O)e is a vector in the steepest descent direction of S(xa, 0). When d(k) is 
a non-zero vector, Ro is a positive definite symmetric matrix and (RoJrJ + I) -1 
is a positive definite matrix. When a is appropriate, l,(0, e) as expressed in 
Equation (13) can provide appropriate output error feedback control. The deriva- 
tion of the above result will be illustrated in Section 3.2. A partially modified 
steepest descent direction can be obtained without using the forward model or the 
back-propagation signal, as Jordan's forward modeling [3]. 
Let Ra be the covariance matrix of the disturbance noise d(k). When a is infinites- 
imal, Ro  -Id is established and an approximate solution I'fb(O, e)  OdZtdJT (o)e 
is obtained. 
3.2 Derivation of Learning Result 
The change of the square of the hand position error norm AS(xd, O) by A0 can be 
1042 E. Oyarna and S. Tachi 
Since eTJAAO = AOATjTe and [Ax12A0 = AOAOTjTJAO are deter- 
mined, ASAO can be approximated : 
SO  -OOrJr(O)e + OOrJr(O)J(O)O (18) 
Considering 0i, defined as 0i, = 0 - i,(0, e), the expected vue of the 
product of 0 and AS at the point (0, e) in the input space of i,(0, e) c be 
approximated as follows: 
E[SOIO, e ]  -oare + oara(O,) (0) 
When the arm is controlled according to Equation (2), 0i, is the disturbance 
noise d(k). Since d(k) satisfies Equation (12), the following equation is established. 
r a r a = o (20) 
Therefore, the expected vue of i,(0, e) can be expressed as; 
When a is sml enough, the condition described in Equation (7) is established. 
The learning result expressed  Equation (13) is obtned  described in Section 
2.2. 
It should be noted that the learning algorithm expressed in Equation (9) is ap- 
plicable not only to $(xt, 0), but also to general penalty functions of hand po- 
sition error norm lel. The proposed learning model synthesizes a direction that 
decreases S(xt, O) by summing after weighting A0 based on the increase or de- 
crease of S (x d, 0). 
The feedback controller defined in Equation (13) requires a number of iterations 
to find a correct inverse kinematics solution, as the coordinates transformation 
function of the controller is incomplete. However, by using Kawato's feedback 
error learning [4], the second feedback controller; the feed-forward controller; or the 
inverse kinematics model that has a complete coordinate transformation function 
can be obtained as shown in Section 4. 
4 TRACKING CONTROL SYSTEM LEARNING 
In this section, we will consider the case where xd changes as xa(k)(k = 
1, 2,...). The hybrid controller that includes the learning feed-forward controller 
I)yy(O(k), Axd(k))  R m that transforms the change of the desired hand position 
Axa(k) = x(k + 1) - x(k) to the joint angle vector space is considered: 
zx0(k) = ff(0(k), zxxd()) + (22) 
e(k) = xa() - x() (23) 
The configuration of the hybrid controller is illustrated in Figure 3. 
By using the modified change of the square of the error norm expressed as: 
1 
AS -- ([xa(k- 1) - x(k)l 2 -le(k- 1)12) (24) 
and AO(k) as defined in Equation (22), the feedback controller learning rule defined 
in Equation (9) is useful for the tracking control system. A sample holder for 
memorizing xa(k - 1) is necessary for the calculation of AS. When the distribution 
Coordinate Transformation Learning of Feedback Controller 1043 
r Sigdfr '  Feedforw-d 
Confrolic, Controller 
Zxd(k)' :  I IX I r 
J(O)J*(O)=i j(o),. 3o__ 
Hmm Arm 
x=f(O) 
Figure 3: Configuration of Hybrid Controller 
of Axt(k) satisfies Equation (20), Equation (13) still holds. When Axt(k) has no 
correlation with d(k) and Axt(k) satisfies p(Axd10, e)= p(-AxtlO, e), Equation 
(20) is approximately established after the feed-forward controller learning. 
Using AO(k) defined in Equation (2) and e(k) defined in Equation (23), AS de- 
fined in Equation (10) can be useful for the calculation of b(0, e). Although the 
learning calculation becomes simpler, the learning speed becomes much lower. 
Let '//(0(k), Axt(k)) be the desired output of .fi(O(k), Axil(k)). According to 
Kawato's feedback error learning [4], we use .f.f(O(k), Axt(k)) expressed as: 
 'l/(O(k),Axa(k)) = (1 - A),Ill(O(k),Axa(k)) + lb(O(k + 1),e(k + 1)) (25) 
where A is a small, positive, real nufnber for stabilizing the learning process and 
ensuring that equation /(0, 0) m 0 holds. If A is small enough, the learning 
feed-forward controller will fulfill the equation: 
Jii(O, Axt)  Axt 
5 NUMERICAL SIMULATION 
Numerical simulation experiments were performed in order to evaluate the perfor- 
mance of the proposed model. The inverse kinematics of a 3 DOF arm moving on 
a 2 DOF plane were considered. The relationship between the joint angle vector 
0 - (0x, 02,0s) T and the hand position vector x = (x, y)T was defined as: 
 = + cos(S) + cos(S + cos(S + + (27) 
Y = Y0 + L sin(0) + L2 sin(0 + 02) + Ls sin(0 + 02 + Os) (28) 
The range for 0 was (-30 , 120); the range for 02 was (0 , 120); and the range 
for 0a was (-75 , 75). L was 0.30 m, L2 was 0.25 m and La was 0.15 m. 
Random straight lines were generated as desired trajectories for the hand. The 
tracking control trials expressed as Equation (22) with the learning of the feedback 
controller and the feed-forward controller were performed. The standard deviation 
of each component of d was 0.01. Learnings based on Equations (9), (22), (24), 
and (25) were conducted 20 times in one tracking trial. 1,000 tracking trials were 
conducted to estimate the RMS(Root Mean Square) of e(k). 
In order to accelerate the learning, a in Equation (9) was modified as a = 
0.5/(IzXxl + 0.11zX0l). x in Equation (25) was set to 0.001. 
Two neural networks with 4 layers were used for the simulation. The first layer had 
5 neurons and the forth layer had 3 neurons. The other layers had 15 neurons each. 
The first layer and the forth layer consisted of linear neurons. The initial values of 
weights of the neural networks were generated by using uniform random numbers. 
The back-propagation method without optimized learning coefficients was utilized 
for the learning. 
1044 E. Oyama and S. Tachi 
Number of Trials 
Figure 4: Learning Process of Controller 
Y 
2q) 10 
0.5 X 
Figure 5: One Example of Tracking Control 
Figure 4 shows the progress of the proposed learning model. It can be seen that the 
RMS error decreases and the precision of the solver becomes higher as the number 
of trials increases. The RMS error became 9.31 x 10-am after 2 x 107 learning trials. 
Figure 5 illustrates the hand position control by the inverse kinematics solver after 
2 x 107 learning trials. The number near the end point of the arm indicates the 
value of k. The center of the small circle in Figure 5 indicates the desired hand 
position. The center of the large circle indicates the final desired hand position. 
Through learning, a precise inverse kinematics solver can be obtained. However, 
for RMS error to fall below 0.02, trials must be repeated more than 106 times. In 
such cases, more efficient learner or a learning rule is necessary. 
6 CONCLUSION 
A learning model of coordinate transformation of the hand position feedback con- 
troller was proposed in this paper. Although the proposed learning model may 
take a long time to learn, it is capable of learning a correct inverse kinematics 
solver without using a forward model, a back-propagation signal, or a pre-existing 
feedback controller. 
We believe that the slow learning speed can be improved by using neural networks 
that have a structure suitable for the coordinate transformation. A major limitation 
of the proposed model is the structure of the learning rule, since the learning rule 
requires the calculation of the product of the change of the error penalty function 
and the change of the joint angle vector. However, the existence of such structure in 
the nervous system is unknown. An advanced learning model which can be directly 
compared with the physiological and psychological experimental results is necessary. 
References 
[1] T. Ogino and S. Ishii,"Long-term Results after Pollicization for Congenital 
Hand Deformities," Hand Surgery, 2, 2,pp.79-85,1997 
[2] F. H. Guenther and D. M. Barreca," Neural models for flexible control of redun- 
dant systems," in P. Morasso and V. Sangnineti (Eds.), Self-organization, Com- 
putational Maps, and Motor Control. Amsterdam: Elsevier, pp.383-421,1997 
[3] M. I. Jordan, "Supervised Learning and Systems with Excess Degrees of Free- 
dom," COINS Technical Report,88-27,pp.1-41,1988 
[4] M. Kawato, K. Furukawa and R. Suzuki, "A Hierarchical Neural-network 
Model for Control and Learning of Voluntary Movement," Biological Cyber- 
netics, 57, pp.169-185, 1987 
[5] D.T. McRuer and H. R. Jex,"A Review of Quasi-Linear Pilot Models," IEEE 
Trans. on Human Factors in Electronics, HFE-8, 3, pp.38-51, 1963 
