# EE2703 Week 3: Fitting Data to Models solution

\$30.00

Original Work ?

## Description

Abstract
This week’s Python assignment will focus on the following topics:
• Reading data from files and parsing them
– Analysing the data to extract information
– Study the effect of noise on the fitting process
– Plotting graphs
Plotting graphs
You can use matplotlib to plot sophisticated graphs in Python.
Here is a simple program to plot J0(x) for 0 < x < 10. (type it in and see)
In [48]: from pylab import *; from scipy.special import *
In [49]: x=arange(0,10,.1)
In [50]: y=jv(0,x)
In [51]: plot(x,y)
In [52]: show()
The import keyword imports a module as already discussed. The “pylab” module is a super module that
imports everything needed to make python seem to be like Matlab.
The actual code is four lines. One defines the x values. The second computes the Bessel function. The
third plots the curve while the last line displays the graphic.
The plot command has the following syntax:
plot(x1,y1,style1,x2,y2,style2,…)
helping you to plot multiple curves in the same plot.
The Assignment
1. The site has a python script generate_data.py. Download the script and run it to generate a
set of data. The data is written out to a file whose name is fitting.dat.
1 h* 1i≡
# script to generate data files for the least squares assignment
from pylab import *
import scipy.special as sp
N=101 # no of data points
k=9 # no of sets of data with varying noise
# generate the data points and add noise
1
t=linspace(0,10,N) # t vector
y=1.05*sp.jn(2,t)-0.105*t # f(t) vector
Y=meshgrid(y,ones(k),indexing=’ij’)[0] # make k copies
scl=logspace(-1,-3,k) # noise stdev
n=dot(randn(N,k),diag(scl)) # generate k vectors
yy=Y+n # add noise to signal
plot(t,yy)
xlabel(r’\$t\$’,size=20)
ylabel(r’\$f(t)+n\$’,size=20)
title(r’Plot of the data to be fitted’)
grid(True)
savetxt(“fitting.dat”,c_[t,yy]) # write out matrix to file
show()
2
2. Load fitting.dat (look up loadtxt). The data consists of 10 columns. The first column is
time, while the remaining columns are data. Extract these.
3. The data columns correspond to the function
f(t) = 1.05J2 (t)−0.105t +n(t)
with different amounts of noise added. What is noise? For us here, it is random fluctuations in
the value due to many small random effects. The noise is given to be normally distributed, i.e., its
probability distribution is given by
Pr(n(t)|σ) = 1
σ

exp

n(t)
2
2σ2

with σ given by
sigma=logspace(-1,-3,9)
Plot the curves in Figure 0 and add labels to indicate the amount of noise in each. (Look up plot
and legend.)
4. We want to fit a function to this data. Which function? The function has the same general shape as
the data but with unknown coefficients:
g(t;A,B) = AJ2 (t) +Bt
Create a python function g(t,A,B) that computes g(t;A,B) for given A and B. Plot it in Figure 0
for A = 1.05, B = −0.105 (this should be labelled as the true value.)
5. Generate a plot of the first column of data with error bars. Plot every 5th data item to make the plot
readable. Also plot the exact curve to see how much the data diverges.
This is not difficult. Suppose you know the standard deviation of your data and you have the data
itself, you can plot the error bars with red dots using
errorbar(t,data,stdev,fmt=’ro’)
Here, ’t’ and ’data’ contain the data, while ’stdev’ contains σn for the noise. In order to show every
fifth data point, you can instead use
errorbar(t[::5],data[::5],stdev,fmt=’ro’)
After plotting the errorbars, plot the exact curve using the function written in part 4, and annotate the
graph.
6. For our problem, the values of t are discrete and known (from the datafile). Obtain g(t,A,B) as a
column vector by creating a matrix equation:
g(t,A,B) =

J2(t1) t1
… …
J2(tm) tm


A
B

≡ M · p
Construct M and then generate the vector M ·

A0
B0

and verify that it is equal to g(t,A0,B0).
How will you confirm that two vectors are equal?
Note: To construct a matrix out of column vectors, create the column vectors first and then use
c_[…]. For instance, if you have two vectors x and y, use M=c_[x,y]
3
7. For A = 0,0.1,…,2 and B = −0.2,−0.19,…,0, for the data given in columns 1 and 2 of the file,
compute
εi j =
1
101
101

k=0
(fk −g(tk
,Ai
,Bj))2
This is known as the “mean squared error” between the data (fk) and the assumed model. Use the
first column of data as fk for this part.
8. Plot a contour plot of εi j and see its structure. Does it have a minimum? Does it have several?
9. Use the Python function lstsq from scipy.linalg to obtain the best estimate of A and B. The
array you created in part 6 is what you need. This is sent to the least squares program.
10. Repeat this with the different columns (i.e., columns 1 and i). Each column has the same function
above, with a different amount of noise added as mentioned above. Plot the error in the estimate of
A and B for different data files versus the noise σ. Is the error in the estimate growing linearly with
the noise?
11. Replot the above curves using loglog. Is the error varying linearly? What does this mean?
Linear Fitting to Data
Perhaps the most common engineering use of a computer is the modelling of real data. That is to say, some
device, say a tachometer on an induction motor, or a temperature sensor of an oven or the current through
a photodiode provides us with real-time data. This data is usually digitised very early on in the acquisition
process to preserve the information, leaving us with time sequences,
(t,x) = {ti
, xi}
N
i=1
If the device has been well engineered, and if the sensor is to be useful in diagnosing and controlling the
device, we must also have a model for the acquired data:
f(t; p1,…, pN)
For example, our model could be
f(t; p1, p2) = p1 + p2 sin(πt
2
)
Our task is to accurately predict p1,…, pN given the real-time data. This would allow us to say things like,
“Third harmonic content in the load is reaching dangerous levels”.
The general problem of the estimation of model parameters is going to be tackled in many later courses.
Here we will tackle a very simple version of the problem. Suppose my model is “linear in the parameters”,
i.e.,
f(t; p1,…, pN) =
N

i=1
piFi(t)
where Fi(t) are arbitrary functions of time. Our example above fits this model:
p1, p2 : parameters to be fitted
F1 : 1
F2 : sin(πt
2
)
For each measurement, we obtain an equation:
1 · p1 +sin(πt
2
1
)p2 = x1
1 · p1 +sin(πt
2
2
)p2 = x2
… … …
1 · p1 +sin(πt
2
M)p2 = xM
4
Clearly the general problem reduces to the inversion of a matrix problem:


F1(t1) F2(t1) … FN(t1)
F1(t2) F2(t2) … FN(t2)
… … … …
F1(tM) F2(tM) … FN(tM)




p1
p2

pN

 =


x1
x2

xM


However, the number of parameters, N, is usually far less than the number of observations, M. In the
absence of measurement errors and noise, any non-singular N ×N submatrix can be used to determine the
coefficients pi
. When noise is present, what can we do?
The matrix equation now becomes
F ·~p =~x0 +~n =~x
where~n is the added noise. ~x0 is the (unknown) ideal measurement, while~x is the actual noisy measurement
we make. We have to assume something about the noise, and we assume that it is as likely to be positive
as negative (zero mean) and it has a standard deviation σn. Clearly the above equation cannot be exactly
satisfied in the presence of noise, since the rank of F is N wheras the number of observations is M  N.
We also make a very important assumption, namely that the noise added to each observation xi
, namely
ni
, is “independent” of the noise added to any other observation. Let us see where this gets us.
We wish to get the “best” guess for ~p. For us, this means that we need to minimize the L2 norm of the
error. The error is given by
ε = F ·~p−~x
The norm of the error is given by

T
·~ε = ∑
i
ε
2
i
This norm can we written in matrix form as
(F ·~p−~x)
T
·(F ·~p−~x)
which is what must be minimized. Writing out the terms
~p
T

F
TF

~p+~x
T
~x−~p
TF
T
~x−~x
TF~p
Suppose the minimum is reached at some ~p0. Then near it, the above norm should be greater than that
minimum. If we plotted the error, we expect the surface plot to look like a cup. The gradient of the error at
the minimum should therefore be zero. Let us take the gradient of the expression for the norm. We write
F
TF = M, and write out the error in “Einstein notation”:
error = piMi j pj +xjxj − piF
T
i j xj −xiFi j pj
Here we have suppressed the summation signs over i and j. If an index repeats in an expression, it is
assumed to be summed over. Differentiating with respect to pk
, and assuming that ∂ pi/∂ pk = δik, we get
∂ error
∂ pk
= δkiMi j pj + piMi jδjk −δkiF
T
i j xj −xiFi jδjk = 0
= Mk j pj + piMik −F
T
k jxj −xiFik
= ∑
j

Mk j + Mjk
pj −2∑
j
F
T
k jxj
Now the matrix M is symmetric (just take the transpose and see). So the equation finally becomes, written
as a vector expression
∇error(~p) = 2

F
TF

~p0 −2F
T
~x = 0
i.e.,
~p0 =

F
TF
−1
F
T
~x
This result is very famous. It is so commonly used that scientific packages have the operation as a built in
command. In Python, it is a library function called lstsq
from scipy.linalg import lstsq
p,resid,rank,sig=lstsq(F,x)
where p returns the best fit, and sig, resid and rank return information about the process.
In Scilab or Matlab, it take the form:
p0 = F\x;
Here F is the coefficient matrix and x is the vector of observations. When we write this, Scilab is actually
calculating
p0=inv(F’*F)*F’*x;
What would happen if the inverse did not exist? This can happen if the number of observations are too few,
or if the vectors fi(xj) (i.e., the columns of F) are linearly dependent. Both situations are the user’s fault.
He should have a model with linearly independent terms (else just merge them together). And he should
have enough measurements.
p0 obtained from the above formula is a prediction of the exact parameters. How accurate can we
expect p0 to be? If x were x0 we should recover the exact answer limited only by computer accuracy. But
when noise is present, the estimate is approximate. The more noise, the more approximate.
Where did we use all those properties of the noise? We assumed that the noise in different measurements was the same when we defined the error norm. Suppose some measurements are more noisy. Then
we should give less importance to those measurements, i.e., weight them less. That would change the
formula we just derived. If the noise samples were not independent, we would need equations to explain
just how they depended on each other. That too changes the formula. Ofcourse, if the model is more
complicated things get really difficult. For example, so simple a problem as estimating the frequency of the
following model
y = Asinωt +Bcosωt +n
is an extremely nontrivial problem. That is because it is no longer a “linear” estimation problem. Such
problems are discussed in advanced courses like Estimation Theory. The simpler problems of correlated,
non-uniform noise will be discussed in Digital Communication since that theory is needed for a cell phone
to estimate the signal sent by the tower.
In this assignment (and in this course), we assume independent, uniform error that is normally distributed. For that noise, as mentioned above, Python already provides the answer:
p=lstsq(F,x)
Even with all these simplifications, the problem can become ill posed. Let us look at the solution again:
~p0 =

F
TF
−1
F
T
~x
Note that we have to invert F
TF. This is the coefficient matrix. For instance, if we were trying to estimate
A and B of the following model:
y = Asinω0t +Bcosω0t
(not the frequency – that makes it a nonlinear problem), the matrix F becomes
F =


sinω0t1 cosω0t1
sinω0t2 cosω0t2
… …
sinω0tn cosω0tn


Hence, F
TF is a 2 by 2 matrix with the following elements
F
TF =

∑sin2ω0ti ∑sinω0ti cosω0ti
∑cosω0tisinω0ti ∑cos2ω0ti


Whether this matrix is invertible depends only on the functions sinω0t and cosω0t and the times at which
they are sampled. For instance, if the tk = 2kπ, the matrix becomes
F
TF =

0 0
0 n

since sinω0tk ≡ 0 for all the measurement times. Clearly this is not an invertible matrix, even though
the functions are independent. Sometimes the functions are “nearly” dependent. The inverse exists, but
it cannot be accurately calculated. To characterise this, when an inverse is calculated, the “condition
number” is also returned. A poor condition number means that the inversion is not to be depended on and
the estimation is going to be poor.
We will not use these ideas in this lab. Here we will do a very simple problem only, to study how the
amount of noise affects the quality of the estimate. We will also study what happens if we use the wrong
model.
7