## Description

Abstract

This week’s Python assignment will focus on the following topics:

• Reading data from files and parsing them

– Analysing the data to extract information

– Study the effect of noise on the fitting process

– Plotting graphs

Plotting graphs

You can use matplotlib to plot sophisticated graphs in Python.

Here is a simple program to plot J0(x) for 0 < x < 10. (type it in and see)

In [48]: from pylab import *; from scipy.special import *

In [49]: x=arange(0,10,.1)

In [50]: y=jv(0,x)

In [51]: plot(x,y)

In [52]: show()

The import keyword imports a module as already discussed. The “pylab” module is a super module that

imports everything needed to make python seem to be like Matlab.

The actual code is four lines. One defines the x values. The second computes the Bessel function. The

third plots the curve while the last line displays the graphic.

The plot command has the following syntax:

plot(x1,y1,style1,x2,y2,style2,…)

helping you to plot multiple curves in the same plot.

The Assignment

1. The site has a python script generate_data.py. Download the script and run it to generate a

set of data. The data is written out to a file whose name is fitting.dat.

1 h* 1i≡

# script to generate data files for the least squares assignment

from pylab import *

import scipy.special as sp

N=101 # no of data points

k=9 # no of sets of data with varying noise

# generate the data points and add noise

1

t=linspace(0,10,N) # t vector

y=1.05*sp.jn(2,t)-0.105*t # f(t) vector

Y=meshgrid(y,ones(k),indexing=’ij’)[0] # make k copies

scl=logspace(-1,-3,k) # noise stdev

n=dot(randn(N,k),diag(scl)) # generate k vectors

yy=Y+n # add noise to signal

# shadow plot

plot(t,yy)

xlabel(r’$t$’,size=20)

ylabel(r’$f(t)+n$’,size=20)

title(r’Plot of the data to be fitted’)

grid(True)

savetxt(“fitting.dat”,c_[t,yy]) # write out matrix to file

show()

2

2. Load fitting.dat (look up loadtxt). The data consists of 10 columns. The first column is

time, while the remaining columns are data. Extract these.

3. The data columns correspond to the function

f(t) = 1.05J2 (t)−0.105t +n(t)

with different amounts of noise added. What is noise? For us here, it is random fluctuations in

the value due to many small random effects. The noise is given to be normally distributed, i.e., its

probability distribution is given by

Pr(n(t)|σ) = 1

σ

√

2π

exp

−

n(t)

2

2σ2

with σ given by

sigma=logspace(-1,-3,9)

Plot the curves in Figure 0 and add labels to indicate the amount of noise in each. (Look up plot

and legend.)

4. We want to fit a function to this data. Which function? The function has the same general shape as

the data but with unknown coefficients:

g(t;A,B) = AJ2 (t) +Bt

Create a python function g(t,A,B) that computes g(t;A,B) for given A and B. Plot it in Figure 0

for A = 1.05, B = −0.105 (this should be labelled as the true value.)

5. Generate a plot of the first column of data with error bars. Plot every 5th data item to make the plot

readable. Also plot the exact curve to see how much the data diverges.

This is not difficult. Suppose you know the standard deviation of your data and you have the data

itself, you can plot the error bars with red dots using

errorbar(t,data,stdev,fmt=’ro’)

Here, ’t’ and ’data’ contain the data, while ’stdev’ contains σn for the noise. In order to show every

fifth data point, you can instead use

errorbar(t[::5],data[::5],stdev,fmt=’ro’)

After plotting the errorbars, plot the exact curve using the function written in part 4, and annotate the

graph.

6. For our problem, the values of t are discrete and known (from the datafile). Obtain g(t,A,B) as a

column vector by creating a matrix equation:

g(t,A,B) =

J2(t1) t1

… …

J2(tm) tm

A

B

≡ M · p

Construct M and then generate the vector M ·

A0

B0

and verify that it is equal to g(t,A0,B0).

How will you confirm that two vectors are equal?

Note: To construct a matrix out of column vectors, create the column vectors first and then use

c_[…]. For instance, if you have two vectors x and y, use M=c_[x,y]

3

7. For A = 0,0.1,…,2 and B = −0.2,−0.19,…,0, for the data given in columns 1 and 2 of the file,

compute

εi j =

1

101

101

∑

k=0

(fk −g(tk

,Ai

,Bj))2

This is known as the “mean squared error” between the data (fk) and the assumed model. Use the

first column of data as fk for this part.

8. Plot a contour plot of εi j and see its structure. Does it have a minimum? Does it have several?

9. Use the Python function lstsq from scipy.linalg to obtain the best estimate of A and B. The

array you created in part 6 is what you need. This is sent to the least squares program.

10. Repeat this with the different columns (i.e., columns 1 and i). Each column has the same function

above, with a different amount of noise added as mentioned above. Plot the error in the estimate of

A and B for different data files versus the noise σ. Is the error in the estimate growing linearly with

the noise?

11. Replot the above curves using loglog. Is the error varying linearly? What does this mean?

Linear Fitting to Data

Perhaps the most common engineering use of a computer is the modelling of real data. That is to say, some

device, say a tachometer on an induction motor, or a temperature sensor of an oven or the current through

a photodiode provides us with real-time data. This data is usually digitised very early on in the acquisition

process to preserve the information, leaving us with time sequences,

(t,x) = {ti

, xi}

N

i=1

If the device has been well engineered, and if the sensor is to be useful in diagnosing and controlling the

device, we must also have a model for the acquired data:

f(t; p1,…, pN)

For example, our model could be

f(t; p1, p2) = p1 + p2 sin(πt

2

)

Our task is to accurately predict p1,…, pN given the real-time data. This would allow us to say things like,

“Third harmonic content in the load is reaching dangerous levels”.

The general problem of the estimation of model parameters is going to be tackled in many later courses.

Here we will tackle a very simple version of the problem. Suppose my model is “linear in the parameters”,

i.e.,

f(t; p1,…, pN) =

N

∑

i=1

piFi(t)

where Fi(t) are arbitrary functions of time. Our example above fits this model:

p1, p2 : parameters to be fitted

F1 : 1

F2 : sin(πt

2

)

For each measurement, we obtain an equation:

1 · p1 +sin(πt

2

1

)p2 = x1

1 · p1 +sin(πt

2

2

)p2 = x2

… … …

1 · p1 +sin(πt

2

M)p2 = xM

4

Clearly the general problem reduces to the inversion of a matrix problem:

F1(t1) F2(t1) … FN(t1)

F1(t2) F2(t2) … FN(t2)

… … … …

F1(tM) F2(tM) … FN(tM)

p1

p2

…

pN

=

x1

x2

…

xM

However, the number of parameters, N, is usually far less than the number of observations, M. In the

absence of measurement errors and noise, any non-singular N ×N submatrix can be used to determine the

coefficients pi

. When noise is present, what can we do?

The matrix equation now becomes

F ·~p =~x0 +~n =~x

where~n is the added noise. ~x0 is the (unknown) ideal measurement, while~x is the actual noisy measurement

we make. We have to assume something about the noise, and we assume that it is as likely to be positive

as negative (zero mean) and it has a standard deviation σn. Clearly the above equation cannot be exactly

satisfied in the presence of noise, since the rank of F is N wheras the number of observations is M N.

We also make a very important assumption, namely that the noise added to each observation xi

, namely

ni

, is “independent” of the noise added to any other observation. Let us see where this gets us.

We wish to get the “best” guess for ~p. For us, this means that we need to minimize the L2 norm of the

error. The error is given by

ε = F ·~p−~x

The norm of the error is given by

~ε

T

·~ε = ∑

i

ε

2

i

This norm can we written in matrix form as

(F ·~p−~x)

T

·(F ·~p−~x)

which is what must be minimized. Writing out the terms

~p

T

F

TF

~p+~x

T

~x−~p

TF

T

~x−~x

TF~p

Suppose the minimum is reached at some ~p0. Then near it, the above norm should be greater than that

minimum. If we plotted the error, we expect the surface plot to look like a cup. The gradient of the error at

the minimum should therefore be zero. Let us take the gradient of the expression for the norm. We write

F

TF = M, and write out the error in “Einstein notation”:

error = piMi j pj +xjxj − piF

T

i j xj −xiFi j pj

Here we have suppressed the summation signs over i and j. If an index repeats in an expression, it is

assumed to be summed over. Differentiating with respect to pk

, and assuming that ∂ pi/∂ pk = δik, we get

∂ error

∂ pk

= δkiMi j pj + piMi jδjk −δkiF

T

i j xj −xiFi jδjk = 0

= Mk j pj + piMik −F

T

k jxj −xiFik

= ∑

j

Mk j + Mjk

pj −2∑

j

F

T

k jxj

Now the matrix M is symmetric (just take the transpose and see). So the equation finally becomes, written

as a vector expression

∇error(~p) = 2

F

TF

~p0 −2F

T

~x = 0

i.e.,

~p0 =

F

TF

−1

F

T

~x

This result is very famous. It is so commonly used that scientific packages have the operation as a built in

command. In Python, it is a library function called lstsq

from scipy.linalg import lstsq

p,resid,rank,sig=lstsq(F,x)

where p returns the best fit, and sig, resid and rank return information about the process.

In Scilab or Matlab, it take the form:

p0 = F\x;

Here F is the coefficient matrix and x is the vector of observations. When we write this, Scilab is actually

calculating

p0=inv(F’*F)*F’*x;

What would happen if the inverse did not exist? This can happen if the number of observations are too few,

or if the vectors fi(xj) (i.e., the columns of F) are linearly dependent. Both situations are the user’s fault.

He should have a model with linearly independent terms (else just merge them together). And he should

have enough measurements.

p0 obtained from the above formula is a prediction of the exact parameters. How accurate can we

expect p0 to be? If x were x0 we should recover the exact answer limited only by computer accuracy. But

when noise is present, the estimate is approximate. The more noise, the more approximate.

Where did we use all those properties of the noise? We assumed that the noise in different measurements was the same when we defined the error norm. Suppose some measurements are more noisy. Then

we should give less importance to those measurements, i.e., weight them less. That would change the

formula we just derived. If the noise samples were not independent, we would need equations to explain

just how they depended on each other. That too changes the formula. Ofcourse, if the model is more

complicated things get really difficult. For example, so simple a problem as estimating the frequency of the

following model

y = Asinωt +Bcosωt +n

is an extremely nontrivial problem. That is because it is no longer a “linear” estimation problem. Such

problems are discussed in advanced courses like Estimation Theory. The simpler problems of correlated,

non-uniform noise will be discussed in Digital Communication since that theory is needed for a cell phone

to estimate the signal sent by the tower.

In this assignment (and in this course), we assume independent, uniform error that is normally distributed. For that noise, as mentioned above, Python already provides the answer:

p=lstsq(F,x)

Even with all these simplifications, the problem can become ill posed. Let us look at the solution again:

~p0 =

F

TF

−1

F

T

~x

Note that we have to invert F

TF. This is the coefficient matrix. For instance, if we were trying to estimate

A and B of the following model:

y = Asinω0t +Bcosω0t

(not the frequency – that makes it a nonlinear problem), the matrix F becomes

F =

sinω0t1 cosω0t1

sinω0t2 cosω0t2

… …

sinω0tn cosω0tn

Hence, F

TF is a 2 by 2 matrix with the following elements

F

TF =

∑sin2ω0ti ∑sinω0ti cosω0ti

∑cosω0tisinω0ti ∑cos2ω0ti

Whether this matrix is invertible depends only on the functions sinω0t and cosω0t and the times at which

they are sampled. For instance, if the tk = 2kπ, the matrix becomes

F

TF =

0 0

0 n

since sinω0tk ≡ 0 for all the measurement times. Clearly this is not an invertible matrix, even though

the functions are independent. Sometimes the functions are “nearly” dependent. The inverse exists, but

it cannot be accurately calculated. To characterise this, when an inverse is calculated, the “condition

number” is also returned. A poor condition number means that the inversion is not to be depended on and

the estimation is going to be poor.

We will not use these ideas in this lab. Here we will do a very simple problem only, to study how the

amount of noise affects the quality of the estimate. We will also study what happens if we use the wrong

model.

7