Description
1 The Assignment
1. Bessel functions are very often seen in cylindrical geometry. They come in several forms, and we
will look at the Bessel function of the first type, Jν (x).
0 5 10 15 20 25
x
0.3
0.2
0.1
0.0
0.1
0.2
0.3
0.4
0.5
Jº
(
x
)
For large x,
Jν (x) ≈
r
2
πx
cos
x−
νπ
2
−
π
4
(1)
(a) Generate a vector of 41 values from 0 to 20 and obtain a vector of J1 (x) values.
(b) for different x0 = 0.5,1,…,18 extract the subvectors of x and J1(x) that correspond to x ≥ x0.
For each x0, construct the matrix corresponding to
Acos(xi) +Bsin(xi) ≈ J1(xi)
and obtain the best fit A and B. Obtain the φ corresponding to the solution (divide by √
A2 +B2
and identify A/
√
A2 +B2
to be cosφ). Hence predict ν in Eq. (??).
1
(c) Repeat with a model that also includes the amplitude:
A
cos(xi)
√
xi
+B
sin(xi)
√
xi
≈ J1(xi)
How do the two fits compare?
(d) Convert the above to a function that can be called via
nu=calcnu(x,x0,’r’,eps,model)
where eps is the amount of noise added (next problem) and model is whether to run (b) or (c)
above.
(e) Now add noise using eps*randn(10) where eps is the amount of the noise. See the effect on the
fit. Plot the fit for eps of 0.01.
0 2 4 6 8 10 12 14 16 18
x0
0.90
0.92
0.94
0.96
0.98
1.00
1.02
1.04
º
² =0,model (b)
² =0, model (c)
² =1:0e¡02, model (c)
You should get something like the above. The blue dots are for problem (b), while the green
dots are for problem (c). Both work but green works better, since it is a better model. When
noise is added you get the red dots. Given that the correct answer is 1.0 you can see the problem
– adding noise makes things worse at large x0. But large x0 is where the noiseless fit is best.
(f) Try varying the number of measurements (keeping the range of x the same). What happens?
(g) Discuss the effect of model accuracy, number of measurements, and the effect of noise on the
quality of the fit.
2 Python Details
2.1 Creating Functions
This was discussed last time
def f(x):
return sin(x)
To return multiple return values, return a “tuple” containing the values:
def f(x):
return (sin(x),cos(x))
2
3 Linear Fitting to Data
Perhaps the most common engineering use of a computer is the modelling of real data. That is to say, some
device, say a tachometer on an induction motor, or a temperature sensor of an oven or the current through
a photodiode provides us with real-time data. This data is usually digitised very early on in the acquisition
process to preserve the information, leaving us with time sequences,
(t,x) = {ti
, xi}
N
i=1
If the device has been well engineered, and if the sensor is to be useful in diagnosing and controlling the
device, we must also have a model for the acquired data:
f(t; p1,…, pN)
For example, our model could be
f(t; p1, p2) = p1 + p2 sin(πt
2
)
Our task is to accurately predict p1,…, pN given the real-time data. This would allow us to say things like,
“Third harmonic content in the load is reaching dangerous levels”.
The general problem of the estimation of model parameters is going to be tackled in many later courses.
Here we will tackle a very simple version of the problem. Suppose my model is “linear in the parameters”,
i.e.,
f(t; p1,…, pN) =
N
∑
i=1
piFi(t)
where Fi(t) are arbitrary functions of time. Our example above fits this model:
p1, p2 : parameters to be fitted
F1 : 1
F2 : sin(πt
2
)
For each measurement, we obtain an equation:
1 · p1 +sin(πt
2
1
)p2 = x1
1 · p1 +sin(πt
2
2
)p2 = x2
… … …
1 · p1 +sin(πt
2
M)p2 = xM
Clearly the general problem reduces to the inversion of a matrix problem:
F1(t1) F2(t1) … FN(t1)
F1(t2) F2(t2) … FN(t2)
… … … …
F1(tM) F2(tM) … FN(tM)
p1
p2
…
pN
=
x1
x2
…
xM
However, the number of parameters, N, is usually far less than the number of observations, M. In the
absence of measurement errors and noise, any non-singular N ×N submatrix can be used to determine the
coefficients pi
. When noise is present, what can we do?
The matrix equation now becomes
F ·~p =~x0 +~n =~x
where~n is the added noise. ~x0 is the ideal measurement, while~x is the actual noisy measurement we make.
We have to assume something about the noise, and we assume that it is as likely to be positive as negative
(zero mean) and it has a standard deviation σn. We also make a very important assumption, namely that the
noise added to each observation xi
, namely ni
, is “independent” of the noise added to any other observation.
Let us see where this gets us.
3
We wish to get the “best” guess for ~p. For us, this means that we need to minimize the L2 norm of the
error. The error is given by
ε = F ·~p−~x
The norm of the error is given by
~ε
T
·~ε = ∑
i
ε
2
i
This norm can we written in matrix form as
(F ·~p−~x)
T
·(F ·~p−~x)
which is what must be minimized. Writing out the terms
~p
T
F
TF
~p+~x
T
~x−~p
TF
T
~x−~x
TF~p
Suppose the minimum is reached at some ~p0. Then near it, the above norm should be greater than that
minimum. If we plotted the error, we expect the surface plot to look like a cup. The gradient of the error at
the minimum should therefore be zero. Let us take the gradient of the expression for the norm. We write
F
TF = M, and write out the error in “Einstein notation”:
error = piMi j pj +x jx j − piF
T
i j xj −xiFi j pj
Here we have suppressed the summation signs over i and j. If an index repeats in an expression, it is
assumed to be summed over. Differentiating with respect to pk
, and assuming that ∂ pi/∂ pk = δik, we get
∂ error
∂ pk
= δkiMi j pj + piMi jδjk −δkiF
T
i j xj −xiFi jδjk = 0
= Mk j pj + piMik −F
T
k jxj −xiFik
= ∑
j
Mk j + Mjk
pj −2∑
j
F
T
k jxj
Now the matrix M is symmetric (just take the transpose and see). So the equation finally becomes, written
as a vector expression
2
F
TF
~p0 −2F
T
~x = 0
i.e.,
~p0 =
F
TF
−1
F
T
~x
This result is so commonly used that scientific packages have the operation as a built in command. In
Scilab it take the form:
p0 = F\x;
Here F is the coefficient matrix and x is the vector of observations. When we write this, Scilab is actually
calculating
p0=inv(F’*F)*F’*x;
What would happen if the inverse did not exist? This can happen if the number of observations are too few,
or if the vectors fi(xj) (i.e., the columns of F) are linearly dependent. Both situations are the user’s fault.
He should have a model with linearly independent terms (else just merge them together). And he should
have enough measurements.
Where did we use all those properties of the noise? We assumed that the noise in different measurements was the same when we defined the error norm. Suppose some measurements are more noisy. Then
we should give less importance to those measurements, i.e., weight them less. That would change the
formula we just derived. If the noise samples were not independent, we would need equations to explain
just how they depended on each other. That too changes the formula. Ofcourse, if the model is mor
complicated things get really difficult. For example, so simple a problem as estimating the frequency of the
following model
y = Asinωt +Bcosωt +n
is an extremely nontrivial problem. That is because it is no longer a “linear” estimation problem. Such
problems are discussed in advanced courses like Estimation Theory. The simpler problems of correlated,
non-uniform noise will probably be discussed in Digital Communication since that theory is needed for a
cell phone to estimate the signal sent by the tower.
In this assignment (and in this course and in Measurements), we assume independent, uniform error
that is normally distributed. For that noise, as mentioned above, Scilab already provides the answer:
p=F\v
Even with all these simplifications, the problem can become ill posed. Let us look at the solution again:
~p0 =
F
TF
−1
F
T
~x
Note that we have to invert F
TF. This is the coefficient matrix. For instance, if we were trying to estimate
A and B of the following model:
y = Asinω0t +Bcosω0t
(not the frequency – that makes it a nonlinear problem), the matrix F becomes
F =
sinω0t1 cosω0t1
sinω0t2 cosω0t2
… …
sinω0tn cosω0tn
Hence, F
TF is a 2 by 2 matrix with the following elements
F
TF =
∑sin2ω0ti ∑sinω0ti cosω0ti
∑cosω0tisinω0ti ∑cos2ω0ti
Whether this matrix is invertible depends only on the functions sinω0t and cosω0t and the times at which
they are sampled. For instance, if the tk = 2kπ, the matrix becomes
F
TF =
0 0
0 n
since sinω0tk ≡ 0 for all the measurement times. Clearly this is not an invertible matrix, even though
the functions are independent. Sometimes the functions are “nearly” dependent. The inverse exists, but
it cannot be accurately calculated. To characterise this, when an inverse is calculated, the “condition
number” is also returned. A poor condition number means that the inversion is not to be depended on and
the estimation is going to be poor.
We will not use these ideas in this lab. Here we will do a very simple problem only, to study how the
amount of noise affects the quality of the estimate. We will also study what happens if we use the wrong
model.
5 h* 5i≡
hq1 (never defined)i