Description

5/5 - (2 votes)

1. [4 marks] Complete the Python function randfp in the YOU_a1q1 notebook so that it randomly
generates normalized binary floating-point numbers from the number system F(β = 2, t, L, U).

Your function should work for values of t up to 52, and −1022 ≤ L < U ≤ 1023. You can read the function’s documentation for more information (type “? randfp”). Hint: To append strings in Python, simply ‘add’ strings.

For example, b = ‘hi’ + ‘ there ‘ + str(15) will construct the string ’hi there 15’.

2. [4 marks] Complete the function fp2dec (in the YOU_a1q2 notebook) so that it converts binary floating-point numbers in F to their decimal equivalents. An incomplete version of the function is supplied as starter code.

Its input is a string representing a binary floating-point number (as described in What you need to know above). It is sufficient to output an IEEE double-precision number as the decimal value.

Hints: For this question, you might find the Python functions find, and int useful. Also, you can extract substrings using indexing. For example, if b=’+0.1001b3’, then b[2] will return the string ‘.’, and b[6:] will return ‘1b3’. Furthermore, the Boolean expression b[3]==’1′ would return a value of True. You cannot, however, use any other function that does the conversion for you.

You must implement it yourself based on first principles.

3. [4 marks] Consider the normalized floating-point number system F(β = 7, t = 4, L = −8, U = 8), with elements of the form ±0.d1d2d3d4 × 7 p where −8 ≤ p ≤ 8 and di ∈ {0, . . . , 6}.

The number system is normalized, so d1 6= 0. The only exception is the zero element, in which all the mantissa digits are zero and the exponent is zero. For the following questions, state your answers in base-7. (a) What is the largest value in F?

(b) What is the value of 265307 × 100007 using this number system. (c) Derive machine epsilon for F from first principles (not using the general formula). In other words, what is the smallest value E ∈ F such that fl(1 + E) > 1.
(d) What fraction of the values in F are smaller in magnitude than 1?

4. [4 marks] Let F be a floating-point number system with machine epsilon E, and suppose that a and
b are numbers that may or may not be elements of F. Show that the relative error for the expression
fl(a) ⊕ fl(b) has the upper bound
|(fl(a) ⊕ fl(b)) − (a + b)|
|a + b|
≤
|a| + |b|
|a + b|
E(2 + E) .
Justify each inequality that you introduce.
5. [5 marks] Consider the function
F(x) = 1
1 − x
−
1
1 + x

for |x| <
1
2
. The notebook YOU_a1q5 contains the functions F_exact and F_fp. The function
F_exact simply computes F(x) for a given x using Python’s default IEEE double-precision.

We will refer to this version as “exact” because it is far more accurate than the alternatives that we will
compare to in this question.

The function F_fp computes the same formula, but using the FPNS F(β, t, L, U) = (10, 4, −100, 100).
Notice that it uses the function fl repeatedly, which is also included in the notebook. Notice also
that all intermediate values are in the number system.

(a) For what range of x-values is it difficult to compute this expression accurately in floating-point
arithmetic? Edit the notebook so that it creates a plot that compares F_exact to F_fp. Adjust
the range on the x-axis so that it illustrates the inaccuracy of using F compared to the “exact”
value. Plot x vs. F for both methods on the same axis (the plot should appear inline in the
notebook).

(b) Algebraically rearrange the formula for F(x) to get a new version so that the computation
using F is more accurate than in part (a). Your derivation should be typeset in LATEXin the
notebook.

(d) Add code to the notebook that creates a second plot comparing F_fp and F2_fp over the
same x-range as in (a). Be sure to add labels to the plot, as well as a legend, as in part (a).

What to submit

Rename each of your jupyter notebooks, replacing “YOU” with your WatIAM ID. For example, I would
rename YOU_a1q1.ipynb to jorchard_a1q1.ipynb. Export each jupyter notebook as a PDF, and
submit each PDF to Crowdmark. If you want, you can typeset your solutions to Q3 and Q4 in LATEXor a
Word document, write electronically (as on a tablet), or write it by hand and submit a high-quality photo
or scan. It is your responsibility to ensure that your handwritten solutions are legible.

CS 370 Assignment 1 Floating-Point Numbers solution

Description

What to submit

Related products

CS 370 Assignment 3 solution

CS 370 HW5: Programming Assignment SYNCHRONIZATION of PRODUCER and CONSUMER Threads solution

CS 370 Assignment 2 Linear Algebra solution