Description
I. Endianness
Consider the following program:
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
int main()
{
int i = 0x87654321;
char u[5] = “unix”;
ofstream myfile;
myfile.open(“fileout.txt”, ios::binary);
myfile.write((char *)&i, sizeof(i));
myfile.write((char *)u, 4);
myfile.close();
}
- Run this program on turing/hopper.
- a) List the characters in the output file in hex in increasing order. Here is one way to see the characters in order; there are surely others:
emacs -nw fileout.txt // start emacs
<esc>x hexl-mode // after you type <esc>x, the cursor will drop to the bottom line for
// entry of the rest of the command
<cntl>x<cntl>c // exit emacs
These commands use the hex mode of the emacs editor. Remember that <esc>x means to press the escape key, then the letter ‘x’.
Emacs hex mode uses a three-column display. The top line of the display gives the header for each column. The first column contains the hex address of the beginning of the line. The header ‘87654321’ indicates that the address could contain up to 8 hex digits. The second column contains 16 bytes of data represented in hex. The heading ‘0011…’ indicates that each byte contains two hex digits and therefore requires two characters to print. The third column represents the same data in ASCII and therefore only requires one character per byte.
- b) What does this tell you about turing/hopper: are they big-endian or little-endian? How do you know?
- c) Run the following command: od -t cx1 fileout.txt
Explain the result, i.e., what do you see, and why? (Use man od if you need more information.) - d) Same as previous question using the following command: od -t cx2 fileout.txt
- e) Same as previous question using the following command: od -t cx4 fileout.txt
- f) Run the following command: od -t cx fileout.txt
What does this tell you about the default value for the x value of the -t operand of od?
II. Character representation
- Show the representation of each of the given strings in each of the formats listed.
| String | Coding system | Representation |
| ‘Bb2’ | Unicode (code points) | a) |
| ‘Bb2’ | Ascii (hex) | b) |
| ‘Bb2’ | UTF‑8 (hex) | c) |
| ‘óáé’ | Unicode (code points) | d) |
| ‘Ūūš’ | Unicode (code points) | e) |
- How many bytes would each of the given strings occupy in the given format?
| String | Ascii | UTF-8 | UTF-16 | UTF-32 |
| ‘Bb2’ | ||||
| ‘óáé’ | — | |||
| ‘Ūūš’ | — |
- Show the representation of each of the following values as a single hex byte.
| Value | hex byte |
| unsigned binary 3 | a) |
| binary +3 (two’s complement form) | b) |
| binary -3 (two’s complement form) | c) |
| Ascii ‘3’ | d) |
| UTF‑8 ‘3’ | e) |
- Character subsets
Fill in the following chart on the answer sheet. For each of these symbols, look up its Unicode name on the web. For example, the Unicode name for the character ‘e’ is “Latin Small Letter E” and ‘é’ is “Latin Small Letter E with Acute”. Then specify whether each of these characters is in the Ascii subset of Unicode, in ISO‑8859‑1, and/or in the BMP.
Note that in older Unicode terminology, “white” means an outline character, “black” means a filled‑in graphic, and “heavy” means a character with a thicker outline. These terms are no longer used for new characters, but they do not have anything to do with race.
| Symbol | Unicode code point |
In Ascii? (yes/no) |
In ext ASCII ISO‑8859‑1? (yes/no) |
In BMP? (yes/no) |
Unicode name |
| o | U+006F | ||||
| ö | U+00F6 | ||||
| ☺ | U+263A | ||||
| 🦃 | U+1F983 |
Fill in the following charts (on the answer sheet). The first chart shows the number of Unicode code points in the basic multilingual plane (BMP, i.e., code page 0) that need 1 byte, 2 bytes, etc. for their representation in UTF-8. The second and third charts show the same thing for UTF‑16 and UTF‑32.
Express code points as U+hhhh, where hhhh is a 4-digit hex number, e.g., U+0000 is the first code point in the Unicode system.
Hints:
- a) The BMP extends from U+0000 to U+FFFF, which is 65536 entries. But U+D800 through U+DFFF don’t represent characters because they are used as control characters, so the total size of the BMP is 65536 ‑ 2048 or 63488 characters. Therefore the numbers in each chart should sum to 63488.
- b) Many of the entries in each table are 0. Study the charts in the slides and fill in the 0’s first.
- UTF‑8
| Category | Range in decimal | First code point | Last code point | 1 byte | 2 bytes | 3 bytes | 4 bytes |
| ASCII | 0-127 | ||||||
| ext. ASCII ISO 8859-1 |
128-255 | ||||||
| rest of BMP | 256-65535 |
- UTF-16
| Category | Range in decimal | First code point | Last code point | 1 byte | 2 bytes | 3 bytes | 4 bytes |
| ASCII | 0-127 | (same) | (same) | ||||
| ext. ASCII ISO 8859-1 |
128-255 | (same) | (same) | ||||
| rest of BMP | 256-65535 | (same) | (same) |
- UTF-32
| Category | Range in decimal | First code point | Last code point | 1 byte | 2 bytes | 3 bytes | 4 bytes |
| ASCII | 0-127 | (same) | (same) | ||||
| ext. ASCII ISO 8859-1 |
128-255 | (same) | (same) | ||||
| rest of BMP | 256-65535 | (same) | (same) |
- Relationship between normalized and denormalized numbers
- a) For a given standard, the smallest normalized number is approximately equal to the largest denormalized number. Why?
- b) If you don’t round off, which of the two is always larger? Why?
- Consider the following piece of code. The answer sheet contains the assembler’s symbol table in the form of a chart. (Although you can see all the columns, note that in real life an entry would not be added to the symbol table until the first time the variable appears in the code.)
The rows show the value of the entries after the execution of each line of code. Leave the entry for a variable blank if the variable has not been added to the symbol table yet. Enter 000 (using an apostrophe if necessary to allow initial zeroes) if the entry has been added but has not been given a value yet. Once the assembler has given the variable a value, enter that value.
I have filled in the first two lines for you. I have also completed the first column so that you can see the benefit of a two-pass assembler. Your job is to fill in the rest of the chart.
Whenever the value of an entry in the symbol table changes, highlight the new value. In other words, whenever a value is different from the value above it in the same column, highlight the new value.
100 LOAD Addr Initialize Next with
101 STORE Next starting address
102 LOAD Num Initialize Ctr
103 ADD Num1
104 STORE Ctr
105 Loop LOAD Sum
106 SUBT Num1
107 STORE Sum
108 JUMP Loop
109 Addr HEX 114
110 Next HEX 0
111 Num DEC 5
112 Sum DEC 16
113 Ctr HEX 0
114 Num1 DEC 10


