## Description

I. INTRODUCTION A binary code is a representation of set of symbols that assigns a distinct bit string—a code word—to every symbol in the set. A binary preﬁx code is a binary code where no bit string of the code is a preﬁx of another. With a binary preﬁx code, we can unambiguously decode any string of bits without marks separating the code words. A binary preﬁx code can be constructed from any two-tree with the symbols positioned at the leaves of the tree. Associate a 0 to the left child and a 1 to the right child of every node with children. The code word of a symbol is the sequence of bits read along the unique path from the root to the leaf holding the symbol. Figure 1 shows a preﬁx code and the two-tree from which is was generated. Suppose that we are given a set of n symbols together with their relative frequencies of occurrence. Let fi and si denote the relative frequency and code word length, respectively, of symbol ai, for 1 ≤ i ≤ n. The symbols are encoded with a binary preﬁx code. The average length of the code is given by n ∑

i=1

fisi. (1)

An optimal binary preﬁx code is a preﬁx code of minimum average length. David Huffman, in 1951, invented a method to derive such codes. In his honor, optimal binary preﬁx codes bear his name. Today, Huffman’s codes are

symbol code word

a 00

b 010

c 011

d 10

e 110

f 111

a d

fe

Code Two-tree representa on

b c

Fig. 1. A preﬁx code for the set of symbols {a,b,c,d,e}.

2

symbol freq. code word

a 0,22 011

b 0,03 0000

c 0,14 001

d 0,14 010

e 0,41 1

f 0,06 0001

Code Two-tree representa on

b f

ad

e

c

Fig. 2. A Huffman’s code for the set of symbols {a,b,c,d,e}.

widely used to compress data; they are used, for example, in compressing MP3 ﬁles. Here is a brief description of Huffman’s method in prose. The method builds a two-tree with the symbols positioned at the leaves. It starts with a forest, where each symbol and its associated frequency is the root of a trivial tree with just one node. At each iteration, the algorithm ﬁnds the two trees whose roots have the smallest frequency over all trees in the current forest. A new node is created having the previous two roots as children, so that a new tree is formed from the created node and its two children. The frequency of the new node is the sum of the frequencies of its two children. The algorithm terminates after n−1 iterations. Figure 2 shows a Huffman’s code and the ﬁnal two-tree of Huffman’s method for the same set of symbols as in Figure 1. The average length of the Huffman’s code is 2,27 whereas, for the same relative frequencies, the average length of the preﬁx code of Figure 1 is 2,64.

II. YOUR ASSIGNMENT

What you have to do. • Implement the procedure GenerateCode(Root, Code) that receives as input a two-tree Root with the symbols at the leaves and produces as output the corresponding binary preﬁx code Code. Your code should be efﬁcient. • Implement the procedure Decode(Root, InString, OutString) that receives as input a two-tree Root representing a binary preﬁx code and a bit string InString and produces as output the decoded string of symbols OutString. • Write program PreﬁxCode that the reads a sequence of symbols from a ﬁle (symbols are separated by spaces or returns) and prints to screen any binary preﬁx code of your choice for the set of symbols. Then, the program reads binary strings from the keyboard and prints the decoded string of symbols to the screen. • Implement procedure HuffmanCode(Symbols, Freq, Code) that receives as input an array Symbols of symbols and an array Freq with their relative frequencies, and produces as output a Huffman’s code Code for the set of symbols. Your code should be efﬁcient.

3

• Write a program Huffman that reads a sequence of symbols and their relative frequencies from a ﬁle (a symbol is separated from its frequency by a space or returns; a frequency is separated from the next symbol by spaces or returns) and prints a Huffman’s code for the set of symbols. What do you have to deliver, how, and when. • You have to deliver your code and a report of no more than three pages containing a text explanation of your algorithms, their pseudo-codes, and a short discussion. • The code and the report should be sent in a .zip ﬁle to my email address with subject p1.