## Description

1. Write a C++ program that creates a binary search tree, empirically calculates the average search cost

for each node in such a tree, and removes a designated node. Your program should:

• Read integer data from the file designated by the user. Every line of the file contains one input

integer. Assume that each integer in the input data file is unique. (Note that there is no number

that gives the total number of integers in the file)

• Print the input data on the screen.

• Create a binary search tree based on the input data.

• Calculate the search cost for each node when it is inserted

– A binary node element has at least two data fields: Key and SearchCost

– For each node, count and store the number of comparisons required when searching for the

node (i.e., the number of comparisons = the search cost for the node = 1+ the depth of the

node)

• If the number of nodes is less than 24

, print on the screen the resulting tree along with the

information about the nodes (for each node, print both its key and search cost). Otherwise, print

this tree and information about its nodes into a file. See the example below.

• Remove a node designated by a user from the binary search tree and print the resulting tree on

the screen if the number of nodes is less than 24

. You should update the search costs of the

tree nodes if the costs have been changed upon node removal.

• Calculate the average search cost for all nodes in the tree

– Sum up the search cost over all the nodes in your tree as you traverse the tree in one of three

traversals

– divide the sum by the total number of nodes in the tree, and the result is the average search

cost for all the tree nodes

• Print total number of nodes and the average search cost

2. Example

Input data: 5, 3, 9, 7

Create a binary search tree:

Key = 5 SearchCost = 1

Key = 3 SearchCost = 2

Key = 9 SearchCost = 2

Key = 7 SearchCost = 3

Total number of nodes is 4

To generate output on the screen we use the in-order traversal. Here each node is represented as

Key[SearchCost]:

3[2] 5[1] 7[3] 9[2]

Sum of the search cost over all the nodes in the tree is: 2 + 1 + 3 +2 = 8. Average search cost: 8/4 = 2.

Average search cost is 2

2

3. Download files containing integer data from the course website.

(a) The files 1p.dat, 2p.dat, …, 12p.dat contain 21 − 1, 22 − 1,…, and 212 − 1 integers respectively.

The integers make 12 perfect binary trees where all leaves are at the same depth. Calculate

and record the average search cost for each perfect binary tree.

(b) The files 1r.dat, 2r.dat, …, 12r.dat contain same number of integers as above. The integers are

randomly ordered and make 12 random binary trees. Calculate and record the average search

cost for each tree.

(c) The files 1l.dat, 2l.dat, …, 12l.dat contain same number of integers as above. The integers are in

increasing order and make 12 linear binary trees. Calculate and record the average search cost

for each tree.

(d) Make a plot showing the average search cost (y-axis) versus the number of tree nodes (x-axis) of

all perfect binary trees, random binary trees and linear binary trees.

(e) Provide the output in the text format. The nodes will be printed according to their depth level.

Missing nodes will be represented by the symbol X. A possible solution is to fill the missing nodes

with fake nodes in the data structure when printing the tree.

Example:

5

3 9

X X 7 X

4. Extra credit.

(a) (10 points) Use the in-order traversal and any graphic programming library like FLTK for binary

tree drawing. See the textbook, page 301 for the algorithm.

(b) (10 points) Implement a balanced binary search tree using one of the balancing technique like

AVL, Red-Black or 2-4 tree.

Report (30 points)

Write a brief report that includes the following:

1. A description of the assignment objective, how to compile and run your programs, and an explanation

of your program structure (i.e. a description of the classes you use, the relationship between the classes,

and the functions or classes in addition to those in the lecture notes).

2. A brief description of the data structure you create (i.e., a theoretical definition of the data structure

and the actual data arrangement in the classes).

3. A description of how you implement the calculation of individual search cost and average search cost. Is

the implementation associated with the operations find, insert, remove? Analyze the time complexity

of updating the search cost of an individual node and summing up the search costs over all the

nodes.

4. Give individual search cost in terms of n using big-O notation. Analyze and give the average search

costs of a perfect binary tree and a linear binary tree using big-O notation, assuming that the following

formulas are true (n denotes the total number of integers).

Plog2

(n+1)−1

d=0 2

d

(d + 1) ≃ (n + 1) · log2

(n + 1) − n and Pn

d=1 d ≃ n(n + 1)/2

5. Include a table and a plot of average search costs you obtain. In your discussions of the experimental

results, compare the curves of search costs with your theoretical analysis results in item 4.