Description
1. Write a C++ program that creates a binary search tree, empirically calculates the average search cost
for each node in such a tree, and removes a designated node. Your program should:
• Read integer data from the file designated by the user. Every line of the file contains one input
integer. Assume that each integer in the input data file is unique. (Note that there is no number
that gives the total number of integers in the file)
• Print the input data on the screen.
• Create a binary search tree based on the input data.
• Calculate the search cost for each node when it is inserted
– A binary node element has at least two data fields: Key and SearchCost
– For each node, count and store the number of comparisons required when searching for the
node (i.e., the number of comparisons = the search cost for the node = 1+ the depth of the
node)
• If the number of nodes is less than 24
, print on the screen the resulting tree along with the
information about the nodes (for each node, print both its key and search cost). Otherwise, print
this tree and information about its nodes into a file. See the example below.
• Remove a node designated by a user from the binary search tree and print the resulting tree on
the screen if the number of nodes is less than 24
. You should update the search costs of the
tree nodes if the costs have been changed upon node removal.
• Calculate the average search cost for all nodes in the tree
– Sum up the search cost over all the nodes in your tree as you traverse the tree in one of three
traversals
– divide the sum by the total number of nodes in the tree, and the result is the average search
cost for all the tree nodes
• Print total number of nodes and the average search cost
2. Example
Input data: 5, 3, 9, 7
Create a binary search tree:
Key = 5 SearchCost = 1
Key = 3 SearchCost = 2
Key = 9 SearchCost = 2
Key = 7 SearchCost = 3
Total number of nodes is 4
To generate output on the screen we use the in-order traversal. Here each node is represented as
Key[SearchCost]:
3[2] 5[1] 7[3] 9[2]
Sum of the search cost over all the nodes in the tree is: 2 + 1 + 3 +2 = 8. Average search cost: 8/4 = 2.
Average search cost is 2
2
3. Download files containing integer data from the course website.
(a) The files 1p.dat, 2p.dat, …, 12p.dat contain 21 − 1, 22 − 1,…, and 212 − 1 integers respectively.
The integers make 12 perfect binary trees where all leaves are at the same depth. Calculate
and record the average search cost for each perfect binary tree.
(b) The files 1r.dat, 2r.dat, …, 12r.dat contain same number of integers as above. The integers are
randomly ordered and make 12 random binary trees. Calculate and record the average search
cost for each tree.
(c) The files 1l.dat, 2l.dat, …, 12l.dat contain same number of integers as above. The integers are in
increasing order and make 12 linear binary trees. Calculate and record the average search cost
for each tree.
(d) Make a plot showing the average search cost (y-axis) versus the number of tree nodes (x-axis) of
all perfect binary trees, random binary trees and linear binary trees.
(e) Provide the output in the text format. The nodes will be printed according to their depth level.
Missing nodes will be represented by the symbol X. A possible solution is to fill the missing nodes
with fake nodes in the data structure when printing the tree.
Example:
5
3 9
X X 7 X
4. Extra credit.
(a) (10 points) Use the in-order traversal and any graphic programming library like FLTK for binary
tree drawing. See the textbook, page 301 for the algorithm.
(b) (10 points) Implement a balanced binary search tree using one of the balancing technique like
AVL, Red-Black or 2-4 tree.
Report (30 points)
Write a brief report that includes the following:
1. A description of the assignment objective, how to compile and run your programs, and an explanation
of your program structure (i.e. a description of the classes you use, the relationship between the classes,
and the functions or classes in addition to those in the lecture notes).
2. A brief description of the data structure you create (i.e., a theoretical definition of the data structure
and the actual data arrangement in the classes).
3. A description of how you implement the calculation of individual search cost and average search cost. Is
the implementation associated with the operations find, insert, remove? Analyze the time complexity
of updating the search cost of an individual node and summing up the search costs over all the
nodes.
4. Give individual search cost in terms of n using big-O notation. Analyze and give the average search
costs of a perfect binary tree and a linear binary tree using big-O notation, assuming that the following
formulas are true (n denotes the total number of integers).
Plog2
(n+1)−1
d=0 2
d
(d + 1) ≃ (n + 1) · log2
(n + 1) − n and Pn
d=1 d ≃ n(n + 1)/2
5. Include a table and a plot of average search costs you obtain. In your discussions of the experimental
results, compare the curves of search costs with your theoretical analysis results in item 4.