Description
1 Overview This project involves three important concepts: sets, balanced binary search trees, and maps. • A set is a collection of distinct objects. • A balanced tree represents a set of n elements with a natural ordering so that the running time per operation is O(log n). Depending on the type of balanced tree, this time bound may be worst-case, expected-case, or amortized. • A map is an object that maps a finite set of keys to a collection of values. Each key can map to at most one value, and a map cannot contain duplicate keys. Maps correspond to the mathematical concept of a function. An example of a map is the function that maps the set of student ID numbers (integers) to student names (strings). In this project, you will gain a deeper understanding of these concepts by writing the following two classes. ABTreeSet: An implementation of sets based on α-balanced trees. Any access or update operation on an n-node α-balanced tree takes O(log n) amortized time. ABTreeMap: An implementation of maps that uses α-balanced trees. We will provide you with templates for ABTreeSet and ABTreeMap. You may add new instance variables and methods to these two classes, but you cannot rename or remove any existing variables or methods, or change any of these variables and methods from public to protected (or private), or vice versa. Note. Although the official due date is 11:59 pm, Friday, December 2, you may submit the assignment without penalty until 11:59 pm, Friday, December 9, 2016. 1 2 Introduction The time complexities of the basic operations on a binary search tree —contains(), add(), and remove()— are proportional to the height of the tree. In the ideal case, the height of an n-element tree is at most log2 n. If no precautions are taken, however, the height can be n − 1. There are a number of ways to guarantee that the height of a tree is O(log2 n); they all involve some sort of “rebalancing” after updates, thus these trees are sometimes called self-balancing trees. Self-balancing trees fall into roughly two categories. Height-balanced trees. Here, rebalancing is done to ensure that the heights of the left and right subtrees of any node do not differ by much. Weight-balanced trees. Here, rebalancing is done to ensure that the the sizes (numbers of elements) of the left and right subtrees of any node do not differ by much. Examples of height-balanced trees are AVL-trees,where the heights of the left and right trees at any node differ by at most one, and red-black trees, the heights of the left and right subtrees at any node can differ by a factor of at most two. Red-black trees are used in Java’s implementation of the TreeSet and TreeMap classes. AVL-trees and red-black trees are described in Wikipedia, where you can find links to additional information. We will not discuss height-balanced trees further here. This assignment deals with a special kind of weight-balanced tree, which we describe next. 3 α-Balanced Trees Let T be a binary search tree and let α be a constant, such that 1 2 < α < 1. Let x be any node in T, and size be the number of elements in the subtree rooted at x. We say x is α-balanced if (number of elements in x’s left subtree) ≤ α · size, (1) and (number of elements in x’s right subtree) ≤ α · size. (2) We say the tree T is α-balanced if every node is α-balanced. An α-balanced tree is shown in Figure 1. Simple math shows that the height of an n-node α-balanced tree is at most log1/α n (the idea is to first observe that the size of the subtree rooted at a node at depth k is at most α kn, and that the size of a non-empty subtree is at least 1). Since α is a constant, this means that contains, add, and remove take logarithmic time. However, adding or removing elements can lead to trees that no longer satisfy the balance conditions (1) and (2). α-Balanced trees maintain balance by periodically restructuring entire subtrees, rebuilding them so that they become 1 2 -balanced. The work required for rebalancing is O(n) in the worst case, but it can be shown that the amortized time for an add or remove is O(log n). Although a formal proof of this is beyond the scope of CS 228, the intuition is that rebalancing is relatively rare, in the same way that array doubling is rare in the FirstCollection class that we saw several weeks ago. Next, we explain the rebalancing method used by α-balanced trees, and how rebalancing is done after an update. 2 5 4 16 2 10 20 7 14 15 Figure 1: An α-balanced tree (which is a subtree of a larger tree) with α = 2/3. For example, consider the node containing 5. The total number of nodes in the subtree is 9. The left subtree has 2 nodes, so we have 2/9 ≤ 2/3. The right subtree has 6 nodes, so we have 6/9 ≤ 2/3. 3.1 The Rebalancing Operation Suppose x is some node in a BST, and that the subtree rooted at x has k nodes. The rebalancing operation rearranges the structure of a subtree rooted at x so that it has the same keys, but its height is at most log2 k. Rebalancing can be done using an inorder traversal of the subtree rooted at x. As we traverse the tree, we put the nodes, in order, into an array or ArrayList. The midpoint of the array will be the root of the new subtree, where as usual the midpoint is (first + last)/2. All the elements to the left of the midpoint will go into its left child, and all the elements to the right of the midpoint go into the right child. An example is shown in Figure 2. Perhaps the most natural way to construct the tree is to use recursion, as shown in Figure 3. Notes • Rebalancing a subtree is a purely structural operation that rearranges the links among existing nodes. You should not create any new nodes and you should not have to perform any key comparisons when rebalancing. • Rebalancing a subtree of size k should take O(k) time. • The operation may be performed on a subtree, so do not forget to update its parent if necessary. 3.2 Restoring Balance after Updates After we update a tree, we must check whether it remains α-balanced. If so, nothing more needs to be done. Otherwise, we must rebalance the tree. To be able to detect quickly whether the balance 3 2 4 5 7 10 12 14 15 16 20 20 10 4 15 2 5 7 12 14 16 5 4 16 2 10 20 7 14 12 15 Saturday, April 6, 2013 Figure 2: Rebalancing a subtree. conditions —inequalities (1) and (2)— are violated, we maintain for each node a count of the number of elements in that node’s subtree; note that this count includes the node itself. Whenever a node is added or removed, we need to iterate up the tree along the path to the root, starting with the node’s parent, updating the node counts. We also need to check whether any nodes along the path have become unbalanced, and identify the highest unbalanced node (if any) along that path. The rebalance operation should be performed on the node closest to the root. Figure 4 illustrates a tree with 31 elements prior to the addition of key 12. Using a value of α = 2/3, the tree is initially balanced. After 12 is added, two of the nodes along the path to the root become unbalanced: the nodes containing 5 and 16, respectively. We rebalance at the node containing 5, since it is the node closest to the root. 4 Task 1: ABTreeSet (120 Points) Your first task is to implement the class ABTreeSet, which extends Java’s AbstractSet abstract class, using α-balanced trees. The ABTreeSet class implements a set of elements with a natural ordering. Duplicate elements are not allowed. We also disallow null elements; any attempt to add a null element should result in a NullPointerException. The ABTreeSet class has the following signature. public class ABTreeSet < E extends Comparable <? super E > > extends AbstractSet The starting point for your implementation should be the sample code in ABTreeSet.java provided along with this assignment. ABTreeSet has methods in place to provide a public, read4 2 4 5 7 10 12 14 15 16 20 10 4 15 first mid last 2 4 5 7 12 14 15 16 20 5 7 5 7 7 12 14 12 14 14 20 16 16 20 20 2 2 (empty) (empty) (empty) Saturday, April 6, 2013 Figure 3: Recursive decomposition. 5 4 16 2 10 20 7 14 12 15 5 Unbalanced (7/10 > 2/3) Unbalanced (5/7 > 2/3) Added node -5 5 nodes 15 nodes 29 Figure 4: Adding key 12 to a balanced tree, using α = 2/3. The node containing 5 is the highest unbalanced node. 6 only view of the tree structure, and a public rebalance() method, which should implement the rebalancing operation described in Section 3.1. To avoid any problems with floating point arithmetic that could arise from using Inequalities (1) and (2), we represent α using two integer instance variables top and bottom that give its numerator and denominator; i.e., α = top/bottom. Then, Inequalities (1) and (2) are expressed as (number of elements in x’s left subtree) · bottom ≤ size · top, (3) and (number of elements in x’s right subtree) · bottom ≤ size · top. (4) The default value should be top = 2 and bottom = 3 (i.e., α = 2/3). The public interface BSTNode, provided with this assignment, defines the following readonly accessors for a node in a binary search tree (see the javadoc for details): BSTNode left(); BSTNode right(); BSTNode parent(); int count(); E data(); The left(), right(), parent(), and data() methods are self-explanatory. The count() method should return the total number of elements in the subtree rooted at that node. This method is needed to determine which, if any, nodes have become unbalanced as a result of an update, and is used to find the root of the subtree at which the rebalance operation must be applied (see Section 3.2). The method can be implemented by maintaining the size of the entire subtree, or by separately maintaining the sizes of the left and right subtrees. In any case, we require that the count() method run in constant time. ABTreeSet has an inner class Node that implements the BSTNode interface. You can make any modifications you wish to the inner class Node, provided that the class continues to conform to the BSTNode interface. The class ABTreeSet has two additional public methods: BSTNode root() Return the root of the tree. void rebalance(BSTNode bstNode) Perform a rebalance operation on the subtree rooted at the given node. There are three constructors. 7 public ABTreeSet() Default constructor. Builds a non-self-balancing tree. public ABTreeSet(boolean isSelfBalancing) If isSelfBalancing is true, builds a self-balancing tree with the default value α = 2/3. If isSelfBalancing is false, builds a non-self-balancing tree. public ABTreeSet(boolean isSelfBalancing, int top, int bottom) If isSelfBalancing is true, builds a self-balancing tree with α = top/bottom. If isSelfBalancing is false, builds a non-self-balancing tree (top and bottom are ignored). Throws an IllegalArgumentException if top/bottom is not strictly greater than 1/2 and strictly less than 1. ABTreeSet must override add(), contains(), remove(), size(), and iterator(). You must also override toString(), to display the current configuration of the underlying α-balanced for the set. This should be done according to the following rules: • Every node of the tree occupies a separate line with only its data on it. • The data stored at a node at depth d is printed with indentation 4d (i.e., preceded by 4d blanks). • Start at the root (at depth 0) and display the nodes in a preorder traversal. More specifically, suppose a node x is shown at line `. Then, starting at line ` + 1, – recursively print all nodes in the left subtree (if any) of x; – recursively print all nodes in the right subtree (if any) of x. • If a node has a left child but no right child, print its right child as null. • If a node has a right child but no left child, print its left child as null. • If a node is a leaf, print it with no further recursion. Figure 5 shows an α-balanced tree with 12 nodes, where α = 2/3. Figure 6 shows the output that should be generated by calling the toString() and System.out.println(). Summary. Your main tasks are as follows. 1. Implement a rebalance() operation for ABTreeSet. 2. Modify the Node class and the add(), remove(), and Iterator.remove() methods to maintain counts at each node. The count() method must be O(1). 3. Modify the add(), remove(), and Iterator.remove() methods so that, if the tree is constructed with the isSelfBalancing flag true, the tree is self-balancing. That is, if an operation causes any node to become unbalanced, a rebalance is automatically performed on the highest unbalanced node (which will always be somewhere along the path to the root). 8 30 55 53 60 62 35 31 37 25 20 10 50 Figure 5: An α-balanced tree, where α = 2/3. 50 30 25 10 null 20 null 35 31 37 55 53 60 null 62 Figure 6: The result of toString() for the tree of Figure 5. 9 Observe that items (1) and (2) can be done independently. Note the following. • The tree should maintain correct node counts whether or not it is self-balancing. • Any subtree can be explicitly rebalanced using the rebalance() method, whether or not the tree is self-balancing. 5 Task 2: ABTreeMap (50 Points) Your second task is to implement the ABTreeMap class, which uses an α-balanced tree to implement a mapping between a set of keys with a natural ordering and a collection of values. Duplicate key values are not allowed. We also disallow null keys or values; any attempt to add a null key or value should result in a NullPointerException. The ABTreeMap class has the following signature. public class ABTreeMap < K extends Comparable <? super K > , V > ABTreeMap has three constructors. public ABTreeMap() Default constructor. Builds a map that uses a non-self-balancing tree. public ABTreeMap(boolean isSelfBalancing) If isSelfBalancing is true, builds a map that uses self-balancing tree with the default value α = 2/3. If isSelfBalancing is false, builds a map that uses a non-self-balancing tree. public ABTreeMap(boolean isSelfBalancing, int top, int bottom) If isSelfBalancing is true, builds a map that uses a self-balancing tree with α = top/bottom. If isSelfBalancing is false, builds a map that uses a non-selfbalancing tree (top and bottom are ignored). Throws an IllegalArgumentException if top/bottom is not strictly greater than 1/2 and strictly less than 1. ABTreeMap has the following methods. public V put(K key, V value) Associates value with key in this map. Returns the previous value associated with key, or null if there was no mapping for key. public V get(K key) Returns the value to which key is mapped, or null if this map contains no mapping for key. 10 public V remove(K key) Removes the mapping for key from this map if it is present. Returns the previous value associated with key, or null if there was no mapping for key. public boolean containsKey(K key) Returns true if this map contains a mapping for key; otherwise, it returns false. public int size() Returns the number of key-value mappings in this map. public ABTreeSet keySet() Returns an ABTreeSet storing the keys (not the values) contained in this map. The tree structure of the ABTreeSet should be the same a the tree structure of this ABTreeMap. Example. Suppose this map consists of the following (key, value) pairs: (10, Carol), (21, Bill), (45, Carol),(81, Alice),(95, Bill). Then, the ABTreeSet returned should consist of 10, 21, 45, 81, 91. public ArrayList values() Returns an ArrayList storing the values contained in this map. Note that there may be duplicate values. The ArrayList should contain the values in ascending order of their corresponding keys. Example. Suppose this map consists of the following (key, value) pairs: (10, Carol), (21, Bill), (45, Carol),(81, Alice),(95, Bill). Then, the ArrayList returned should consist of the strings Carol, Bill, Carol, Alice, Bill, in that order. Note. Both keySet() and values() should be implemented by iterating through the ABTreeSet that represents the map. 6 Submission Write your classes in the edu.iastate.cs228.hw5 package. Turn in the zip file, not your class files. Please follow the guidelines posted under Documents & Links on Blackboard Learn. Also, follow project clarifications on Blackboard. Include the Javadoc tag @author in each class source file. Your zip file should be named Firstname_Lastname_HW5.zip. 11