CPS222 Lecture: Heaps; Priority Queues Last revised 1/25/2015 Objectives 1. To show how a complete binary tree can me mapped straight-forwardly to an array. 2. To define a heap, and show how a heap can be maintained. 3. To show how a heap can be used to implement a priority queue. I. Heaps - ----- A. In today's lecture, we're going to cover the same ground as the assigned section in the book, but in a different order. 1. We will begin by talking about a special kind of binary trees known as a heap. We are then show a special use of heaps - to implement a data structure known as a priority queue. 2. Your text starts by discussing priority queues, and then introduces heaps as a way of implementing them. B. Recall that in talking about binary trees we defined the notion of a complete binary tree. 1. A complete binary tree (called "almost-complete" by some writers) is a binary tree having the following properties: a. If the height of the tree is h, then all leaves lie at level h or at level h - 1. b. If any node has a descendant at level h in its right subtree, then all of the leaves in its left subtree are at level h. Ex: A A / \ / \ B B C / \ / D E F Recall: a perfect binary tree can be converted to a complete, but not perfect, binary tree of the same height by removing nodes on the lowest level, starting from the right and working toward the left. If all the nodes on the lowest level are removed this way, one ends up with another perfect tree of height one less. 2. We also showed that, in a complete binary tree of height h, there are at least 2^(h-1) nodes and at most 2^h - 1 nodes. C. There is a correspondence between an array and a COMPLETE binary tree. 1. Consider what happens if we number the nodes in a complete binary tree, using level order - e.g: 1 / \ 2 3 / \ / \ 4 5 6 7 / \ 8 9 2. Observe that the following relationship holds between the number of a node and the number of its children: if m is the number of a node, then 2m is the number of its left child (unless 2m exceeds the number of nodes in the tree, in which case it has no left child.) Likewise, 2m+1 is the number of its right child, unless 2m+1 exceeds the number of nodes. 3. Likewise, if m is the number of a node, then m / 2 is the number of its parent - unless m / 2 = 0 (m = 1) - in which case the node is the root of the tree and has no parent. 4. A complete binary tree, then, can be represented by an array without using any pointers. Furthermore, in such a representation it is easily possible to go from a node to its children and also from a child back to its parent. (When implementing such an array in C/C++/Java, it is convenient to not use slot 0 in the array, storing the nodes in slots 1 .. size of tree, which means the total space allocated for the tree is one more slot than actually used. There are ways to use slot 0 as a header slot for certain operations, or to it can be used to store information about the total number of nodes in the tree.) 5. Example: the tree APPLE / \ BANANA CHERRY / DOGWOOD can be represented by the array: [1] [2] [3] [4] APPLE BANANA CHERRY DOGWOOD and the array [1] [2] [3] [4] [5] [6] [7] [8] A C F G I M Q Z represents the tree A / \ C F / \ / \ G I M Q / Z D. One apecial kind of complete binary tree is known as a HEAP. A heap is a binary tree with the following properties: 1. The STRUCTURE PROPERTY: it is complete 2. The HEAP PROPERTY: The key at each node is <= the key at either of its children (if it has any.) 3. Examples a. Both of the above trees are heaps b. Example: the following is not a heap CAT / \ EEL AARDVARK / \ / \ ZEBRA RACCOON FOX SNAKE Why? ASK The heap order property is violated by AARDVARK, because it is not true that CAT <= AARDVARK c. Example: the following is not a heap CAT / \ EEL FOX / \ \ ZEBRA RACCOON SNAKE Why? ASK The heap structure property is violated by SNAKE. 4. Note: nothing is said about the relative order of the keys of the children - only the relationship between the parent and the child. Thus, both of the following are heaps: CAT CAT / \ and / \ DOG FOX FOX DOG 5. Note that this definition defines what is sometimes called a "minheap" because the key at the root is the minimum of all the keys in the tree. It is also possible to define a "maxheap" by changing the <= requirement in the heap order property to >=. II. Maintaining a heap -- ------------------ A. We now consider the basic strategy for maintaining a heap. We need to support two basic operations: 1. Construction: inserting new items into the heap either incrementally, or enmasse (creating a heap from scratch from a mass of data.) a. This can be done in O(log n) time incrementally. b. It can be done in amortized O(1) time enmasse. 2. Removing the item with smallest value from the heap. (Finding it is easy - it is always the top of the heap - what's a bit more complicated is replacing it with the next smallest value.) This can be done in O(log n) time. 3. We do NOT consider an operation for removing a SPECIFIC item from the heap. As it turns out, such an operation is not needed for the uses of heaps we have discussed, and would take O(n) time just to FIND the specific item, since a heap is not intended as a search structure. 4. Three preliminary remarks: a. We represent the heap by a data structure consisting of a count of the number of items currently in the heap (n) and an array of actual items (in slots [1] .. [n]). We assume that the array has additional space available for adding new items - so to add an item we can increment n, which makes slot [n+1] part of the heap, and then adjust the information in the heap appropriately. b. Because a heap is a complete binary tree, we know that its height is <= ceiling(log n). Hence, any operation that performs at most one operation at each level in the tree takes time O(log n) c. The algorithms I'm presenting differ in some details from the ones in the book, but are essentially the same. B. Constructing a Heap 1. The strategy for incremental construction is this: to add a new node node to a heap: a. Declare slot n+1 to be part of the heap. Call this the vacant slot. b. Perform the following operation repeatedly: i. Consider the parent of the vacant slot. (Slot (vacant slot / 2)). It the parent does not exist (vacant slot is 1) or the current contents of the parent slot <= the new item, quit this loop. ii. Otherwise, move the contents of the parent slot into the vacant slot and declare the parent slot to be the vacant slot c. When the loop is done, insert the new entry in the vacant slot. d. Example: Add 3 to the following heap 1 4 2 7 5 10 9 8 - Initially, vacant slot is right child of 7. 1 4 2 7 5 10 9 8 _ - Since 7 > 3, move 7 into the vacant slot and declare its slot the vacant slot. 1 4 2 _ 5 10 9 8 7 - 4 is the parent of the new vacant slot. Since 4 > 3, move 4 into the vacant slot and declare its slot vacant. 1 _ 2 4 5 10 9 8 7 - 1 is the parent of the new vacant slot. Since 1 <= 3, stop. - Put 3 into the vacant slot 1 3 2 4 5 10 9 8 7 e. Clearly, this process is O(h) = O(log n) 2. If we have all the entries available to us at the outset, we can build the heap more efficiently as follows: a. Initially just put the entries into the array representation in any order. The result, viewed as a binary tree, will satisfy the heap structure property, but not the heap order property. b. Convert this to a structure satisfying the heap order property - the algorithm for this is given in section 8.3.6 of the book (where it is called bottom-up heap construction.) c. The book gives an analysis that shows that the cost of building the entire heap this way is O(n), which makes the amortized cost per entry O(1). C. Removing the minimum item from a heap (removeMin) 1. The algorithm is similar to that for incremental construction. Since the minimum item is to be removed from the heap, we consider its slot (the root) to be vacant. Likewise, since the size of the heap is to be decreased by 1, we must find a new home for the item currently in slot n (the displaced item), since the size of the heap is being reduced to n-1. a. Perform the following process repeatedly: i. Consider the child or children of the vacant slot (slots 2 * (vacant slot) and 2 * (vacant slot) + 1. - If neither is part of the heap (2 * vacant slot) > new heap size, quit this loop. - If there are two children, consider the child item with the smallest value - we call the slot where this occurs child slot. - If the displaced item is <= than this child, quit this loop. ii. Otherwise, move the child item into the vacant slot, and consider the child slot to be the new vacant slot. b. When the loop is done, put the displaced item in the vacant slot. 2. Example: Remove the smallest item from the following heap: 1 3 2 4 5 10 9 8 7 - Initially, the displaced item is 7. The vacant slot is the one that contained 1 _ Displaced item = 7 3 2 4 5 10 9 8 (Note that the slot that contained 7 is no longer considered part of the heap) - Since 2 is the smallest child of the vacant slot, and 7 > 2, move 2 into the vacant slot and make its slot the new vacant slot. 2 Displaced item = 7 3 _ 4 5 10 9 8 - Since 9 is the smallest child of the vacant slot, and 7 <= 9, stop. Put the displaced item - 7 - into the vacated slot 2 3 7 4 5 10 9 8 3. Clearly, this process is O(h) = O(log n) - Why? ASK III. Uses for Heaps --- ---- --- ----- A. One use discussed in the book is to represent a priority queue. (We assume here that smaller numbers mean higher priority - e.g "priority 1" beats "priority 2"). 1. A priority queue is often used in conjunction with some kind of server that provides services on a priority basis - e.g. a. A priority CPU scheduler in an operating system assigns the CPU to the process with the smallest priority value. b. The scheduler associated with a print queue might print the shortest job (in terms of number of pages) first. 2. The principal operation a priority queue needs to support is to find the entry with the smallest priority value and remove it from the queue. (The book calls this removeMin). 3. Note that, with a heap based on priority values, the smallest value is always found "at the top of the heap". Because we can remove this entry and replace it with the one having the next smallest priority value easily (as we shall see shortly) we can use a heap as a priority queue. B. Another use of heaps is in event-driven simulations of some system. 1. Example: simulate the operation of a bank. Events are a. New customer arrives and gets in line b. Finish processing a customer transaction 2. The heart of such a simulation is the "event list" which maintains a list of simulated events in the order in which they occur. 3. The principal operation the event list needs to support is the ability to find the next event that is scheduled to occur and remove it from the event list. Again, a heap based on the scheduled time for events works well for this.