CPS222 Lecture: Disk-Based Search Structures; B-Trees Last revised 1/30/2013 Objectives: 1. To introduce hashtables on disk. 2. To introduce B-Trees and key variant (B+ tree) I. Introduction - ------------ A. All of the search structures we have considered thus far have one thing in common: they are stored in main memory, where access to any item is equally fast (true random access). However, it is often the case that we must build search structures on disk rather than in primary memory, for two reasons: 1. Size. Large structures cannot be kept in their entirety in main memory. 2. Permanency. Structures in main memory are volatile and need to be created (or read in from disk) whenever a program using them is run. One important use of disk-based search structures is to index tables in databases, to expedite operations like selection and natural join. (The use of an index avoids the need to read every row in the table to find a desired key value.) B. When we build structures on disk, we must deal with certain realities of access and transfer time: 1. Random access to disk typically requires on the order of 10-20 ms access time to position the head and wait for data to come up under it. This is equivalent to about 10 million CPU cycles! 2. However, once the head position is right, data can be transferred at rates in excess of 100 million bytes/sec. 3. Observe, then, how total transfer times behave for different size blocks (assuming a 10 ms access time, and 100 megabyte/sec transfer rate) Size of block Access time Transfer time Total time 1 byte 10 ms .01 micro-sec 10.00001 ms 10 bytes 10 ms .1 micro-sec 10.0001 ms 100 bytes 10 ms 1 micro-sec 10.001 ms 1000 bytes 10 ms 10 micro-sec 10.01 ms 10000 bytes 10 ms 100 micro-sec 10.1 ms 100000 bytes 10 ms 1 ms 11 ms Clearly, then, transfers have very high overheads for access time (often in excess of 99%), so one would prefer to organize a search structure in such a way as to allow fairly large transfers to/from disk. 4. For this reason, disk files are typically block-oriented. a. Data is stored in blocks of some fixed size (determined by disk geometry) - typically a power of 2 ranging from 512 on up. While access/transfer time considerations argue for fairly large blocks, too large of a block can result in wasting space in the last block of a file when the file size is not a multiple of the block size - as it rarely is. (On the average, each file wastes about 1/2 block of storage.) b. Blocks in a given file are numbered 1, 2, 3, ... (there is no block 0.) Any given block can be accessed at any time by specifying its block number, which serves as a kind of on-disk "pointer" to the block. (In the discussion that follows, when we use the term "pointer" we mean such a block number.) A block number of 0 refers to a non-existent block - the disk equivalent of a null pointer. c. However, different blocks are usually stored at different places on the disk, so that accessing data in two different blocks entails the costs associated with two disk accesses. (Defragmentation of a disk can bring consecutive blocks physically together, which reduces this cost somewhat, but the benefits are not permanent, and the code cannot be written to assume the ability to access two successive blocks in one operation.) 5. Most of the structures we have considered do not lend themselves well to on-disk implementation because they require accesses to too many different parts of the structure. E.g. a binary search tree of 1000 nodes has minimum height 10, so would require 10 disk accesses / item. II. Hashtables on disk -- ---------- -- ---- A. While most of the search structures for main memory that we have looked at do not adapt well for use on disk, one does - the hashtable. B. Recall that a hashtable consists of a number of buckets, each of which in turn consists of a number of slots that can hold a key and its associated value, with the hash function determining what bucket holds a given key-value pair. 1. When a hashtable is stored in main memory, normally each bucket has just a single slot. 2. But when a hashtable is stored on disk, an entire block is normally used as a bucket, with as many slots as there is room in a block for key-value pairs. Example: If the block size is 4096, and a key-value pair requires 100 bytes, then a bucket will have 40 slots (with some bytes available for overhead or wasted) a. In this case, the hash function determines which disk block should hold a given key and its associated value, with a search of the bucket being needed to find the correct slot. b. However, the time for this search is typically very small relative to the access time for the bucket in the first place. C. Another difference between in-memory and on-disk hashtables arises in conjunction with handling collisions. 1. When a collision occurs with an in-memory hashtable, a strategy like linear probing is used to find an available slot for the item being added. 2. Of course, collisions are less frequent when using a bucket with multiple slots; but if a bucket fills up, the normal strategy is to allocate an additional disk block, with the original block containing a pointer to this overflow block. This strategy is called chaining. Example: Suppose we had a hashtable on disk using a bucket size of 5, and six keys hashed to the same bucket. The following situation would result: --------------------- --------------------- | Primary bucket | | Overflow bucket | | [ first five keys | | [ sixth key and | | and associated | | associated | | values ] | | value ] | | o---|--->| | --------------------- --------------------- a. At this point, any new keys that hash to this chain would be added to the overflow bucket until it fills up - at which time another overflow bucket could be allocated, with the chain consisting of three buckets. b. In the worst case - typically resulting from an overfilled table or a poor hash function - a chain could become long, resulting in performance tending toward O(n) rather than O(1). But this is improbable if the hash function and table size are chosen well. III. B-Trees --- ------- A. Now, we consider a search structure specifically designed for use with disk files: the B-Tree. 1. A B-Tree is a form of search tree in which breadth is traded for depth. Each node contains multiple keys (instead of 1 as in a binary search tree), and so has multi-way branching rather than 2-way branching. 2. The result is a very "bushy" sort of tree, with typical heights in the range 2-4, thus requiring only 2-4 disk accesses per operation. 3. Further performance improvements can be had by keeping a copy of the root block in main memory while the file is open, reducing the effective height of the tree by 1. It may also be possible to cache the blocks at the next level down. These two steps could reduce the number of disk accesses needed for most operations to 1 or 2. B. Preliminary: An m-way search tree. 1. An m-way search tree is a tree of degree m in which: a. Each node having s children (2 <= s <= m) contains s-1 keys. Let the children be called t .. t and the keys k .. k 0 s-1 0 s-2 b. The keys and children obey the following properties: i. k < k < k ... < k (or <= if duplicate keys are 0 1 2 s-2 allowed; we will assume < and no duplicates for our examples.) ii. All the keys in child t are < k 0 0 iii. All the keys in child t (1 <= i < s-1) lie between k and k i i-1 i iv. All the keys in child t are > k s-1 s-2 2. Observe that a binary search tree is simply an m-way search tree with m = 2. 3. Examples of 4-way search trees: C F K A / / \ \ / \ A B D E HIJ L B / | \ / | \ /||\ /\ / \ C (where the empty children are called / \ failure nodes, and are represented by D "null pointers") / \ E / \ F / \ etc.. a. Clearly, the first is much more desirable than the second! b. Note that if the first example were implemented as a binary search tree instead, it would have height 4 instead of 2, a sizable cost increase at 10ms per disk access. (The savings become even larger as m increases. For example, the first example could be implemented as a 12-way search tree with only 1 level.) 4. Observe that a 2-3-4 tree is simply a variant of a 4-way search tree. In fact, B-Trees generalize some of the ideas we used with 2-3-4 trees, though the standard algorithms for maintaining B-Trees are slightly different from those used with 2-3-4 trees. 5. When a search tree is stored on disk, each node is typically one block (or cluster) and the branching factor m is chosen so that a node with a maximal number of keys and children just fits in one block (cluster). C. Definition of a B-Tree 1. As was true with binary search trees, we recognize that m-way search trees can be very efficient if well-balanced, but have undesirable degenerate cases. With binary search trees, we defined a variants such as AVL trees and Red-Black trees that avoid the degenerate behavior. We do the same here. 2. A B-Tree of order m is an m-way search tree in which: a. All the failure nodes are on the same level. (The term "failure node refers to the empty subtree one ultimately encounters when one is searching for an nonexistent key. Of course, there isn't really any such node - it is represented by an impossible block number (corresponding to a null pointer in an in-memory tree.)) b. Each internal (non-failure) node, except the root, has at least __ __ | m | ___ children 2 c. The root, if it is not a failure node (meaning the tree is totally empty), has at least 2 children. d. Of our two examples, only the first is a B-Tree 3. Examples: which of the following is/are B-Tree(s) of order 5: E J O E J O E J O E J O / / \ \ / / \ \ / / \ \ / / \ \ ABC F KLM PQRS ABF GH KL PQRS AB FI KL PQRS AB HI KL PQRS //\\ /\ //\\//|\\ //\\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\ FG /|\ no: Node "F" no: the tree is yes no: all the failure has only 2 not a search tree, nodes are not on the children since F > E same level 4. Note: because all the failure nodes of a B-Tree are on the same (bottom) level, we normally do not bother to draw them. Thus, we will draw the one good tree in the above example as follows from now on: E J O / / \ \ AB FI KL PQ D. Some properties of a B-Tree 1. What is the MAXIMUM number of KEYS in a B-Tree of order m of height h? (Measuring height in terms of the number of NODES). a. In such a tree, each non-failure node would have the maximal number of children (m), and thus the maximal number of keys (m-1). Thus, we would have: 1 node m-1 keys at level 1 m nodes m * (m-1) keys at level 2 m**2 nodes m * m * (m-1) keys at level 3 ... m**(h-1) nodes m**(h-1) * (m-1) keys at level h -------------------- m**h - 1 keys total b. Compare our result for complete binary trees of height h - 2**h - 1 nodes. (h measured in NODES) 2. What is the MINIMUM number of KEYS in a B-Tree of order m of height h? (Measuring height in terms of the number of NODES). a. In such a tree, the root would have only 2 children (1 key), since this is the minimum allowed for a root. All other nodes would have ceil(m/2) children, and ceil(m/2) - 1 keys. b. For convenience, let c = ceil(m/2). 1 node 1 key at level 1 2 nodes 2 * (c-1) keys at level 2 2*c nodes 2 * c * (c-1) keys at level 3 2*c**2 nodes 2 * c**2 * (c-1) keys at level 4 ... 2*c**(h-2) nodes 2 * c**(h-2)*(c-1) keys at level h ----------- 2 * [c**(h-1)-1] + 1 = 2 * c**(h-1) - 1 keys total 3. To determine the height of a B-Tree of order m containing n keys, we solve each of the above for h, as follows: a. From the equation for the maximum number of keys, we know: n <= m**h - 1 or, solving for h: n+1 <= m**h log (n+1) <= h m Now, since h must be an integer, we can take the ceiling of the log to obtain: ___________ | log (n+1) | <= h m b. From the equation for the minimum number of keys, we know: n >= 2 * c**(h-1) - 1 or, solving for h (n + 1) ------- >= c**(h-1) 2 log ((n+1)/2) >= h-1 c h <= 1 + log ((n+1)/2) - - | m | ___ 2 Now, since h must be an integer, we can use the floor of the log to obtain: h <= 1 + | log ((n+1)/2) | (floor of the log to the base | _ _ | ceiling m/2 of ((n+1)/2)) | | m | | | ___ | | 2 | |___________________| c. Combining the above results from minimal and maximal trees, we obtain the following bounds for h: ceil(log (n+1)) <= h <= 1 + floor(log (n+1)/2) m ceil(m/2) 4. Some examples: a. 1 million keys - B-Tree of order 200: height is 3 - Lower bound is ceil(log 1,000,001) = 3 200 (Note that a maximal tree of height 2, order 200, contains 39,999 keys - so tree must have height at least 3) - Upper bound is 1+ floor(log 500,001) = 3 100 (Note that a minimal tree of height 4, order 200, contains 1,999,999 keys, so tree must have height no greater than 3) a. 2 million keys - B-Tree of order 200: - Lower bound is still 3 - Upper bound is now 4 so the height could be 3 or 4. E. An important note: In our discussion, we have talked only about nodes containing KEYS. In practice, we build search structures to allow us to associate keys with VALUES (e.g. name = key; phone number and address = value). In the form of B-Tree we are discussing, then, a node actually contains s pointers to children, s-1 keys, and s-1 values. These can be stored in one of two ways: 1. The actual value can be stored in the node. This, however, can reduce m, and thus the branching factor of the tree, if the size of the value is large compared to that of the key (as it often is). Example: node size 8000, key length 12, pointer size 4 bytes, allows m = 500 if we don't store any value with the key if we also have to store a value of size 36, however, we would reduce m to 125 2. The node can contain the number of another disk block that stores the actual value. (This additional pointer adds minimally to the size of the node.) However, this means that successful searches require an additional access to get the data. 3. We can use a variant of the B-Tree called a B+ Tree - to be discussed shortly. IV. Operations on B-Trees: -- ---------- -- ------- A. For our examples, we will use the following B-Tree of order 3 (sometimes called a 2-3 tree, since each node has 2 or 3 children and 1 or 2 keys). We use order 3 to keep the size of the examples down: J T C F M P Y AB DE GH KL NO R UW Z B. Locate a given key k in a (sub) tree whose root is block t. (Assume blocks in the file are numbered 1 .. , with 0 denoting a failure node.) InfoType locate(KeyType k, int t) { if (t == 0) search fails --- else { read block t from the disk determine how many keys it holds int i = 0; while (i < number of keys && key[i] < k) i ++; if (i < number of keys && key[i] == k) return associated information else return locate(k, child[i]) } } Example: Locate J: succeeds immediately at the root Locate L: at the root, we end the while loop with i = 1, since key[1] = 'T', so we go to the second child of the root in that child, the while loop exits with i = 0, (since key[0] = 'M' >'L'), so we go to the first child. In that node, we find what we are looking for. Locate Z: at the root, we end the while loop with i = 2, since i = Number of keys, so we go to the third child of the root in that child, the while loop exits with i = 1, for the same reason, so we go to the second child. In that node, we find Z Locate X: at the root, we end the while loop with i = 2 In the third child, the while loop exits with i = 0, since key[0] = 'Y' > 'X', so we go to the first child. In that node, we exit the while loop with i = 2. Since the third child (and all children, in fact) of this node is a failure node, the search fails. C. Inserting a new node. (Assume that we disallow duplicate keys). 1. We first proceed as in locate, until we get to a leaf - that is, a node whose children are empty. (Of course, if we find the key we are looking for along the way, we declare an error and quit.) 2. If the leaf node we have arrived at contains less than the maximum number of keys, then we simply insert the new key at the appropriate point, and add an extra failure node child pointer (0). Example: Insert S: We work our way down to the node containing R. Since it contains only one key, and can hold two, we add S, and our tree becomes: J T C F M P Y AB DE GH KL NO RS UW Z Note that inserting a new key in a leaf may require moving other keys over. Example: Insert Q in the original tree. Result: J T C F M P Y AB DE GH KL NO QR UW Z ^ |__ R has been moved over one place 3. Life becomes more interesting if the leaf we reach is already full. (E.g. consider trying to insert "X" in the above.) In this case, we cannot add a new node on a lower level, since this would violate one of the B-Tree constraints. Instead, we proceed as follows: a. Allocate a new node from a free list, or extend the file by one node. b. Redistribute the keys in the original node, plus the new key, so that: - The first half remain in the original node - The middle key in the order of keys is held out for a use to be explained shortly. - The second half go into the new node. Note: The key we were inserting can go into either of the nodes, or it might be the middle key. (e.g. if we are inserting X, it will go into the new node; but if we are inserting V into the same node, it would be the middle key.) c. Insert the middle key we saved out, plus a pointer to the newly created node, into the parent at an appropriate point, just after the pointer that we followed to go down to the node we split. Of course, this means we move keys and pointers into the parent to make room. Example: insert X into the original tree __________ this key was promoted into parent J T | v C F M P WY AB DE GH KL NO R U X Z ^ ^ | |___ new node now contains X |______ original node now contains only U 4. Observe that this strategy guarantees that the resulting tree will still meet all the tests for a B-Tree. a. Clearly, all the leaves are still on the same level. b. What about the number of children of the new nodes? - If we are forced to split a node, it is because it contained the maximum number of keys before insertion - m-1. With the new key, this gives m keys, to be divided two ways plus the promoted key. This leaves m-1 keys to be divided. - If m is odd, then each node gets (m-1)/2 keys, and has (m+1)/2 children, which is exactly ceil(m/2), as required. - If m is even, then one node gets m/2 keys - 1, and the other gets m/2 keys. The smaller node then has m/2 children, which is exactly ceil(m/2), as required. (The larger node has more than the minimum, which is fine.) 5. Now what if there is no room for the promoted key in the parent? (Example: insert I into the original tree. Node GH splits, with H to be promoted to node CF. But this node has no room for another key and child.) a. Solution: split the parent as before, creating a new parent to hold half of the keys and pointers to half of the children. Again, promote one key and a pointer to the new node up one level. b. Note that, if carried to its ultimate, this can result in splitting the root, which is how the tree gains height. At this point, the single middle key resulting from the splitting of the root becomes the one key in the new root. (This is why we allow the root of a B-Tree to have as few as 2 children.) Example: insert I: J / \ F T / \ / \ C H M P Y / \ / \ / | \ / \ AB DE G I KL NO R UW Z 6. You will note that the approach we have taken to splitting nodes in a B-Tree is somewhat different from the one we used with 2-3-4 trees. Here, we have postponed splitting a node until absolutely necessary. a. If one splits a full node when it is not necessary to do so, the result won't be a BTree if m is odd - e.g. when inserting S into the above example, if we split MP (which we don't have to), and promoted one of the keys to the node containing T, we'd end up with one empty node in the middle of the tree! If m were > 3, we would not end up with an empty node, but would end up with one child having less than m/2 children. b. The price is more complex code; but given the time required for disk accesses this policy makes some sense, since it postpones height increases until the absolute last moment possible. (One could choose to compromise the B-Tree requirement and use the 2-3-4 tree approach of anticipating the need for splits on the way down the tree, however - trading simpler code for earlier splitting of the root.) D. Deletion from a B-Tree 1. As we have seen in other kinds of trees, deleting a key from a leaf will be much simpler than deleting a key from an interior node. As before, then, we use the trick of converting a deletion from an interior node into a deletion from a leaf by promoting a key from a leaf - typically the first key in the leftmost subtree of the child just after the key. Example: to delete J from the root of our original tree, we would promote K to take its place, and delete K from the leaf. 2. Deleting a key from a leaf is basically trivial - we simply slide other keys over as necessary to fill in the gap. Example: Delete N from our original tree: J T C F M P Y AB DE GH KL O R UW Z 3. However, we can run into a problem if the leaf we are deleting from already contains the minimal number of keys. Example: Delete R from our original tree. 4. In this case, we essentially reverse the process we used to deal with an over-full node on insert. a. We find one of the siblings of the node from which we are deleting (we can use either side.) b. We rearrange keys between the node we are working on and the sibling so as to give each the minimal number, if possible. This will mean changing the divider key between them in the parent. Example: When deleting R from the original tree, we can combine R's node with NO and rearrange as follows: J T C F M O Y AB DE GH KL N P UW Z c. If, as a result, we do not have enough keys to make two legal nodes (i.e. if the sibling we are using also contains the minimal number of keys), then we combine the two nodes into one, also removing a key and child pointer from the parent. Example: working with the above (not the original tree), we can now try to delete P. Since the only node we can combine with is N, and it has the minimal number of keys already, we must pull O down from the parent and combine everything into one node, recycling the other: J T C F M Y AB DE GH KL NO UW Z d. Of course, removing a key from the parent may get us into trouble as well. (e.g. suppose that, in succession, we removed L, N, and then O from the above). In this case, the parent may have to "borrow" keys and children from a sibling. In an extreme case, we may even have to merge the parent with its sibling, and could ultimately even reduce the height of the tree if we had to merge two children of the root. Example: Recall the tree we got by splitting the root: J / \ F T / \ / \ C H M P Y / \ / \ / | \ / \ AB DE G I KL NO R UW Z Suppose we now try to delete Z: - We have to merge UW with the now vacated node. J / \ F T / \ / \ C H M P W / \ / \ / | \ / \ AB DE G I KL NO R U Y Suppose we now delete Y: We must merge with U, but this pulls W out of the parent, leaving it with too few keys. (Zero, in this case; but in a higher degree tree we're in trouble when the number of keys drops below ceil(m/2) - 1. - We therefore rearrange keys and children with MP: J / \ F P / \ / \ C H M T / \ / \ / \ / \ AB DE G I KL NO R UW V. Variants of the B-Tree - -------- -- --- ------ A. Achieving good performance with a disk-based search structure requires keeping the height of the tree down (2 is rarely practical, but 3 is often ideal) - which in turn entails using a large branching factor. Indeed, if we know the desired size of the tree, we can calculate the minimum branching factor needed to guarantee a certain height, using formulas derived earlier. In particular, for a B-Tree, we can guarantee a maximum height of 3 as follows: n Minimum m 1000 15 10,000 34 100,000 73 1,000,000 158 10,000,000 341 100,000,000 736 (Clearly, m grows rather slowly with n) B. However, considerations of node size may make it difficult to achieve the necessary branching factor. Recall that, in a B-Tree, each node contains up to m pointers, m-1 keys, AND M-1 VALUES ASSOCIATED WITH THE M-1 KEYS. 1. Suppose we need an m-value of 20 for some application in which the keys are 10 bytes long and the associated values are 100 bytes long. If a pointer is 4 bytes, then the minimum node size is 4*20 + 19*(10+100) = 2170 bytes 2. For efficient performance, we must guarantee that each node can be read or written with a single disk access - which requires that the entire node reside in a single contiguous blocks on disk. Since this is dictated by disk geometry and system software, the designer of a B-Tree is usually faced with a fixed upper bound to node size. Example: If we were working with a disk that restricted the block (or cluster) size to 2048, then we could not achieve the desired branching factor with these parameters. C. Two techniques can be used to hold the node size down while still achieving a desired branching factor. These techniques lead to two variants of the B-Tree. D. An often-used variant of the B-Tree known as the B+ tree. 1. We saw that the total size of a node is generally the limiting factor in terms of the value of "m" that can be used for a B-Tree. Here, of course, the main villain is usually the value that is stored in the node along with the key. If the value is - say - 10 times bigger than the key, then its presence in the node reduces the potential branching factor by a ratio of almost 10:1! 2. One way to address this would be to not store the values in the tree at all. a. Rather, each node would contain up to m child pointers, m-1 keys and m-1 POINTERS TO VALUES STORED ELSEWHERE. b. The difficulty with this scheme, though, is that once the tree has been searched to find the desired key, an additional disk access is needed to find the data. The effect on performance is the same as if the height of the tree were increased by one, so this may undo the gain obtained by using the higher branching factor. 3. A B+ tree addresses this problem as follows: a. Values are only stored in the lowest level of the tree. Nodes at higher levels contain keys, but not values. b. This means that the branching factor in the upper levels is much greater than the branching factor at the lowest level (where the children are failure nodes.) Example: assume nodes are 512 bytes, keys are 10 bytes, values are 90 bytes, and pointers are 4 bytes. Each node in the lowest level of a B+ tree could store up to 5 key-value pairs, with 12 bytes to spare. (No pointers need be stored, because the 6 children are all failure nodes.) Each node at upper levels would have branching factor 37. It would store up to 36 key-pointer pairs, plus one extra pointer, with 4 bytes to spare. We assume that we can distinguish between a leaf and a non-leaf node in some way during our processing - perhaps by keeping track of the height of the tree or by tagging the node itself or the pointer to it in some special way. c. Of course, this means that all keys must occur at the lowest level of the tree, so that a value can be stored with them. The keys in the upper levels, then, are copies of keys stored lower down; some keys are stored twice. In particular, each upper level key is a copy of the least key in the subtree to its right. (Alternately, we could store the greatest key in the subtree to the left.) Example: given the above scenario, assume that we have a B-Tree that holds the 26 letters of the alphabet as keys. Since the maximum branching factor of a leaf is 6, the minimum branching factor would be 3, and each leaf would hold 2-5 keys. Thus, we would have 6-13 leaves, which could easily be accomodated as children of a single root node. Thus, our tree might look like this: ------------------------------------- | C F J L O R T W | ------------------------------------- / | | | | | | | \ AB CDE FGHI JK LMN OPQ RS TUV WXYZ Note that the separator keys in the root are copies of the first key in the leaf to the right of the separator. We could also have chosen to store the last key in the leaf to the left. d. For this particular set of characteristics, we might contrast: i. This B+ tree of height 2, with plenty of room to grow without gaining height. ii. An ordinary B-Tree of order 6 (where all levels hold values) - which would be of height 3 for this configuration. e. Just to illustrate the concept of a B+ tree further, we consider the what might happen with different assumptions about how many keys would fit in a non-leaf node, so that we end up with a three-level tree, like the following: ----- | L | ----- / \ --------------- ------------------ | C F J | | O R T W | --------------- ------------------ / | | | | | | | \ AB CDE FGHI JK LMN OPQ RS TUV WXYZ Note that the root key - L - is a copy of the smallest key in its right subtree, which doesn't actually occur in its child but only at the bottom of the tree.) 4. A common modification of the B+ Tree is to add across the bottom level, like this: ----- | L | ----- / \ --------------- ------------------ | C F J | | O R T W | --------------- ------------------ / | | | | | | | \ AB->CDE->FGHI->JK-->LMN->OPQ->RS->TUV->WXYZ a. There is a slight overhead for this, of course - but typically the key/value size doesn't evenly divide the block size, so some extra bytes are available for the link. b. This arrangement facilitates range queries, where we want to find all entries whose keys lie in a certain range. E. A B* Tree of order m is an m-way search tree in which each node (save the root) has a minimum of (ceil (2m-1)/3) children and a maximum of m children. 1. Nodes in a B* Tree of order m have the same size as those in a B Tree of order m, but their minimum branching factor is greater. 2. This is achieved by using the following strategy on insertion of a new key: a. If the leaf in which the key belongs has room for the new key, then it is put there (as with the basic B-Tree.) b. However, if the leaf is full, then instead of splitting the leaf we choose one of its siblings and attempt to redistribute keys between the two leaves. (This is sort of the reverse of what we did when deleting a key from a B-Tree.) Example: A B* Tree of order 5 has 4-5 children (3-4 keys) for each node. Consider the following example of such a tree: E K P V / | | | \ ABCD FGHI LMN RSTU XYZ If we go to insert Q, we find that leaf RSTU is full. Instead of splitting it (which would force a split of the root and a new level in the tree), we combine RSTU with one of its siblings - say LMN - and rearrange keys between them and the divider in the parent to get E K Q V / | | | \ ABCD FGHI LMNP RSTU XYZ c. If the chosen sibling is also full, then we combine the keys from the two nodes and split the result to give three nodes. This preserves the ceil((2m-1)/3) branching factor. 3. To see the advantage of the B* Tree, consider the following table showing the minimum number of keys in a B Tree and a B* Tree of height 3 for different values of m m minimal B Tree height 3 mimimal B* Tree height 3 5 17 17 10 49 97 20 199 337 50 1249 2177 100 4999 8977 200 19,999 35,377 (The advantage is even greater for higher trees)