- 论坛徽章:
- 0
|
Red-Black Trees
Contents:
Red-Black Tree Definition
Okasaki's Insertion Method
Rotations
The CLRS Insertion Method
Deletion
Exercise
The information on this page is a drawn in part from
Introduction to Algorithms, second
edition, by Cormen, Leiserson, Rivest and Stein (CLRS);
Purely Functional Data Structures, by
Chris Okasaki;
Franklyn
Turbak's lecture notes;
Chee Yap's lecture notes, which were on the web, but the link is now
broken; and comments from the students in my Algorithms class.
Red-Black Tree Definition
As defined in CLRS, a red-black tree is a binary search tree (BST) that
satisfies the following properties:
Every node is either red or black.The root is black.Every leaf (NIL) is black.If a node is red, then both its children are black. (Or in other
words, two red nodes may not be adjacent.)For each node, all paths from the node to descendant leaves
contain the same number of black nodes.
Data items are only stored at internal nodes. What we are calling a
"leaf" would probably be represented as a null pointer in the parent
node. It helps in describing the insertion and deletion algorithms to
think of these as actual nodes, colored black.
Theorem: The height h of a red-black tree with n
internal nodes is no greater than 2log(n+1).
Proof: By property 5, every root-to-leaf path in the tree has
the same number of black nodes; let this number be B. So there
are no leaves in this tree at depth less than B, which means
the tree has at least as many internal nodes as a complete binary tree
of height B. Therefore,
n≥2B-1. This implies
B≤log(n+1).
By property 4, at most every other node on a root-to-leaf path is
red. Therefore, h≤2B. Putting these together, we
have h≤2log(n+1).
Q.E.D.
The remainder of this page shows how insertion and
deletion can be done without violating the above properties, and in
time proportional to the height of the tree, or O(log n).
Parenthetical remarks relate this description to that in CLRS. If you
don't have CLRS you can ignore these.
Okasaki's Insertion Method
A node is inserted the same way as it would for a BST, and colored red.
(The children of the new node will be
leaves, which are by definition black.) At that point either property
2 (root is black) or property 4 (no two adjacent red nodes) might be
violated.
If the inserted node is the root (meaning the tree was previously
empty) then we simply change its color to black and we're done.
If property 4 is violated, it must be
because the new node's parent is also red. Since the root must be
black, the new node must also have a grandparent in the tree, and by
property 4 it
must be black. There are four possibilities for how this subtree can
be structured, vis a vis the new node, its parent and its grandparent,
as shown in the diagram below. In Okasaki's method, each possible
subtree is transformed into the subtree shown in the middle of the
diagram.
![]()
(A, B, C and D
refer to arbitrary subtrees. We have said that the children of the
new node must both be leaves, but we will see in a moment that the
diagram applies more generally).
First, notice that the ordering is
preserved by these transformations.
Second, notice that these transformations do not change the number of
black nodes on any path from the parent of this subtree (if one
exists) to the leaves. We are again in the position where the only
properties that can be violated are property 2 (if y is the root)
and property 4 (if y's parent is red). But the advantage
is that we are now two steps closer to the root. We can repeat this
operation until either y's parent is black, in which
case we're done, or y is the root, in which case we color
y black and we're done. (Coloring the root black
increases the number of black nodes on every path from the root to the
leaves by the same amount, so property 5 will not be violated after
that re-coloring if it wasn't violated before.)
This procedure preserves the red-black properties, and it takes time
proportional to the height of the tree, which is O(log n).
Rotations
Restructuring operations on red-black trees (and many other kinds of
trees) can often be expressed more clearly in terms of the
rotation operation, diagrammed below.
![]()
Clearly the order is preserved by the
rotation operation. Therefore, if we start with a BST, and only
restructure using rotations, then we will still have a BST. In the
remainder of this tutorial, we will only restructure with rotations,
and so we won't need to say anything more about the proper ordering of
the elements.
In the diagram below, the transformations in Okasaki's insertion
method are shown as one or two rotations. For comparison,
here
is the previous diagram. It makes a
nice visualization to put the two diagrams in about the same place on
your screen, and go back and forth between them using your browser's
forward and back buttons.
![]()
The CLRS Insertion Method
CLRS gives an insertion method that is more complicated than
Okasaki's, but slightly more efficient. It's time complexity is
still O(log n), but the constant embedded in the big-Oh is
smaller. You can safely skip this section and go to
deletion
.
As with Okasaki's method, this method starts with standard BST
insertion, and colors the new node red. It differs in how it handles
violations of property 4 (no two red nodes adjacent). We distinguish
two cases, based on the color of the uncle of the bottom red node.
(The bottom red node is the child node in the red-parent/red-child
pair. Its uncle is the red-parent's sibling.) First, let's consider
the case where the uncle is black. There are four subcases, depending
on whether each red node is a left or right child. The diagram below
shows how the tree should be restructured and re-colored. (This is
cases 2 and 3 in CLRS.)
![]()
It is interesting to see how this diagram compares with
Okasaki's insertion with rotations
.
(Again, it is interesting to go back and forth between these two
images using your browser's forward and back buttons.)
There are two differences. The first is in how the final subtree (in
the middle) is colored. In Okasaki's method, the root y
of the subtree is red and its children are black, while here
y is black and its children are red. Making
y black means that property 4, no two adjacent red nodes,
will not be violated at y, so the procedure does not need
to continue up the tree toward the root. The CLRS insertion method
never requires more than two rotations.
The second difference is that in the CLRS method this case has a
precondition that the uncle of the bottom red node is black. It can
clearly be seen in the diagram that if the uncle (which is the root of
either subtree A or D) were red, then there
would be two adjacent red nodes in the final tree, and so this method
could not be used.
It is straightforward to verify that the red-black properties are
satisfied in the final tree, if the only violation initially is the
adjacency of the two red nodes shown.
It remains to consider the case where the uncle of the bottom red node
is red. (This is case 1 in CLRS.) In that case the top red node and
its sibling (the uncle) are turned black, and their parent is turned
red. The tree is not restructured. There are four cases,
depending on whether the bottom red node is a left or right child, and
whether the top red node is a left or right child, but they are all
essentially the same. One case is pictured here:
![]()
It can easily be seen that the number of black nodes on a path from
the root of the tree to the leaves is not changed by this operation.
Afterwards, the only violation of the red-black properties would be if
the root of the subtree is the root of the tree, or if it has a
red parent. In other words, we might have to start over, but two
steps closer to the root. So, the procedure is repeated until
either (i) the parent of z is black, in which case we're done; (ii) z
is the root, in which case we color it black and we're done; or
(iii) we encounter the black-uncle case, in which case we do one or
two rotations and we're done. In the worst case we will have to
re-color nodes all the way up to the root, requiring
O(log n) operations.
Deletion
To delete an item from a red-black tree, we start by performing the
standard BST delete operation (see CLRS, chapter 12). To review,
standard BST deletion has three cases:
The node to be removed has no children. In this case we simply
remove it. If it is the root, then the tree becomes empty; otherwise
its parent's appropriate child pointer is set to null.
The node to be removed has one child. Again, we simply remove
it. If it is the root, then its child becomes the root; otherwise,
its parent's appropriate child pointer is set to point to the removed
node's child.
The node to be removed has two children. In this case we find
the node's successor, which is the minimum node in its right subtree.
The data elements in these two nodes are swapped, and then we remove
the successor node. Since the successor node cannot have a left
child, one of the first two cases will apply.
The node that is removed from the tree might not be the node that
originally contained the deleted data item. For the purposes of
re-establishing the red-black properties we are only concerned with
the node that is removed. Call this node v, and let its parent
be p(v).
At least one of v's children must be a leaf. If v
has a non-leaf child, its place in the structure of the
tree is taken over by that child; otherwise its place is taken over by
a leaf. Let u be the child that replaces v in the tree
structure after BST deletion. If u is a leaf, then we know it
is black.
If v is red, we are done - no violations of the
red-black properties will be introduced. So assume v is
black. All paths from the root to leaves that descended from
v will have fewer black nodes than other root-to-leaf
paths in the tree, which is a violation of property 5. If
p(v) and u are both red then property 4 will also
be violated, but it will turn out that our solution to the violation
of property 5 will also solve the property 4 violation without doing
any more work, so for now we will focus on the property 5 problem.
Let's imagine assigning a black token to
u. The token indicates that the simple paths to leaves
descending from the node holding the token are short a black node
(initially that is because v was removed). We will move the
token up the tree until we encounter a situation where property 5
may be restored.
The token is represented as a black square in the diagrams below.
If the node with the token is black, then we call it a
doubly black node.
Note that the token is a conceptual device, and is not physically
implemented in the tree data structure.
We distinguish four mutually exclusive cases.
If the node with the token is red or the root of the tree (or both),
simply set its color to black and we're done. Note that this removes
any violation of property 4 (no two red nodes adjacent). It also
removes a violation of property 5, for the following reason. The
token indicated that leaves descending from this node needed another
black node on their paths to the root in order to match other
root-to-leaf paths in the tree. By changing this red node to black,
we add one more black node on precisely those paths that were short a
black node.
If the token is at the root and the root is black, the token is simply
dropped. In this case, every root-to-leaf path in the tree has had
it's number of black nodes reduced by one, and property 5 still holds.
For the remaining cases, we can assume that the node with the token is
black, and is not the root.
If the sibling and both nephews of the doubly black node are also
black, then we change the sibling to red and move the token up one
step towards the root. (This is case 2 in CLRS.)
In the diagram below, which shows the two
possible subcases, the dashed line around
y indicates that we don't care at this point what color
y is, and the small circles above A,
B, C or D indicate that the
root of that subtree is black.
![]()
Changing the sibling to red removes one black node from the paths to
its descendant leaves, so that those paths now have the same number of
black nodes as those descending from the doubly black node. We move
the token up to the parent y to indicate that now
all paths descending from y are short a black
node. We haven't solved the problem, but we have pushed it up towards
the root one step.
Clearly this operation can only be done if both
nephews are black, since otherwise we would end up with adjacent red
nodes.
If the sibling of the doubly black node is red, then we perform a
rotation and change colors. (This is CLRS case 1.) The two
possibilities are shown in the following diagram:
![]()
This step doesn't change the number of black nodes on the path from
the root to any leaf, but it does guarantee that the doubly black
node's sibling is black, which enables either case B or case D in the
next step.
It might seem that we are going backwards, since the token is now
farther from the root than it was before. But the parent of
the doubly black node is now red, so if the next step is case B, the
token will be passed up to a red node, which will then turn black and
we will be done. Furthermore, as shown below, case D always consumes
the token and finishes the operation. So this apparent step backward
is actually a sign that we are almost done.
Finally, we have the case where the doubly black node has a black
sibling and at least one red nephew.
Define the near nephew of a node
x to be the left child of x's sibling if
x is a left child, and the right child of
x's sibling if x is a right child; define
x's far nephew to be its other
nephew. (In the diagram, x's near nephew is closer to x
than its far nephew.)
We now have two subcases: (i) the doubly black node's far nephew is
black, so its near nephew must be red; and (ii) the far nephew is red,
so the near nephew can be either color. (This is cases 3 and 4 in
CLRS.) As shown in the following diagram, subcase (i) is
transformed to subcase (ii) by a rotation and change of colors, and
subcase (ii) is solved by another rotation and change of colors. The
two rows are symmetrical, depending on whether the doubly black node
is a left or right child.
![]()
In this case we generate an extra black node, the token is dropped,
and we are done. It can easily be verified by looking at the diagram
that the number of black nodes on paths to leaves below the token is
increased by one, while the number of black nodes on other paths is
unchanged. And it is also clear that none of the other red-black
properties is violated.
Putting all of these cases together, we can see that in the worst
case we are following a path from leaf to root, and performing a
constant number of operations on each step, so the time complexity for
deletion is O(log n).
Exercise
Starting with an empty red-black tree, successively insert the
following values: 260, 240, 140, 320, 250, 170, 60, 100, 20, 40, 290,
280, 270, 30, 265, 275, 277, and 278. If you use Okasaki's insertion
method, and if I haven't made any mistakes, you should end up with
this tree:
![]()
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u1/34937/showart_673433.html |
|