- 论坛徽章:
- 0
|
这个是wikipedia的资料:
Overview
The radix tree is easiest to understand as a space-optimized trie where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in tries, edges can be labelled with sequences of characters as well as single characters. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.
It supports the following main operations, all of which are O(k), where k is the maximum length of all strings in the set:
* Lookup: Determines if a string is in the set. This operation is identical to tries except that some edges consume multiple characters.
* Insert: Add a string to the tree. We search the trie until we can make no further progress. At this point we either add a new outgoing edge labelled with all remaining characters in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labelled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string characters.
* Delete: Delete a string from the tree. We delete the corresponding leaf and then delete its former parent, merging the two incident edges.
* Find predecessor: Locates the largest string less than a given string, by lexicographic order.
* Find successor: Locates the smallest string greater than a given string, by lexicographic order. In the case where the inputs are bitstrings and we've already performed a Lookup operation, both this and the previous operation are constant-time.
A common extension of radix trees uses two colors of nodes, 'black' and 'white'. To check if a given string is stored in the tree, the search starts from the top and follows the edges of the input string until no further progress can be made. If the search-string is consumed and the final node is a black node, the search has failed; if it is white, the search has succeeded. This enables us to add a large range of strings with a common prefix to the tree, using white nodes, then remove a small set of "exceptions" in a space-efficient manner by inserting them using black nodes.
[edit]
Applications
As mentioned, radix trees are useful for constructing associative arrays with keys that can be expressed as strings. They find particular application in the area of IP routing, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of IP addresses. They are also used for inverted indexes of text documents in information retrieval.
[edit]
History
Donald R. Morrison first described what he called "Patricia tries" in 1968; the name comes from the acronym PATRICIA, which stands for "Practical Algorithm to Retrieve Information Coded in Alphanumeric". Gernot Gwehenberger independently invented and described the data structure at about the same time.
[edit]
Comparison to other data structures
In the following comparisons, it is assumed that the keys are of length "k" and the data structure contains "n" elements.
Like self-balancing binary search trees, and unlike tries, radix trees support finding the lexographic successor/predecessor in O(1) time (once the lookup is complete). Unlike balanced trees, radix trees permit lookup, insertion, and deletion in O(k) time rather than O(log n). This doesn't seem like an advantage, since normally k ≥ log n, but in a balanced tree every comparison is a string comparison requiring O(k) worst-case time, many of which are slow in practice due to long common prefixes. In a trie, all comparisons require constant time, but it takes m comparisons to look up a string of length m. Radix trees can perform these operations with less comparisons and require many less nodes.
Radix trees also share the disadvantages of tries, however: as they can only be applied to strings of elements or elements with a reversible mapping (bijection) to strings, they lack the full generality of balanced search trees, which apply efficiently to any datatype with a total ordering (for example, floating-point numbers).
Hash tables have expected O(k) insertion and deletion times, while radix trees have worst-case O(k) insertion and deletion. The successor/predecessor operations of radix trees are also entirely impractical for hash tables.
如果它这句“every internal node has at least two children.”正确, 那么LKD的说法又多了一个错误…… |
|