"Xanadu" and the Flaming-X logo are registered trademarks of Project Xanadu.

Google
  Entire Web   sunless-sea.net   xanadu.net   udanax.com
 
  9 Visitor(s) on Site
  98 Wiki Pages

Recently Modified

Sections (edit)

How This Wiki Works

SearchWiki:
Recent Changes Printable View Page History Edit Page
Content Last Modified on December 30, 2007, at 12:05 PM CST

About Tree Data Structures

- trees versus hashes (we need range operations, which are lacking in hashes)

- most trees are organzized along one dimension of search

(TBD: add lots of links to background material on wikipedia et. al.)

Operations

Dynamic set operations, in time proportional to the height of the tree.

  • search
  • predecessor
  • successor
  • minimum
  • maximum
  • insert
  • delete

Balanced versus Unbalanced

A fully balanced tree has a height "logb n" where n is the number of nodes in the tree, and b is the number of children per node. The objective of balancing is to minimize the number of levels in the tree in order to obtain the best running time.

Types of Trees

binary-tree?
has at most two child nodes per node
red-black?

A binary tree balanced by rotating nodes to adjust the height of the tree.

A rotation

        A
      /    \
     X      B
            /  \
           Y    Z
Goes to:
        B
      /    \
      A     Z
    /  \
   X    Y

This will reduce the height of the tree, if Y or Z are more than 2 nodes taller than X.

Red/black trees have an invariant that all black nodes are perfectly balanced, and red nodes can fill the levels between black nodes. Alternatively, all paths in the tree have the same number of black nodes in them, and any red node has a black father. See Corman, Leiserson, and Rivest for good explanations.

AVL

All nodes have 2 or 3 children -- simpler code than Red/Black, but also non-uniform nodes, which complicates storage management.

b-tree
Has many more than two children per node, and variable children per node. Fixed size nodes (generally an integral number of disk blocks. Always perfectly height balanced, as when a node fills up, the key can be pushed into the parent. If the root is full, the tree grows by one level with the creation of a new root, and splitting of the old root. Key factors are average fullness of nodes, and the heuristics used to make sure that disk space usage is efficient. The basic algorithm is trivial, the refinements have been subject to years of academic and commercial work.
quad?

graphical query tree that where each node divides a 2-d index space into 4 quadrants based on the key value. These are very hard to balance incrementally -- I don't know that anyone does so.

Octrees are the 3-d variant.

K-dimensional?

Another multidimensional tree, these support search in N-dimensional spaces: at each level one dimension is used. the 2-variant are XY trees, splitting first on x-coord, then y-coord, then back to x, etc. again, incremental balancing not sensible.

Caching

Large trees often cannot all be held in primary storage (RAM) at one time. Instead a virtualization of storage, by combining an in-memory cache with secondary storage is used in these cases. Optimizing the number of nodes becomes more important with splitting storage across primary and secondary.

Locking

For external storage, B-trees rule the tree roost because they are easy to work with in multi-user scenarios -- if you split full nodes on the way down, at most 2 nodes need be locked at a time, and the locks are always taken from the top-down, preventing deadlock. High branching factors of 512-4096, typically mean that most b-trees are 3-4 levels even for enormous databases, with the root kept in memory. This limits disk accesses to 2 at most to get to any data -- caching can make the average much smaller.

Mutability versus Immutability

Concurrency Issues

Performance re the Big O

(insert table of factors)

Relevant Citations

  • Bayer, R., M. Schkolnick. Concurrency of Operations on B-Trees. In Readings in Database Systems (ed. Michael Stonebraker), pages 216-226, 1994.
  • Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, Introduction to Algorithms, MIT Press, Massachusetts: 1998.
  • Gray, J. N., R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of Locks and Degrees of Consistency in a Shared Data Base. In Readings in Database Systems (ed. Michael Stonebraker), pages 181-208, 1994.
  • Kung, H. T., John T. Robinson. On Optimistic Methods of Concurrency Control. In Readings in Database Systems (ed. Michael Stonebraker), pages 209-215, 1994.
Recent Changes Printable View Page History Edit Page
All trademarks and copyrights on this page are owned by their respective companies. Comments are owned by the Poster. The Rest (c) 2001-2007 Jeff Rush