• Static data structures are only for querying.
  • Dynamic data structures support updating.
  • Implicit data organization uses pointers.
  • Explicit data organization uses mathematical relationships.
  • Internal memory (first level memory) - RAM.
  • External memory (second level memory) - disk

 

Fundamental data structures

Fundamental data structures are used to build more complex data structures.

Sequence

Sequence stores elements in a linear order.

Supported operations:

  • insert an element in a given position
  • remove an element
  • insertAfter
  • insertBefore
  • size
  • head
  • tail
  • prev
  • next

Implementations of the sequence data structure:

  • array
  • single-linked list
  • doubly-linked list
  • stack
  • queue
  • concatenable queue - supports insert, delete, find, concatenate, and split operations that take O(logN) time [2]

Comparing the performance of an array and a single-linked list:

implementation access elements insert/delete elements
array O(1) O(N)
single-linked list O(N) O(1)

Comparing the performance of an array, a single-linked list, and a doubly-linked list. The doubly-linked list is the most efficient:

implementation  prev next insertAfter insertBefore insertHead insertTail
array  O(1) O(1) O(N) O(N) O(N) O(1)
single-linked list O(N) O(1) O(1) O(N) O(1) O(1)
doubly-linked list O(1) O(1) O(1) O(1) O(1) O(1)

 

Priority queue

  • Elements in a priority queue are ordered.
  • Operations of retrieving and removing the largest element are supported (removeMax).

Priority queues are used in sorting algorithms. A heap is one possible realization of the priority queue. A buffer tree is an example of an external memory realization of the priority queue.

Examples of the priority queue data structures:

  • min-max heap - a double-ended priority queue implemented as a modified version of a binary heap
  • pagoda - a priority queue implemented with a variant of a binary tree
  • binomial heap - an implementation of the mergeable heap abstract data type which is a priority queue supporting merge operation
  • Fibonacci heap - a heap data structure consisting of a collection of trees

Applications of the priority queue:

  • Scheduling systems
  • Sorting e.g. selection sort, insertion sort, heap sort. The idea is to insert elements to the queue one-by-one and then remove them from the queue in decreasing order using removeMax.

 

Dictionary and Trees

An implicit realization of the dictionary can be provided by a hash table

  • B-tree is a two-level memory data structure designed to search in large databases.
  • Fractional cascading technique speeds up searching for the same element in the dictionary.
  • Trie is an ordered tree that is used to store a dynamic set or associative array. In pattern matching and text compression algorithms a trie is a tree in which edges are labeled by letters or words.
  • Balanced search tree (e.g. AVL-tree, red-black tree, 2-3 tree, 2-3-4 tree)
  • A binary search tree is said to be weight balanced if half the nodes are on the left of the root, and a half on the right.
  • Splay trees are self-adjusting binary search trees used in caches and memory allocators. In a splay tree recently accessed elements have better access times than elements accessed less frequently [3].
  • AVL trees are balanced binary trees. AVL trees are often compared with red-black trees because they support the same set of operations and because both take O(log n) time for basic operations. AVL trees are more rigidly balanced than red-black trees, leading to slower insertion and removal but faster retrieval, so AVL trees perform better than red-black trees for lookup-intensive applications [3].
  • Scapegoat trees are self-balancing binary search trees, that provide worst-case O(log n) lookup time, and O(log n) amortized insertion and deletion time. Unlike other self-balancing binary search trees that provide worst case O(log n) lookup time, scapegoat trees have no additional per-node overhead compared to a regular binary search tree [3].
  • Treaps exhibit the properties of both binary search trees and heaps. A treap is a binary search tree that orders the nodes by a key but also by a priority attribute. The nodes are ordered so that the keys form a binary search tree and the priorities obey the max heap order property [3].

 

Intrusive containers

Intrusive containers link the object with other objects in the container rather than storing a copy of the object. Contrast this with non-intrusive containers that store a copy of an object. They can't store non-copyable and non-movable objects [3].

Intrusive containers vs. non-intrusive containers:

  • better performance: dynamic memory management is minimized, iterating elements uses less memory and it is faster, etc.
  • better exception guarantees 
  • the computation of an iterator to an element from a pointer or reference to that element is a constant time operation
  • localization of data (e.g. for a cache hit optimization) leads to measurable effects
  • intrusive containers offer external memory management while non-intrusive containers use allocators to manage memory internally
  • each object that is used as an element of an intrusive container needs some data members storing the information needed by the container e.g. to insert an object to an intrusive linked list, the class has to include the next and previous pointers
  • because objects are stored outside of the container, the user has to manage the lifetime of inserted objects independently from the containers
  • analyzing the thread safety of a program is harder because the container might be modified indirectly without an explicit call to a container member
  • in non-intrusive containers, an object can only belong to one container

The lifetime of a stored object is not bound to or managed by the container:

  • When the container gets destroyed before the object, the object is not destroyed. You have to be careful to avoid resource leaks.
  • When the object is destroyed before it is erased from the container, the container contains a pointer to a non-existing object.

Applications of intrusive containers:

  • multi-index containers
  • high-performance code like memory allocation algorithms

Semantically, intrusive containers are similar to non-intrusive containers holding pointers to objects.

 

Applications of data structures

  • Graphs and networks: adjacency matrix, adjacency list, dynamic expression tree, topology tree, parsers.
  • Text processing: string, suffix tree.
  • Geometry and graphics: binary space partition tree, trapezoid tree, range tree.

 

References

[1] Allen B. Tucker. "Computer Science Handbook, Second Edition". Chapman and Hall/CRC, 2 edition, 2004

[2] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. "The Design and Analysis of Computer Algorithms". Addison-Wesley, 1st Edition, 1974.

[3] Boost C++ Library documentation.