Advanced Data Structures and Algorithms

This is a book to complement the Data Structures book and the Algorithms book, and assumes these books as prerequisites.

There are two conflicting goals in online book writing:

  • to be an introduction to fundamental material
  • to be a complete reference work that includes discussions of tangential material

We've decided to do both. The Data Structures text and the Algorithms text focus on just the fundamentals. This book (Advanced Data Structures and Algorithms) is a place for reference material. The idea is that a student in the span of a year or less can cover those fundamentals and then move on the advanced topics in this book.

Possible topicsEdit

  • Parallel Programming (section; an elaboration of divide and conquer, or pipeline decoupling for streaming algorithms)
  • Memory Management and Garbage Collection (whole chapter)
  • NP-Completeness (whole chapter)
  • Proof of the master theorem (section)
  • Quadtrees and Octrees and R-trees (chapter)
  • B-trees (chapter)
  • (pseudo) random number generators
  • Introspective Sort (section; a hybrid sorting algorithm)
  • Closures ( What Is Closure? )
  • Continuations ( continuations )
  • Cache-Aware and Cache-Oblivious Algorithms
  • Approximation
  • Maximum Flow
  • Primality testing

While there is no content here yet, please feel free to start writing sections! Because this book is more of a reference, chapters can be more independent of each other. Further, you can assume the reader already knows the basic data-structures and algorithms.

FFT Multiplication*Edit

As covered in the Algorithms book, by breaking an integer into two parts and treating it as a polynomial, we were able to derive a better-than-grade-school multiplication algorithm. Even more gains in efficiency can be made if we break the integer into three parts instead. However, we don't need to stop there: we can generalize the technique of polynomial multiplication and interpolation by breaking an n digit binary number into a degree n polynomial, which breaks the number up into its individual digits (and is the furthest we can divide such a number).

We noticed that evaluating the polynomial representations at points 1, -1, and 0 made the expressions easy, but if we are to use degree-2n polynomials, how can we come up with enough points to use? The trick is to evaluate the polynomial using complex numbers, in particular, by using "roots of unity": that is, complex numbers that are all distance 1 from the origin.


When dealing with lists, we normally have two choices: either we get O(1) indexed access but O(N) insertion with vectors, or O(N) indexing and O(1) insertion with linked lists. It'd be nice if we could find some middle ground, say with O(\log N) (amortized) running time for all operations. Well, Soren Sandmann has implemented GSequence, which uses a splay tree to achieve exactly that. [TODO: This structure doesn't have a widely accepted name yet. If someone can think of a better name, suggest it.]

I think B-tree algorithms give worst-case O(ln(N)) insertion, deletion, and access by name. Is GSequence better in some way? I've been told that some operating systems stored files and directories on disk as B-trees. --DavidCary 23:39, 21 July 2005 (UTC)

There's a number of balanced trees that give O(ln N) performance. The idea here is just wrapping a list interface around a tree. You might as well just say that you can use a tree since a list interface is only for implementation convenience.

There are various heap data structures that give O(log(N)) insertion, deletion and access by name.

I believe these are called random-access lists. Also, it is possible to get O(1) insertion and O(log N) indexing; e.g. Okasaki's skew-binomial random access lists. If there is an interest, I can upload a demonstration-quality implementation in Common Lisp.

Last modified on 6 June 2013, at 00:49