Persistent Data Structures in C++ — Persistent Binary Search Tree

Overview

Persistent data structures preserve previous versions after updates: instead of mutating in place, updates create new versions that share unchanged structure with older versions. This enables time-travel queries, cheap snapshots, and safer concurrent reads.

Core Concepts

Immutability

Immutability means nodes are not modified after creation. When an update occurs, only the nodes on the path from the root to the changed element are copied; the rest of the tree is shared between versions. This is commonly called the node-copying technique.

Types of Persistence

There are several persistence models. Partial persistence allows queries on all versions but updates only on the newest. Full persistence allows updates on any version, producing branches. Confluent persistence supports merging versions.

Type	Read Old Versions	Update Old Versions	History Shape
Partial	Yes	No	Linear
Full	Yes	Yes	Tree
Confluent	Yes	Yes (merge)	DAG

Designing a Persistent BST in C++

In C++ you can implement persistence by making nodes immutable and using smart pointers to share structure. std::shared_ptr is a natural fit: multiple versions can hold shared ownership of unchanged subtrees while new nodes are allocated for changed paths. This approach keeps memory management safe and concise.

Key API Ideas

Node is immutable after construction.
Tree is a lightweight handle holding a std::shared_ptr<Node> root.
Insert/Erase return a new Tree (new root pointer) leaving the old Tree intact.

Minimal Example

The following example demonstrates the node-copying pattern with std::shared_ptr. It is intentionally compact to show the idea; production code should add balancing, iterators, and exception safety.

// persistent_bst.hpp
#include <memory>
#include <optional>

template<typename T>
struct Node {
  T key;
  std::shared_ptr<Node> left;
  std::shared_ptr<Node> right;
  Node(const T& k, std::shared_ptr<Node> l=nullptr, std::shared_ptr<Node> r=nullptr)
    : key(k), left(l), right(r) {}
};

template<typename T>
class PersistentBST {
  using NodePtr = std::shared_ptr<Node<T>>;
  NodePtr root;
public:
  PersistentBST() = default;
  explicit PersistentBST(NodePtr r): root(std::move(r)) {}

  PersistentBST insert(const T& key) const {
    return PersistentBST(insert_node(root, key));
  }

  bool contains(const T& key) const {
    auto cur = root;
    while(cur) {
      if(key == cur->key) return true;
      cur = key < cur->key ? cur->left : cur->right;
    }
    return false;
  }

private:
  static NodePtr insert_node(NodePtr node, const T& key) {
    if(!node) return std::make_shared<Node<T>>(key);
    if(key == node->key) return node; // no-op, share node
    if(key < node->key) {
      // copy current node with a new left child
      return std::make_shared<Node<T>>(node->key, insert_node(node->left, key), node->right);
    } else {
      return std::make_shared<Node<T>>(node->key, node->left, insert_node(node->right, key));
    }
  }
};

How it works: each insert walks the path and allocates new nodes only for that path; unchanged subtrees are shared via std::shared_ptr. Old versions keep owning their roots and remain valid.

Version Control and Practical API

Model each version as an immutable handle (for example, a small struct with a root pointer and a version id). Keep a version table or vector of roots to allow O(1) access to any version's root. Creating a new version is just storing the returned root pointer. This makes branching and time-travel queries trivial.

Example pattern:

// versioned usage
std::vector<PersistentBST<int>> history;
PersistentBST<int>> base;
history.push_back(base);
auto v1 = history.back().insert(42);
history.push_back(v1);
auto v2 = history.back().insert(7);
history.push_back(v2);

Use Cases in Concurrent Programming

Read-heavy concurrency benefits most: immutable versions allow lock-free reads because readers never observe in-place mutations. Writers create new versions and then publish the new root pointer with an atomic store or via a version table update. Readers can continue using older roots safely.

Common patterns:

Snapshotting for consistent views without global locks.
Lock-free readers with a single atomic pointer to the current version.
Multi-version concurrency control where transactions read a version and commit by producing a new version.

Trade-offs: persistence reduces in-place mutation bugs and simplifies reasoning, but it increases memory usage (shared_ptr overhead and extra allocations) and can add allocation pressure. For high-performance workloads, consider custom memory pools or intrusive reference counting to reduce overhead.

Quick Reference

When to use persistence

Audit logs and immutable histories
Concurrent read-heavy systems
Undo/redo and time-travel debugging

Pros

Safe concurrent reads
Cheap snapshots
Clear reasoning about state

Cons

Higher memory and allocation overhead
Potential performance cost without balancing

Implementation tips

Use std::shared_ptr for simplicity and safety.
Consider pooling or arena allocators for hot paths.
Add balancing (AVL, red-black, treap) to keep path lengths small.

Next article: Writing Readable C++ Code