Overview

Persistent data structures preserve previous versions after updates: instead of mutating in place, updates create new versions that share unchanged structure with older versions. This enables time-travel queries, cheap snapshots, and safer concurrent reads.


Core Concepts

Immutability

Immutability means nodes are not modified after creation. When an update occurs, only the nodes on the path from the root to the changed element are copied; the rest of the tree is shared between versions. This is commonly called the node-copying technique.

Types of Persistence

There are several persistence models. Partial persistence allows queries on all versions but updates only on the newest. Full persistence allows updates on any version, producing branches. Confluent persistence supports merging versions.

TypeRead Old VersionsUpdate Old VersionsHistory Shape
PartialYesNoLinear
FullYesYesTree
ConfluentYesYes (merge)DAG

Designing a Persistent BST in C++

In C++ you can implement persistence by making nodes immutable and using smart pointers to share structure. std::shared_ptr is a natural fit: multiple versions can hold shared ownership of unchanged subtrees while new nodes are allocated for changed paths. This approach keeps memory management safe and concise.

Key API Ideas

Minimal Example

The following example demonstrates the node-copying pattern with std::shared_ptr. It is intentionally compact to show the idea; production code should add balancing, iterators, and exception safety.

// persistent_bst.hpp
#include <memory>
#include <optional>

template<typename T>
struct Node {
  T key;
  std::shared_ptr<Node> left;
  std::shared_ptr<Node> right;
  Node(const T& k, std::shared_ptr<Node> l=nullptr, std::shared_ptr<Node> r=nullptr)
    : key(k), left(l), right(r) {}
};

template<typename T>
class PersistentBST {
  using NodePtr = std::shared_ptr<Node<T>>;
  NodePtr root;
public:
  PersistentBST() = default;
  explicit PersistentBST(NodePtr r): root(std::move(r)) {}

  PersistentBST insert(const T& key) const {
    return PersistentBST(insert_node(root, key));
  }

  bool contains(const T& key) const {
    auto cur = root;
    while(cur) {
      if(key == cur->key) return true;
      cur = key < cur->key ? cur->left : cur->right;
    }
    return false;
  }

private:
  static NodePtr insert_node(NodePtr node, const T& key) {
    if(!node) return std::make_shared<Node<T>>(key);
    if(key == node->key) return node; // no-op, share node
    if(key < node->key) {
      // copy current node with a new left child
      return std::make_shared<Node<T>>(node->key, insert_node(node->left, key), node->right);
    } else {
      return std::make_shared<Node<T>>(node->key, node->left, insert_node(node->right, key));
    }
  }
};

How it works: each insert walks the path and allocates new nodes only for that path; unchanged subtrees are shared via std::shared_ptr. Old versions keep owning their roots and remain valid.


Version Control and Practical API

Model each version as an immutable handle (for example, a small struct with a root pointer and a version id). Keep a version table or vector of roots to allow O(1) access to any version's root. Creating a new version is just storing the returned root pointer. This makes branching and time-travel queries trivial.

Example pattern:

// versioned usage
std::vector<PersistentBST<int>> history;
PersistentBST<int>> base;
history.push_back(base);
auto v1 = history.back().insert(42);
history.push_back(v1);
auto v2 = history.back().insert(7);
history.push_back(v2);

Use Cases in Concurrent Programming

Read-heavy concurrency benefits most: immutable versions allow lock-free reads because readers never observe in-place mutations. Writers create new versions and then publish the new root pointer with an atomic store or via a version table update. Readers can continue using older roots safely.

Common patterns:

Trade-offs: persistence reduces in-place mutation bugs and simplifies reasoning, but it increases memory usage (shared_ptr overhead and extra allocations) and can add allocation pressure. For high-performance workloads, consider custom memory pools or intrusive reference counting to reduce overhead.

Quick Reference

When to use persistence

Pros

Cons

Implementation tips

Next article: Writing Readable C++ Code