Overview
Persistent data structures preserve previous versions after updates: instead of mutating in place, updates create new versions that share unchanged structure with older versions. This enables time-travel queries, cheap snapshots, and safer concurrent reads.
Core Concepts
Immutability
Immutability means nodes are not modified after creation. When an update occurs, only the nodes on the path from the root to the changed element are copied; the rest of the tree is shared between versions. This is commonly called the node-copying technique.
Types of Persistence
There are several persistence models. Partial persistence allows queries on all versions but updates only on the newest. Full persistence allows updates on any version, producing branches. Confluent persistence supports merging versions.
| Type | Read Old Versions | Update Old Versions | History Shape |
|---|---|---|---|
| Partial | Yes | No | Linear |
| Full | Yes | Yes | Tree |
| Confluent | Yes | Yes (merge) | DAG |
Designing a Persistent BST in C++
In C++ you can implement persistence by making nodes immutable and using smart pointers to share structure. std::shared_ptr is a natural fit: multiple versions can hold shared ownership of unchanged subtrees while new nodes are allocated for changed paths. This approach keeps memory management safe and concise.
Key API Ideas
- Node is immutable after construction.
- Tree is a lightweight handle holding a
std::shared_ptr<Node>root. - Insert/Erase return a new Tree (new root pointer) leaving the old Tree intact.
Minimal Example
The following example demonstrates the node-copying pattern with std::shared_ptr. It is intentionally compact to show the idea; production code should add balancing, iterators, and exception safety.
// persistent_bst.hpp
#include <memory>
#include <optional>
template<typename T>
struct Node {
T key;
std::shared_ptr<Node> left;
std::shared_ptr<Node> right;
Node(const T& k, std::shared_ptr<Node> l=nullptr, std::shared_ptr<Node> r=nullptr)
: key(k), left(l), right(r) {}
};
template<typename T>
class PersistentBST {
using NodePtr = std::shared_ptr<Node<T>>;
NodePtr root;
public:
PersistentBST() = default;
explicit PersistentBST(NodePtr r): root(std::move(r)) {}
PersistentBST insert(const T& key) const {
return PersistentBST(insert_node(root, key));
}
bool contains(const T& key) const {
auto cur = root;
while(cur) {
if(key == cur->key) return true;
cur = key < cur->key ? cur->left : cur->right;
}
return false;
}
private:
static NodePtr insert_node(NodePtr node, const T& key) {
if(!node) return std::make_shared<Node<T>>(key);
if(key == node->key) return node; // no-op, share node
if(key < node->key) {
// copy current node with a new left child
return std::make_shared<Node<T>>(node->key, insert_node(node->left, key), node->right);
} else {
return std::make_shared<Node<T>>(node->key, node->left, insert_node(node->right, key));
}
}
};
How it works: each insert walks the path and allocates new nodes only for that path; unchanged subtrees are shared via std::shared_ptr. Old versions keep owning their roots and remain valid.
Version Control and Practical API
Model each version as an immutable handle (for example, a small struct with a root pointer and a version id). Keep a version table or vector of roots to allow O(1) access to any version's root. Creating a new version is just storing the returned root pointer. This makes branching and time-travel queries trivial.
Example pattern:
// versioned usage
std::vector<PersistentBST<int>> history;
PersistentBST<int>> base;
history.push_back(base);
auto v1 = history.back().insert(42);
history.push_back(v1);
auto v2 = history.back().insert(7);
history.push_back(v2);
Use Cases in Concurrent Programming
Read-heavy concurrency benefits most: immutable versions allow lock-free reads because readers never observe in-place mutations. Writers create new versions and then publish the new root pointer with an atomic store or via a version table update. Readers can continue using older roots safely.
Common patterns:
- Snapshotting for consistent views without global locks.
- Lock-free readers with a single atomic pointer to the current version.
- Multi-version concurrency control where transactions read a version and commit by producing a new version.
Trade-offs: persistence reduces in-place mutation bugs and simplifies reasoning, but it increases memory usage (shared_ptr overhead and extra allocations) and can add allocation pressure. For high-performance workloads, consider custom memory pools or intrusive reference counting to reduce overhead.
Quick Reference
When to use persistence
- Audit logs and immutable histories
- Concurrent read-heavy systems
- Undo/redo and time-travel debugging
Pros
- Safe concurrent reads
- Cheap snapshots
- Clear reasoning about state
Cons
- Higher memory and allocation overhead
- Potential performance cost without balancing
Implementation tips
- Use
std::shared_ptrfor simplicity and safety. - Consider pooling or arena allocators for hot paths.
- Add balancing (AVL, red-black, treap) to keep path lengths small.
Next article: Writing Readable C++ Code