Concepts#

Candybox borrows the bucket/object model from S3 but gives the pieces its own names.

Vocabulary#

TermMeaning
Boxa bucket — a named, flat namespace of keys
Candyan object — the bytes plus metadata stored under a key
CandyKeyan object key
Syrupa data ledger that holds object bytes
CandyLocatorthe small pointer the index stores to find a Candy’s bytes inside a Syrup

(Candy in a box — that’s the whole theme.)

The data model#

A Box is a flat, sorted key space. Keys are stored in sorted order, which is what makes range scans, prefix listing, and directory-walk workloads cheap sequential reads rather than scatter-gather.

Each Box is split into a fixed number of hash partitions. At any moment exactly one node owns a partition and serves its reads and writes; ownership is spread evenly across the cluster by an elected balancer, so a single Box’s traffic is served by many nodes.

Durability and consistency in one paragraph#

Object data and index both live in BookKeeper ledgers — append-only, replicated, and self-fencing. Candybox never mutates data in place: updates and deletes are new appends (with tombstones). Consistency rests on single, fenced ownership per partition: the owner holds a ZooKeeper lease with a fencing token, and every state-changing operation carries that token, so a stale former owner cannot corrupt a partition after ownership moves during a failure. Each write is stamped with a hybrid logical clock for last-writer-wins ordering across nodes.

For the full picture — record formats, the read/merge path, and the handover protocol — see Concepts & architecture.