CRDTs: The Hard Parts
videoOn YT
Related
Key takeout
CRDTs provide a stable way to reference an element in a list
With this is simple to define new operations that use this identifiers
See
Notes
Base sample
Collaborative text editing
At the "beggining" all users have the same document
Concurrent actions make all users converge on the same state
Alternatives
OT
Operational Transformation
Google Docs, MS Office Online
CRDTs
Conflict-Free Replicated Data Types
Riak, TomTom GPS, Atom's Teletype
OTs
Requires
A central server (YT)
"How changes are made to a document"
Indexes of the positions where some operations happens have to be transformed when operations happen concurrently
CRDTs
Looks for convergence
Two nodes that see the same set of operations (but maybe in a different order) will converge in the same final state.
Interleaving
What happens if several people concurrently introduce text in the same position?
IDs
Gives each character in the document a (unique, stable) identifier by which is referred
The way to define this IDs is what makes the CRDTs differ on how they merge (so, how they interleave)
Some represent the text as a tree of characters (Treedoc, Logoot, RGA, Causal Trees, ...)
Maybe nodeID + counter
Moving/Reordering list items
For example, for TO-DO lists
Text CRDTs don't support move, only adding and deleting
Delete-and-reinsert in new position might duplicate the element in the new position
How to define a new operation?
CRDTs provide a stable way to reference an element in a list
Take it's unique identifiers to use in the new operation
We define the operation with Last-Writer-Wins semantic (any stable one is OK)
A new (state) register is constructed for the elements going in this operation.
Each node will add a new element (pair) to the set
Open problem
Concurrent move + Editing of the moving element
Concurrent moving of subtrees
Moving one subtree to (concurrently) two different ones at the same time
Instead of duplication, a Last-Writer-Wins behavior is expected
Apply the operations in timestamp order, and skip the operations that will produce and incorrect state
This means that some of the nodes will have to undo the skipped operations
Even more, operations will have to be undone and redone to thread received operations in timestamp-order
See
Data structures
Each move operation has
time
Timestamp (Globally unique, eg. Lamport timestamp)
parent
NodeId (Destination of move)
meta
Metadata (eg. filename within parent, etc)
child
NodeId (Subtree being moved)
Each log entry (for undo/redo) has
time
Timestamp
oldParent
NodeId? (null if not previously on tree, happened in remote nodes)
oldMeta
Metadata?
newParent
NodeId
newMeta
Metadata
child
NodeId
Reducing metadata overhead
If we give every character a unique identifier and some metadata, each char will go from 1 byte to ~100 bytes for metadata&ID