Table of contents

CRDTs: The Hard Parts
- References
- Notes
  - OTs
  - CRDTs

DONE
CRDTs: The Hard Parts
video

On YT
https://www.youtube.com/watch?v=x7drE24geUw

Slides
https://speakerdeck.com/ept/crdts-the-hard-parts

Related

Key takeout
CRDTs provide a stable way to reference an element in a list

With this is simple to define new operations that use this identifiers
See
How to define a new operation?

TODO
References

https://martin.kleppmann.com/2020/07/06/crdt-hard-parts-hydra.html

Notes

Base sample
Collaborative text editing
At the "beggining" all users have the same document
Concurrent actions make all users converge on the same state
Alternatives

OT
Operational Transformation

Google Docs, MS Office Online

CRDTs
Conflict-Free Replicated Data Types

Riak, TomTom GPS, Atom's Teletype

OTs

Requires
A central server (YT)
"How changes are made to a document"
Indexes of the positions where some operations happens have to be transformed when operations happen concurrently

https://youtu.be/x7drE24geUw?t=319

CRDTs

Looks for convergence
Two nodes that see the same set of operations (but maybe in a different order) will converge in the same final state.

Interleaving

What happens if several people concurrently introduce text in the same position?

IDs

Gives each character in the document a (unique, stable) identifier by which is referred
The way to define this IDs is what makes the CRDTs differ on how they merge (so, how they interleave)

Some represent the text as a tree of characters (Treedoc, Logoot, RGA, Causal Trees, ...)
Maybe nodeID + counter

Moving/Reordering list items

For example, for TO-DO lists
Text CRDTs don't support move, only adding and deleting
Delete-and-reinsert in new position might duplicate the element in the new position

How to define a new operation?

CRDTs provide a stable way to reference an element in a list

Take it's unique identifiers to use in the new operation

We define the operation with Last-Writer-Wins semantic (any stable one is OK)
A new (state) register is constructed for the elements going in this operation.

Each node will add a new element (pair) to the set

Open problem

Concurrent move + Editing of the moving element

https://youtu.be/x7drE24geUw?t=1970

Concurrent moving of subtrees

Moving one subtree to (concurrently) two different ones at the same time
Instead of duplication, a Last-Writer-Wins behavior is expected
Apply the operations in timestamp order, and skip the operations that will produce and incorrect state

This means that some of the nodes will have to undo the skipped operations
Even more, operations will have to be undone and redone to thread received operations in timestamp-order
See
https://youtu.be/x7drE24geUw?t=2704

Data structures

Each move operation has

time
Timestamp (Globally unique, eg. Lamport timestamp)
parent
NodeId (Destination of move)
meta
Metadata (eg. filename within parent, etc)
child
NodeId (Subtree being moved)

Each log entry (for undo/redo) has

time
Timestamp
oldParent
NodeId? (null if not previously on tree, happened in remote nodes)
oldMeta
Metadata?
newParent
NodeId
newMeta
Metadata
child
NodeId

Reducing metadata overhead

If we give every character a unique identifier and some metadata, each char will go from 1 byte to ~100 bytes for metadata&ID