On Cassandra Collections, Updates, and Tombstones

update

I was chatting with a user today who referenced this old post. Most of it is still relevant but sstable2json is no longer supported in modern c*. The new tool is sstabledump. The two tools are pretty much equivalent so you can just replace sstabe2json with sstabledump everywhere you see it here the outputs may have slightly different formatting but it should not matter in substance.

Cassandra collections create tombstones?

Many new cassandra users learn this the hard way, they choose cassandra collections for the wrong reasons, for the wrong use cases, and then experience what is known as death by tombstones.

Update - To hear Luke Tillman, Patrick McFadin, and Eric Stevens talking about this post check out this video on Planet Cassandra! https://t.co/n9a6RFP5mP

TL;DR

When folks ask me if they should use collections here are my recommendations.

Why do cassandra developers choose collections?

Relational mindset:

It feels more natural--warm and fuzzy--to model one to many relationships if you don’t have to de-normalize tables (this is a very common reason, but not a great reason).

Convenient reads:

Need to get a nested java structure directly out of the query

SELECT entitlements from entitlements_by_user WHERE … ;
Access whole collection or parts of the collection based on query patterns:

SELECT * FROM entitlements_by_user WHERE entitlements CONTAINS ‘App ABC';

Convenient writes:

Ability to do incremental updates or deletes :

UPDATE entitlements_by_user ... entitlements= entitlements + ‘App ABC’

This convenience does not come free:

Serialization & deserialization takes time with maps due to the complex java objects
(non incremental) inserts/updates on Maps generate tombstones. Insert/Update heavy workloads are not collection friendly. Excessive tombstones significantly affect compaction performance.
Collections are not designed to hold more than 10’s of fields. Compactions and repairs will be slow if you abuse collections.

**Therefore -- Ensure you have a good use case for collections and that you understand their limitations.
**

Details:

Here are some code examples and results that summarize what kinds of collections generate tombstones and which don't.

Let's create a table with a map and a frozen map.

cqlsh> CREATE TABLE test.map_test (
    a text PRIMARY KEY,
    b map<text, text>,
    c frozen<map<text, text>>
)

and add some data to each:

cqlsh> insert into map_test (a, b, c) VALUES ('a', { '1':'a' }, { '2': 'b' }) ;

cqlsh> select * from test.map_test ;

 a | b          | c
---+------------+------------
 a | {'1': 'a'} | {'2': 'b'}

Let's see what happened under the hood using sstable2json after flushing:

$ sstable2json test-map_test-ka-1-Data.db
[
{"key": "a",
 "cells": [["","",1458266095727275],
           ["b:_","b:!",1458266095727274,"t",1458266095],
           ["b:31","61",1458266095727275],
           ["c","0000000100000001320000000162",1458266095727275]]}
]

Notice the t (tombstone) in b. There is no tombstone in c. This is because frozen collections are stored all together in a single cassandra cell. No tombstone necessary for inserts.

Now let's try an update

$ update test.map_test SET b = { '3': 'c'}, c = {'3':'c'} where a='a' ;

cqlsh> select * from test.map_test ;

 a | b          | c
---+------------+------------
 a | {'3': 'c'} | {'3': 'c'}

(1 rows)

After flushing we get a new sstable, also with a tombstone in b:

$ sstable2json test-map_test-ka-2-Data.db 
[
{"key": "a",
 "cells": [["b:_","b:!",1458266473158221,"t",1458266473],
           ["b:33","63",1458266473158222],
           ["c","0000000100000001330000000163",1458266473158222]]}
]

Does a compaction get rid of the tombstone?

$ nodetool compact

$ sstable2json test-map_test-ka-3-Data.db

[
{"key": "a",
 "cells": [["","",1458266095727275],
           ["b:_","b:!",1458266473158221,"t",1458266473],
           ["b:33","63",1458266473158222],
           ["c","0000000100000001330000000163",1458266473158222]]}
]

No! remember tombstones must live longer than gc_grace AND meet the criteria in your tombstone compaction subproperties to get deleted. This helps avoid zombie data.

Now let's try incremental update:

cqlsh> update test.map_test SET b = b + { '4': 'd'}, c = c + {'4':'d'} where a='a' ;

InvalidRequest: code=2200 [Invalid query] message="Invalid operation (c = c + {'4':'d'}) for frozen collection column c"

cqlsh> update test.map_test SET b = b + { '4': 'd'} where a='a' ;

$ sstable2json test-map_test-ka-4-Data.db
[
{"key": "a",
 "cells": [["b:34","64",1458266948817380]]}
]

Only the non frozen collection supports this fancy kind of updates. Notice that it did not produce a tombstone. Tombstones only happen for inserts and non incrememntal updates on non frozen collections.