Lessons from >100 Startups

Cassandra Summit 2015

Lessons from >100 Startups

Contents

Program Overview
Getting started problems and how to solve them
Tuning Cassandra and DSE - what levers to pull when
Equip yourself for success

DataStax Startup Program Overview

Overview

Unlimited, FREE DataStax Enterprise.
No node limit. No hidden restrictions.
If you’re a startup with less than $3M in annual revenue and less than $30M in capital raised, this is for you!

DataStax Startup Program Overview

Perks

Discounts on support and services
Discounts with partners
Marketing opportunities
Free tech help!

Join over 500 startups worldwide. Apply now at datastax.com/startups

DataStax Startup Program Overview

Stats about the program

Over 600 startups signed up
From more than 20 market verticals
more than 50 countries

Lessons from >100 Startups

Starting Cassandra/DSE

Cannot access cqlsh

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

Is the process running?

ps -ef|grep dse
ps -ef|grep cassandra

If you restart the process sudo service dse restart or dse cassandra and it still doesn’t appear in your process list it did not come up successfully so check your system.log. Common causes include:

Another process is using one of your ports
Bad yaml (snake error)
Bad cassandra-env.sh (incorrect jamm version?)
Bad upgrade, wrong libs (try apt-get purge)
Permissions issues could be for data or commitlog dirs (usualy on tarball or .run deployments)
Error reading the commitlog
Super old sstables - sstables are only supported when they’re one version old
Loading search core timeout - increase timeout, increase cores per core, see section on search tuning
Replace node? (system tables aren’t in data directory but the node has already bootstraped)
Can’t start without JNA? CASSANDRA-6575

There is one scenario where the process won’t start and you won’t see anything in your logs!

Is your disk full?

$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda1     848321208 848321208         0 100% /
udev             3806788         8   3806780   1% /dev
tmpfs            1525892       212   1525680   1% /run
none                5120         0      5120   0% /run/lock
none             3814728       112   3814616   1% /run/shm
/dev/xvdb      433455904    203012 411234588   1% /mnt/ephemeral
cgroup           3814728         0   3814728   0% /sys/fs/cgroup

Why is my commitlog so big?

CASSANDRA-7031

DSE is up but I still can’t connect!

1) Why all these addresses?

Listen Address - Node to node communication
RPC Address - Client to node communication
Broadcast Address - Address advertised by c*
Broadcast RPC Address (new)- RPC Address advertised by c*

2) Firewall (OS and Cloud security groups)?

Telnet is your friend.

Pro tip- Check out the Listen and RPC interfaces

3) Is your auth keyspace replicated?

4) SSL setup? Cassandra and DSE security can be tricky to set up. Use this shortcut currently Work in Progress.

Avoid cryptic errors by

Set up your os/kernel system settings

ulimit
device block size
zone_reclaim in NUMA systems
use NTP
Turn off SWAP! dstat -s 10 will show if it’s not off
etc.

Use the preflight check in DSE to check for these things

Lessons from >100 Startups

DSE has a lot of configurables, which ones should I use and when?

Compaction levers
Tombstone levers
Bootstrap levers
DSE Search levers
DSE Analytics levers

Compaction levers

Goal: Ensure that you don’t fall behind on compactions (increasing pending compactions in nodetool compactionstats) but also minimize impact of compactions on reads and writes

Concurrent compactors:

export target_compactors=2
wget https://jmxsh.googlecode.com/files/jmxsh-R5.jar
wget https://jmxsh.googlecode.com/files/jmxsh
echo jmx_set -m org.apache.cassandra.db:type=CompactionManager CoreCompactorThreads $target_compactors > changeCoreCompactors.sh
echo jmx_set -m org.apache.cassandra.db:type=CompactionManager MaximumCompactorThreads $target_compactors > changeMaxCompactors.sh
java -jar jmxsh-R5.jar -h localhost -p 7199 -q changeCoreCompactors.sh
java -jar jmxsh-R5.jar -h localhost -p 7199 -q changeMaxCompactors.sh

Compaction throttling: nodetool getcompactionthroughput - Default 16mb/s

nodetool setcompactionthroughput or compaction_throughput_mb_per_sec in cassandra.yaml

Compaction levers

Use the right compaction strategy for your use case:

Size tiered compacton
Leveled compaction
Date tiered compaction

Tombstone levers

Goal: Increase read performance and reclaim disk while avoiding zombie data. Reminder- tombstones come from deletes or from updating collections.

1) gc_grace_seconds - can be decreased in order to make tombstones available for deletion sooner. Remember to run repairs more often than gc_grace_seconds (configurable by table) or risk zombie data.

2) Tombstone compaction sub properties:

tombstone_compaction_interval - minumum number of days after an sstable is created for it to become available for tombstone compaction
tombstone_threshold - ratio of tombstones to columns in an sstable for it to be considered for tombstone compaction
unchecked_tombstone_compaction - ignore the two values above and compact tombstones based purely on gc_grace

3) confirm using nodetool cfstats and sstablemetadata <sstable filenames>

more details.

Emergency tactic: Reading a row with millions of tombstones can hurt cluster performance. Drop the row if you can find it.

1) Find the most frequently opened file:

wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/opensnoop
chmod +x opensnoop
echo 'ctrl-c after a while' && sudo ./opensnoop|grep db > files.txt
cat files.txt|awk '{ print $4 }'|sort| uniq -c|sort

2) Confirm that it’s the culprit:

sstablemetadata <filename>

3) Find the bad row (This is Russ Bradberry’s code):

keys = Counter()
for row in rows:
   for cell in row.get('cells', []):
          if len(cell) > 3:
                     if cell[3] == 't':
                                    keys[row['key']] += 1

4) Drop the row and watch your cluster get better. Tweet about it

Levers for bootstraps

streaming_socket_timeout_in_ms CASSANDRA-8611
streaming throttle - nodetool getstreamthroughput nodetool setstreamthroughput
deal with the compactions that result from streaming

Contingency strategy:

Reboot with autobootstrap false, repair your way to the end.

Levers for bootstraps

"Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true"

Consistent range movements edge case CASSANDRA-2434 and CASSANDRA-7069

I need to grow my cluster fast, what do I do?

You can safely bootstrap two nodes at a time if they’re on the same rack. This will not trigger the error above because there aren’t nodes moving for the same token ranges.
If you are wiling to accept the possibility of stale reads, turn off the check in your env.sh:

JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

Levers for DSE Search

1) Search monitoring

2) Indexing performance

3) Query time performance

This is also a sizing conversation, indexing performance scales with CPU cores and query time performance scales with RAM.

Levers for DSE Search

What to monitor?

indexing pool stats - what is back pressure? My queues are uneaven (upgrade and check your data model for wide rows)
commit and update stats - documentation note: these are in micro seconds

Levers for DSE Search

What to tune for indexing perf?

The main lever is soft autocommit, that’s the minimum amount of time that will go by before queries are available for search. We can probably set it to 250 ms given the right hardware to hit the 500ms SLA.
The next most important lever is cores_per_core. This could have been named better it means number of CPU cores that can be used per solr core. You can usually set this to number of cores to maximize indexing throughput. Indexing throughput scales with cores.
backpressure threshold will become more important when we stat using significant loads.
Don’t forget to set up the ramBuffer to 2gb per the docs when you turn on RT.

Levers for DSE Search

What to tune for query perf?

Use fq not q, leverage the filter cache
Minimize your index to fit it into page cache

1) Turn off Term Vector information if you’re not using highlighting or other functionality that relies on it:

• termVectors="false"

• termPositions="false"

• termOffsets="false"

2) Turn on omit norms if you’re not using Boosts:

• omitNorms="true"

3) Only index fields you intend to search. As you mentioned above, you don’t have to index all your fields

From what I’ve seen term vectors and omit norms can be a substantial percentage of your index ~50%

Levers for DSE Search

What about the JVM?

In DSE Search, solr and cassandra run in the same JVM. If you’re running RT, you should expect significant heap pressure from search indexing.

Use 20gb heaps with G1GC configured, G1 is almost as good as a perfectly tuned CMS and you don’t have to know the black magic of CMS.

G1GC levers-

Don’t set a new gen size (let G1 set it dynamically)
max pause time in milliseconds - Set this to 1000ms or so, setting it too low kills your throughput. The trade-off is throughput vs outliers.

Levers for DSE Search

Common reason for OOM - Don’t abuse dynamic fields

Want to find out if you are abusing dynamic fields? check luke

Consider a solr side join for this scenario

Levers for DSE Analytics

Read Russ’s blog posts.

Levers for DSE Analytics

Levers for stability and for running tasks consistently:

spark.executor.memory
spark.cores.max

Levers for DSE Analytics

For reads:

input.split.size_in_mb
spark.cassandra.input.page.row.size

Node locality:

spark.locality.wait

Introspect partitions and preferred locations:

val rdd = sc.cassandraTable("test_ks","test_table")
rdd.partitions.foreach(part => println (rdd.getPreferredLocations(part)))

Levers for DSE Analyitics

For writes:

spark.cassandra.output.batch.size.rows spark.cassandra.output.concurrent.writes
spark.cassandra.output.throughput_mb_per_sec

Levers for DSE Analyitics

Where Operations should be in your Chain of RDD Operations

Placement   <- Earliest                                                                                     Latest ->
###Type of Operation
Cassandra RDD Specific
Filters on the Spark Side
Independent Transforms
Per Partition Combinable Transforms
Full Shuffle Operations

###Examples
where select
filter sample
map mapByPartition keyByreduceByKey
aggregateByKeygroupByKey
join sort shuffle

Levers for DSE Analyitics

Other performance wins:

Avoid shuffles by using the SpanBy methods in the connector
Use repartition and joinWithCassandraTable for predicate pushdown of many PK combinations
Post collect operations happen at the driver

Levers for DSE Analyitics

Spark 1.4 (which ships with DSE 4.8) gives us:

Better monitoring capabilities in the Spark UI SPARK-6418 Kay’s original app
dataframes - make spark usable from Python
job server - "I don’t know how anybody uses Spark without Job Server" — Ilya

Levers for DSE Analytcs

Read these:

http://www.datastax.com/dev/blog/spark-performance-guide

http://www.datastax.com/dev/blog/common-spark-troubleshooting

https://www.mapr.com/blog/getting-started-spark-web-ui#.VaAgG5NViko

https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/RDDFunctions.scala#L124

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md

Equip yourself for success

Historical montoring

Configure opscenter

www.datastax.com/dev/blog/opscenter-5-2-dashboard-importexport-labs-feature[Import / Export feature blog post]

Real-time monitoring

dstat -rvn 10 - 10 second intervals of OS statistics
ttop and gc ststs from sjk-plus

wget https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar
java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU

java -jar sjk-plus-0.3.6.jar gc -s localhost:7199

nodetool cfhistograms nodetool proxyhistograms nodetool cfstats and sstablemetatdata sstable2json utililities
brendan gregg’s perf-tools
Al tobey’s pcstat

Benchmark your datamodel

www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema[Jake’s blog on 2.1 stress]
My data modeler tool

Get your main table to scale and perform

Baseline and predictability for when to add nodes

Take your app out of the equation

Leverage your resources

Stack overflow
Sign up for the startup program
Go through our free trainings
Use the driver mailinglists
Participate in meetups and talk to each other

Lessons from >100 Startups

THANK YOU

Startup Customers
Startup Team
DataStax field, engineering, training, and docs teams

datastax.com/startups