Lessons from >100 Startups

Lessons from >100 Startups

Cassandra Summit 2015


Lessons from >100 Startups

Contents

  • Program Overview
  • Getting started problems and how to solve them
  • Tuning Cassandra and DSE - what levers to pull when
  • Equip yourself for success

DataStax Startup Program Overview

Overview

  • Unlimited, FREE DataStax Enterprise.
  • No node limit. No hidden restrictions.
  • If you’re a startup with less than $3M in annual revenue and less than $30M in capital raised, this is for you!

DataStax Startup Program Overview

Perks

  • Discounts on support and services
  • Discounts with partners
  • Marketing opportunities
  • Free tech help!

Join over 500 startups worldwide. Apply now at datastax.com/startups

DataStax Startup Program Overview

Stats about the program

  • Over 600 startups signed up
  • From more than 20 market verticals
  • more than 50 countries

Lessons from >100 Startups

Starting Cassandra/DSE

Cannot access cqlsh

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

Is the process running?

ps -ef|grep dse
ps -ef|grep cassandra

If you restart the process sudo service dse restart or dse cassandra and it still doesn’t appear in your process list it did not come up successfully so check your system.log. Common causes include:

  • Another process is using one of your ports
  • Bad yaml (snake error)
  • Bad cassandra-env.sh (incorrect jamm version?)
  • Bad upgrade, wrong libs (try apt-get purge)
  • Permissions issues could be for data or commitlog dirs (usualy on tarball or .run deployments)
  • Error reading the commitlog
  • Super old sstables - sstables are only supported when they’re one version old
  • Loading search core timeout - increase timeout, increase cores per core, see section on search tuning
  • Replace node? (system tables aren’t in data directory but the node has already bootstraped)
  • Can’t start without JNA? CASSANDRA-6575

There is one scenario where the process won’t start and you won’t see anything in your logs!

Is your disk full?

$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda1     848321208 848321208         0 100% /
udev             3806788         8   3806780   1% /dev
tmpfs            1525892       212   1525680   1% /run
none                5120         0      5120   0% /run/lock
none             3814728       112   3814616   1% /run/shm
/dev/xvdb      433455904    203012 411234588   1% /mnt/ephemeral
cgroup           3814728         0   3814728   0% /sys/fs/cgroup

Why is my commitlog so big?

DSE is up but I still can’t connect!

1) Why all these addresses?

  • Listen Address - Node to node communication
  • RPC Address - Client to node communication
  • Broadcast Address - Address advertised by c*
  • Broadcast RPC Address (new)- RPC Address advertised by c*

2) Firewall (OS and Cloud security groups)?

Telnet is your friend.

Pro tip- Check out the Listen and RPC interfaces

3) Is your auth keyspace replicated?

4) SSL setup? Cassandra and DSE security can be tricky to set up. Use this shortcut currently Work in Progress.

Avoid cryptic errors by

  • ulimit
  • device block size
  • zone_reclaim in NUMA systems
  • use NTP
  • Turn off SWAP! dstat -s 10 will show if it’s not off
  • etc.

Use the preflight check in DSE to check for these things

Lessons from >100 Startups

DSE has a lot of configurables, which ones should I use and when?

  • Compaction levers
  • Tombstone levers
  • Bootstrap levers
  • DSE Search levers
  • DSE Analytics levers

Compaction levers

Goal: Ensure that you don’t fall behind on compactions (increasing pending compactions in nodetool compactionstats) but also minimize impact of compactions on reads and writes

Concurrent compactors:

export target_compactors=2
wget https://jmxsh.googlecode.com/files/jmxsh-R5.jar
wget https://jmxsh.googlecode.com/files/jmxsh
echo jmx_set -m org.apache.cassandra.db:type=CompactionManager CoreCompactorThreads $target_compactors > changeCoreCompactors.sh
echo jmx_set -m org.apache.cassandra.db:type=CompactionManager MaximumCompactorThreads $target_compactors > changeMaxCompactors.sh
java -jar jmxsh-R5.jar -h localhost -p 7199 -q changeCoreCompactors.sh
java -jar jmxsh-R5.jar -h localhost -p 7199 -q changeMaxCompactors.sh

Compaction throttling: nodetool getcompactionthroughput - Default 16mb/s

nodetool setcompactionthroughput or compaction_throughput_mb_per_sec in cassandra.yaml

Compaction levers

Use the right compaction strategy for your use case:

  • Size tiered compacton
  • Leveled compaction
  • Date tiered compaction

Tombstone levers

Goal: Increase read performance and reclaim disk while avoiding zombie data. Reminder- tombstones come from deletes or from updating collections.

1) gc_grace_seconds - can be decreased in order to make tombstones available for deletion sooner. Remember to run repairs more often than gc_grace_seconds (configurable by table) or risk zombie data.

  • tombstone_compaction_interval - minumum number of days after an sstable is created for it to become available for tombstone compaction
  • tombstone_threshold - ratio of tombstones to columns in an sstable for it to be considered for tombstone compaction
  • unchecked_tombstone_compaction - ignore the two values above and compact tombstones based purely on gc_grace

3) confirm using nodetool cfstats and sstablemetadata <sstable filenames>

Emergency tactic: Reading a row with millions of tombstones can hurt cluster performance. Drop the row if you can find it.

1) Find the most frequently opened file:

wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/opensnoop
chmod +x opensnoop
echo 'ctrl-c after a while' && sudo ./opensnoop|grep db > files.txt
cat files.txt|awk '{ print $4 }'|sort| uniq -c|sort

2) Confirm that it’s the culprit:

sstablemetadata <filename>

3) Find the bad row (This is Russ Bradberry’s code):

keys = Counter()
for row in rows:
   for cell in row.get('cells', []):
          if len(cell) > 3:
                     if cell[3] == 't':
                                    keys[row['key']] += 1

4) Drop the row and watch your cluster get better. Tweet about it

Levers for bootstraps

  • streaming_socket_timeout_in_ms CASSANDRA-8611
  • streaming throttle - nodetool getstreamthroughput nodetool setstreamthroughput
  • deal with the compactions that result from streaming

Contingency strategy:

Reboot with autobootstrap false, repair your way to the end.

Levers for bootstraps

"Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true"

Consistent range movements edge case CASSANDRA-2434 and CASSANDRA-7069

I need to grow my cluster fast, what do I do?

  • You can safely bootstrap two nodes at a time if they’re on the same rack. This will not trigger the error above because there aren’t nodes moving for the same token ranges.
  • If you are wiling to accept the possibility of stale reads, turn off the check in your env.sh:
JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

Levers for DSE Search

1) Search monitoring

2) Indexing performance

3) Query time performance

This is also a sizing conversation, indexing performance scales with CPU cores and query time performance scales with RAM.

Levers for DSE Search

What to monitor?

Levers for DSE Search

What to tune for indexing perf?

  • The main lever is soft autocommit, that’s the minimum amount of time that will go by before queries are available for search. We can probably set it to 250 ms given the right hardware to hit the 500ms SLA.
  • The next most important lever is cores_per_core. This could have been named better it means number of CPU cores that can be used per solr core. You can usually set this to number of cores to maximize indexing throughput. Indexing throughput scales with cores.
  • backpressure threshold will become more important when we stat using significant loads.
  • Don’t forget to set up the ramBuffer to 2gb per the docs when you turn on RT.

Levers for DSE Search

What to tune for query perf?

  • Use fq not q, leverage the filter cache
  • Minimize your index to fit it into page cache

1) Turn off Term Vector information if you’re not using highlighting or other functionality that relies on it:

• termVectors="false"

• termPositions="false"

• termOffsets="false"

2) Turn on omit norms if you’re not using Boosts:

• omitNorms="true"

3) Only index fields you intend to search. As you mentioned above, you don’t have to index all your fields

From what I’ve seen term vectors and omit norms can be a substantial percentage of your index ~50%

Levers for DSE Search

What about the JVM?

In DSE Search, solr and cassandra run in the same JVM. If you’re running RT, you should expect significant heap pressure from search indexing.

Use 20gb heaps with G1GC configured, G1 is almost as good as a perfectly tuned CMS and you don’t have to know the black magic of CMS.

G1GC levers-

  • Don’t set a new gen size (let G1 set it dynamically)
  • max pause time in milliseconds - Set this to 1000ms or so, setting it too low kills your throughput. The trade-off is throughput vs outliers.

Levers for DSE Search

Common reason for OOM - Don’t abuse dynamic fields

Want to find out if you are abusing dynamic fields? check luke

Consider a solr side join for this scenario

Levers for DSE Analytics

Read Russ’s blog posts.

Read Russ’s blog posts.

Read Russ’s blog posts.

Levers for DSE Analytics

Levers for stability and for running tasks consistently:

  • spark.executor.memory
  • spark.cores.max

Levers for DSE Analytics

For reads:

  • input.split.size_in_mb
  • spark.cassandra.input.page.row.size

Node locality:

  • spark.locality.wait

Introspect partitions and preferred locations:

val rdd = sc.cassandraTable("test_ks","test_table")
rdd.partitions.foreach(part => println (rdd.getPreferredLocations(part)))

Levers for DSE Analyitics

For writes:

  • spark.cassandra.output.batch.size.rows spark.cassandra.output.concurrent.writes
  • spark.cassandra.output.throughput_mb_per_sec

Levers for DSE Analyitics

Where Operations should be in your Chain of RDD Operations

Placement   <- Earliest                                                                                     Latest ->
###Type of Operation
Cassandra RDD Specific
Filters on the Spark Side
Independent Transforms
Per Partition Combinable Transforms
Full Shuffle Operations

###Examples
where select
filter sample
map mapByPartition keyByreduceByKey
aggregateByKeygroupByKey
join sort shuffle

Levers for DSE Analyitics

Other performance wins:

  • Avoid shuffles by using the SpanBy methods in the connector
  • Use repartition and joinWithCassandraTable for predicate pushdown of many PK combinations
  • Post collect operations happen at the driver

Levers for DSE Analyitics

Spark 1.4 (which ships with DSE 4.8) gives us:

  • Better monitoring capabilities in the Spark UI SPARK-6418 Kay’s original app
  • dataframes - make spark usable from Python
  • job server - "I don’t know how anybody uses Spark without Job Server" — Ilya

Equip yourself for success

Historical montoring

Configure opscenter

www.datastax.com/dev/blog/opscenter-5-2-dashboard-importexport-labs-feature[Import / Export feature blog post]

Real-time monitoring

  • dstat -rvn 10 - 10 second intervals of OS statistics
  • ttop and gc ststs from sjk-plus
wget https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar
java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU

java -jar sjk-plus-0.3.6.jar gc -s localhost:7199
  • nodetool cfhistograms nodetool proxyhistograms nodetool cfstats and sstablemetatdata sstable2json utililities
  • brendan gregg’s perf-tools
  • Al tobey’s pcstat

Benchmark your datamodel

  • www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema[Jake’s blog on 2.1 stress]
  • My data modeler tool

Get your main table to scale and perform

Baseline and predictability for when to add nodes

Take your app out of the equation

Leverage your resources

  • Stack overflow
  • Sign up for the startup program
  • Go through our free trainings
  • Use the driver mailinglists
  • Participate in meetups and talk to each other

Lessons from >100 Startups

THANK YOU

  • Startup Customers
  • Startup Team
  • DataStax field, engineering, training, and docs teams

/