Tuning DSE Search - Indexing latency and query latency

Introduction

DSE offers out of the box search indexing for your Cassandra data. The days of double writes or ETL's between separate DBMS and Search clusters are gone.

I have my cql table, I execute the following API call, and (boom) my cassandra data is available for:

full text/fuzzy search
ad hoc lucene secondary index powered filtering, and
geospatial search

Here is my API call:

$ bin/dsetool create_core <keyspace>.<table> generateResources=true reindex=true

or if you prefer curl (or are using basic auth) use the following:

$ curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=<keyspace>.<table>&generateResources=true"

Rejoice! we are in inverted index, single cluster, operational simplicity bliss!

The remainder of this post will be focused on advanced tuning for DSE Search both for a) search indexing latency (the time it takes for data to be searchable after it has been inserted through cql), and b) search query latency (timings for your search requests).

Indexing latency

In this section I'll talk about the kinds of things we can do in order to

instrument and monitor DSE Search indexing and
tune indexing for lower latencies and increased performance

Note: DSE Search ships with Real Time (RT) indexing which will give you faster indexing with 4.7.3, especially when it comes to the tails of your latency distribution. Here's one of our performance tests. It shows you real time vs near-real time indexing as of 4.7.0:

indexing chart

Perhaps more importantly, as you get machines with more cores, you can continue to increase your indexing performance linearly:
rt vs nrt

Be aware, however, that you should only run one RT search core per cluster since it is significantly more resource hungry than near-real time (NRT).

Side note on GC: Because solr and cassandra run on the same JVM in DSE Search and the indexing process generates a lot of java objects, running Search requires a larger JVM Heap. When running traditional CMS, we recommend a 14gb heap with about 2gb new gen. Consider the Stump's CASSANDRA-8150 settings when running search with CMS. G1GC has been found to perform quite well with search workloads, I personally run with a 25gb heap (do not set new gen with G1, the whole point of G1 is that it sets it itself based on your workload!) and gc_pause_ms at about 1000 (go higher for higher throughput or lower to minimize latencies / p99's; don't go below 500). Update (thanks mc) you configure this setting in cassandra-env.sh.

1) Instrumentation

Index Pool Stats:

DSE Search parallelizes the indexing process and allocates work to a thread pool for indexing of your data.

Using JMX, you can see statistics on your indexing threadpool depth, completion, timings, and whether backpressure is active.

This is important because if your indexing queues get too deep, we risk having too much heap pressure => OOM's. Backpressure will throttle commits and eventually load shed if search can't keep up with an indexing workload. Backpressure gets triggered when the queues get too large.

The mbean is called:

com.datastax.bdp.search.<keyspace>.<table>.IndexPool

Indexing queues

Commit/Update Stats:

You can also see statistics on indexing performance (in microseconds) based on the particular stage of the indexing process for both commits and updates.

Commit:

The stages are:

FLUSH - Comprising the time spent by flushing the async indexing queue.

EXECUTE - Comprising the time spent by actually executing the commit on the index.

The mbean is called:

com.datastax.bdp.search.<keyspace>.<table>.CommitMetrics

Update:

The stages are:

WRITE - Comprising the time spent to convert the Solr document and write it into Cassandra (only available when indexing via the Solrj HTTP APIs). If you're using cql this will be 0.
QUEUE - Comprising the time spent by the index update task into the index pool.
PREPARE- Comprising the time spent preparing the actual index update.
EXECUTE - Comprising the time spent to actually executing the index update on Lucene.

The mbean is:

`com.datastax.bdp.search.<keyspace>.<table>.UpdateMetrics`

indexing stats

Here, the average latency for the QUEUE stage of the update is 767 micros. See our docs for more details on the metrics mbeans and their stages.

2) Tuning

Almost everything in c* and DSE is configurable. Here's the key levers to get you better search indexing performance. Based on what you see in your instrumentation you can tune accordingly.

The main lever is soft autocommit, that's the minimum amount of time that will go by before queries are available for search. With RT we can set it to 250 ms or even as low as 100ms--given the right hardware. Tune this based on your SLA's.

The next most important lever is concurrency per core (or max_solr_concurrency_per_core). You can usually set this to number of CPU cores available to maximize indexing throughput.

Backpressure threshold will become more important as your load increases. Larger boxes can handle higher bp thresholds.

Don't forget to set up the ramBuffer to 2gb per the docs when you turn on RT indexing.

Query Latency

Now, I'll go over how we can monitor query performance in DSE Search, identify issues, and some of the tips / tricks we can use to improve search query performance. I will cover how to:

instrument and monitor DSE Search indexing and
tune indexing for lower latencies and increased performance.

Simliar to how search indexing performance scales with CPU's, search query performance scales with RAM. Keeping your search indexes in OS page cache is the biggest thing you can do to minimize latencies; so scale deliberately!

1) Instrumentation

There are multiple tools available for monitoring search performance.

OpsCenter:

OpsCenter supports a few search metrics that can be configured per node, datacenter, and solr core:

search latencies
search requests
index size
search timeouts
search errors

opscenter

Metrics mbeans:

In the same way that indexing has performance metrics, DSE Search query performance metrics are available through JMX and can be useful for troubleshooting perofrmance issues. We can use the query.name parameter in your DSE Serch queries to capture metrics for specifically tagged queries.

Query:

The stages are:

COORDINATE - Comprises the total amount of time spent by the coordinator node to distribute the query and gather/process results from shards. This value is computed only on query coordinator nodes.

EXECUTE - Comprises the time spent by a single shard to execute the actual index query. This value is computed on the local node executing the shard query.

RETRIEVE - Comprises the time spent by a single shard to retrieve the actual data from Cassandra. This value will be computed on the local node hosting the requested data.

The mbean is:

com.datastax.bdp.search.<keyspace>.<table>.QueryMetrics

Query Tracing:

When using solr_query via cql, query tracing can provide useful information as to where a particular query spent time in the cluster.

Query tracing is available in cqlsh tracing on, in devcenter (in the tab at the bottom of the screen), and via probabilistic tracing which is configurable via nodetool.

DSE Search slow query log:

When users complain about a slow query and you need to find out what it is, the DSE Search slow query log is a good starting point.

dsetool perf solrslowlog enable

Stores to a table in cassandra in the dse_perf.solr_slow_sub_query_log table

2) Tuning

Now let's focus on some tips for how you can improve search query performance.

Index size

Index size is so important that, I wrote a separate post just on that subject:

Q vs. FQ

In order to take advantage of the solr filter cache, build your queries using fq not q. The filter cache is the only solr cache that persists across commits so don't spend time or valuable RAM trying to leverage the other caches.

Solr query routing

Partition routing is a great multi-tennancy feature in DSE Search that lets you limit the amount of fan out that a search query will take under the hood. Essentially, you're able to specify a Cassandra partition that you are interested in limiting your search to. This will limit the number of nodes that DSE Search requires to fullfil your request.

Use docvalues for Faceting and Sorting.

To get improved performance and to avoid OOMs from the field cache, always remember to turn on docvalues on fields that you will be sorting and faceting over. This may become mandatory in DSE at some point so plan ahead.

Other DSE Differentiators

If you're comparing DSE Search against other search offerings / technologies, the following two differentiators are unique to DSE Search.

Fault tolerant distributed queries

If a node dies during a query, we retry the query on another node.

Node health

Node health and shard router behavior.
DSE Search monitors node health and makes distributed query routing decisions based on the following:

Uptime: a node that just started may well be lacking the most up-to-date data (to be repaired via HH or AE).
Number of dropped mutations.
Number of hints the node is a target for.
"failed reindex" status.

All you need to take advantage of this is be on a modern DSE version.

Tuning DSE Search - Indexing latency and query latency

Introduction

Indexing latency

1) Instrumentation

2) Tuning

Query Latency

1) Instrumentation

OpsCenter:

Metrics mbeans:

Query Tracing:

DSE Search slow query log:

2) Tuning

Index size

Q vs. FQ

Solr query routing

Use docvalues for Faceting and Sorting.

Other DSE Differentiators

Fault tolerant distributed queries

Node health

On Cassandra Collections, Updates, and Tombstones

Things you didn't think you could do with DSE Search and CQL