Intro / why?

Search query performance depends on our ability to utilize the OS page cache effectively to keep search indexes hot. The smaller the size of your indexes, the easier it will be for the OS to maintain them in memory.

This article shows 6 tactics that can be used to minimize the size of your DSE Search index.

Tactics

Here are the tactics you can employ to minimize your DSE Search index size:

  1. Turn off Term Vector information if you're not using highlighting or other functionality that relies on it:

termVectors="false"

termPositions="false"

termOffsets="false"

  1. Turn on omit norms if you're not using Boosts:

omitNorms="true"

Note: From what I've seen term vectors and omit norms can be a substantial percentage of your index ~50%

  1. Only index fields you intend to search. Most use cases don't require users to index all their fields for search.

  2. Make sure you're not indexing your _partition_key (this may happen by default in modern DSE versions):

<field name="_partitionKey" type="uuid" indexed="false"/>

  1. Use StrField rather than TextField (no tokenizers)

  2. TrieField precisionStep - A higher precision step will increase query latency but it will decrease the index size.

Learn more about your indexes

You can also introspect your indexes using Luke. Luke is bundled in DSE so you can access it from a browser by hitting:

http://:8983/solr/./admin/luke?&numTerms=0