Nidhi Gupta
3 min readApr 17, 2022

Primary shards & Replica shards in Elasticsearch

Shards offer the most competitive balance between allocation speed, nodes balancing, and overall cluster management.

In this article let’s understand the theoretical and practical concept of sharding in elastic search. Let’s understand first what sharding is, and how it proved to be an advantage or disadvantage in databases.

If we talk about the architecture of elastic search or how data is added or written to the index and how data gets added to shards. So, here data is first written in documents using JSON structure in indexes. These documents are written to nodes as indexes than to different shards and each shard has its replica to recover data in case of node failure. The primary shard and replica shard is always placed on different nodes.

Earlier versions of elastic search had by default 5 primary shards and 5 replica shards. As per the latest ELK version we have 1 primary shard and 1 replica shard. Whenever we start writing data to indexes it is necessary to know the number of shards and number of replicas per shard in elastic search. Horizontal scaling or scale-out is the main reason to shard a database. The number of shards depends heavily on the amount of data we have.

Note: We can change a number of shards without losing our data. But this process will require a brief downtime when index data is rewritten.

To understand the sharding and replica concept let’s consider it as a problem statement. Selecting the right number of shards is complicated because you never know how many documents we will get before we start.

Problem Statement1: How to specify the number of shards and number of replicas per shard in elastic search?

Solution: From 0 to 3 million documents per index: 1 shard, From 3 to 5 million documents per index : 2 shard, with more than 5 millions documents,(numberof documents/5 million)+ 1 ==> Formula to calculate required shards

Replica decides the data recovery in case primary shard failure.Mostly we have 1 replica per shards.

Query1: Cluster-level information

Query to check cluster-level information, allocated node, total disk space, available disk space, used disk space, number of primary shards, and space allocated by indexes.

GET /_cat/allocation?v

Query2: Shard-level information

Query to check the number of primary shards, replica shards, number of documents, and storage.

GET /_cat/shards?v

For specific-index then:

GET /_cat/shards/index-name

Query3: To reallocate shards and replicas to an index.

PUT _index_template/template_shard1
{
“index_patterns”: [“index-name”],
“template”:{
“settings”:{
“index.number_of_shards”:2,
“index.number_of_replicas”:1}}
}

Note: Implementation of shard allocation can lead to data loss for existing indexes. For existing try to re-index the index data and then allocate shards and replicas.

Refer to my previous article for re-indexing index data

Always first decide the number of replicas and shards before adding data to an index.

Example:

Step1: Allocate shard and replica on index data

PUT _index_template/template_shard_allc
{
“index_patterns”: [“order_data-*”],
“template”:{
“settings”:{
“index.number_of_shards”:5,
“index.number_of_replicas”:1
}
}
}

Step2: Now create an index and add data

POST /order_data-2022–04–17/_doc
{
“name”:”place an order”,
“place”:”Lucknow”
}

Step3: Check the number of shards and replicas for the index

GET /_cat/shards/order_data-*

Here we have index name order_data-2022–04–17 and index pattern as order_data-*. We have allocated 5 primary shards[P0, P1, P2, P3, P4] and each primary shard has 1 replica shard[R0, R1, R2, R3, R4].

Thanks for the read. Do clap 👏if find it useful.😊

“Keep learning and keep sharing knowledge”

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.