Primary shards & Replica shards in Elasticsearch
Shards offer the most competitive balance between allocation speed, nodes balancing, and overall cluster management.
In this article let’s understand the theoretical and practical concept of sharding in elastic search. Let’s understand first what sharding is, and how it proved to be an advantage or disadvantage in databases.
If we talk about the architecture of elastic search or how data is added or written to the index and how data gets added to shards. So, here data is first written in documents using JSON structure in indexes. These documents are written to nodes as indexes than to different shards and each shard has its replica to recover data in case of node failure. The primary shard and replica shard is always placed on different nodes.
Earlier versions of elastic search had by default 5 primary shards and 5 replica shards. As per the latest ELK version we have 1 primary shard and 1 replica shard. Whenever we start writing data to indexes it is necessary to know the number of shards and number of replicas per shard in elastic search. Horizontal scaling or scale-out is the main reason to shard a database. The number of shards depends heavily on the amount of data we have.
Note: We can change a number of shards without losing our data. But this process will require a brief downtime when index data is rewritten.
To understand the sharding and replica concept let’s consider it as a problem statement. Selecting the right number of shards is complicated because you never know how many documents we will get before we start.
Problem Statement1: How to specify the number of shards and number of replicas per shard in elastic search?
Solution: From 0 to 3 million documents per index: 1 shard, From 3 to 5 million documents per index : 2 shard, with more than 5 millions documents,(numberof documents/5 million)+ 1 ==> Formula to calculate required shards
Replica decides the data recovery in case primary shard failure.Mostly we have 1 replica per shards.
Query1: Cluster-level information
Query to check cluster-level information, allocated node, total disk space, available disk space, used disk space, number of primary shards, and space allocated by indexes.
Query2: Shard-level information
Query to check the number of primary shards, replica shards, number of documents, and storage.
For specific-index then:
Query3: To reallocate shards and replicas to an index.
Note: Implementation of shard allocation can lead to data loss for existing indexes. For existing try to re-index the index data and then allocate shards and replicas.
Refer to my previous article for re-indexing index data
Re-Indexing in elastic search
Recently working on my current project had faced a scenario where a particular field of an index is not able to add or…
Always first decide the number of replicas and shards before adding data to an index.
Step1: Allocate shard and replica on index data
Step2: Now create an index and add data
“name”:”place an order”,
Step3: Check the number of shards and replicas for the index
Here we have index name order_data-2022–04–17 and index pattern as order_data-*. We have allocated 5 primary shards[P0, P1, P2, P3, P4] and each primary shard has 1 replica shard[R0, R1, R2, R3, R4].
Thanks for the read. Do clap 👏if find it useful.😊
“Keep learning and keep sharing knowledge”