Hi,
Today, I want to share a workaround to correct ElasticSearch shards replication problem:
In my case, a shard was unassigned and my cluster state was yellow..
First, I had to determine which index was impacted :
root@server [ /root ] curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED
2158047191530222659_-823974979_201908_001-2 3 r UNASSIGNED
REPLICATION FAILED
|
...
try to recover
[2158047191530222659_-823974979_201908_001-2][3] from primary shard with sync
id but number of docs differ: 17986 (ElasticSearch Data2, primary) vs
20064(ElasticSearch Data1)];
...
|
The problem is due to a corrupted recover file on a slave node.
In this case, simply decrease number of replicates to "0" and then reconfigure the replicates to its initial config to force the replication (Elasticsearch will recreate metadata on the slave nodes). If you don't know the number of replicates, you can use this command:
root@server [ /root ] curl -X GET "localhost:9200/2158047191530222659_-823974979_201908_001-2/_settings?pretty" |
- Decrease the number of replicas
root@server [ /root ] curl -X PUT "localhost:9200/2158047191530222659_-823974979_201908_001-2/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index" : {
"number_of_replicas" : 0
}
}'
|
- Wait 5 seconds, and then reconfigure to the original value.
root@server [ /root ] curl -X PUT "localhost:9200/2158047191530222659_-823974979_201908_001-2/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index" : {
"number_of_replicas" : 1
}
}'
|
- Check if UNASSIGNED shards are still exist.
root@server [ /root ] curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
root@server
|
No more unassigned shards :)