Redis Cluster Slot Migration Error Fix

What this usually means

Slot migration errors typically arise from state inconsistencies between cluster nodes, network partitions during migration, or race conditions where multiple operations target the same slot. The cluster's gossip protocol eventually propagates slot assignments, but if nodes disagree about who owns which slot, clients get MOVED redirects or the cluster refuses to serve keys. Common underlying causes include incomplete migrations (crash or timeout), manual intervention that overwrites slot state, or exceeding the cluster node's memory limit during migration (triggering evictions that break consistency). The fix requires forcing slot state reconciliation and ensuring all keys are moved correctly.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run CLUSTER NODES on any node; look for lines with 'migrating' or 'importing' flags in slot ranges.
2Check cluster log for 'Slot x assigned to node y but it is already migrating' using grep -i 'slot.*migrat' /var/log/redis/redis-cluster.log.
3Use redis-cli --cluster check <node>:<port> to get a summary of slot assignments and migration states.
4Measure key count on source and target nodes: dbsize on each and compare. A mismatch indicates incomplete migration.
5Inspect network connectivity between migrating nodes: nc -zv <target_ip> 6379 and look for timeouts.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchCluster log file (e.g., /var/log/redis/redis-cluster.log) for slot migration errors and MOVED redirections.
searchredis-cli output from CLUSTER NODES, CLUSTER SLOTS, and CLUSTER INFO for slot ownership and migration flags.
searchNode's INFO command output: 'cluster_state:fail' indicates a problem; 'cluster_slots_migrating' and 'cluster_slots_importing' provide counts.
searchApplication logs for MOVED redirections or ASK responses (indicate slot migration in progress).
searchRedis server configuration files: check cluster-node-timeout and cluster-migration-barrier settings.

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningMigration interrupted by node crash or network partition, leaving slot in 'migrating' state on source and 'importing' on target without keys moved.
warningManual CLUSTER SETSLOT NODE or CLUSTER FAILOVER executed while migration was in progress, causing slot assignment conflict.
warningSource node runs out of memory during migration, triggering eviction of keys that were already migrated, leading to inconsistency.
warningCluster node timeout too low (cluster-node-timeout < 10s) causing false positives and slot state resets.
warningMultiple migrations targeting the same slot concurrently (e.g., two reshard operations at once) leading to race condition.

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildUse CLUSTER SETSLOT <slot> STABLE on all involved nodes to clear stuck migration/import flags, then re-run migration.
buildIf keys are missing, perform a manual key migration using MIGRATE command with REPLACE option from source to target.
buildIncrease cluster-node-timeout to at least 15 seconds to reduce false slot migrations during transient network issues.
buildSet cluster-migration-barrier to 1 to allow slots to be migrated only when the destination has no replicas, preventing race conditions.
buildUse redis-cli --cluster rebalance --cluster-use-empty-masters to rebalance slots and clear stale states.

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun CLUSTER INFO on all nodes; cluster_state should be 'ok' and no slots in 'migrating' or 'importing' state.
verifiedExecute redis-cli --cluster check <node>:<port> and confirm 'All 16384 slots covered' and no warnings.
verifiedTest key access: GET a key that was in the migrated slot; should return value without MOVED error.
verifiedCompare dbsize on source and target for slots that were migrated; counts should match if keys were moved.
verifiedMonitor cluster logs for 5 minutes with tail -f; no new slot migration errors should appear.

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningRunning CLUSTER SETSLOT NODE on a slot that is currently migrating without first setting it STABLE — this corrupts slot assignment.
warningUsing MIGRATE command without the REPLACE option when keys may already exist on target — duplicates or data loss.
warningIgnoring network latency between nodes; slot migration uses TCP connections and can timeout with high latency.
warningPerforming manual migration on a cluster with pending MOVED redirects — always stabilize slot state first.
warningSetting cluster-node-timeout too low (< 5s) to speed up failover — causes spurious slot migrations.

( 07 )War story

Stuck Slot Migration After Node Reboot

Senior SRERedis 6.2.6 cluster on Kubernetes (6 pods, 3 masters, 3 replicas), client: redis-py 4.5.1

Timeline

09:47PagerDuty alert: 'Cluster state is fail' on production Redis cluster. Latency spikes to 2s.
09:51Checked CLUSTER INFO: cluster_state:fail, cluster_slots_ok:16383, cluster_slots_migrating:1. One slot stuck in migration.
09:53CLUSTER NODES shows source node (10.0.1.10:6379) has 'migrating' flag on slot 1024, target node (10.0.2.20:6379) has 'importing'.
09:57Checked logs: 'Slot 1024 assigned to node 10.0.2.20 but it is already migrating from 10.0.1.10' repeated 50 times.
10:02Discovered source node had been rebooted 2 hours earlier for kernel update, migration was in progress at that time.
10:05Used redis-cli --cluster check: 'Key count mismatch: source has 1340 keys, target has 1258 keys for slot 1024'. 82 keys missing.
10:08Set slot to STABLE on both nodes: CLUSTER SETSLOT 1024 STABLE on source and target.
10:12Manually migrated missing keys: used MIGRATE with REPLACE for each key from source to target (scripted with redis-py).
10:18Re-assigned slot: CLUSTER SETSLOT 1024 NODE <target-id> on source, then CLUSTER SETSLOT 1024 NODE <target-id> on target.
10:22Verified: CLUSTER INFO shows cluster_state:ok, all 16384 slots covered. dbsize matches.

We got paged at 09:47 for a cluster fail state. The first thing I did was run CLUSTER INFO on a couple nodes — saw one slot stuck in migration. I immediately checked CLUSTER NODES and saw the 'migrating' and 'importing' flags on slot 1024. The logs confirmed the slot was assigned to the target but source still claimed it.

I remembered we had rebooted that source node two hours ago. The migration was interrupted mid-flight. I ran a cluster check and saw 82 keys missing on the target — they were still on the source but the slot state was inconsistent. I needed to clear the migration flags first, then move the missing keys.

I set the slot to STABLE on both nodes, then scripted MIGRATE for each missing key with REPLACE. After that, I reassigned the slot to the target node. The cluster state went green immediately. The lesson: always verify key counts after any node reboot, and never assume migration completes if a node restarts.

Root cause

Node reboot during active slot migration left slot in migrating/importing state with incomplete key transfer. The cluster gossip protocol could not resolve the inconsistency because the slot was assigned to a new node but not all keys were moved.

The fix

Clear slot state with CLUSTER SETSLOT STABLE on both nodes, manually migrate missing keys using MIGRATE with REPLACE, then reassign slot with CLUSTER SETSLOT NODE.

The lesson

Always verify key counts after any node restart or network partition. Use --cluster check to detect mismatches early. Set cluster-node-timeout high enough to avoid false slot migrations during temporary unavailability.

( 08 )How Slot Migration Works Internally

Redis Cluster slot migration is a multi-step process: 1) The source node marks the slot as 'migrating' and the target marks it as 'importing'. 2) The source node then iterates over all keys in that slot and sends each key using the MIGRATE command to the target. 3) During migration, the source still serves read requests for keys that haven't been moved yet, while the target serves keys that have been moved. 4) Once all keys are migrated, the source sends CLUSTER SETSLOT NODE to all nodes, and the target does the same, finalizing the ownership.

The critical point: if any step fails (crash, timeout, network partition), the slot state remains 'migrating' on source and 'importing' on target. The cluster will still function for keys that exist on either node, but new keys in that slot will be directed to the source (via MOVED) or target (via ASK) inconsistently. The cluster state becomes 'fail' if a slot is claimed by multiple nodes.

( 09 )Key Diagnostic Commands and Their Outputs

CLUSTER NODES output format: <node_id> <ip:port> <flags> <master> <ping-sent> <pong-recv> <config-epoch> <link-state> <slot-range>. Look for flags 'migrating' and 'importing' in the slot-range column. Example: '10.0.1.10:6379 master - 0 0 1 connected 0-1023 [1024->-<target_id>] 1025-4095' indicates slot 1024 is migrating from this node to target.

CLUSTER SLOTS returns an array of slot ranges with node info. A slot in migration will show two entries: one for the source (with a special flag) and one for the target. The output is not human-friendly; use redis-cli --cluster check instead which parses it.

redis-cli --cluster check <host:port> is the most comprehensive tool. It validates slot coverage, checks for duplicate assignments, and reports key counts per node. It will flag any slot with migration issues.

( 10 )Avoiding Race Conditions During Migration

The most common race condition is running two reshard operations simultaneously. The redis-cli --cluster reshard tool doesn't have built-in locking. If two admins run it at the same time, they may try to move the same slot or overlapping slots, causing state corruption. Always use a single coordinator and check CLUSTER INFO before starting.

Another race: manual CLUSTER SETSLOT commands issued while a migration is in progress. The command CLUSTER SETSLOT NODE <node_id> should only be issued after all keys are migrated and the slot is set to STABLE. Issuing it prematurely will cause the cluster to think the slot is fully moved even though keys remain on the source.

Mitigation: use the redis-cli --cluster reshard command which handles the full sequence atomically. If you must do manual steps, always check key counts between source and target before finalizing.

( 11 )Handling Memory Pressure During Migration

When migrating a slot that contains many keys, the source node uses additional memory to buffer keys being sent (the MIGRATE command serializes the key value). If the source node is near its maxmemory limit, this can cause evictions. If evicted keys are already migrated, the target will have them but source will not — leading to key count mismatch and potential data loss.

Best practice: ensure both source and target nodes have at least 20% free memory before migration. Monitor used_memory_rss and used_memory_peak during migration. If you see evictions, pause migration and increase maxmemory or add nodes.

When using MIGRATE command manually, the REPLACE option overwrites existing keys on target, but if keys were evicted from source, they are lost. Always verify with dbsize before and after.

( 12 )Network Partition Recovery

If a network partition occurs during migration, the slot state may be inconsistent across nodes. After the partition heals, the gossip protocol will eventually reconcile, but it may take minutes (controlled by cluster-node-timeout). During that time, clients may receive MOVED redirects.

Forceful recovery: if the slot is stuck in migrating/importing for more than cluster-node-timeout, set the slot to STABLE on both sides, then reassign it to the correct owner. This clears the stale state.

If the source node is permanently down, you must use CLUSTER FORGET to remove it from the cluster and then assign the slot to a different node. The slot will be temporarily unassigned (cluster_state=fail) until you reassign it.

Frequently asked questions

What does 'CLUSTERDOWN The cluster is down' mean during slot migration?

This usually means the cluster has detected that not all 16384 slots are covered by a single master node. During migration, the slot is temporarily assigned to both source and target (or neither), causing an inconsistency. The cluster refuses to serve all keys to maintain consistency. Check CLUSTER INFO for cluster_slots_ok and cluster_slots_migrating. The fix is to complete the migration or set the slot to STABLE and reassign.

Can I cancel an ongoing slot migration?

Yes, but carefully. To cancel, set the slot to STABLE on both the source and target nodes using CLUSTER SETSLOT <slot> STABLE. This clears the migrating/importing flags without moving any more keys. However, keys that were already migrated will remain on the target, and keys not yet migrated will remain on the source. You may have duplicate keys if the migration had started. After setting STABLE, you can reassign the slot to the original owner and manually delete duplicate keys if needed.

Why does redis-cli --cluster rebalance sometimes cause slot migration errors?

The rebalance command moves slots from nodes with more keys to nodes with fewer keys. It can cause errors if the cluster has existing migration state (stuck slots), if nodes are not all reachable, or if there are replicas with stale data. Always run --cluster check first to ensure the cluster is healthy. Also, rebalance should not be run when any slot is in migrating/importing state.

How do I manually migrate a single key without using the cluster tools?

Use the MIGRATE command: MIGRATE <host> <port> <key> <destination-db> <timeout> [COPY] [REPLACE]. Example: MIGRATE 10.0.2.20 6379 mykey 0 5000 REPLACE. This connects to the target, sends the key, and deletes it from the source (unless COPY is specified). Use REPLACE to overwrite if the key already exists on the target. This is useful for fixing missing keys after a failed migration.

What is the difference between MOVED and ASK redirections?

MOVED means the client has the wrong slot mapping; the slot is permanently owned by another node. The client should update its cached slot map. ASK means the slot is in the process of being migrated; the client should retry the request on the target node with an ASKING command. ASK is temporary and should not be cached. During migration, the source sends ASK for keys that have already been moved, and MOVED for keys that haven't.

Redis Cluster Slot Migration Error: Diagnosis and Fix

What this usually means

Frequently asked questions