Thousands of Events due to "connect to cold tier failed"

Overview

In case that the object tier is in 'failed' state for any reason, thousands of events could be should in the UI.

The reason for that is that all DCs created in the system are trying to connect to the object tier and failed due to its status.

The events are shown up every minute, per DC, which can leads to huge amounts of events.

 

Solution

    1. Connect to EMS by SSH

$ gcloud beta compute --project "gcp-project" ssh --zone "europe-west4-a" "cluster-vm"
$ sudo su -
root#

 

    2. List all the data containers in the system, and find all the ones which their ilm_state status is 'ilm_connecting' with an error

root# . elfs_admin
[root~(elfs_admin)]# elfs-cli data_container list 
[root~(elfs_admin)]# elfs-cli data_container list | grep ilm_connecting

    3. Disconnect all the data containers you found in the last step that have ilm_state status is 'ilm_connecting'

[root~(elfs_admin)]# elfs-cli data_container disconnect_ilm --id 4

    4. Clean the errors

$ for i in $(elfs-cli event list | awk '{print $1}' | grep -o '^[0-9]*$'); do elfs-cli event ack --id $i; done

 

 

Was this helpful?
How can we improve it?