CephClusterCriticallyFull
2. Description
Storage cluster utilization has crossed 80%.
Detailed Description: Storage cluster utilization has crossed 80% and will become read-only at 85%. Your Ceph cluster will become read only once utilization crosses 85%. Expand the storage cluster immediately. It is common to see alerts related to OSD devices full or near full prior to this alert.
4. Prerequisites
To proceed with the prerequisites and resolution, you will need basic cli tools including:
-
oc (Openshift CLI)
-
jq
-
curl
4.1. Verify cluster access
oc whoami
Check the output to ensure you are in the correct context for the cluster mentioned in the alert. If not, please change context and proceed.
4.2. Check Alerts
MYALERTMANAGER=$(oc -n openshift-monitoring get routes/alertmanager-main --no-headers | awk '{print $2}')
Quickly view all alerts to check if your alert is still active.
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname) | { ALERT: .labels.alertname, STATE: .status.state}'
Continue ONLY if you want to view your specific alert or need more details
export MYALERTNAME="<alertname from alert>"
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname | test(env.MYALERTNAME)) | { ALERT: .labels.alertname, STATE: .status.state}'
No entries means the alert is no longer active.
Some alerts, such as mismatch versions, can occur during upgrades and resolve themselves. If this alert is not a mismatch version alert then there should be an investigation into what triggered the alert even though the alert resolved. Look for other active alerts or alerts with similiar timing.
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname | test(env.MYALERTNAME)) | { ALERTDETAILS: .}'
More about the Prometheus Alert endpoint can be found here: https://prometheus.io/docs/prometheus/latest/querying/api/#alerts
4.3. Check/Document OCS Ceph Cluster Health
You may directly check OCS Ceph Cluster health by using the rook-ceph toolbox.
The rook-ceph toolbox is not supported by Red Hat and is used here only to provide a quick health assessment. Do not use the toolbox to modify your Ceph cluster. Use the toolbox for querying health only. |
oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
After the rook-ceph-tools
Pod is Running
, access the toolbox like this:
TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage $TOOLS_POD
ceph status
ceph health
ceph osd status
ceph osd df
exit
Do not forget to exit.
5. Procedure for Resolution
5.1. Resolution Overview
Determine if you can proceed with scaling *UP* or *OUT* for expansion. Choose the appropriate tab for your deployment below.
5.2. Preparation for Scaling UP
The procedure for scaling UP storage requires adding more storage capacity to existing nodes. In general, this process requires 3 steps:
-
Check Ceph Cluster Status Before Recovery - Check ceph status, ceph osd status, check current alerts
-
Add Storage Capacity - Determine if LSO is in use or not, add capacity accordingly. This will be deployment specific and you will need to have some idea of how your OCS cluster was deployed (i.e. AWS, Azure, bare metal) and how to add storage for your specific deployment.
-
Check Ceph Cluster Status During Recovery - Essentially same as first item.
5.3. Assess Your Cluster State
Check the current status of the ceph cluster and OSDs using the rook-ceph toolbox:
oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
After the rook-ceph-tools
Pod is Running
, access the toolbox like this:
TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage $TOOLS_POD
Once inside the toolbox, run the following Ceph commands:
ceph status
In the output of ceph status
look for cluster: health, and data: usage.
Typically, OSDs running out of space to replicate are causing the problem. View current OSD status by running:
ceph osd status
In the output of ceph osd status
look for the numbers in the used and avail columns. You will track these numbers after adding scaling storage capacity.
Do not forget to exit the pod to return back to your command prompt:
exit
This in this situation, it is common to have MANY alerts firing or warning all at once. If you have not already checked firing alerts, you may do so now by running:
MYALERTMANAGER=$(oc -n openshift-monitoring get routes/alertmanager-main --no-headers | awk '{print $2}')
Quickly view all alerts to check if your alert is still active.
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname) | { ALERT: .labels.alertname, STATE: .status.state}'
5.4. Determine LSO or No LSO
To determine is LSO is installed and in use, run the following commands:
oc get csv -A | grep local-storage
openshift-local-storage local-storage-operator.4.6.0-202103130248.p0 Local Storage 4.6.0-202103130248.p0 Succeeded
If you see output like above ending with Succeeded, LSO is installed.
Determine if LSO is in use by looking for PVs that begin with the "local-pv" prefix, and PVC names that contain "ocs-deviceset-" prefix then the name of the storageClass used in the storage cluster definition, by running the following commands:
oc -n openshift-storage get pvc | grep local-pv
ocs-deviceset-localblksc-demo-0-data-0-jlp6g Bound local-pv-f4d1075c 50Gi RWO localblkscdemo 140m ocs-deviceset-localblksc-demo-1-data-0-c8pmq Bound local-pv-bf5ca51e 50Gi RWO localblkscdemo 140m ocs-deviceset-localblksc-demo-2-data-0-nzzn5 Bound local-pv-fb64d972 50Gi RWO localblkscdemo 140m
Notice the storage class name embedded in the name of the PVC, right after "ocs-deviceset-". In our example demo case, this is "localblksc-demo" and can be verified to be in use by the storage cluster by running:
oc -n openshift-storage get storagecluster -o yaml | grep storageClassName
storageClassName: localblksc-demo
If PVs named local-pv* are present and are bound to PVCs that contain the name of the storageClass in use by storagecluster definition, LSO is in use.
=== Adding Storage Capacity when Local Storage Operator is in USE
Follow the cli instructions below or the webui instructions OCS - Scaling up storage by adding capacity to your OpenShift Container Storage nodes using local storage devices.
To determine which nodes will need additional capacity, view the localvolumeset and look for values: with key: kubernetes.io/hostname by running the following command:
oc -n openshift-local-storage get localvolumeset -o yaml
[...] nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - ip-10-0-149-49 - ip-10-0-179-37 - ip-10-0-209-228 storageClassName: localblksc-demo [...]
Add storage capacity (i.e. in AWS attach EBS volumes) to each hostname listed.
Details of attaching additional storage capacity will depend on your particular environment and are beyond the scope of this document. |
Once you have added capacity to each node, verify LSO has discovered the new storage by viewing the localvolumediscoveryresults details. To view the localvolumediscoveryresults details, first view the localvolumediscovery then choose a node for a spot check:
oc -n openshift-local-storage get localvolumediscoveryresults
If you do not see your new storage please rerun the command a few times to allow discovery to complete.
Choose one of the localvolumediscoveryresults from above to inspect details. When new storage has been "discovered" it will be reflected under status: discoveredDevices.
oc -n openshift-local-storage get localvolumediscoveryresults/discovery-result-ip-10-0-149-49.us-east-2.compute.internal -o yaml
[...] - deviceID: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0b86476e1d962d061 fstype: "" model: 'Amazon Elastic Block Store ' path: /dev/nvme2n1 property: NonRotational serial: vol0b86476e1d962d061 size: 53687091200 status: state: Available type: disk vendor: "" [...]
Expand the OCS cluster by directly editing the ocs-storagecluster definition, finding spec: count: and increasing the count by 1:
oc -n openshift-storage edit storagecluster/ocs-storagecluster
spec: encryption: {} externalStorage: {} [...] monDataDirHostPath: /var/lib/rook storageDeviceSets: - config: {} count: 1
In the example above "count: 1" becomes "count: 2".
A successful edit of storagecluster/ocs-storagecluster will show the following:
storagecluster.ocs.openshift.io/ocs-storagecluster edited
=== Assess Your Cluster During Recovery Watch the expansion progress by viewing ceph status and ceph osd status. This step is a repeat of Assess Your Cluster however, this time you will be specifically looking for evidence of:
-
New osds for the new storage
-
Status of recovery in io:
-
Progress of OSD rebalancing across all (existing and new) OSDs
-
Finally ceph health: and all alerts return to normal
In case you did not do this from the previous Assess Your Cluster section:
oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
After the rook-ceph-tools
Pod is Running
, access the toolbox like this:
TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage $TOOLS_POD
Once inside the toolbox, run the following Ceph commands:
ceph status
cluster: id: 4e949bf7-7e24-4c0e-898e-f5985b514a99 health: HEALTH_ERR 3 backfillfull osd(s) 3 pool(s) backfillfull Degraded data redundancy: 18634/33627 objects degraded (55.414%), 122 pgs degraded Full OSDs blocking recovery: 93 pgs recovery_toofull services: mon: 3 daemons, quorum a,b,c (age 59m) mgr: a(active, since 59m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 6 osds: 6 up (since 43s), 6 in (since 43s) task status: scrub status: mds.ocs-storagecluster-cephfilesystem-a: idle mds.ocs-storagecluster-cephfilesystem-b: idle data: pools: 3 pools, 288 pgs objects: 11.21k objects, 43 GiB usage: 137 GiB used, 163 GiB / 300 GiB avail pgs: 18634/33627 objects degraded (55.414%) 45/33627 objects misplaced (0.134%) 165 active+clean 93 active+recovery_toofull+degraded 29 active+recovery_wait+degraded 1 active+recovering io: client: 3.0 KiB/s rd, 75 MiB/s wr, 3 op/s rd, 38 op/s wr recovery: 71 MiB/s, 0 keys/s, 20 objects/s
Items to notice:
-
The cluster: health: is still HEALTH_ERR which is expected until the cluster full recovers.
-
service: osds reflect the total number of desired OSDs, and how many are currently up. This should eventually change to have desired match up.
-
io: recovery: being present means a recovery is taking place. This line will disappear when the recovery is complete.
To actively watch the OSD recovery, run the following:
ceph osd status
Watch the output of ceph osd status
for detail on how the OSDs are rebalancing. You will see changes in the used and avail columns as ceph moves data to achieve a health state.
+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | 0 | ip-10-0-179-37.us-east-2.compute.internal | 33.9G | 16.0G | 0 | 830k | 0 | 0 | exists,up | | 1 | ip-10-0-209-228.us-east-2.compute.internal | 35.0G | 14.9G | 3 | 5652k | 0 | 0 | exists,up | | 2 | ip-10-0-149-49.us-east-2.compute.internal | 32.4G | 17.5G | 0 | 1638k | 0 | 0 | exists,up | | 3 | ip-10-0-179-37.us-east-2.compute.internal | 13.5G | 86.4G | 3 | 8532k | 2 | 106 | exists,up | | 4 | ip-10-0-209-228.us-east-2.compute.internal | 12.2G | 87.7G | 8 | 8183k | 0 | 0 | exists,up | | 5 | ip-10-0-149-49.us-east-2.compute.internal | 15.2G | 84.7G | 3 | 4118k | 0 | 0 | exists,up | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
Once the recovery process has completed, ceph status
will show:
-
cluster: health: HEALTH_OK
-
All OSDs desired and present in service: osds:
-
io: recovery: will not be present since recovery has completed
In addition, ceph osd status
will show:
-
Fairly even distribution of used/avail across all OSDs
ceph osd status
after recovery:+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | 0 | ip-10-0-209-93.us-east-2.compute.internal | 1104M | 48.9G | 0 | 0 | 0 | 0 | exists,up | | 1 | ip-10-0-140-27.us-east-2.compute.internal | 1116M | 48.9G | 0 | 0 | 0 | 0 | exists,up | | 2 | ip-10-0-183-136.us-east-2.compute.internal | 1092M | 48.9G | 0 | 3276 | 0 | 0 | exists,up | | 3 | ip-10-0-140-27.us-east-2.compute.internal | 1110M | 48.9G | 0 | 7372 | 0 | 0 | exists,up | | 4 | ip-10-0-183-136.us-east-2.compute.internal | 1134M | 48.8G | 0 | 2457 | 0 | 0 | exists,up | | 5 | ip-10-0-209-93.us-east-2.compute.internal | 1121M | 48.9G | 0 | 0 | 2 | 106 | exists,up | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
Do not forget to exit the pod to return back to your command prompt:
exit
Alerts will resolve themselves as the cluster recovers.
MYALERTMANAGER=$(oc -n openshift-monitoring get routes/alertmanager-main --no-headers | awk '{print $2}')
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname) | { ALERT: .labels.alertname, STATE: .status.state}'
This procedure is complete.
=== Adding Storage Capacity - No Local Storage Operator (LSO)
Follow the cli instructions below or the webui instructions OCS - Scaling up storage by adding capacity to your OpenShift Container Storage.
Expand the OCS cluster by directly editing the ocs-storagecluster definition, finding spec: count: and increasing the count by 1:
oc -n openshift-storage edit storagecluster/ocs-storagecluster
spec: encryption: {} externalStorage: {} [...] monDataDirHostPath: /var/lib/rook storageDeviceSets: - config: {} count: 1
In the example above "count: 1" becomes "count: 2".
A successful edit of storagecluster/ocs-storagecluster will show the following:
storagecluster.ocs.openshift.io/ocs-storagecluster edited
=== Assess Your Cluster During Recovery Watch the expansion progress by viewing ceph status and ceph osd status. This step is a repeat of Assess Your Cluster however, this time you will be specifically looking for evidence of:
-
New osds for the new storage
-
Status of recovery in io:
-
Progress of OSD rebalancing across all (existing and new) OSDs
-
Finally ceph health: and all alerts return to normal
In case you did not do this from the previous Assess Your Cluster section:
oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
After the rook-ceph-tools
Pod is Running
, access the toolbox like this:
TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage $TOOLS_POD
Once inside the toolbox, run the following Ceph commands:
ceph status
cluster: id: 4e949bf7-7e24-4c0e-898e-f5985b514a99 health: HEALTH_ERR 3 backfillfull osd(s) 3 pool(s) backfillfull Degraded data redundancy: 18634/33627 objects degraded (55.414%), 122 pgs degraded Full OSDs blocking recovery: 93 pgs recovery_toofull services: mon: 3 daemons, quorum a,b,c (age 59m) mgr: a(active, since 59m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 6 osds: 6 up (since 43s), 6 in (since 43s) task status: scrub status: mds.ocs-storagecluster-cephfilesystem-a: idle mds.ocs-storagecluster-cephfilesystem-b: idle data: pools: 3 pools, 288 pgs objects: 11.21k objects, 43 GiB usage: 137 GiB used, 163 GiB / 300 GiB avail pgs: 18634/33627 objects degraded (55.414%) 45/33627 objects misplaced (0.134%) 165 active+clean 93 active+recovery_toofull+degraded 29 active+recovery_wait+degraded 1 active+recovering io: client: 3.0 KiB/s rd, 75 MiB/s wr, 3 op/s rd, 38 op/s wr recovery: 71 MiB/s, 0 keys/s, 20 objects/s
Items to notice:
-
The cluster: health: is still HEALTH_ERR which is expected until the cluster full recovers.
-
service: osds reflect the total number of desired OSDs, and how many are currently up. This should eventually change to have desired match up.
-
io: recovery: being present means a recovery is taking place. This line will disappear when the recovery is complete.
To actively watch the OSD recovery, run the following:
ceph osd status
Watch the output of ceph osd status
for detail on how the OSDs are rebalancing. You will see changes in the used and avail columns as ceph moves data to achieve a health state.
+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | 0 | ip-10-0-179-37.us-east-2.compute.internal | 33.9G | 16.0G | 0 | 830k | 0 | 0 | exists,up | | 1 | ip-10-0-209-228.us-east-2.compute.internal | 35.0G | 14.9G | 3 | 5652k | 0 | 0 | exists,up | | 2 | ip-10-0-149-49.us-east-2.compute.internal | 32.4G | 17.5G | 0 | 1638k | 0 | 0 | exists,up | | 3 | ip-10-0-179-37.us-east-2.compute.internal | 13.5G | 86.4G | 3 | 8532k | 2 | 106 | exists,up | | 4 | ip-10-0-209-228.us-east-2.compute.internal | 12.2G | 87.7G | 8 | 8183k | 0 | 0 | exists,up | | 5 | ip-10-0-149-49.us-east-2.compute.internal | 15.2G | 84.7G | 3 | 4118k | 0 | 0 | exists,up | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
Once the recovery process has completed, ceph status
will show:
-
cluster: health: HEALTH_OK
-
All OSDs desired and present in service: osds:
-
io: recovery: will not be present since recovery has completed
In addition, ceph osd status
will show:
-
Fairly even distribution of used/avail across all OSDs
ceph osd status
after recovery:+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+ | 0 | ip-10-0-209-93.us-east-2.compute.internal | 1104M | 48.9G | 0 | 0 | 0 | 0 | exists,up | | 1 | ip-10-0-140-27.us-east-2.compute.internal | 1116M | 48.9G | 0 | 0 | 0 | 0 | exists,up | | 2 | ip-10-0-183-136.us-east-2.compute.internal | 1092M | 48.9G | 0 | 3276 | 0 | 0 | exists,up | | 3 | ip-10-0-140-27.us-east-2.compute.internal | 1110M | 48.9G | 0 | 7372 | 0 | 0 | exists,up | | 4 | ip-10-0-183-136.us-east-2.compute.internal | 1134M | 48.8G | 0 | 2457 | 0 | 0 | exists,up | | 5 | ip-10-0-209-93.us-east-2.compute.internal | 1121M | 48.9G | 0 | 0 | 2 | 106 | exists,up | +----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
Do not forget to exit the pod to return back to your command prompt:
exit
Alerts will resolve themselves as the cluster recovers.
MYALERTMANAGER=$(oc -n openshift-monitoring get routes/alertmanager-main --no-headers | awk '{print $2}')
curl -k -H "Authorization: Bearer $(oc -n openshift-monitoring sa get-token prometheus-k8s)" https://${MYALERTMANAGER}/api/v1/alerts | jq '.data[] | select( .labels.alertname) | { ALERT: .labels.alertname, STATE: .status.state}'
This procedure is complete.
Pending scale out