OpenShift Regional Disaster Recovery with Advanced Cluster Management

Table of Contents

1. Overview
2. Deploy and Configure ACM for Multisite connectivity
3. OpenShift Data Foundation Installation
4. Install ODF Multicluster Orchestrator Operator on Hub cluster
5. Configure SSL access between S3 endpoints
6. Enabling Multicluster Web Console
7. Create Data Policy on Hub cluster
8. Create Sample Application for DR testing
9. Application Failover between managed clusters
- 9.1. Modify DRPlacementControl to failover
10. Application Failback between managed clusters
- 10.1. Modify DRPlacementControl to failback

1. Overview

The intent of this guide is to detail the steps and commands necessary to be able to failover an application from one OpenShift Container Platform (OCP) cluster to another and then failback the same application to the original primary cluster. In this case the OCP clusters will be created or imported using Red Hat Advanced Cluster Management or RHACM.

This is a general overview of the steps required to configure and execute OpenShift Disaster Recovery (ODR) capabilities using OpenShift Data Foundation (ODF) v4.11 and RHACM v2.5 across two distinct OCP clusters separated by distance. In addition to these two cluster called managed clusters, there is currently a requirement to have a third OCP cluster that will be the Advanced Cluster Management (ACM) hub cluster.

These steps are considered Tech Preview in ODF 4.11 and are provided for POC (Proof of Concept) purposes. OpenShift Regional Disaster Recovery will be supported for production usage in a future ODF release.

Install the ACM operator on the hub cluster.
After creating the OCP hub cluster, install from OperatorHub the ACM operator. After the operator and associated pods are running, create the MultiClusterHub resource.
Create or import managed OCP clusters into ACM hub.
Import or create the two managed clusters with adequate resources for ODF (compute nodes, memory, cpu) using the RHACM console.
Ensure clusters have unique private network address ranges.
Ensure the primary and secondary OCP clusters have unique private network address ranges.
Connect the private networks using Submariner add-ons.
Connect the managed OCP private networks (cluster and service) using the RHACM Submariner add-ons.
Install ODF 4.10 on managed clusters.
Install ODF 4.10 on primary and secondary OCP managed clusters and validate deployment.
Install ODF Multicluster Orchestrator on the ACM hub cluster.
Install from OperatorHub on the ACM hub cluster the ODF Multicluster Orchestrator. The OpenShift DR Hub operator will also be installed.
Configure SSL access between S3 endpoints
If managed OpenShift clusters are not using valid certificates this step must be done by creating a new user-ca-bundle ConfigMap that contains the certs.
Enable Multicluster Web Console.
This is a new Tech Preview capability that is required before creating a DRPolicy. It is only needed on the Hub cluster where ACM resides.
Create one or more DRPolicy
Use the All Clusters Data Services UI to create DRPolicy by selecting the two managed clusters the policy will apply to.
Validate OpenShift DR Cluster operators are installed.
Once the first DRPolicy is created this will trigger the DR Cluster operators to be created on the two managed clusters selected in the UI.
Create the Sample Application using ACM console.
Use the sample app example from github.com/RamenDR/ocm-ramen-samples to create a busybox deployment for failover and failback testing.
Validate Sample Application deployment.
Using CLI commands on both managed clusters validate that the application is running.
Apply DRPolicy to Sample Application.
Use the All Clusters Data Services UI to apply the new DRPolicy to the Sample Application. Once applied a DRPlacementControl resource will be created in the application namespace on the Hub cluster.
Failover Sample Application to secondary managed cluster.
Modify the application DRPlacementControl resource on the Hub Cluster, add the action of Failover and specify the failoverCluster to trigger the failover.
Failback Sample Application to primary managed cluster.
Modify the application DRPlacementControl resource on the Hub Cluster and change the action to Relocate to trigger a failback to the preferredCluster.

2. Deploy and Configure ACM for Multisite connectivity

This installation method requires you have three OpenShift clusters that have network reachability between them. For the purposes of this document we will use this reference for the clusters:

Hub cluster is where ACM, ODF Multisite-orchestrator and ODR Hub controllers are installed.
Primary managed cluster is where ODF, ODR Cluster controller, and Applications are installed.
Secondary managed cluster is where ODF, ODR Cluster controller, and Applications are installed.

2.1. Install ACM and MultiClusterHub

Find ACM in OperatorHub on the Hub cluster and follow instructions to install this operator.

Figure 1. OperatorHub filter for Advanced Cluster Management

Verify that the operator was successfully installed and that the MultiClusterHub is ready to be installed.

Figure 2. ACM Installed Operator

Select MultiClusterHub and use either Form view or YAML view to configure the deployment and select Create.

Most MultiClusterHub deployments can use default settings in the Form view.

Once the deployment is complete you can logon to the ACM console using your OpenShift credentials.

First, find the Route that has been created for the ACM console:

oc get route multicloud-console -n open-cluster-management -o jsonpath --template="https://{.spec.host}/multicloud/clusters{'\n'}"

This will return a route similar to this one.

Example Output:

https://multicloud-console.apps.bos3.example.com/multicloud/clusters

After logging in you should see your local cluster imported.

Figure 3. ACM local cluster imported

2.2. Import or Create Managed clusters

Now that ACM is installed on the Hub cluster it is time to either create or import the Primary managed cluster and the Secondary managed cluster. You should see selections (as in above diagram) for Create cluster and Import cluster. Chose the selection appropriate for your environment. After the managed clusters are successfully created or imported you should see something similar to below.

Figure 4. ACM managed cluster imported

2.3. Verify Managed clusters have non-overlapping networks

In order to connect the OpenShift cluster and service networks using the Submariner add-ons, it is necessary to validate the two clusters have non-overlapping networks. This can be done by running the following command for each of the managed clusters.

oc get networks.config.openshift.io cluster -o json | jq .spec

Example output for ocp4bos1:

{
  "clusterNetwork": [
    {
      "cidr": "10.5.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OpenShiftSDN",
  "serviceNetwork": [
    "10.15.0.0/16"
  ]
}

Example output for ocp4bos2:

{
  "clusterNetwork": [
    {
      "cidr": "10.6.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OpenShiftSDN",
  "serviceNetwork": [
    "10.16.0.0/16"
  ]
}

These outputs show that the two example managed clusters have non-overlapping clusterNetwork and serviceNetwork ranges so it is safe to proceed.

2.4. Connect the Managed clusters using Submariner add-ons

Now that we know the cluster and service networks have non-overlapping ranges, it is time to move on to installing the Submariner add-ons for each managed cluster. This is done by using the ACM console and Cluster sets.

Navigate to selection shown below and at the bottom of the same page, select Create cluster set.

Figure 5. ACM Create new Cluster set

Once the new Cluster set is created select Manage resource assignments.

Figure 6. ACM Cluster set created

Follow the instructions and add the two managed clusters to the new Cluster set. Select Save and then navigate to Submariner add-ons.

Figure 7. ACM Submariner add-ons

Select Install Submariner add-ons at the bottom of the page and add the two managed clusters. Click through the wizard selections and make changes as needed. After Review of your selections select Install.

Do not select Enable Globalnet because of overlapping cluster and service networks for the managed clusters. Using Globalnet is not supported with Regional Disaster Recovery currently. Ensure that cluster and service networks are non-overlapping before proceeding.

It can take more than 5 minutes for the Submariner add-ons installation to finish on both managed clusters. Resources are installed in the submariner-operator project.

A successful deployment will show Connection status and Agent status as Healthy.

Figure 8. ACM Submariner add-ons installed

3. OpenShift Data Foundation Installation

In order to configure storage replication between the two OCP clusters OpenShift Data Foundation (ODF) must be installed first on each managed cluster. ODF deployment guides and instructions are specific to your infrastructure (i.e. AWS, VMware, BM, Azure, etc.). Install ODF version 4.11 on both OCP managed clusters.

You can validate the successful deployment of ODF on each managed OCP cluster with the following command:

oc get storagecluster -n openshift-storage ocs-storagecluster -o jsonpath='{.status.phase}{"\n"}'

And for the Multi-Cluster Gateway (MCG):

oc get noobaa -n openshift-storage noobaa -o jsonpath='{.status.phase}{"\n"}'

If the result is Ready for both queries on the Primary managed cluster and the Secondary managed cluster continue on to configuring mirroring.

The successful installation of ODF can also be validated in the OCP Web Console by navigating to Storage and then Data Foundation.

4. Install ODF Multicluster Orchestrator Operator on Hub cluster

On the Hub cluster navigate to OperatorHub and filter for ODF Multicluster Orchestrator. Follow instructions to Install the operator into the project openshift-operators. The ODF Multicluster Orchestrator also installs the Openshift DR Hub Operator on the ACM hub cluster as a dependency.

Check to see the operators Pod are in a Running state. The OpenShift DR Hub operator will be installed at the same time in openshift-operators.

oc get pods -n openshift-operators

Example output.

NAME                                       READY   STATUS    RESTARTS   AGE

odfmo-controller-manager-f6fc95f7f-7wtjl   1/1     Running   0          4m14s
ramen-hub-operator-85465bd487-7sl2k        2/2     Running   0          3m40s
odf-multicluster-console-76b88b444c-vl9s4  1/1     Running   0          3m50s

5. Configure SSL access between S3 endpoints

These steps are necessary so that metadata can be stored on the alternate cluster in a Multi-Cloud Gateway (MCG) object bucket using a secure transport protocol and in addition the Hub cluster needs to verify access to the object buckets.

If all of your OpenShift clusters are deployed using signed and valid set of certificates for your environment then this section can be skipped.

Extract the ingress certificate for the Primary managed cluster and save the output to primary.crt.

oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > primary.crt

Extract the ingress certificate for the Secondary managed cluster and save the output to secondary.crt.

oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > secondary.crt

Create a new YAML file cm-clusters-crt.yaml to hold the certificate bundle for both the Primary managed cluster and the Secondary managed cluster.

There could be more or less than three certificates for each cluster as shown in this example file.

apiVersion: v1
data:
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 primary.crt here>
    -----END CERTIFICATE----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 from secondary.crt here>
    -----END CERTIFICATE-----
kind: ConfigMap
metadata:
  name: user-ca-bundle
  namespace: openshift-config

This ConfigMap needs to be created on the Primary managed cluster, Secondary managed cluster, and the Hub cluster.

oc create -f cm-clusters-crt.yaml

Example output.

configmap/user-ca-bundle created

The Hub cluster needs to verify access to the object buckets using the DRPolicy resource. Therefore the same ConfigMap, cm-clusters-crt.yaml, needs to be created on the Hub cluster.

After all the user-ca-bundle ConfigMaps are created, the default Proxy cluster resource needs to be modified.

Patch the default Proxy resource on the Primary managed cluster, Secondary managed cluster, and the Hub cluster.

oc patch proxy cluster --type=merge  --patch='{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}'

Example output.

proxy.config.openshift.io/cluster patched

6. Enabling Multicluster Web Console

This is a new capability that is required before creating a Data Policy or DRPolicy. It is only needed on the Hub cluster and RHACM 2.5 must be installed.

Multicluster console is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

Enable the feature gate by navigating from Administration → Cluster Settings → Configuration → FeatureGate, and edit the YAML template as follows:

[...]
spec:
  featureSet: TechPreviewNoUpgrade

Click Save to enable the multicluster console for all clusters in the RHACM console.

Do not set this feature gate on production clusters. You will not be able to upgrade your cluster after applying the feature gate, and it cannot be undone.

7. Create Data Policy on Hub cluster

Regional Disaster Recovery uses the DRPolicy resources on the Hub cluster to failover and relocate workloads across managed clusters. A DRPolicy requires a set of two DRClusters or peer clusters with ODF version 4.11 installed. The ODF MultiCluster Orchestrator Operator facilitates the creation of each DRPolicy and the corresponding DRClusters through the Multicluster Web console.

On the Hub cluster navigate to All Clusters. Then navigate to Data policies under Data services menu. If this your first DRPolicy created you will see Create DRpolicy at the bottom of the page.

Make sure to login to all clusters from the Multicluster Web console. The clusters will be directly below All Clusters.

Figure 9. Multicluster console Data policies

Click on Create DRPolicy. Select the clusters presented from the list of managed clusters that you would like to participate in the DRPolicy and give the policy a unique name (i.e., ocp4bos1-ocp4bos2-5m).

Figure 10. DRPolicy select peer clusters

The greyed out dropdown option for Replication policy will automatically be selected as async based on the OpenShift clusters selected and a Sync schedule will be available. Select the replication interval for this DRPolicy and then select Create.

Figure 11. DRPolicy select replication interval

For every desired replication interval a new DRPolicy needs to be created with a unique name (i.e., ocp4bos1-ocp4bos2-10m). The same clusters could be selected but the Sync schedule would be configured with a different replication interval in minutes. The minimum is one minute.

This should create the two DRCluster resources and also the DRPolicy on the Hub cluster. In addition, when the initial DRPolicy is created the following will happen:

Create a bootstrap token and exchanges this token between the managed clusters.
Enable mirroring for the default CephBlockPool on each managed clusters.
Create a VolumeReplicationClass on the Primary managed cluster and the Secondary managed cluster for the replication interval in the DRPolicy.
An object bucket created (using MCG) on each managed cluster for storing PVC and PV metadata.
A Secret created in the openshift-operators project on the Hub cluster for each new object bucket that has the base64 encoded access keys.
The ramen-hub-operator-config ConfigMap on the Hub cluster is modified with s3StoreProfiles entries.
The OpenShift DR Cluster operator will be deployed on each managed cluster in the openshift-dr-system project.
The object buckets Secrets on the Hub cluster in the project openshift-operators will be copied to the managed clusters in the openshift-dr-system project.
The s3StoreProfiles entries will be copied to the managed clusters and used to modify the ramen-dr-cluster-operator-config ConfigMap in the openshift-dr-system project.

To validate that the DRPolicy is created successfully run this command on the Hub cluster for the each DRPolicy resource created.

Replace <drpolicy_name> with your unique name.

oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'

Example output.

Succeeded

To validate object bucket access from the Hub cluster to both the Primary managed cluster and the Secondary managed cluster first get the names of the DRClusters on the Hub cluster.

oc get drclusters

Example output.

NAME        AGE
ocp4bos1   4m42s
ocp4bos2   4m42s

Now test S3 access to each bucket created on each managed cluster using this DRCluster validation command.

Replace <drcluster_name> with your unique name.

oc get drcluster <drcluster_name> -o jsonpath='{.status.conditions[2].reason}{"\n"}'

Example output.

Succeeded

Make sure to run command for both DRClusters on the Hub cluster.

To validate that the OpenShift DR Cluster operator installation was successful on the Primary managed cluster and the Secondary managed cluster do the following command:

oc get csv,pod -n openshift-dr-system

Example output.

NAME                                                                      DISPLAY                         VERSION   REPLACES   PHASE
clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.11.0   Openshift DR Cluster Operator   4.11.0               Succeeded

NAME                                             READY   STATUS    RESTARTS   AGE
pod/ramen-dr-cluster-operator-5564f9d669-f6lbc   2/2     Running   0          5m32s

You can also go to OperatorHub on each of the managed clusters and look to see the OpenShift DR Cluster Operator is installed.

Figure 12. ODR Cluster Operator

Validate the status of the ODF mirroring daemon health on the Primary managed cluster and the Secondary managed cluster.

oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}'

Example output.

{"daemon_health":"OK","health":"OK","image_health":"OK","states":{}}

It could take up to 10 minutes for the daemon_health and health to go from Warning to OK. If the status does not become OK eventually then use the ACM console to verify that the Submariner connection between managed clusters is still in a healthy state. Do not proceed until all values are OK.

8. Create Sample Application for DR testing

In order to test failover from the Primary managed cluster to the Secondary managed cluster and back again we need a simple application. The sample application used for this example with be busybox.

8.1. Creating Sample Application using ACM console

Start by loggin into the ACM console using your OpenShift credentials if not already logged in.

oc get route multicloud-console -n open-cluster-management -o jsonpath --template="https://{.spec.host}/multicloud/applications{'\n'}"

This will return a route similar to this one.

Example Output:

https://multicloud-console.apps.bos3.example.com/multicloud/applications

After logging in select Create application in the top right and choose Subscription.

Figure 13. ACM Create application

Fill out the top of the Create an application form as shown below and select repository type Git.

Figure 14. ACM Application name and namespace

The next section to fill out is below the Git box and is the repository URL for the sample application, the github branch and path to resources that will be created, the busybox Pod and PVC.

Sample application repository github.com/RamenDR/ocm-ramen-samples. Branch is main and path is busybox-odr.

Figure 15. ACM application repository information

Scroll down in the form until you see Deploy application resources only on clusters matching specified labels and then add a label for the Primary managed cluster name in RHACM cluster list view.

Figure 16. ACM Select cluster for deployment

After adding the Label to identify the cluster, select Save in the upper right hand corner.

On the follow-on screen go to the Topology tab. You should see that there are all Green checkmarks on the application topology.

Figure 17. ACM application successful topology view

To get more information click on any of the topology elements and a window will appear to right of the topology view.

8.2. Validating Sample Application deployment

Now that the busybox application has been deployed to your Primary managed cluster the deployment can be validated.

Logon to your managed cluster where busybox was deployed by ACM. This is most likely your Primary managed cluster.

oc get pods,pvc -n busybox-sample

Example output.

NAME          READY   STATUS    RESTARTS   AGE
pod/busybox   1/1     Running   0          6m

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-a56c138a-a1a9-4465-927f-af02afbbff37   1Gi        RWO            ocs-storagecluster-ceph-rbd   6m

8.3. Apply DRPolicy to Sample Application

On the Hub cluster go back to the Multicluster Web console and select All Clusters in the top right hand corner.

Make sure to login to all clusters from the Multicluster Web console. The clusters will be directly below All Clusters.

Navigate to Data Services and then choose Data policies. You should see the DRPolicy you created earlier in these instructions, section Create Data Policy on Hub cluster. At the far right of the DRPolicy select the vertical dots as shown below.

Figure 18. Apply DRPolicy

When the Apply DRPolicy box appears select busybox and then Apply.

Figure 19. DRPolicy select application

Validate that a DRPlacementControl or DRPC was created in the busybox-sample namespace on the Hub cluster. This resource is used for both failover and failback actions for this application.

oc get drpc -n busybox-sample

Example output.

NAME                       AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-placement-1-drpc   6m59s   ocp4bos1                                           Deployed

8.4. Deleting the Sample Application

Deleting the busybox application can be done using the ACM console. Navigate to Applications and then find the application to be deleted (busybox in this case).

The instructions to delete the sample application should not be executed until the failover and failback (relocate) testing is completed and you want to remove this application from RHACM and from the managed clusters.

Figure 20. ACM delete busybox application

When Delete application is selected a new screen will appear asking if the application related resources should also be deleted. Make sure to check the box to delete the Subscription and PlacementRule.

Figure 21. ACM delete busybox application resources

Select Delete in this screen. This will delete the busybox application on the Primary managed cluster (or whatever cluster the application was running on).

In addition to the resources deleted using the ACM console, the DRPlacementControl must also be deleted immediately after deleting the busybox application. Logon to the OpenShift Web console for the Hub cluster. Navigate to Installed Operators for the project busybox-sample. Choose OpenShift DR Hub Operator and the DRPlacementControl.

Figure 22. Delete busybox application DRPlacementControl

Select Delete DRPlacementControl.

If desired, the DRPlacementControl resource can also be deleted in the application namespace using CLI.

This process can be used to delete any application with a DRPlacementControl resource.

9. Application Failover between managed clusters

This section will detail how to failover the busybox sample application. The failover method for Metro Disaster Recovery is application based. Each application that is to be protected in this manner must have a corresponding DRPlacementControl in the application namespace as shown in the Apply DRPolicy to Sample Application section.

9.1. Modify DRPlacementControl to failover

To failover requires modifying the DRPlacementControl YAML view. On the Hub cluster navigate to Installed Operators and then to Openshift DR Hub Operator. Select DRPlacementControl as show below.

Make sure to be in the busybox-sample namespace.

Figure 23. DRPlacementControl busybox instance

Select busybox-placement-1-drpc and then the YAML view. Add the action and failoverCluster as shown below. The failoverCluster should be the ACM cluster name for the Secondary managed cluster.

Figure 24. DRPlacementControl add action Failover

Select Save.

In the failoverCluster specified in the YAML file (i.e., ocp4bos2), see if the application busybox is now running in the Secondary managed cluster using the following command:

oc get pods,pvc -n busybox-sample

Example output.

NAME          READY   STATUS    RESTARTS   AGE
pod/busybox   1/1     Running   0          35s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb   5Gi        RWO            ocs-storagecluster-ceph-rbd   35s

Next, using the same command check if busybox is running in the Primary managed cluster. The busybox application should no longer be running on this managed cluster.

oc get pods,pvc -n busybox-sample

Example output.

No resources found in busybox-sample namespace.

10. Application Failback between managed clusters

A failback operation is very similar to failover. The failback is application based and again uses the DRPlacementControl action value to trigger the failback. In this case the action is Relocate to the preferredCluster.

10.1. Modify DRPlacementControl to failback

To failback requires modifying the DRPlacementControl YAML view. On the Hub cluster navigate to Installed Operators and then to Openshift DR Hub Operator. Select DRPlacementControl as show below.

Make sure to be in the busybox-sample namespace.

Figure 25. DRPlacementControl busybox instance

Select busybox-placement-1-drpc and then the YAML form. Modify the action to Relocate as shown below.

Figure 26. DRPlacementControl modify action to Relocate

Select Save.

Check if the application busybox is now running in the Primary managed cluster using the following command. The failback is to the preferredCluster which should be where the application was running before the failover operation.

oc get pods,pvc -n busybox-sample

Example output.

NAME          READY   STATUS    RESTARTS   AGE
pod/busybox   1/1     Running   0          60s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb   5Gi        RWO            ocs-storagecluster-ceph-rbd   61s

Next, using the same command, check if busybox is running in the Secondary managed cluster. The busybox application should no longer be running on this managed cluster.

oc get pods,pvc -n busybox-sample

Example output.

No resources found in busybox-sample namespace.