HA and DR notes

DR — Notes

First understand what a DR is and when they are done with the DR intro article.
Then understand how to run DRs, there are different methods depending on if you are running the DR on a new controller (which is set to replace another controller which might of died) or an existing controller.

Key to successful DR, is making sure that the CC’s is generating
masters properly and regularly.
If the original CC is still available, you can look at the files
- /mnt/tmp/validate_log
- /mnt/tmp/mastervalidateresults.txt
to see if the master creation had any issues.

We most likely are doing a DR, because we couldn’t import the
filesystem/CC needed to be RMA’ed. To figure out when was the last
time a master was generated, goto any of the other members of the
cloudfs and run

zfs list -o name,pz_master_snap_ts zroot/fsname

This will give you a time value in seconds. you have to convert that
into human time by running

date -r

pz_fsnames is a tool you can get a lot of information about CC’s in
the cloudfs.

pz_fsnames -c — Gives the current ccid of the CC you are on.
pz_fsnames -C — Gives the current ccid of all the CC’s in the cloudfs.

DR process is different, if you are doing for master/subordinate.
It depends largely on where we are doing the DR and why.
• If the DR is done because of software issues, DR for master and
subordinate is the same. This is because all the Configuration,
encryption certs and ssh-keys are already present on the CC.
• If the DR is done for Hardware failure, we have to make sure we
get all these certs, keys and licenses on to the CC properly.

• For subordinate DR, we have to import customs ssh-keys. It
will get the encryption cert and configuration from the
master. Detailed KB article later.
• For Master DR, we have to make one of the other CC’s a master
temporarily and this CC as subordinate till the DR is
complete. Detailed KB article later.

Time it takes to do a DR is tough to predict. It depends on the
following things.

• CPU and I/O capabilities of the CC, (We shipped some bad
performing Disks before)
• Size of the Filesystem
• Number of user snapshots.
• How recently the Master Snapshot was taken. This is especially
true for NAS use cases, where there are lot of modifications done
to the date. On Archive or Backup cases, does not matter as much.

While the DR is progressing, one thing we can check is the filesystem
size by running

zfs list

meta-drive-journal-ccid-fsid gives you the details about the master
snapshots and system snapshots.

To download a file from the cloud, use the tool “pz_cloud_tool” there
is help for that command.

To Download the journal file, we will run.

pz_cloud_tool GET meta-drive-journal-ccid-fsid

The file with the extension .rd will be placed in /cache.

We always apply the first master, before the first system snapshot.

To get an idea about how big the master is , you can download the
master file. using the above command. Master snapshot name is in the
format of “master-meta-ccid-fsid-snapnumber”

To look into a meta file, you can use the tool

pz_meta_file_debug

-h option gives all the file. By running

pz_meta_file_debug -d Drive master-meta…

we can see all the sub drives we have to apply to complete the
process.

To trigger a DR, webUI runs the command..

pz_recover_fs -n -c -f 1 -l
Note: this article's last section explains how to find fsname, ccid & cloud-lic-name
Note: -f 1 because to date we only support 1 "cluster" per cloudfs (in the future we might support more "clusters" per cloudfs)

This creates a configuration file /mnt/recovery_cfg and reboots the CC
into recovery mode.

Some of the files, that are used to track the progress of recovery

• /mnt/recovery_cfg
• /mnt/recovery_log - A progress log, displayed in WEBUI
• /mnt/recovery_state — We preserve the state, in case we have to
restart.
• /mnt/recovery_state.done — This indicates that the recovery
process is complete.
• /mnt/tmp/recovery_mode — As long as this file is present, CC will
not go out of recovery mode.

If the recovery process dies, to restart it again, you have to run..

pz_recover_fs -r
Note: its safest to run this with a nohup (make sure you run "bash" so your in bash and not in "sh")
nohup pz_recover_fs -r &> /mnt/pzrecovery.out &

DR Steps

Executing the DR, reboots the CC into recovery mode.
In recovery mode

• Create the root
• Create the filesystem
• If subordinate, receive the configuration and encryption cert from
the master. If Master, we read the configuration from the cloud
and apply it.
• We download the journal file and start applying the master. At
this point the status says downloading the master and until that
is complete, there will be no updates on the UI. The way to check
for the status here is to look at “zfs list” output or check in /
cache to see if we are downloading sub drives.
• After the master, we start downloading the user snapshots.
• We start downloading the incremental.
• Once we exhaust all the incremental, we declare local filesystem
DR is complete and prompt the user to reboot.
• Here we display the last time we think this filesystem wrote data
to the cloud. If this is too old, size looks small, or something
not right, we should not reboot the CC and you have to investigate
this further, by looking at other CC’s in the cloudfs, or in the
cloud itself.
• After the reboot the CC will start receiving the snapshots of
other members of the CloudFS.

HA Notes

In 7.0, setting up the HA nodes is driven thru setup wizards.
On a HA node (Global or Local), the process of receiving the snapshots
is very similar to what we do for a new CC added to the cloudfs.
Couple of relevant files that are stored for HA nodes on the CC and in
the cloud.
• clone- -1 — This is a file in the cloud, indicating the
state of the ccid. The state can be active/standby. This file has
two sections, one for active and one for standby. When parsing
the file, we are only interested in the state of the , the
other section is ignored for now. You will occasionally see the 0
for the other CC, which is valid and of no concern.
• /opt/pixel8/data/switchover.txt — When takeover is initiated logs
are in this file.
• /opt/pixel8/data/switchover_state.txt — Will tell you when the
takeover was complete.
• /opt/pixel8/data/switchover_state.bin — To preserve the state,
for restarting the process, if we reboot in the middle.

HA-Local takeover

Takeover should be initiated from the WEBUI, we have some checks in
the UI to make sure that we can initiate takeover only when we are
ready.
To run the takeover from the shell

pz_cc_takeover

Once we execute this

• We check to make sure that the current active is down.
• We Update the clone files to indicate the current active should
become Standby, if it reboots.
• The snapshot sync process will stop sync’ing other CC’s in the
cloudfs and will start to complete the sync of the Active CC,
that we are trying to receive.
• We will wait until we receive all the snapshots for the Active
CC.
• We update the clone files, to indicate that the current Standby
CC will become the Active.
• Download the configuration from the cloud for data locality
rules.
• If we notice the old active came up, we issue a reboot, this is
to avoid split brain case.
• At this point we will proceed to make the standby an active
filesystem.

When the old Active is rebooted, it will come up and change it’s state
to Standby. If the active went down with some dirty cache, then after
it changes it’s status, it will not sync the new snapshots, but will
give us an opportunity to retrieve the data ( This process will become
a little different in the future). In the logs you will see
“Dirty Cache not uploaded to the cloud when CC was active, To retrieve
the data, please contact support”
By touching the file “/tmp/rollback_standby_fs”, the CC will become
full standby.

To retrieve the data, we have to go to the shell and run pfind or
equivalent to get a list of all the files and copy them to the current
master.

In the later releases, we will copy the data to lost+found directory
and proceed without user input.

HA Local Share IP

Have to make sure that both the CC’s are on the same LAN.
If we add a Virtual Hostname to a master, on an existing setup,

• we have to make sure the master configuration is changed in all
the CC’s to reflect the virtual hostname.
• You have to rejoin to the AD.

User has to create a DNS entry for Virtual Hostname and Virtual IP
Virtual Hostname and Virtual IP will move to the CC that is active.
You can webui/ssh to the active using Virtual Hostname and Virtual IP.
This feature will not work on AMI/Azure — for now.

Kostia: I properly tabed and newlined the DR section from the DR & HA Notes.pdf that Vinay provided. However that PDF was missing the HA section (Even though it was called DR & HA Notes), thus I tabbed and newlined the HA sections with my own logical intuition. So if you have the copy of DR & HA notes.pdf that has the HA section, please edit the HA sections accordingly by adding tabs and newlines to match this Article with the PDF.

HA and DR notes

Indepth technical notes on HA and DR procedure