Migrating instances between compute nodes

Modified: 03 Jan 2024 23:26 UTC

You can migrate instances between compute nodes (CNs) within the same data center using both internal APIs (for operators) and CloudAPI (for end users).

This document provides an overview of the instance migration feature and the tools provided for operators. For end user documentation, see CloudAPI docs for VM migration. You can learn more about instance migration by visiting the links in the Additional resources section of this page.

Note: Instance migration is not yet supported in the Triton portal for end users, but is supported via CLI with node-triton.

Prerequisites

Note: If your Triton installation does not currently meet the prerequisites, please visit Migrating with migrator.sh (deprecated).

To check the versions you are currently running from the head node:

You can check the version of node-triton you are currently using from your CLI with triton --version.

Instance migration overview

Instance migrations operate in three distinct phases:

  1. Begin: This phase starts the migration process, creates a new migration database entry, and provisions the target instance. It will use the existing Triton provisioning and allocation APIs to ensure proper placement of the migrated instance, but you can specify a target server UUID to override that behavior.
  2. Sync: This phase synchronizes the ZFS datasets of the source instance with the ZFS datasets in the target instance (without stopping the instance). It can be executed multiple times, with each subsequent sync operation only transferring the file system differences since the last sync.
  3. Switch: This phase stops the source instance from running, synchronizes the ZFS datasets of the source instance with the target instance, moves the NICs from the source to the target instance, moves control to the target instance, and then starts the target instance.

A migration can be set to run all three phases at once for an automatic migration, or individually for an on-demand migration.

For any migration operation (e.g., begin, sync, switch, or abort) you can use the migration watch endpoint to show progress information.

After successfully completing the switch phase (whether automatically or on-demand) and confirming the integrity of the migrated instance, you can then use the migration finalize action to clean up, which removes the original (now hidden) source instance.

Alternatively, if the migrated instance is not working correctly, you can perform a migration rollback action, which will reinstate the original source instance and delete the target instance, reverting to the state of the instance before the migration switch operation.

End user migrations are disabled by default, but can be enabled globally in SAPI, or per-instance in VMAPI. For specific details, see Allowing user migrations.

Performing an instance migration

There are two types of migrations that you can perform as an operator, automatic or on-demand. We will review both types in this section.

Automatic migration

To begin an automatic instance migration, use the migrate subcommand for sdc-migrate. Optionally, a target compute node (CN) may be specified with -n CN:

migrate [-n CN] VM_UUID      - full automatic migration for this instance

For example:

[root@headnode (triton0) ~]# sdc-migrate migrate d167bf4f-a98d-c278-b121-e5b28b6eb8ad
# Migration begin running in job 32afebae-de1e-4f25-8002-99286062f91b
 - reserving instance
 - syncing data
  - running: 100%  5.6MB/s
 - syncing data
  - running: 100%  1.1kB/s  (ETA 4s)
 - stopping the instance
 - syncing data
  - running: 100%  211.3kB/s
 - switching instances
 - filesytem sync finished, switching instances
 - reserving the IP addresses for the instance
 - setting up the target filesystem
 - hiding the original instance
 - promoting the migrated instance
 - removing sync snapshots
 - starting the migrated instance
OK - switch was successful

At this point, the instance is migrated to the new compute node (CN). If everything is working properly, proceed to finalize the migration. If not, use the rollback action to revert to the source instance.

On-demand migration

On-demand migrations can be initiated when you need to switch the instance over to the new compute node (CN) at a specific time. For example, the instance can only afford downtime between 2:00 - 3:00 UTC at the lowest traffic time. You estimate the migration will take 6 hours to complete and start the begin and sync phases. You are able to successfully sync the datasets within 5 hours, run another sync at the start of your downtime window, then proceed to switch the migration to the new compute node (CN) after that.

Another use case for on-demand migration is if you need to ensure that each member of a cluster is switched over one at a time. You can begin and sync them at the same time, but then switch each one individually after confirming it has come back online and rejoined the cluster on the new compute node (CN).

In order to run an on-demand migration, you will use the following subcommands for sdc-migrate; note the option to specify a target compute node (CN) with -n CN in the begin subcommand:

begin [-n CN] VM_UUID        - begin a migration for this instance
sync VM_UUID                 - sync the filesystems of this migration
switch VM_UUID               - switch control over to the migrated instance

For example:

[root@headnode (triton0) ~]# sdc-migrate begin d167bf4f-a98d-c278-b121-e5b28b6eb8ad
# Migration begin running in job f144346d-08fe-4485-ad56-73cc0c9d5790
 - reserving instance
OK - ready for migration sync
[root@headnode (triton0) ~]# sdc-migrate sync d167bf4f-a98d-c278-b121-e5b28b6eb8ad
# Migration sync running in job d475e15c-dab5-4cf3-a6c6-c87d5de6b3d0
 - syncing data
  - running: 98%  9.5MB/s
OK - ready for migration sync or migration switch
[root@headnode (triton0) ~]# sdc-migrate switch d167bf4f-a98d-c278-b121-e5b28b6eb8ad
# Migration switch running in job 3fa0887b-c302-45b5-a654-9778ae8a85c1
 - stopping the instance
 - syncing data
  - running: 100%  276.4kB/s
 - filesytem sync finished, switching instances
 - switching instances
 - setting up the target filesystem
 - hiding the original instance
 - promoting the migrated instance
 - removing sync snapshots
 - starting the migrated instance
OK - switch was successful

At this point, the instance is migrated to the new compute node (CN). If everything is working properly, proceed to finalize the migration. If not, use the rollback action to revert to the source instance.

Rollback the migration (as needed)

If for any reason you need to rollback the migration, use the rollback action to remove the newly migrated target instance and restore the original source instance back as the primary (and visible) instance. Any file system changes made to the target instance since it was primary (i.e. since the migration switch) will be lost. After a successful migration rollback, the migration record will have been removed.

For example:

[root@headnode (triton0) ~]# sdc-migrate rollback d167bf4f-a98d-c278-b121-e5b28b6eb8ad
# Migration rollback running in job e4369be3-7a84-44ca-a1ef-c59af657e2a7
 - stopping the instance
OK - switch was successful

Finalize the migration

Whether the instance was migrated automatically or on-demand, you will need to run the finalize action to cleanup and remove the original source instance.

For example:

[root@headnode (triton0) ~]# sdc-migrate finalize d167bf4f-a98d-c278-b121-e5b28b6eb8ad
Done - the migration is finished.

Additional resources