Triton maintenance and upgrades

Modified: 12 Mar 2024 22:09 UTC

Triton DataCenter requires periodic upgrades as MNX adds new features, improves performance, and corrects any issues found. To simplify this process, the sdcadm management tool was created. This tool provides the operator with one interface that manages all aspects of Triton upgrades.

If at any point you come across an error, please contact Support.

Software update procedure

Each new release of Triton software will include release notes that describe the steps to be performed for applying updates to existing installations. These steps may vary depending on the currently installed versions. Customers should always adhere to the steps defined in the release notes and should not assume that the steps are the same for every release.

Note: Customers should not apply updates without the express instruction of MNX support.

Understanding and setting the channel

Channels are used by sdcadm in order to determine what images are available to be installed in an installation. Unless otherwise informed by support, customers should always have their channel set to "support".

Check the current channel:

[root@headnode (swdemo01) ~]# sdcadm channel list
experimental  -        feature-branch builds (warning: 'latest' isn't meaningful)
dev           -        main development branch builds
staging       -        staging for release branch builds before a full release
release       -        release bits
support       true     MNX-supported release bits

Set the default channel with sdcadm channel set:

[root@headnode (swdemo01) ~]# sdcadm channel set support
Update channel has been successfully set to: 'support'

Always self-update sdcadm before proceeding with updates

It's always recommended to run sdcadm self-update --latest before performing any sdcadm upgrade operations, especially because there could be critical bug fixes published since the last time sdcadm itself was updated.

Make available upgrades

List the upgrades available:

[root@headnode (swdemo01) ~]# sdcadm avail
SERVICE    IMAGE                                 VERSION
adminui    1e13fdde-8e11-11e7-bf1b-87366b54a459  adminui@release-20170831-20170831T054615Z-g321b088
amon       73f7324e-8e0b-11e7-b5cb-3394d548dc3b  amon@release-20170831-20170831T050825Z-gec495e5
amonredis  8b9de19e-8e11-11e7-98b0-bf74e1e8594d  amonredis@release-20170831-20170831T055639Z-gc27efa7
assets     55eb262c-8e0e-11e7-93b7-ab60606c5cd3  assets@release-20170831-20170831T053400Z-gb8b8887
binder     82efbc0e-8e1e-11e7-91b5-3b22281af224  manta-nameservice@release-20170831-20170831T072531Z-g9f7cbb4
cloudapi   7792772e-8e65-11e7-b67c-6b3461bac4b8  cloudapi@release-20170831-20170831T155112Z-g85b34ed
cmon       fca551ac-8e15-11e7-9d36-dfeafff49b13  cmon@release-20170831-20170831T062606Z-gc544560
cnapi      a4c7ff2e-8e1b-11e7-b857-9b41504fedeb  cnapi@release-20170831-20170831T070544Z-gb96c268
cns        668cc624-8e0f-11e7-8aad-c794e7b574b8  cns@release-20170831-20170831T053858Z-g6a34a5f
dhcpd      1eb82f30-8e11-11e7-a9a7-3becc1aa97ef  dhcpd@release-20170831-20170831T055132Z-g0e513df
docker     a54e9126-8e14-11e7-bbdd-37fad5806e08  docker@release-20170831-20170831T060953Z-g171148a
fwapi      0161d502-8e1c-11e7-8b3e-57ee2ff4b98d  fwapi@release-20170831-20170831T070850Z-g9e60015
imgapi     3fb3f5ea-8e19-11e7-a90c-5b0d509d5568  imgapi@release-20170831-20170831T064456Z-g76dda21
mahi       2dcab4f2-8e20-11e7-88f5-c39fd995f464  manta-authcache@release-20170831-20170831T073911Z-g10c97a0
manatee    b00e657c-8e1f-11e7-abc4-bb6d695d897c  sdc-postgres@release-20170831-20170831T073540Z-gc587e86
manta      5c541768-8e26-11e7-a791-1f37a259f408  manta-deployment@release-20170831-20170831T082236Z-g69cc703
moray      9eef0e62-8e20-11e7-a9ca-03741a985146  manta-moray@release-20170831-20170831T074212Z-g4808aed
napi       65fb0b64-8e1c-11e7-94b5-57f64090d905  napi@release-20170831-20170831T071300Z-g487e972
papi       c386cfe6-8e19-11e7-81d9-eb1cdaadaec7  papi@release-20170831-20170831T064746Z-g1aed1e2
portolan   a2daf3f8-8e14-11e7-aaa8-ef007ee08c53  portolan@release-20170831-20170831T061700Z-gcb0aad5
rabbitmq   e3861afc-8e11-11e7-8020-6b3df2585aae  rabbitmq@release-20170831-20170831T055802Z-gb1ad38d
sapi       e05401f4-8e1c-11e7-88f6-5fd7d2e426e0  sapi@release-20170831-20170831T071527Z-g645d525
sdc        d4f56f36-8e18-11e7-b4de-13627579108c  sdc@release-20170831-20170831T064139Z-g903d7e3
sdcadm     add203e2-8e09-11e7-bc1e-1fca7a15a9a5  sdcadm@1.16.0
ufds       24c426c8-8e15-11e7-aba9-73ed6fe3e502  ufds@release-20170831-20170831T061926Z-g3ff5a93
vmapi      7e5a7578-8e17-11e7-acbc-13723aa4a211  vmapi@release-20170831-20170831T063247Z-g280af75
workflow   a3e309ce-8e15-11e7-ab0e-6f07f1b734be  workflow@release-20170831-20170831T062225Z-g3c4a57d

Procede to make the necessary upgrades.

Upgrade GZ tools

Update GZ tools to the latest version:

[root@headnode (swdemo01) ~]# sdcadm experimental update-gz-tools --latest
Using channel support
UUID of latest installed gz-tools image is:

Downloading gz-tools
    image: 19552d2c-6b37-463f-a540-13da012b1a7a (3.0.0)
    to: /var/tmp/gz-tools-19552d2c-6b37-463f-a540-13da012b1a7a.tgz
Decompressing gz-tools tarball
Validating gz-tools tarball files
Updating "sdc" zone tools
Updating global zone scripts
Mounting USB key
Unmounting USB key
Finding servers to update
Starting cn_tools update on 2 servers
Update compute node tools                    [=================================================================================================>] 100%        2
Update USB key contents                      [=================================================================================================>] 100%        2
Cleaning up gz-tools tarball
Updated gz-tools successfully (elapsed 41s).

Upgrade agents tool

Update the agents tool:

[root@headnode (swdemo01) ~]# sdcadm experimental update-agents --latest --all
Finding latest "agentsshar" on updates server (channel "support")
Latest is agentsshar 10db3065-51de-4eea-b348-889c92ac15d6 (1.0.0-release-20170817-20170817T152940Z-g707200f)
Finding servers to update

This update will make the following changes:
    Ensure core agent SAPI services exist
    Download agentsshar 10db3065-51de-4eea-b348-889c92ac15d6
    Update GZ agents on 2 (of 2) servers using
        agentsshar 1.0.0-release-20170817-20170817T152940Z-g707200f

Would you like to continue? [y/N] y

Downloading agentsshar from updates server (channel "support")
    to /var/tmp/
Copy agentsshar to assets dir: /usbkey/extra/agents
Create /usbkey/extra/agents/latest symlink
Starting agentsshar update on 2 servers
Updating node.config                         [=================================================================================================>] 100%        2
Downloading agentsshar                       [=================================================================================================>] 100%        2
Installing agentsshar                        [=================================================================================================>] 100%        2
Deleting temporary /var/tmp/
Reloading sysinfo on updated servers
Sysinfo reloaded for all the running servers
Refreshing config-agent on all the updated servers
Config-agent refreshed on updated servers
Successfully updated agents (3m29s)

Note: if all compute nodes are not available the --all flag cannot be used. In this case you can manually provide a list of servers to upgrade.

Upgrade the core components

With sdcadm update --all, all of the core updates will be made including CloudAPI, AdminUI, CNS, Manta, and more.

[root@headnode (swdemo01) ~]# sdcadm update --all
Finding candidate update images for 25 services (workflow, amonredis, docker, manatee,
amon, dhcpd, binder, manta, vmapi, ufds, sdc, papi, mahi, redis, moray, napi,
sapi, fwapi, adminui, imgapi, cnapi, cloudapi, cmon, cns, assets).
Using channel support

This update will make the following changes:
    download 8 images (865 MiB):
        image 85ce363a-7870-11e7-9332-67c43f746235
        image 73f7324e-8e0b-11e7-b5cb-3394d548dc3b
        image 4c5d24e2-7870-11e7-8f69-63cf84404f9c
        image 9cf650a6-7873-11e7-8b15-ab7b89e7d7a4
        image 5cb7face-8360-11e7-979d-5f22a90fec5a
        image b57eb610-7871-11e7-90a7-ff99e9c4cae8
        image 9dd2da36-6d00-11e7-981d-eb8d4624ef1b
        image bd9188a2-8369-11e7-b625-b77991ba0370
    update "docker" service to image 85ce363a-7870-11e7-9332-67c43f746235
        instance "ead82fbd-c7b7-4d07-9e56-cc2023b180e6" (docker0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "amon" service to image 73f7324e-8e0b-11e7-b5cb-3394d548dc3b
        instance "64123282-9a15-4e97-8db9-e20308be1555" (amon0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "cnapi" service to image 5cb7face-8360-11e7-979d-5f22a90fec5a
        instance "c8bd17af-6536-465f-8b8e-e786d9594327" (cnapi0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "cloudapi" service to image b57eb610-7871-11e7-90a7-ff99e9c4cae8
        instance "4185c80e-3e97-4f23-a202-55566394c897" (cloudapi0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "cmon" service to image 9dd2da36-6d00-11e7-981d-eb8d4624ef1b
        instance "ff327d74-1def-47cd-818f-b4c021ba8537" (cmon0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "cns" service to image bd9188a2-8369-11e7-b625-b77991ba0370
        instance "0f6d8ba8-d175-4d9f-ad11-7a5d862f7c70" (cns0) on server 44454c4c-3900-1046-8059-c6c04f395231
    update "imgapi" service to image 9cf650a6-7873-11e7-8b15-ab7b89e7d7a4
    update "sapi" service to image 4c5d24e2-7870-11e7-8f69-63cf84404f9c

Would you like to continue? [y/N] y

Create work dir: /var/sdcadm/updates/20170907T160105Z
Downloading image 85ce363a-7870-11e7-9332-67c43f746235
Downloading image 73f7324e-8e0b-11e7-b5cb-3394d548dc3b
Downloading image 4c5d24e2-7870-11e7-8f69-63cf84404f9c
Downloading image 9cf650a6-7873-11e7-8b15-ab7b89e7d7a4
Imported image 73f7324e-8e0b-11e7-b5cb-3394d548dc3b
Downloading image 5cb7face-8360-11e7-979d-5f22a90fec5a
Imported image 4c5d24e2-7870-11e7-8f69-63cf84404f9c
Downloading image b57eb610-7871-11e7-90a7-ff99e9c4cae8
Imported image 5cb7face-8360-11e7-979d-5f22a90fec5a
Downloading image 9dd2da36-6d00-11e7-981d-eb8d4624ef1b
Imported image 85ce363a-7870-11e7-9332-67c43f746235
Downloading image bd9188a2-8369-11e7-b625-b77991ba0370
Imported image 9cf650a6-7873-11e7-8b15-ab7b89e7d7a4
Imported image bd9188a2-8369-11e7-b625-b77991ba0370
Imported image b57eb610-7871-11e7-90a7-ff99e9c4cae8
Imported image 9dd2da36-6d00-11e7-981d-eb8d4624ef1b

--- Updating docker ...
"docker" VM already has a delegate dataset
Updating image for SAPI service "docker"
    service uuid: fe61f110-ae54-41b3-8f66-d23177354a6c
Installing image 85ce363a-7870-11e7-9332-67c43f746235
Reprovisioning VM ead82fbd-c7b7-4d07-9e56-cc2023b180e6 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for docker instance ead82fbd-c7b7-4d07-9e56-cc2023b180e6 to come up

--- Updating amon ...
Updating image for SAPI service "amon"
    service uuid: 702f546c-776d-4f4d-a723-176974a2a260
Installing image 73f7324e-8e0b-11e7-b5cb-3394d548dc3b
Reprovisioning VM 64123282-9a15-4e97-8db9-e20308be1555 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for amon instance 64123282-9a15-4e97-8db9-e20308be1555 to come up

--- Updating cnapi ...
Updating image for SAPI service "cnapi"
    service uuid: ba65f585-6f28-4fc8-8bd3-d84b0971eec6
Installing image 5cb7face-8360-11e7-979d-5f22a90fec5a
Reprovisioning VM c8bd17af-6536-465f-8b8e-e786d9594327 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for cnapi instance c8bd17af-6536-465f-8b8e-e786d9594327 to come up

--- Updating cloudapi ...
"cloudapi" VM already has a delegate dataset
Updating image for SAPI service "cloudapi"
    service uuid: 00204e6d-5e37-4f19-bd45-b9e5567411f2
Installing image b57eb610-7871-11e7-90a7-ff99e9c4cae8
Reprovisioning VM 4185c80e-3e97-4f23-a202-55566394c897 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for cloudapi instance 4185c80e-3e97-4f23-a202-55566394c897 to come up

--- Updating cmon ...
"cmon" VM already has a delegate dataset
Updating image for SAPI service "cmon"
    service uuid: 0193ab5e-ec26-470d-8f89-96daaaa19afe
Installing image 9dd2da36-6d00-11e7-981d-eb8d4624ef1b
Reprovisioning VM ff327d74-1def-47cd-818f-b4c021ba8537 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for cmon instance ff327d74-1def-47cd-818f-b4c021ba8537 to come up

--- Updating cns ...
"cns" VM already has a delegate dataset
Updating image for SAPI service "cns"
    service uuid: 829cb35b-b155-4203-be3f-169368a52a4a
Installing image bd9188a2-8369-11e7-b625-b77991ba0370
Reprovisioning VM 0f6d8ba8-d175-4d9f-ad11-7a5d862f7c70 on server 44454c4c-3900-1046-8059-c6c04f395231
Waiting for cns instance 0f6d8ba8-d175-4d9f-ad11-7a5d862f7c70 to come up

--- Updating imgapi ...
Updating image for SAPI service "imgapi"
    service uuid: 1a218453-cd30-48d9-9a9c-399efb6b78c9
Installing image 9cf650a6-7873-11e7-8b15-ab7b89e7d7a4
Reprovisioning imgapi VM eb7d6185-60b6-4f01-9408-df798ce50dfe
Waiting for imgapi instance eb7d6185-60b6-4f01-9408-df798ce50dfe to come up
Disabling imgapi service
Running IMGAPI migration-008-new-storage-layout.js
Running IMGAPI migration-009-backfill-archive.js
Running IMGAPI migration-010-backfill-billing_tags.js
Running IMGAPI migration-011-backfill-published_at.js
Running IMGAPI migration-012-update-docker-image-uuids.js (if exists)
Enabling imgapi service

--- Updating sapi ...
Get SAPI current mode
Updating image for SAPI service "sapi"
    service uuid: 182762f8-03c1-4602-aeb5-5dcaa15cd50b
Installing image 4c5d24e2-7870-11e7-8f69-63cf84404f9c
Updating 'sapi-url' in SAPI
Updating 'sapi-url' in VM b01e12bd-01f9-459a-9ff6-364f0d63b935
Verifying if we are on an HA setup
Provisioning Temporary sapi VM sapi0tmp
Waiting for sapi instance b01e12bd-01f9-459a-9ff6-364f0d63b935 to come up
Running vmadm lookup to get tmp instance UUID
Checking for service errors in temporary instance 101f7327-f9cf-4389-9fbb-534c1132471e
Waiting until 101f7327-f9cf-4389-9fbb-534c1132471e instance is in DNS
Disabling registrar on VM b01e12bd-01f9-459a-9ff6-364f0d63b935
Wait until VM b01e12bd-01f9-459a-9ff6-364f0d63b935 is out of DNS
Reprovisioning sapi VM b01e12bd-01f9-459a-9ff6-364f0d63b935
Waiting for sapi instance b01e12bd-01f9-459a-9ff6-364f0d63b935 to come up
Waiting until b01e12bd-01f9-459a-9ff6-364f0d63b935 instance is in DNS
Disabling registrar on VM 101f7327-f9cf-4389-9fbb-534c1132471e
Wait until VM 101f7327-f9cf-4389-9fbb-534c1132471e is out of DNS
Stop tmp VM 101f7327-f9cf-4389-9fbb-534c1132471e
Destroying tmp VM 101f7327-f9cf-4389-9fbb-534c1132471e (sapi0tmp)
Updated successfully (elapsed 766s).

Upgrade Other

[root@headnode (swdemo01) ~]# sdcadm experimental update-other
Running VMAPI migrations

Download new platform images and assign it to the next boot

See our documentation for managing and upgrading platform images. Be sure to download the new platform images before assigning one to the next boot.

Failure Mode: Full USB Key

When installing the latest platform images, you may incur a failure.

[root@headnode (swdemo01) ~]# sdcadm platform install --latest
Using channel support
Checking latest Platform Image is already installed
Downloading platform 20170803T064535Z
    image a38069db-1611-4971-85ae-ae945943b02c
    to /var/tmp/platform-release-20170803-20170803T064535Z.tgz
Installing Platform Image onto USB key
==> Mounting USB key
==> Staging 20170803T064535Z
######################################################################## 100.0%
==> Unpacking 20170803T064535Z to /mnt/usbkey/os
==> This may take a while...

gzcat: stdout: Broken pipe
Error: unpacking image into /mnt/usbkey/os/20170803T064535Z
Error: script failed for platform /var/tmp/platform-release-20170803-20170803T064535Z.tgz
In order not to have to re-download image, /var/tmp/platform-release-20170803-20170803T064535Z.tgz has been left behind.
After correcting above problem, rerun `sdcadm platform install /var/tmp/platform-release-20170803-20170803T064535Z.tgz`.
sdcadm platform install: error: script failed for platform /var/tmp/platform-release-20170803-20170803T064535Z.tgz

Find the mounted USB key:

[root@headnode (swdemo01) ~]# df -h /mnt/usbkey
Filesystem             Size   Used  Available Capacity  Mounted on
/dev/dsk/c2t0d0p1      3.7G   3.5G       249M    94%    /mnt/usbkey

Read our docs on removing old platform images from the USB key.

Again, if you encounter errors, please contact Support.

Understanding data center maintenance mode

"Maintenance mode" for a Triton installation means that Cloud API is in read-only mode. Modifying requests will return "503 Service Unavailable", and the Workflow API will be drained on entry. This has no impact on jobs that are submitted directly through the API's or via AdminUI.

Note: This does not current wait for config changes to be made and cloudapi instances restarted. That means there is a window after starting that new jobs could come in.

To put a data center into maintenance mode:

headnode# sdcadm dc-maint --start
Getting SDC's sapi instances from SAPI
Putting cloudapi in read-only mode
Putting docker in read-only mode
Waiting up to 5 minutes for workflow jobs to drain
Workflow cleared of running and queued jobs

To check the status of a datacenter:

headnode# sdcadm dc-maint
DC maintenance: on (since 2015-08-17T13:03:28.727Z)

To take a data center out of maintenance mode:

headnode# sdcadm dc-maint --stop
Getting SDC's sapi instances from SAPI
Taking cloudapi out of read-only mode
Taking docker out of read-only mode

More help with the sdcadm tool

You can list all of the available sdcadm tools with the --help flag:

[root@headnode (swdemo01) ~]# sdcadm help
Administer a Triton Data Center

    sdcadm [OPTIONS] COMMAND [ARGS...]
    sdcadm help COMMAND

    -h, --help             Print help and exit.
    --version              Print version and exit.
    -v, --verbose          Verbose/debug output.

    help (?)               Help on a specific sub-command.
    self-update            Update "sdcadm" itself.
    instances (insts)      List all (or a filtered subset of) SDC service instances.
    services (svcs)        List all SDC services.
    avail (available)      Display images available for update of SDC services and instances.
    update (up)            Update SDC services and instances.
    create                 Create one or more instances of an existing Triton VM service.
    rollback               Rollback SDC services and instances.
    check-config           Check sdc config in SAPI versus system reality.
    check-health (health)  Check that services or instances are up.
    dc-maint               DC maintenance related sdcadm commands.
    post-setup             Common post-setup procedures.
    platform               Platform related sdcadm commands.
    channel                sdcadm commands for operations with update channels.
    default-fabric         Initialize a default fabric for an account.
[root@headnode (swdemo01) ~]# sdcadm help update
Update SDC services and instances.

     ...update spec on stdin... | sdcadm update [<options>]
     sdcadm update [<options>] <svc> ...
     sdcadm update [<options>] <svc>@<image> ...
     sdcadm update [<options>] <svc>@<version> ...
     sdcadm update [<options>] <inst> ...
     sdcadm update [<options>] <inst>@<image> ...
     sdcadm update [<options>] <inst>@<version> ...

     # Update all instances of the cnapi service to the latest
     # available image.
     sdcadm update cnapi

     TODO: other calling forms

    -h, --help                Show this help.
    -n, --dry-run             Go through the motions without actually updating.
    -a, --all                 Update all instances.
    -y, --yes                 Answer yes to all confirmations.
    -I, --just-images         Just import images. Commonly this is used to
                              preload images before the full upgrade run.
    --force-data-path         Upgrade components in the customer data path
    --force-rabbitmq          Forcibly update rabbitmq (which is not updated by
    --force-same-image        Allow update of an instance(s) even if the target
                              image is the same as the current.
    --force-bypass-min-image  Allow update of an instance(s) even if the target
                              image is unknown or it does not fulfil the minimum
                              image requirements for updates.
    --ufds-backup-timeout=T   Timeout (in seconds) for the creation of the
                              backup of all the UFDS data during ufds updates.
                              Default: 600secs.
    -C ARG, --channel=ARG     Use the given channel to fetch the image(s), even
                              if it is not the default one.
    -x ARG, --exclude=ARG     Exclude the given services (only when -a|--all is
                              provided). Both multiple values (-x svc1 -x svc2)
                              or a single comma separated list (-x svc1,svc2) of
                              service names to be excluded are supported.
[root@headnode (swdemo01) ~]# sdcadm help experimental
Experimental, unsupported, temporary sdcadm commands.

These are unsupported and temporary commands to assist with
migration away from incr-upgrade scripts. The eventual
general upgrade process will not include any commands under
"sdcadm experimental".

    sdcadm experimental [OPTIONS] COMMAND [ARGS...]
    sdcadm experimental help COMMAND

    -h, --help             Show this help message and exit.

    help (?)               Help on a specific sub-command.
    info (config)          Get SDC info.
    update-agents          Update GZ agents on servers in the DC.
    dc-maint               DC maintenance related sdcadm commands.
    update-other           Temporary grabbag for small SDC update steps.
    update-gz-tools        Temporary grabbag for updating the SDC global zone tools.
    update-docker          This command has been replaced by `sdcadm post-setup docker`.
    install-docker-cert    Installs a custom TLS certificate to be used by sdc-docker.
    fix-core-vm-resolvers  Update resolvers for core VMs on the admin network.
    cns                    Create the "cns" service and a first instance.
    nfs-volumes            Enables/disables support for various NFS volumes features.
    remove-ca              Remove the Cloud Analytics services from Triton.
    avail                  Display images available for update of SDC services and instances.
    update                 Update SDC services and instances.

Other maintenance tasks

Triton has been designed to be largely maintenance free; outside of upgrades all necessary maintenance tasks such as management and maintenance of the manatee postgres cluster are handled internally by Triton.