Understanding disk space usage in instances
There are numerous tools that can be used to manage containers and hardware virtual machines in Triton. This page discusses those tools as well as some of the key differences between how disk space is allocated, used, and managed between the two types of instances. Note: this page is primarily geared towards users of Linux, SmartOS, or FreeBSD operating systems. Windows users should consult their windows documentation for troubleshooting and management tools to use inside their Windows server.
Disk usage in infrastructure containers running SmartOS or Container Native Linux
In containers, disks are allocated as virtual storage pools (known as zpools), which are constructed of virtual devices (vdevs). Virtual devices are constructed of physical drives.
During provisioning of an infrastructure container running SmartOS or Container Native Linux, the disk space that gets allocated is based on the predefined quota set in the package that the instance is provisioned with.
Disk usage in hardware virtual machines
For hardware virtual machines, disks are allocated as zvols, and come with two mounted drives upon provisioning:
- /dev/vda (the OS disk)
- /dev/vdb (the data disk)
The OS disks are not intended for any data storage, and are allocated the same size across the board. The /dev/vdb
disk, however, is there for storing data and varies in capacity depending on the package size chosen for the instance.
Monitoring and analyzing disk usage in instances
There are many tools available to monitor and analyze disk usage for instances. The following outlines a few of the more common tools used to monitor disk usage and i/o:
- Useful storage monitoring tools:
- Tools to examine hard drive failures (troubleshooting):
- iostat(1M)
iostat -En
- fmadm(1M)
fmadm faulty
- zpool(1M)
zpool status
- iostat(1M)
Working with df
The df
command is used to view free disk space. This is useful to monitor the amount of space left on disk devices and filesystems.
To view a current snapshot of disk space on an instance (in human-readable format), use the -kh
options as shown below:
container# df -kh
Filesystem Size Used Avail Use% Mounted on
zones/2ccc2818-0948-e401-a957-b1aa3b5a2228 17G 427M 16G 3% /
/lib 263M 235M 28M 90% /lib
/lib/svc/manifest 2.4T 761K 2.4T 1% /lib/svc/manifest
/lib/svc/manifest/site 17G 427M 16G 3% /lib/svc/manifest/site
/sbin 263M 235M 28M 90% /sbin
/usr 417M 374M 44M 90% /usr
/usr/ccs 17G 427M 16G 3% /usr/ccs
/usr/local 17G 427M 16G 3% /usr/local
swap 512M 35M 478M 7% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1 417M 374M 44M 90% /lib/libc.so.1
swap 256M 8.0K 256M 1% /tmp
swap 512M 35M 478M 7% /var/run
If you see /
(root), /tmp (swap), getting close to 100%, then the instance is in jeopardy of running out of disk space, which can lead to other problems (including hard hangs).
It's important to keep filesystems cleaned up, by deleting any unnecessary data, or moving files over to an external backup solution.
Working with iostat
The iostat
utility reports i/o activity in specified interval iterations. You can use iostat
to monitor disk activity as a tool for ensuring disk i/o performance stays healthy.
To view how busy disk i/o activity is, use the -xnmz
options and an interval to run (in seconds), such as exampled below:
container# iostat -xnmz 10
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.1 0.0 0.0 0.0 0.2 0.6 0 0 lofi1
0.0 0.0 0.2 0.2 0.0 0.0 0.0 0.0 0 0 ramdisk1
38.7 262.7 100.7 20106.7 0.0 0.3 0.0 0.8 0 10 c0t0d0
38.7 262.7 100.7 20106.7 54.7 0.3 181.4 0.9 5 10 zones
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
246.3 514.9 230.9 33402.2 0.0 0.4 0.0 0.5 1 35 c0t0d0
246.3 514.9 230.9 33402.2 33.6 0.5 44.1 0.6 3 36 zones
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 476.7 0.0 38763.1 0.0 0.5 0.0 1.0 1 18 c0t0d0
0.0 476.7 0.0 38763.1 54.6 0.5 114.6 1.0 14 18 zones
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.2 134.6 0.7 5569.8 0.0 0.0 0.0 0.2 0 1 c0t0d0
0.2 134.6 0.7 5569.8 0.7 0.0 4.9 0.2 1 1 zones
extended device statistics
The above runs in 10 second iterations, providing an average of read and writes per second, along with various other details. Please see the iostat
man(1) pages for more information.
One of the key columns to focus on (specifically for monitoring disk i/o activity), is the %b
column. This column indicates how busy the disks are.
Seeing this column spike to 100% on occasion is less concerning than seeing it at, or close to 100% on a consistent basis. In general, if a device is constantly busy, then it's important to determine who or what is doing all of the disk i/o.
Troubleshooting hard drive failures on compute nodes
Hard drive failures on compute nodes affect all instances that live on the server. There are a couple of key tools that can be used to help troubleshoot and identify potential problems with the underlying storage:
iostat -En
fmadm
(fault management configuration tool)zpool status
Verifying failures using iostat
For a quick look at potential disk errors, you can run the iostat
command with the -En
option as shown below:
computenode# iostat -En
c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: Generic Product: STORAGE DEVICE Revision: 9451 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c1t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: Kingston Product: DataTraveler 2.0 Revision: PMAP Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 82 Predictive Failure Analysis: 0
c2t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EZEX-00K Revision: 1H15 Serial No: WD-WCC1S5975038
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 116 Predictive Failure Analysis: 0
c2t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD5000AVVS-0 Revision: 1B01 Serial No: WD-WCASU7437291
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 121 Predictive Failure Analysis: 0
c2t2d0 Soft Errors: 0 Hard Errors: 26 Transport Errors: 0
Vendor: HL-DT-ST Product: DVDRAM GH24NS95 Revision: RN01 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 26 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
If you see any errors being reported (soft, hard, or transport), then it's definitely worth further investigating for potential hard drive failures or other possible storage-related issues.
Working with the fault management configuration tool fmadm
The fmadm
utility is used to administer and service problems detected by the Solaris Fault Manager, fmd(1M). If a component has been diagnosed as faulty, fmadm
will report what component has failed, and the response taken for the failed device.
To view a list of failed components, run fmadm
with the faulty
option:
computenode# fmadm faulty
-------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
-------------- ------------------------------------ -------------- ---------
May 02 20:00:34 abe52661-52aa-ec45-983e-f019e465db53 ZFS-8000-FD Major
Host : headnode
Platform : MS-7850 Chassis_id : To-be-filled-by-O.E.M.
Product_sn :
Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=zones/vdev=5dbf266cd162b324
faulted and taken out of service
Problem in : zfs://pool=zones/vdev=5dbf266cd162b324
faulted and taken out of service
Description : The number of I/O errors associated with a ZFS device exceeded
acceptable levels. Refer to
http://illumos.org/msg/ZFS-8000-FD for more information.
Response : The device has been offlined and marked as faulted. An attempt
will be made to activate a hot spare if available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
The above provides an example of what you may see if a device in a zpool was diagnosed as faulted. For more details on fmadm
, please view the fmadm
man(1M) pages.
Verifying the status of zpools
The zpool
command manages and configures ZFS storage pools, which are simply a collection of virtual devices (generally physical drives) that are provided to ZFS datasets (zones).
You can obtain a quick snapshot of the health and status of storage pools by running the following zpool
command:
computenode# zpool status
pool: zones
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
cache
c2t1d0 ONLINE 0 0 0
errors: No known data errors
The above output indicates a healthy storage pool with no errors or disk maintenance activities going on.