Container Monitor with Prometheus
Container Monitor is a Prometheus-compatible interface to performance metrics for all your instances on Triton.
Container Monitor allows you to use the Prometheus-compatible ecosystem of monitoring solutions to visualize the status of your applications and track alerts for your performance thresholds. Learn more about Prometheus, an open source application that can read Container Monitor metrics, at prometheus.io.
Any solution that can read a prometheus-compatible metrics exporter can use Container Monitor, but the following configuration documentation is written for Prometheus itself.
Authentication
If you haven't already, add an SSH key to your Triton account. You can choose to upload a key or we'll make one for you. This key is used to authenticate you with all of your account's containers and Triton DataCenter APIs, including Container Monitor. Container Monitor uses this key to identify and authenticate your access to the Container Monitor interfaces.
Configuration
Container Monitor exposes a metrics endpoint for every instance in your account. Rather than manually configuring (and reconfiguring) Prometheus for every instance, you can use the Triton service discovery configuration in Prometheus to automate it.
The Triton configuration block in the prometheus.yml
file looks like the following:
- job_name: 'triton'
scheme: https
triton_sd_configs:
# The account username to use for discovering new target containers
- account: <string>
# The API is versioned, the current version is "1"
version: 1
# The DNS suffix which should be applied to target containers
# For Triton Public Cloud, this is cmon.<data center name>.triton.zone (example: cmon.us-sw-1.triton.zone)
dns_suffix: <string>
# The Triton discovery endpoint
# For Triton Public Cloud, this is cmon.<data center name>.triton.zone (the same value as dns_suffix)
endpoint: <string>
# TLS configuration.
tls_config:
ca_file: '<path to the CA file>'
cert_file: '<path to the cert file>'
key_file: '<path to the key file>'
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_triton_machine_alias]
target_label: instance
You must also enable Triton CNS in order to use Container Monitor. All new containers will get a Container Monitor CNAME record automatically. Similarly, proxy records will be added and removed when proxies come and go, along with corresponding CNAME records.
Each CNAME record represents a virtual Prometheus endpoint backed by a proxy.
To retrieve containers that can be scraped by a Prometheus server for metrics:
$ triton inst get <container>
For users running pre-existing Prometheus servers, the suggested service discovery mechanism will be leveraging file based service discovery in conjunction with our Prometheus Autopilot Pattern. That way the servers will have an equivalent experience to CloudAPI-based discovery without having to upgrade their Prometheus installation.
Available metrics
There are several metrics with a singular endpoint to learn more about your containers:
cpu_user_usage
: User CPU utilization in nanosecondscpu_sys_usage
: System CPU usage in nanosecondscpu_wait_time
: CPU wait time in nanosecondsload_average
: Load averagemem_agg_usage
: Aggregate memory usage in bytesmem_limit
: Memory limit in bytesmem_swap
: Swap in bytesmem_swap_limit
: Swap limit in bytesmem_anon_alloc_fail
: Anonymous allocation failure countnet_agg_packets_in
: Aggregate inbound packetsnet_agg_packets_out
: Aggregate outbound packetsnet_agg_bytes_in
: Aggregate inbound bytesnet_agg_bytes_out
: Aggregate outbound bytestcp_failed_connection_attempt_count
: Failed TCP connection attemptstcp_retransmitted_segment_count
: Retransmitted TCP segmentstcp_duplicate_ack_count
: Duplicate TCP ACK counttcp_listen_drop_count
: TCP listen drops. Connection refused because backlog fulltcp_listen_drop_Qzero_count
: Total # of connections refused due to half-open queue (q0) fulltcp_half_open_drop_count
: TCP connection dropped from a full half-open queuetcp_retransmit_timeout_drop_count
: TCP connection dropped due to retransmit timeouttcp_active_open_count
: TCP active open connectionstcp_passive_open_count
: TCP passive open connectionstcp_current_established_connections_total
: TCP total established connectionsvfs_bytes_read_count
: VFS number of bytes readvfs_bytes_written_count
: VFS number of bytes writtenvfs_read_operation_count
: VFS number of read operationsvfs_write_operation_count
: VFS number of write operationsvfs_wait_time_count
: VFS cumulative wait (pre-service) timevfs_wait_length_time_count
: VFS cumulative wait length*time productvfs_run_time_count
: VFS cumulative run (pre-service) timevfs_run_length_time_count
: VFS cumulative run length*time productvfs_elements_wait_state
: VFS number of elements in wait statevfs_elements_run_state
: VFS number of elements in run statezfs_used
: zfs space used in byteszfs_available
: zfs space available in bytestime_of_day
: System time in seconds since epoch