Container Monitor with Prometheus

Modified: 03 Jan 2024 18:23 UTC

Container Monitor is a Prometheus-compatible interface to performance metrics for all your instances on Triton.

Container Monitor allows you to use the Prometheus-compatible ecosystem of monitoring solutions to visualize the status of your applications and track alerts for your performance thresholds. Learn more about Prometheus, an open source application that can read Container Monitor metrics, at prometheus.io.

Any solution that can read a prometheus-compatible metrics exporter can use Container Monitor, but the following configuration documentation is written for Prometheus itself.

Authentication

If you haven't already, add an SSH key to your Triton account. You can choose to upload a key or we'll make one for you. This key is used to authenticate you with all of your account's containers and Triton DataCenter APIs, including Container Monitor. Container Monitor uses this key to identify and authenticate your access to the Container Monitor interfaces.

Configuration

Container Monitor exposes a metrics endpoint for every instance in your account. Rather than manually configuring (and reconfiguring) Prometheus for every instance, you can use the Triton service discovery configuration in Prometheus to automate it.

The Triton configuration block in the prometheus.yml file looks like the following:

  - job_name: 'triton'
    scheme: https

    triton_sd_configs:
        # The account username to use for discovering new target containers
        - account: <string>

          # The API is versioned, the current version is "1"
          version: 1

          # The DNS suffix which should be applied to target containers
          # For Triton Public Cloud, this is cmon.<data center name>.triton.zone (example: cmon.us-sw-1.triton.zone)
          dns_suffix: <string>

          # The Triton discovery endpoint
          # For Triton Public Cloud, this is cmon.<data center name>.triton.zone (the same value as dns_suffix)
          endpoint: <string>

          # TLS configuration.
          tls_config:
            ca_file: '<path to the CA file>'
            cert_file: '<path to the cert file>'
            key_file: '<path to the key file>'
            insecure_skip_verify: true

    relabel_configs:
      - source_labels: [__meta_triton_machine_alias]
        target_label: instance

You must also enable Triton CNS in order to use Container Monitor. All new containers will get a Container Monitor CNAME record automatically. Similarly, proxy records will be added and removed when proxies come and go, along with corresponding CNAME records.

Each CNAME record represents a virtual Prometheus endpoint backed by a proxy.

To retrieve containers that can be scraped by a Prometheus server for metrics:

$ triton inst get <container>

For users running pre-existing Prometheus servers, the suggested service discovery mechanism will be leveraging file based service discovery in conjunction with our Prometheus Autopilot Pattern. That way the servers will have an equivalent experience to CloudAPI-based discovery without having to upgrade their Prometheus installation.

Available metrics

There are several metrics with a singular endpoint to learn more about your containers:

cpu_user_usage: User CPU utilization in nanoseconds
cpu_sys_usage: System CPU usage in nanoseconds
cpu_wait_time: CPU wait time in nanoseconds
load_average: Load average
mem_agg_usage: Aggregate memory usage in bytes
mem_limit: Memory limit in bytes
mem_swap: Swap in bytes
mem_swap_limit: Swap limit in bytes
mem_anon_alloc_fail: Anonymous allocation failure count
net_agg_packets_in: Aggregate inbound packets
net_agg_packets_out: Aggregate outbound packets
net_agg_bytes_in: Aggregate inbound bytes
net_agg_bytes_out: Aggregate outbound bytes
tcp_failed_connection_attempt_count: Failed TCP connection attempts
tcp_retransmitted_segment_count: Retransmitted TCP segments
tcp_duplicate_ack_count: Duplicate TCP ACK count
tcp_listen_drop_count: TCP listen drops. Connection refused because backlog full
tcp_listen_drop_Qzero_count: Total # of connections refused due to half-open queue (q0) full
tcp_half_open_drop_count: TCP connection dropped from a full half-open queue
tcp_retransmit_timeout_drop_count: TCP connection dropped due to retransmit timeout
tcp_active_open_count: TCP active open connections
tcp_passive_open_count: TCP passive open connections
tcp_current_established_connections_total: TCP total established connections
vfs_bytes_read_count: VFS number of bytes read
vfs_bytes_written_count: VFS number of bytes written
vfs_read_operation_count: VFS number of read operations
vfs_write_operation_count: VFS number of write operations
vfs_wait_time_count: VFS cumulative wait (pre-service) time
vfs_wait_length_time_count: VFS cumulative wait length*time product
vfs_run_time_count: VFS cumulative run (pre-service) time
vfs_run_length_time_count: VFS cumulative run length*time product
vfs_elements_wait_state: VFS number of elements in wait state
vfs_elements_run_state: VFS number of elements in run state
zfs_used: zfs space used in bytes
zfs_available: zfs space available in bytes
time_of_day: System time in seconds since epoch

DOCUMENTATION

APIs

Container Monitor with Prometheus

Authentication

Configuration

Available metrics