Jobs

Modified: 08 Sep 2022 04:28 UTC

A ContainerPilot job is a user-defined process and rules for when to execute it, how to health check it, and how to advertise it to Consul. The rules are intended to allow for flexibility to cover nearly any type of process one might want to run. Some possible job configurations include:

Lifecycle Events

Every job will emit events associated with the lifecycle of its process. Any job can react to the events emitted by any other job (or even its own events) via the when configuration.

Note that although stopping and stopped events are emitted for each running job when ContainerPilot is shutting down, the receiving job will have a limited window in which to execute. This window is 5 seconds, in order to provide enough time for ContainerPilot to halt all jobs, gracefully shut down its own listeners, and exit within the default Docker shutdown timeout of 10 seconds. After this point all processes receive a SIGKILL and are forced to exit immediately.

Additionally, jobs may react to these events:

Finally, there are two special source values that can be used to trigger a job when ContainerPilot receives a UNIX signal.

Signal events come in handy when you need to kick off some type of special process (reloading configs, publishing debug info) inside a container. This type of external communication is only supported when running ContainerPilot within a Docker container running on a Docker host, or under the supervision of a scheduler like Nomad.

Note: Either two signals can be sent to ContainerPilot acting as a PID 1 supervisor or its standalone worker process.

Configuration

Job configurations include the following fields:

jobs: [
  {
    name: "app",
    exec: "/bin/app",
    logging: {
      raw: false
    },

    // 'when' defines the events that cause the job to run
    when: {
      source: "setup",
      once: "exitSuccess",
      timeout: "60s"
      // interval: "10s",     // can't be set at the same time as 'source'/'once'
      // each: "exitSuccess", // can't be set at the same time as 'once'
    },

    // these fields interact with 'when' behaviors (see below)
    timeout: "300s",
    stopTimeout: "10s",
    restarts: "unlimited",

    // 'health' defines how the job is health checked
    health: {
      exec: "/usr/bin/curl --fail -s -o /dev/null http://localhost/app",
      interval: 5,
      ttl: 10,
      timeout: "5s",
    },

    // 'port', 'tags', 'interfaces', and 'consul' define options for
    // service discovery with Consul
    port: 80,
    initial_status: "warning", // optional status to immediately register service with
    tags: [
      "app",
      "prod"
    ],
    interfaces: [
      "eth0",
      "eth1[1]",
      "192.168.0.0/16",
      "2001:db8::/64",
      "eth2:inet",
      "eth2:inet6",
      "inet",
      "inet6",
      "static:192.168.1.100", // a trailing comma isn't an error!
    ],
    consul: {
      enableTagOverride: true,
      deregisterCriticalServiceAfter: "10m"
    }
  }
]

The name field is the name of the job as it will appear in logs and events. It will also be the name of the service as it will appear in Consul (if the job is registered to Consul). Each instance of the service in Consul will have a unique ID made up from the name+hostname of the container. Names must match requirements for Consul; they start with a lower-case letter and contain upper- or lower-case letters, numerals, or - but no other characters. (Or in other words they must match the regex ^[a-z][a-zA-Z0-9\-]+$)

The exec field is the executable (and its arguments) that is called when the job runs. This field can contain a string or an array of strings (see below for details on the format). The command to be run will have a process group set and this entire process group will be reaped by ContainerPilot when the process exits. The process will be run concurrently to all other work, so the process won't block the processing of other ContainerPilot events.

Jobs and health checks have a logging configuration block with a single option: raw. When the rawfield is set to false (the default), ContainerPilot will wrap each line of output from an exec process's stdout/stderr in a log line. If set to true, ContainerPilot will attach the stdout/stderr of the process to the container's stdout/stderr and these streams will be unmodified by ContainerPilot. The latter option can be useful if the process emits structured logs in its own format.

Running and timing fields

The following fields define when a job starts, stops, restarts, and times out.

The when field defines a hook for an event that starts the job's exec. By default, a job's exec process starts as soon as ContainerPilot has finished startup. Many jobs will want to have a configuration that determines some specific event to wait for, using the when field.

If the interval field is set it is the only field permitted under when. Otherwise, the once and each fields are mutually exclusive -- you can set one or the other but not both.

The timeout field is optional and is the amount of time to wait after the job starts before it is killed. Processes killed this way are terminated immediately (SIGKILL) without an opportunity to clean up their state and a heartbeat will not be sent.

For long-running jobs like servers, you will generally want to omit this field. If this field is omitted and the job does not have a when.frequency field, then the job will never timeout. If the field is omitted and the job does have a when.frequency field, then the timeout will default to the frequency.

If set and not left as the default, the minimum timeout is 1ms (see the golang ParseDuration docs for this format) but in practice it takes 20-50ms for a process to be forked and executed so the timeout should be considerably longer.

stopTimeout is the maximum amount of time a stopping job will wait for another job that might be watching for the stopping event.

This can be useful for jobs which need to perform a task when they begin shutting down, but before they've done so. For example, a Consul agent might need to be removed from the list of available nodes via consul leave, which requires that the agent process is still running.

In the example below, the consul-agent job will need to leave time between the stopping and stopped events in order for consul leave to perform. The stopTimeout field is the time that the job will wait before exiting and killing its process. consul-agent job waits 5 seconds after it's stopping event has fired in order to allow the leave-consul job to execute.

jobs: [
  {
    name: "consul-agent",
    exec: "consul agent...",
    stopTimeout: "5s"
  },
  {
    name: "leave-consul",
    exec: "consul leave",
    when: {
      source: "consul-agent",
      once: "stopping"
    }
  }
]

The job that's watching for the stopping event can take however long it wants to do it's work. If you want to make sure the watching job is also going to finish, you need to add the timeout field to that job as well.

The restarts field is the number of times the process will be restarted if it exits. This field supports any non-negative numeric value (ex. 0 or 1) or the strings "unlimited" or "never". This value is optional and usually defaults to "never" (see the note below about the interval field for the exception).

It's important to understand how this field compares to the when field. A restart is run only when the job receives its own exitSuccess or exitFailed event. The when field is for triggering on other events. In the example below the app job is first started when the db job is healthy, but it will restart whenever it exits. Using restarts with the each option of when is not recommended because each time the each event triggers, it will spawn an exec that can restart after exit. In the case of unlimited restarts this would eventually use up all the resources in your container, so trying to use restarts: "unlimited" and each will return an error.

jobs: [
  {
    name: "app",
    restarts: "unlimited",
    when: {
      source: "db",
      once: "healthy"
    }
  }
]

The behavior of restarts is somewhat different if the when field is using the interval option. In this case, the restarts field indicates how many times the exec will be run on that interval. In the example configuration below, the app job will be run every 5 seconds for a maximum of 4 times (3 restarts). When the interval is set, the restarts field defaults to "unlimited", which means the job will run every interval period without stopping.

jobs: [
  {
    name: "app",
    restarts: 3,
    when: {
      interval: "5s"
    }
  }
]

Health checks

The health field defines how ContainerPilot determines if a job is healthy. This field is optional. Jobs without a health field set will not emit healthy and changed events.

Service discovery

The following fields define how a job is registered with Consul.

The port field is the port the service will advertise to Consul. Note that this assumes the job is listening on that port but does not change anything in the process. If you want to dynamically assign this port you might use an environment variable and template rendering for both the exec and port fields. For example:

jobs: [
  {
    name: "myjob",
    exec: "node /bin/server.js --port {{ .PORT }}",
    port: {{ .PORT }}
  }
]

The initial_status field is optional and specifies which status to immediately register the service with. If not specified, the service will not be registered in consul until after the first successful health check. Valid values are passing, warning or critical.

The tags field is an optional array of tags to be used when the job is registered as a service in Consul. Other containers can use these tags in watches to filter a service by tag.

The interfaces field is an optional single or array of interface specifications. If given, the IP of the service will be obtained from the first interface specification that matches. (Default value is ["eth0:inet"]). The value that ContainerPilot uses for the IP address of the interface will be set as an environment variable with the name CONTAINERPILOT_{JOB}_IP. See the environment variables section.

The consul field is an optional block of job-specific Consul configuration.

Exec arguments

All exec fields that configure a child process (jobs/exec and jobs/health/exec) accept both a string or an array. If a string is given, the command and its arguments are separated by spaces; otherwise, the first element of the array is the command path, and the rest are its arguments. This is sometimes useful for breaking up long command lines.

String command

health: {
  exec: "/usr/bin/curl --fail -s http://localhost/app"
}

Array command

health: {
  exec: [
    "/usr/bin/curl",
    "--fail",
    "-s",
    "http://localhost/app"
  ]
}