HerokuMattermost

Post crashed Heroku dynos to Mattermost

A Heroku + Mattermost agent flow

A dyno that crash-loops at 2am usually surfaces as a customer complaint, not a chat ping. Heroku shows the state in its dashboard, but nobody is watching the dashboard at 2am. This recipe has an agent watch it for you. It calls ps_list for each app you care about, reads the state field on every dyno, and when one comes back crashed or restarting it posts a line into a Mattermost channel with post_message: the app, the process type, the dyno name, and the state. Heroku replies to the call rather than pushing, so the agent runs on a short loop, every minute or two, and compares each dyno's state to the previous read so a dyno that's still down isn't reposted on every pass.

The flow

Herokups_list

List the running dynos for an app.

Mattermostpost_message

Post a message to a channel.

Step by step

  1. List dynos per app

    The agent calls ps_list for each app name you give it. The response is the current process formation: every dyno with its process type (web, worker), name, and state. There is no event stream, so each run is a fresh snapshot.

  2. Diff against the last snapshot

    Keep the prior state of each dyno keyed by app plus dyno name. A dyno that was up and is now crashed or restarting is a new failure; one that was already crashed last pass is not. Only state transitions get posted.

  3. Post the failure to Mattermost

    For each newly failed dyno the agent calls post_message with the channel and a short line: app, process type, dyno, and state. Recovery to up can post a follow-up so the channel reflects the resolution.

Tell your agent

Every two minutes, call Heroku ps_list for apps api and web-frontend, and for any dyno whose state changed to crashed or restarting since the last check, post the app, process type, dyno, and state to the Mattermost ops channel.

Setup

This flow needs both servers connected to your agent. Follow each install guide:

Worth knowing

  • ps_list returns the dyno formation for one app at a time, so the agent loops over your app list; there is no account-wide dyno query in this tool.
  • State strings include up, starting, crashed, restarting, and idle. Decide which ones count as an alert, since a dyno briefly restarting after a deploy is normal.

Questions

Will a dyno that stays crashed get posted on every run?
No, as long as you diff against the last snapshot. The agent posts only when a dyno's state changes into a failure, then again when it recovers, so a dyno stuck down sits quiet between transitions.
Can I include the actual error from the logs?
ps_list only returns dyno state, not log lines. Pair it with get_app_logs if you want the agent to pull the recent crash output and include a snippet in the Mattermost message.