The Warboard
The Dogsbody Technology Warboard sits on the wall in our office and allows us to see a detailed overview of the infrastructure we monitor real time, this has proved it’s self to be invaluable for spotting potential issues and remedying them before they ever become an issue.
We’re responsible for monitoring and maintaining hundreds of servers on a daily basis. Checking the status of this infrastructure manually would be virtually impossible. To make this job easier we use tools such as Pingdom and NewRelic however we still felt the need for a high level overview of all servers.
When there is an issue with either a service failing on a server, or the health of a server deteriorates Pingdom and NewRelic will alert us real time via custom webhooks we have written. These are great for reacting to an issue when it happens, however it doesn’t give us a clear overview of the infrastructure we monitor before an issue occurs, this is why we created the Warboard.
The Warboard is displayed in such as way that we only see the metrics we need to. Services at the top of the Pingdom column are ordered by highest response rate, servers in NewRelic are ordered by the highest metric for each server (if CPU utilisation was a higher percentage than memory, disk usage and disk IO it would be used). We display the Warboard on a wall mounted TV for the whole team to see.
In the Pingdom column red checks are checks that are currently down, blue checks are paused and green checks are up. In the NewRelic column red checks are servers that have hit their high threshold on their policy, amber checks have hit their warning threshold, blue checks are servers that are no longer reporting and green checks are servers that have not reached a threshold.
We also have a column for Sirportly, our ticketing system. This shows how many tickets each team member has. Below this is an overview of events in our Google Calendar where we can see upcoming events and scheduled maintenance.
The Warboard backend is all written in Python and the frontend is Python (Flask) using the Jinja2 templating engine. We’ve made the Warboard public on Github, so feel free to contribute, modify it and use it in your own environment if you please.
If you’d like us to monitor your infrastructure be sure to take a look at our maintenance packages and get in contact.