How to debug fire?
Take a deep breath. More calm you are, better you can focus on real problem. Oxygen is necessary to make good decisions.
![]()
I dont mean to say its time to start meditating, just meant "Keep calm and nerd on"
Before we begin, we need to understand:-
How does POWR stay on?​
- We have registered our domain with Gandi.net, where we have configured our nameservers to point to cloudflare.
- Cloudflare is our DNS provider, meaning - all our A, CNAME, SPA, MX records, and anything and everything that has to do with DNS is configured within Cloudflare.
- We also use Cloudflare for CDN and some Security.
- Cloudflare first receives http request coming to powr.io, and redirects to Heroku when needed (if its not CDN cached).
- Heroku is where we host our rails server (among other things)
- We have also configured heroku to Auto Scale.
- Amazon RDS Postgres is what we use for our database
- Compose.com is what we use for redis
- We use Sidekiq for background jobs
- We use heroku's scheduler for cron jobs.
- We use Sparkpost for our transactional emails (some marketing)
- Most of our marketing emails are triggered within Hubspot.
- Braintree processes our internal payments (for Pro/Subscriptions)
Here are few things you should do:-
Identify the problem​
- Is this happening on one or multiple app types?
- Can you re-produce similar issue on production?
- Were there recent production deploy, or changes on data that may have caused this issue?
- Is this happening because of one, or few apps (eg: few apps getting too many responses, some spammy user trying to steal bitcoins from random users, or something similar to that nature?)
How to get started?​
Note:- YOU are highlighted for a reason. Its because YOU can do this, its easier than YOU think.
- Start with Heroku, take a look at metrics there, does it look unusual?

- Move onto NewRelic -> Summary
- Toggle last 30 mins, 60 mins, 3 hours etc.
- Take a look around Web transactions time, throughtput and error rate
- More you familiarize yourself with NewRelic, easier it gets, its built by developers for developers like YOU.

- If you notice transaction time > 200ms, throughtput climbing, and/or error rates more than 2.5% we are about to get into some trouble.
- If you see all, most or some of the above happen, its time for YOU to be a firefighter.
- Try to understand whats happening, what does those graph mean, what is causing those errors. Few tabs/sidebar menus that will come handy to undestand these are, Transactions and Errors (one left hand side)