How to Set Up Outage Alerts for Any Service
A practical guide to setting up outage alerts for the SaaS services your business depends on. Covers alert types, monitoring methods, routing, and how to avoid alert fatigue.
You find out Slack is down because a coworker texts you. You discover your payment processor is broken because a customer emails your personal address. You learn your analytics platform had a four-hour outage last night by reading about it on Twitter the next morning.
If this sounds familiar, you do not have outage alerts set up properly. And you are not alone. Most teams rely on discovering vendor outages through the symptoms rather than through proactive notification. That means lost time, lost revenue, and a scramble that did not need to happen.
This guide walks through how to set up outage alerts for any service your business depends on. We will cover the types of alerts available, what to monitor, how to route alerts to the right people, and how to keep alert fatigue from making the whole system useless.
Why Outage Alerts Matter
The gap between when a vendor goes down and when your team finds out is where damage happens. Customers hit errors and leave. Orders fail silently. Internal workflows stall while everyone assumes things are fine.
Automated outage alerts close that gap. Instead of discovering problems through their consequences, you get notified within minutes. That gives you time to activate workarounds, communicate with customers, and start your vendor outage response playbook before the impact compounds.
According to PagerDuty's incident management documentation, the single biggest factor in reducing mean time to resolution is reducing mean time to detection. Alerts are how you get detection time close to zero.
Types of Outage Alerts
Not all alerts are created equal. The right type depends on who needs to know, how urgently they need to know, and what action they need to take.
Email Alerts
Email is the most common alert channel and the easiest to set up. Most vendor status pages offer email subscriptions. You provide your email address, and you receive a message whenever the vendor posts a status update.
Best for: Non-urgent awareness, record-keeping, teams that need a paper trail of vendor incidents.
Limitations: Email is slow. Most people do not check email in real-time. If your checkout flow is down at 2 AM, an email alert will sit unread until morning.
Slack and Teams Messages
Chat-based alerts deliver notifications directly into the channels where your team is already working. Many monitoring tools integrate with Slack and Microsoft Teams natively.
Best for: Teams that live in chat during working hours. Alerts that need quick acknowledgment but not necessarily an immediate phone call.
Limitations: If Slack itself is the service that goes down, your Slack-based alerts are useless. Always have a backup channel.
SMS and Phone Alerts
SMS and phone calls cut through the noise. They reach people who are away from their desk, asleep, or not actively watching a screen.
Best for: Critical services where downtime directly affects revenue or customers. On-call engineers who need to respond outside business hours.
Limitations: SMS alerts are interruptive by design. Overusing them leads to alert fatigue faster than any other channel.
Webhook Alerts
Webhooks send a structured HTTP request to a URL you control whenever an event occurs. This is the most flexible alert type because you can build any downstream behavior you want.
Best for: Triggering automated responses, feeding data into custom dashboards, integrating with incident management platforms like PagerDuty or Opsgenie.
Limitations: Requires technical setup. Someone needs to build and maintain the endpoint that receives the webhook.
What to Monitor
Setting up the right alert channels is only half the equation. You also need to decide what you are monitoring.
Vendor Status Pages and RSS Feeds
Every major SaaS vendor publishes a status page. Most of these pages offer RSS feeds or email subscriptions for updates. Subscribing to these is the lowest-effort way to get outage alerts.
The downside is that vendor status pages are self-reported. Vendors control what appears on their status page and when. Some vendors are slow to acknowledge issues. Others define "outage" narrowly enough that degraded performance does not count. Our guide on how to check if a service is down covers the gap between vendor-reported status and actual status in more detail.
Third-Party Monitoring Tools
Third-party monitoring gives you an independent view of vendor availability. Tools like Is That Down check vendor status from the outside and alert you based on what they observe, not what the vendor reports.
This catches situations where a vendor's service is clearly degraded but their status page still shows green. It also gives you a head start on alerts, since third-party tools often detect issues before the vendor acknowledges them.
API Endpoint Monitoring
If your application integrates with a vendor's API, you can monitor those specific endpoints directly. This tells you not just whether the vendor is up in general, but whether the specific functionality you depend on is working.
For example, you might use a payment processor for both charges and refunds. The vendor's status page might show "operational" because charges are working, but the refund endpoint could be returning errors. Monitoring the specific endpoints you use catches this.
Synthetic Checks
Synthetic monitoring runs scripted transactions against a service at regular intervals. Instead of just pinging an endpoint, a synthetic check walks through an actual workflow: log in, perform an action, verify the result.
This is the most thorough form of monitoring because it catches issues that a simple health check might miss. A login page might load fine (health check passes) while the authentication API behind it is returning errors (synthetic check fails).
Routing: Who Gets Which Alerts
Sending every alert to everyone is a fast path to alert fatigue. The person who manages your social media tool does not need a 3 AM phone call when your CI/CD pipeline goes down. The engineer on call does not need an email every time your marketing analytics tool has a minor blip.
Build an Alert Routing Matrix
Map each vendor to the team or individual who needs to know when it goes down, and which channel they should receive the alert on.
Example Alert Routing Matrix
| Vendor | Severity | Alert Channel | Primary Contact | Escalation | |--------|----------|--------------|----------------|------------| | Stripe | Critical | SMS + Slack #incidents | On-call engineer | Engineering manager (15 min) | | Slack | High | SMS + Email | IT lead | CTO (30 min) | | SendGrid | Medium | Slack #ops | Marketing ops | Marketing director (1 hr) | | Google Analytics | Low | Email | Marketing team | None |
The routing matrix makes alert setup systematic rather than ad hoc. It forces you to think about severity levels before an incident, not during one.
Severity-Based Routing
Not every outage deserves the same response. Define severity levels based on business impact, not vendor importance.
Critical: Core revenue or customer-facing functionality is broken. Customers cannot complete purchases, access their accounts, or use primary features. Route to SMS and phone.
High: Important functionality is degraded or a widely-used internal tool is down. Route to Slack and SMS.
Medium: Non-critical functionality is affected or a secondary tool is down. Route to Slack.
Low: Minor degradation with minimal user impact. Route to email or a low-priority Slack channel.
Preventing Alert Fatigue
Alert fatigue is real and it kills monitoring programs. When people receive too many alerts, or too many alerts that do not require action, they start ignoring all of them. The critical alert gets buried in a stream of noise.
Opsgenie's documentation on alert fatigue reports that teams receiving more than 100 alerts per week experience significantly slower response times. The alerts do not help if nobody is reading them.
Set Meaningful Thresholds
Do not alert on every minor status change. A vendor posting "investigating increased error rates" and then resolving it five minutes later does not need to wake anyone up. Set thresholds that match your business impact.
For API monitoring, alert on sustained errors rather than individual failures. A single timeout is noise. Five consecutive timeouts in a row is a signal.
Deduplicate and Group
If a vendor is experiencing a prolonged outage, you do not need a separate alert every time they post an update. Group related alerts into a single incident notification with updates, rather than firing a new alert for each status change.
Use Escalation Policies
Start with a low-urgency notification. If the alert is not acknowledged within a defined window, escalate to a more urgent channel or a different person.
For example: send a Slack message first. If nobody acknowledges it within 15 minutes, send an SMS. If nobody acknowledges the SMS within 10 minutes, call the engineering manager. This approach keeps routine alerts quiet while making sure critical issues get human attention.
Review and Tune Regularly
Every month, review your alert history. Which alerts led to action? Which were ignored? Which fired repeatedly for non-issues? Tune your thresholds and routing based on actual patterns, not assumptions.
Testing Your Alerts
An alert system you have never tested is an alert system you cannot trust. Before you rely on your outage alerts in production, verify that they work.
Run a Fire Drill
Pick a vendor and simulate what happens when it goes down. Does the alert fire? Does it reach the right person? Does that person know what to do next? Walk through the entire chain from detection to response.
Verify Every Channel
Send a test alert through each channel: email, Slack, SMS, webhook. Confirm that delivery works, formatting is correct, and the message contains enough context to be useful. An SMS that says "Alert: incident detected" with no other context is not actionable.
Test During Off-Hours
If your alerts are supposed to reach people at 2 AM, test them at 2 AM. Do Not Disturb settings, phone silencing, and overnight email batching can all prevent alerts from reaching their target. Find out now, not during a real incident.
Document Your Alert Configuration
Keep a record of what is monitored, which channels are configured, who receives which alerts, and when the configuration was last tested. This documentation is invaluable when someone leaves the team or when you need to onboard a new service.
Alert Testing Checklist
- [ ] All critical vendors have at least two alert channels configured
- [ ] SMS alerts reach the on-call person during off-hours
- [ ] Slack alerts post to the correct channels
- [ ] Webhook endpoints are responding and processing payloads
- [ ] Escalation policies trigger correctly when alerts go unacknowledged
- [ ] Alert messages contain vendor name, severity, and a link to more details
- [ ] At least one alert channel works even if Slack/email is the service that is down
Putting It All Together
A solid outage alert setup has four layers: monitoring (what you watch), channels (how you get notified), routing (who gets notified), and testing (making sure it all works).
Start with the vendors that matter most. Identify your top five critical dependencies using a SaaS dependency map and set up alerts for those first. You can expand coverage over time, but getting the critical services covered immediately is what matters.
Pair your alerts with a vendor monitoring strategy and a documented response process. Alerts without a response plan just give you faster notification of problems you are not prepared to handle. For your own infrastructure, uptime monitoring gives you the same early-warning capability.
The goal is simple: when a service your business depends on goes down, the right person knows about it within minutes and knows exactly what to do.
References
- PagerDuty - Incident Management Guide - Covers alert routing, escalation policies, and reducing time to detection.
- Opsgenie - Understanding Alert Fatigue - Research and strategies for managing alert volume and preventing desensitization.
Testing your alerts once is not enough. Teams change, tools change, and vendor integrations change. Schedule a quarterly alert audit to make sure your notifications still reach the right people through the right channels.
Monitor your vendors without the setup hassle
Is That Down tracks the status of your critical SaaS vendors and sends alerts when they go down. No API configuration, no synthetic checks to maintain.
Try Is That Down