temp_preferences_customTHE FUTURE OF PROMPT ENGINEERING

Monitoring & Alerting Strategy Designer

Designs production monitoring, alerting, SLO/SLI frameworks, runbooks, and observability stacks for engineering teams.

terminalclaudetrending_upRisingcontent_copyUsed 723 timesby Community

grafanamonitoringprometheusobservabilitySREsloalerting

claude

0 words

System Message

## Role & Identity You are a Senior Site Reliability Engineer with expertise in Prometheus, Grafana, Datadog, PagerDuty, and observability engineering. You design monitoring systems that alert on symptoms, not causes — and produce runbooks that enable anyone to resolve incidents at 3am. ## Task Design a comprehensive monitoring and alerting strategy for the provided system. ## Process 1. **SLI/SLO Definition** — Availability, latency (p50/p95/p99), error rate, throughput SLIs and targets. 2. **Error Budget** — Error budget calculation, error budget burn rate alerts. 3. **Golden Signals** — Latency, traffic, errors, saturation dashboards. 4. **Metrics Collection** — Prometheus scrapers, Datadog agents, custom metrics. 5. **Alert Design** — Alert on burn rate (not raw error rate), symptom-based alerts, avoid alert fatigue. 6. **Dashboard Design** — Service overview, drill-down dashboards, USE method for infrastructure. 7. **Log Alerting** — Log-based alert conditions, anomaly detection. 8. **On-Call Rotation** — Alert routing, escalation policy, team rotation design. 9. **Runbooks** — Runbook structure for each alert: what, why, how to investigate, how to resolve. 10. **Incident Management** — Incident severity tiers, response procedures, post-mortem template. ## Output Format ``` ## SLO Definitions ## Alert Rules (PromQL or equivalent) ## Dashboard Design ## Runbook Templates ## On-Call Policy ```

User Message

Design monitoring and alerting for: {&{SYSTEM_DESCRIPTION}}

About this prompt

## Monitoring & Alerting Strategy Designer Designs SLO-based monitoring with error budget burn alerts, golden signal dashboards, and runbooks — the observability stack that enables rapid incident response. ### Use Cases - Define SLI/SLO targets and error budget alerts for a new microservice before production launch - Design Prometheus/Grafana golden signals dashboards for a payment processing service - Create PagerDuty alert routing and escalation policy for a 24/7 production platform

When to use this prompt

check_circleDefine SLO/SLI targets and error budget burn rate alerts for a new microservice before production.
check_circleDesign Prometheus golden signals dashboards for a payment processing service under SLA obligations.
check_circleCreate PagerDuty routing and escalation policy for a 24/7 production SaaS platform on-call rotation.

signal_cellular_altadvanced

Latest Insights

Stay ahead with the latest in prompt engineering.

View blogchevron_right

How to Write System Prompts That Actually Work

Article

person Admin•schedule 5 min read

How to Write System Prompts That Actually Work

System prompts set the rules of the game for every AI interaction. This hands-on guide shows you exactly how to structure them for reliability and consistency.

Claude vs GPT-4o: Which Model Fits Your Use Case?

Article

person Admin•schedule 5 min read

Claude vs GPT-4o: Which Model Fits Your Use Case?

Choosing between Claude and GPT-4o is less about which is "better" and more about which fits your specific task. Here is a practical breakdown.

How Our Design Team Cut Brief-Writing Time by 70% with AI

Article

person Admin•schedule 5 min read

How Our Design Team Cut Brief-Writing Time by 70% with AI

A real-world case study on how a 12-person design team at a product agency standardised their creative brief process using prompt templates on PromptShip.

Why AI Hallucinations Happen (and How to Reduce Them)

Article

person Admin•schedule 5 min read

Why AI Hallucinations Happen (and How to Reduce Them)

Hallucinations are not bugs — they are a fundamental property of how language models work. Understanding why they happen is the first step to minimising them.

The State of AI Coding Assistants in 2026

Article

person Admin•schedule 5 min read

The State of AI Coding Assistants in 2026

From autocomplete to autonomous agents — AI coding tools have changed dramatically. Here is where things stand and what to expect next.

From Idea to Shipped Prompt: A Solo Founder's AI Workflow

Article

person Admin•schedule 5 min read

From Idea to Shipped Prompt: A Solo Founder's AI Workflow

One founder. No team. A dozen AI-powered tools and a tight prompt library. Here is the workflow that runs a bootstrapped SaaS doing $15k MRR.

Recommended Prompts

claudeshieldTrusted

bookmark

MCP Server Observability Engineer

Designs observability for MCP servers covering tool call tracing, latency metrics, error tracking, and usage analytics.

SRE Runbook & Incident Playbook Writer

Creates detailed SRE runbooks and incident playbooks covering detection, diagnosis, mitigation, and post-mortem for production services.

System Reliability & SLO Designer

Designs SLO frameworks covering SLI definition, error budget management, alerting policy, and reliability improvement process.

Observability Stack Design & Monitoring Strategy

Designs a complete observability strategy covering metrics, logs, and traces — with tool selection, dashboard design, alerting rules, and SLI/SLO definitions.

Web Vitals Real User Monitoring Setup

Implements a complete Real User Monitoring (RUM) pipeline for Core Web Vitals using the web-vitals library, custom performance marks, and dashboard-ready metric reporting.

Incident Post-Mortem Writer

Write a blameless post-mortem with timeline, contributing factors, customer impact, corrective actions, and durable systemic fixes using Google SRE's methodology.

star 0fork_right 298

bolt

pin_invoke