Pagerduty Advance AI - Part I

In critical operations, the efficiency and accuracy of information gathering and analysis are paramount to ensure successful outcomes and timely decision-making. Currently, the process of collecting, analyzing, and disseminating information during these operations is cumbersome and time-consuming.

Entreprise AI, AI Agents

Entreprise AI, AI Agents

PagerDuty

PagerDuty

2023-2026

2023-2026


The problem

Responders often need to review hundreds of recent changes and pages and pages of log files which take minutes to hours depending on the complexity of the Incident.
There is often no guidance provided to the Incident responder on what to do next to fix the issue or reminders that a status update needs to be sent out to internal and external stakeholders with different levels of detail.
The Incident Management process is highly labor-intensive, yet our buyers and executives aim to reduce labor costs and the excessive percentage of tasks currently handled manually within Incident Management.


Context

When this initiative started, it was primarily an engineering-led effort, focused on prototyping ways to integrate GenAI into PagerDuty, assessing which LLMs to use, and exploring their ability to learn from our data sources and deliver relevant outputs.

Upon joining in June 2023, I collaborated with the team to identify where GenAI could add the greatest value, which interface would be most impactful (Slack vs. Web), and how the solution should function. Given 60% of our clients used the Slack extension, and Slack’s strong tooling for bot integration, we agreed to prioritize Slack.
I initiated our shared Figma file, which helped structure ongoing work and set foundational design standards.


My initial focus was on shaping the bot’s messaging style, visual identity, and answer evaluation mechanisms. Building an end-to-end prototype, I connected PagerDuty capabilities using a script developed by the team, proactively enhancing it with design improvements and additional flows. Throughout, I partnered closely with Principal PM for feedback and coordinated with Staff Engineers for technical alignment, while collaborating with our Frontend Engineer on Slack implementation.

By mapping capabilities to the user journey, my design work provided critical context for team discussions: clarifying how and when the bot could serve as a “buddy” to responders. This journey-centric approach guided implementation and sparked conversations about optimal feature integration.


Solution

  1. Map out the JTBD of the different user types involved. Scribe, Communication manager, Incident Commander, SREs and DevOps.
  2. Understand how they differ and how could we allow them to work as a team during critical incidents.


Challenges

  • Looking at existing analytics data to understand usage and identify trends.
  • Interviewing clients to understand what was their main Incident Management tool (chat vs web).
  • Concept testing, understand trade offs and where they were willing to do more work (managers vs responders).
  • Choosing a platform/surface to start.


Process

User interviews to understand what companies needed to be able to start exploring solutions.​

What info is more relevant for responders to start work on an incident?
What platforms do they use?
What type of interactions they are used to?
What tools they use for diagnostics?​
How do they communicate with the team?​




Writing a script of an end-to-end incident resolution process with a team of 4
(Incident Commander, SRE, Customer Liaison and Scribe)​

Goal: create an impactful storytelling to sell this opportunity internally and to our customers.


In November 2023, we set out to produce a sizzle video for Copilot (PagerDuty Advance) to highlight PagerDuty’s GenAI investment at the AWS conference. Despite a tight four-week timeline and minimal initial specifications, I jumped in with early design explorations for integrating Copilot into the web platform—an extension of concepts I’d previously piloted for event-to-resolution card testing.

Collaborating closely with our Principal PM during the first week, I developed a preview featuring content inspired by the Southwest Airlines meltdown. Over the following three weeks, I worked iteratively with the Brand and Marketing team and the an external Agency, refining the designs based on their direction. I proposed both dark and light UI chat concepts, ultimately merging elements from both: combining dark conversation bubbles with PagerDuty’s signature light interface. Midway through the project, a meeting with the agency sparked a discussion on moving PagerDuty’s UI toward dark mode.


I delivered early dark mode concepts for the Incidents and Incident details' page, creating a new grayscale palette to define key page sections without altering existing structures. I spent the next two days refining these pages, enhancing details such as charts on the Incidents page to visualize event impact on team metrics. I also updated the prototype in real time to reflect script changes and adapted UI interactions, including preparing new states for pills as the narrative evolved.

Final Video


Launching PagerDuty Advance EA

We enabled PagerDuty Advance for EA customers in November 2023.
Internally, Advance was enabled for every team under Product Development and started being tested by our Major Incidents team.


Results

From 0 to 68 Entreprise customers using PD Advance
Continuous discovery happening to improving existing capabilities
Expanding number of surfaces (MSTeams and Web)


More details about this project:

Building Agent Analytics Dashboard (Read Blog Article)

Worked with early adopters and PMs to understand how we could better communicate the value AI Agents could bring in their workflows.


Creating AI-UX Design Patterns (Read Blog Article)

As a Design lead for PagerDuty Advance I was responsible for scaling AI in our products, mentoring other designers and teams, and helping to improve design guidelines and the design system component for AI in our web interface.