
Time to read :
1 min read
Most companies talk about deploying AI. Spotify actually did it, at a scale most engineering teams cannot imagine, and the results changed how their best developers work every single day.
Spotify deployed multiple AI agent systems including their internal coding agent platform (Honk), a multi-agent advertising platform (Ads AI), and an agentic developer portal (Backstage). Together, these systems handle over 50% of all pull requests, reduce manual code work dramatically, and power complex business decisions across channels. The architecture uses specialized agents working in parallel, shared context layers, and LLM orchestration built on top of existing infrastructure.
Key Takeaways
Spotify's Honk system processes more than 1,500 AI generated pull requests and automates roughly half of all PRs merged since mid-2024
Their multi-agent advertising platform (Ads AI) uses Google's Agent Development Kit and Vertex AI to run specialized agents in parallel
Spotify replaced deterministic code migration scripts with LLM prompt-based agents inside their Fleet Management system
Senior engineers at Spotify stopped writing code manually in December 2024 and supervise AI output instead
The Backstage developer portal is evolving from a human-facing tool to an agent-first platform using MCP connections
Enterprise AI agent deployment at this scale requires role-specific agents, shared context, and strict guardrails, not one giant AI model
Any company with duplicated business logic across channels is a prime candidate for the same agentic approach
Why Spotify's AI Agent Story Matters for Enterprise Teams
Spotify is not a small startup experimenting with AI chatbots. They are a company with hundreds of millions of users, thousands of software repositories, and an engineering team that moves fast. When they say AI agents changed how they build software, that means something real.
The honest truth is: most enterprise AI deployments fail not because the AI is bad, but because companies bolt AI onto existing workflows without rethinking the architecture. Spotify did the opposite. They rebuilt the decision layer, the workflow layer, and the developer tooling layer. All three. That is the actual lesson here.
For developers and engineering leaders searching for practical guidance on enterprise AI agent deployment, this case study pulls directly from Spotify's own engineering blog and public statements. No speculation. Just what they actually built.
What Is Honk and How Does Spotify Use It for AI Coding Agent Deployment
Honk is Spotify's internal platform that connects AI coding agents directly to their engineering workflow through Slack. An engineer sends a message in Slack describing what they need. The agent reads it, writes or modifies code, opens a pull request, and sends the result back through Slack for human review. The engineer can approve and merge the update before they even get to the office.
This is not a prototype. Since mid-2024, roughly half of all pull requests at Spotify have been generated by automated systems. By late 2025, Honk had processed more than 1,500 merged AI-generated pull requests. Spotify's chief technology officer publicly stated that their most senior developers stopped writing code manually in December 2024. They now review and approve AI-generated code instead.
How Honk Connects to Fleet Management
Before Honk, Spotify used a system called Fleet Management to apply code transformations across all their repositories. Think of it as a way to run the same small code change across thousands of services at once. This worked well for simple tasks like updating dependency versions or swapping out deprecated methods.
The problem was complexity. Writing the transformation scripts required deep specialized knowledge. One automated Maven dependency updater script grew to over 20,000 lines of code just to handle edge cases. Fleet Management was powerful but hard to scale to complex changes.
Spotify solved this by replacing the deterministic transformation scripts with LLM-powered agents. Engineers now write a natural language prompt describing the change they want. The agent reads the prompt, understands the codebase context, and generates the transformation. The rest of Fleet Management including repository targeting, pull request creation, review, and merging stays exactly the same.
This is a clean example of LLM deployment strategy done right: augment an existing system's weakest point with AI, keep everything else stable, and measure results. Spotify did not rewrite Fleet Management. They replaced one step in the pipeline with an agent.
Fleet Management: Before and After AI Agents
Aspect | Before AI Agents | After AI Agents (Honk) |
Code transformation method | Hand-written scripts (AST, regex) | Natural language prompt to LLM agent |
Complexity ceiling | Simple, repeatable tasks only | Complex multi-file code migrations |
Expertise needed | High (specialized script writers) | Lower (engineers write prompts) |
Pull requests automated | Limited scope | ~50% of all merged PRs |
Speed | Days per migration | Runs in background automatically |

Spotify's Multi Agent Architecture for Advertising: A Real World AI Orchestration Case
The second major AI agent deployment at Spotify is less talked about but arguably more technically ambitious. Their Ads AI platform uses a multi-agent architecture to handle the full media planning workflow for advertisers.
Here is the problem they were solving: Spotify sells ads in multiple ways including direct sales, self-serve, and programmatic. Each of these channels had its own workflow, its own decision logic, and its own automation. On the surface the infrastructure was shared. In practice, the behavior was not. The same core decisions about budget allocation, inventory selection, and reach targets were being re-implemented separately in each channel. Over time they drifted apart and became inconsistent.
The standard engineering fix would be to build a new backend service with a clean state machine. Spotify decided that approach would not work because the decisions involved are too combinatorial. Planning, forecasting, audience selection, creative guidance, pacing, and optimization all depend on who the advertiser is, what inventory is available, and what the business priorities are. You cannot hard-code that into a state machine.
So they built an agentic platform instead.
How the Multi Agent Ad System Actually Works
The system is called Ads AI. It uses Google's Agent Development Kit (ADK) and Vertex AI as the foundation. The architecture decomposes the media planning workflow into specialized AI agents that work in parallel.
An advertiser types their campaign requirements in natural language. Something like: maximize reach in Brazil, protect video inventory, and still hit a certain return on spend target. The system takes that goal and routes it across multiple specialized agents simultaneously.
Each agent handles one specific domain. A MediaPlannerAgent queries and ranks historical performance data. Another agent handles inventory availability. Another handles audience selection. They all share context and signals from the same underlying data layer. When they finish, the orchestration layer assembles the results into a unified media plan.
This is the real shift in large-scale AI architecture: instead of one big model trying to do everything, you have many small focused agents doing one thing well, running in parallel, and combining their outputs. The result is faster, more accurate, and much easier to debug and maintain.
Why This Architecture Model Transfers to Other Industries
Spotify built this for ad tech but the pattern solves a much broader problem. Any company with duplicated decision logic across channels is a candidate for this approach. A bank with different risk scoring on mobile versus web versus branch. An e-commerce company with different recommendation engines for search, homepage, and email. A healthcare platform with different triage logic for different patient intake paths.
The question to ask is: are we answering the same business question in multiple places? If yes, an agentic platform can centralize that decision-making and project it everywhere consistently.
Backstage Becomes an Agent First Platform: The Next Phase of Spotify AI Deployment
Backstage started as Spotify's internal developer portal. Engineers used it to discover services, check documentation, manage infrastructure, and understand ownership of different parts of the codebase. It was very much a human-facing tool.
That is changing fast. Spotify is now evolving Backstage into an agent-first platform. Instead of engineers manually navigating the portal to find information or trigger actions, agents connect to Backstage through Model Context Protocol (MCP) connections. An agent can query service ownership, check deployment status, and trigger infrastructure changes without a human clicking through a UI.
This matters a lot. It means the developer portal is becoming a tool not for humans to use but for agents to use on behalf of humans. The human sets the goal in natural language. The agent navigates Backstage, gathers context, takes action, and reports back.
How Spotify Designs Its AI Agents: Architecture Principles Worth Copying
Spotify did not just pick an AI model and start using it everywhere. They were careful about how each agent was designed. Looking across their systems, there are clear patterns that any engineering team can learn from.
Principle 1: Each Agent Has One Job
Whether it is the coding agent in Honk or the MediaPlannerAgent in Ads AI, every agent has a specific well-defined role. The workflow agents handle ambiguity and gather context. The coding agent handles execution. The review agents handle verification. None of them try to do all three.
This is the most important design principle in autonomous agent use cases at tech companies. When an agent does too many things, it fails in ways that are hard to debug. When it does one thing well, failures are isolated and fixable.
Principle 2: Existing Infrastructure Stays, Agent Replaces One Step
In Fleet Management, the pull request infrastructure, repository targeting, review process, and merge logic all stayed the same. The agent replaced only the code transformation step. In Ads AI, existing APIs for inventory, forecasting, and booking all stayed the same. Agents use those APIs as tools rather than replacing them.
This is how enterprise AI agent deployment should work. Not a rip and replace. A targeted augmentation of the step that needs the most intelligence.
Principle 3: Observability Must Be Built In
Spotify built custom tooling for observability from the start. Their internal CLI for the coding agent captures traces in MLflow, uploads logs to Google Cloud Platform, and uses LLMs as judges to evaluate diffs before they are submitted. For the ads system, observability means answering not just traditional metrics like latency and error rates but behavioral questions like what did the agent decide and why.
If you cannot explain what an agent did and why, you cannot trust it in production. Spotify solved this before scaling.
Principle 4: Guardrails on Semi-Autonomous Decisions
No agent at Spotify operates without human oversight. The coding agent opens pull requests. A human reviews and merges. The ads agents generate plans. A human or senior automated check validates them. The Backstage agents act on behalf of engineers but within defined permission scopes.
This is what separates real-world autonomous agent deployment from demos. Guardrails are not a limitation. They are what makes autonomous agents safe enough to trust with real work.
Spotify AI Agent Roles at a Glance
Agent System | Agent Role | Technology Used | Human Oversight |
Honk (Fleet Mgmt) | Code transformation via prompt | Claude Code, MCP, GCP, MLflow | PR review before merge |
Honk (Slack) | Remote code fix / feature add | Claude Code via Slack | Engineer approves in Slack |
Ads AI Orchestrator | Route campaign goals to agents | Google ADK, Vertex AI | Planner reviews media plan |
Ads AI (Media Planner) | Query and rank historical data | Google ADK, Vertex AI | Part of plan review |
Ads AI (Audience) | Audience selection logic | Google ADK, Vertex AI | Part of plan review |
Backstage Agents | Navigate dev portal, trigger actions | MCP connections | Engineer sets goal and approves |
The Tech Stack Behind Spotify's Agent AI Deployment
Understanding the actual tools Spotify uses helps cut through the hype. Here is what is powering their enterprise AI agent deployment:
Claude Code (Anthropic) for the Honk coding agent
Slack as the conversational interface for triggering agents
Model Context Protocol (MCP) for connecting agents to internal tools and the Backstage portal
Google Agent Development Kit (ADK) for the Ads AI multi-agent platform
Vertex AI (Google Cloud) for hosting and running the advertising agents
MLflow for tracing and evaluating agent behavior
Google Cloud Platform (GCP) for log storage and infrastructure
Fleet Management system (Spotify internal) as the pull request orchestration backbone
What Results Did Spotify Actually Get From AI Agent Deployment
Numbers matter. Here is what Spotify has reported publicly:
Metric | Result |
Pull requests automated (since mid-2024) | Approximately 50% of all merged PRs |
AI-generated PRs from Honk agent | Over 1,500 merged pull requests |
New features shipped in 2025 | More than 50 features and changes |
Senior engineers writing manual code | Stopped in December 2024 |
Advertiser plan generation speed (Ads AI) | Optimized plans generated in seconds vs hours |
Annual Wrapped engagement | 300 million+ users, 630 million social shares |
These are not estimates. These came from Spotify's earnings calls, engineering blog, and CTO statements. The numbers are real.
The Real Opinion: What Most Companies Get Wrong About AI Agent Deployment
Here is an uncomfortable take: most enterprise AI pilots fail not because the technology is not ready, but because companies treat AI agents like glorified chatbots. They deploy one model behind one endpoint. They add a prompt helper to an existing UI. They call it AI transformation.
That is not what Spotify did. Spotify asked a harder question: what is the actual structural problem in how this system works? For Fleet Management the problem was complexity of transformation scripts. For advertising the problem was duplicated decision logic drifting across channels. For Backstage the problem was that humans were the only ones who could navigate it.
Once they found the structural problem, they designed the agent to fix that specific thing. Not everything. One thing.
Most teams build AI features. Spotify built an AI platform. That is the difference between a company that uses AI and a company that is transformed by it. And the gap between those two states is not the quality of the AI model. It is the quality of the architecture decision.
Any engineering team that wants to replicate this needs to start by identifying where their biggest bottleneck is. Not where AI sounds impressive. Where the actual work is slow, inconsistent, or duplicated. That is your starting point.
How to Apply the Spotify Agent AI Deployment Model to Your Own Company
This is not just a story about Spotify. The architecture patterns here apply to any company with complex workflows, duplicated business logic, or large codebases. Here is a simplified path:
Step | What To Do | Spotify's Equivalent |
1 | Identify duplicated decision logic across channels or teams | Ads logic re-implemented per channel |
2 | Find the bottleneck step in an existing automated pipeline | Complex code transformations in Fleet Management |
3 | Define specialized agent roles with single responsibilities | Workflow agent, Coding agent, Review agent |
4 | Use existing APIs as tools, do not replace infrastructure | Ads APIs used as tools by agent layer |
5 | Build observability and tracing before you scale | MLflow traces, LLM-as-judge evaluation |
6 | Keep humans in the loop with clear approval checkpoints | PR review, plan review, Slack approvals |
Final Thoughts on the Spotify Agent AI Deployment Case Study
Spotify's story is not really about music or podcasts. It is about what happens when an engineering organization stops asking what AI feature should we add and starts asking where does our system break down and how can an agent fix that specific thing.
The results are real. Half of all pull requests automated. Senior engineers freed from writing code. A unified advertising decision engine running across all buying channels. A developer portal that agents can use without human navigation. These are not future predictions. They are already running in production.
For any company thinking about enterprise AI agent deployment, the roadmap is here. Start with your biggest structural problem. Design an agent for one step. Keep humans in the loop. Measure everything. Then scale.
That is the blueprint. The question is whether your team is ready to follow it.
About Deliverables Agency
Deliverables Agency is an AI and software development company helping businesses build real AI systems that solve real problems. From custom AI agent development to RAG pipelines and automation workflows, the team at Deliverables Agency turns complex AI architecture into working products.
Stop Testing AI. Start Using It Like Spotify
Spotify moved beyond experiments and built AI agents that handle real production work across engineering and business systems. Deliverables Agency helps you do the same designing practical agent systems that remove bottlenecks, reduce manual effort, and scale your operations without adding complexity.
Some Topic Insights:
What is the Spotify Honk system?
Honk is Spotify's internal AI coding agent platform. It connects Claude Code to Spotify's engineering workflow through Slack. Engineers send natural language instructions via Slack, the agent writes or edits code, opens a pull request, and the engineer reviews and merges it. It is a background coding agent that runs without interrupting normal engineering workflows.




