Why We Chose Go Microservices Over a Monolith for Workflow Monitoring

Published: February 2026 | Reading Time: 10 minutes

Should you start with a monolith or microservices? It's the architecture question that divides engineering teams. Amazon Prime Video famously cut infrastructure costs by 90% by moving from microservices to a monolith. Meanwhile, Monzo Bank runs over 1,500 microservices for real-time fraud detection.

The right choice depends entirely on context. For Monitrics—a distributed workflow monitoring platform—we chose microservices from the start. And we chose Go as the language to build them. Here's what drove those decisions.

Why Monitoring Platforms Suit Microservices

Not every application benefits from microservices. Many would be better served by a well-designed monolith. But monitoring platforms have characteristics that make service separation valuable.

The different concerns of a monitoring platform have fundamentally different scaling characteristics. Collecting metrics from thousands of endpoints requires horizontal compute capacity, but processing those metrics into alerts requires less compute and more state management. A monolith scales everything together; microservices scale each concern independently.

Storage requirements diverge as well. Workflow definitions are relational data—they fit naturally in PostgreSQL with foreign keys and transactions. Time-series metrics need specialized storage optimized for append-heavy workloads and time-range queries. Trying to serve both patterns from a single database architecture creates compromises in both directions.

The critical alerting path must remain available even when other system components fail. If the dashboard analytics service crashes, alerts still need to fire. Microservices provide this isolation naturally—a failing analytics service doesn't bring down the notification system. In a monolith, a memory leak in analytics code might crash the entire process, taking alerting down with it.

Multi-tenant platforms benefit from service boundaries as an additional isolation layer. Different tenants have different workloads and compliance requirements. Service separation provides natural points to enforce quotas, route traffic, and maintain audit boundaries.

Why Go Fits This Problem

With 85% of cloud-native companies prioritizing Go expertise according to recent surveys, we weren't alone in seeing Go's fit for distributed systems. But language choice should follow from requirements, not trends. Go happens to address our specific needs well.

Concurrency is a first-class concept. Go's goroutines and channels make concurrent programming approachable in ways that threads and locks don't. A worker pool processing thousands of health check results can be expressed in a few lines of clear, maintainable code. Goroutines start with tiny 2KB stacks and grow as needed, making it practical to run millions of them simultaneously. This matters when your system processes checks from many endpoints concurrently.

Single binary deployment simplifies operations. Go compiles to a statically linked binary with no runtime dependencies. The resulting container image can be scratch-based—literally empty except for the binary and CA certificates. Image sizes under 20MB reduce pull times and storage costs. There's no "which Python version" or "did we install the right packages" debugging. The binary runs or it doesn't.

Fast compilation enables rapid iteration. A hundred thousand lines of Go compiles in seconds. This keeps the feedback loop tight during development and CI pipelines fast. When you're iterating on complex distributed behavior, waiting minutes for builds kills productivity.

The standard library covers most needs. Go's built-in HTTP server is production-quality. JSON encoding is fast and correct. Context propagation for cancellation and timeouts is baked in. Database access, cryptography, compression—the standard library handles common infrastructure needs without requiring dependency selection and maintenance.

Error handling is explicit. Go's approach of returning errors rather than throwing exceptions makes error paths visible. You can't accidentally forget to handle an error; the compiler reminds you. For systems that must fail gracefully—like monitoring platforms—explicit error handling prevents the kind of silent failures that cause outages.

The Service Architecture

We organized Monitrics into six specialized services, each addressing a distinct concern.

The API service handles HTTP requests, authentication, and request routing. It's the gateway through which all external interaction flows. Separating it allows independent scaling for request volume without scaling compute-heavy workers.

The Scheduler service manages timing—parsing cron expressions, tracking when workflows should execute, and triggering them at the right moments. This service needs clock precision and distributed coordination to ensure workflows run exactly once, even when multiple scheduler instances exist for high availability.

The Executor service actually runs workflow steps. It processes HTTP checks, TCP connectivity tests, ICMP pings, DNS resolution, and browser-based checks. This is the most compute-intensive service and scales based on check volume and complexity.

The Executions service handles result processing and state management. Every check produces metrics that flow through this service into time-series storage. Separating write-heavy metrics ingestion from read-heavy API queries prevents them from competing for resources.

The Notifier service delivers alerts through various channels—email, Slack, PagerDuty, webhooks, and Telegram. Alert delivery has different reliability requirements than metrics processing; separating them ensures notification delivery remains robust even during metrics ingestion spikes.

Finally, an All-in-One service runs all workers together for simplified deployment. Not every deployment needs distributed infrastructure. Single-node deployments, development environments, and small-scale usage benefit from running everything in one process. The all-in-one option provides this flexibility without maintaining separate codebases.

Service Boundaries and Communication

Choosing where to draw service boundaries matters more than the number of services. Bad boundaries create distributed monoliths—all the complexity of microservices with none of the benefits.

We drew boundaries along domain lines following Domain-Driven Design principles. The workflow definition domain handles creating, modifying, and versioning workflows. The execution domain handles running workflows and managing their state. The notification domain handles alert routing and delivery. Each domain has clear responsibilities and minimal coupling to others.

Services communicate through a combination of synchronous REST APIs and asynchronous message queues. REST works for request-response patterns where the caller needs immediate feedback. Message queues work for fire-and-forget patterns where reliability and decoupling matter more than immediacy.

The Scheduler communicates with Executor through a message queue. When it's time to run a workflow, Scheduler puts a message on the queue. Executor workers pull from the queue and process checks. This decoupling means Scheduler doesn't care how many Executor instances exist or whether any are currently available—the queue buffers work until workers are ready.

Notification delivery uses the same pattern. When a check fails and meets alert conditions, a message goes to the notification queue. Notifier workers pull messages and deliver alerts through configured channels. If Notifier is temporarily overloaded, messages wait in the queue rather than being lost.

The Tradeoffs We Accepted

Microservices aren't free. We accepted specific tradeoffs in exchange for the benefits.

Operational complexity increased. Instead of deploying one thing, we deploy six. Each needs its own monitoring, alerting, and capacity planning. Service-to-service communication adds network calls that can fail. Distributed tracing became essential rather than optional—without it, debugging issues across service boundaries is nearly impossible.

Testing got harder. Unit tests work the same, but integration tests must account for multiple services. End-to-end tests require spinning up the full system. We invested heavily in test infrastructure to keep confidence high despite the complexity.

Data consistency requires explicit handling. A monolith with a single database gets transactions for free. Distributed services need saga patterns, eventual consistency, or careful API design to maintain data integrity. We chose eventual consistency where appropriate and explicit coordination where necessary.

The learning curve steepened. New engineers need to understand not just the code, but the service architecture, communication patterns, and failure modes. Documentation and onboarding materials became more important.

These tradeoffs were worth it for our specific situation. They wouldn't be worth it for everyone.

When to Choose Differently

Our choice fits our context. Your context might be different.

Stay with a monolith if your team has fewer than ten engineers. Microservices require operational investment that small teams can't afford. The coordination overhead exceeds the benefits. If your application is straightforward CRUD, microservices add complexity without providing value. If you're prototyping and need to ship fast, a monolith iterates more quickly.

Consider microservices if team size exceeds fifteen and Conway's Law is working against you—organization structure tends to mirror architecture, and fighting it creates friction. If different components have different scaling needs, if you need independent deployment cycles, or if you're using multiple database technologies for different purposes, service separation starts making sense.

The migration path matters. If you start monolithic, you can extract services later as pain points emerge. Starting with microservices and merging them is harder. When in doubt, start simpler and let complexity emerge from actual needs rather than anticipated ones.

What We Learned

A few lessons emerged from building this system.

Service isolation works as intended. When the analytics processing backed up during a traffic spike, alerting continued unaffected. The separation paid dividends during our first major incident.

Go's simplicity enabled rapid development. New team members become productive quickly. The lack of framework magic means understanding the code requires reading the code, not studying framework documentation.

Single binaries simplified deployment dramatically. Container images are tiny. Startup is fast. There's no runtime to configure or debug. The operational simplicity exceeded our expectations.

Distributed tracing became non-negotiable. We added tracing early, and it proved essential for debugging. Without it, understanding request flow across services would require correlating logs manually—a nightmare we avoided.

The Bottom Line

Go microservices aren't always the right choice—but for a distributed workflow monitoring platform, they provided the isolation, scalability, and operational characteristics we needed.

The combination of Go's performance, simplicity, and deployment model makes it well-suited for building services that need to be reliable, observable, and scalable. For Monitrics, this architecture enables processing many workflow executions while maintaining fast alerting response times.

Technology choices should follow from requirements, not fashion. We chose Go and microservices because they fit our specific problem. Different problems deserve different solutions.

Curious about Monitrics' architecture? Our platform provides distributed workflow monitoring with multi-step health checks across multiple regions. Learn more at Monitrics.

Why We Chose Go Microservices Over a Monolith for Workflow Monitoring

Why We Chose Go Microservices Over a Monolith for Workflow Monitoring

Why Monitoring Platforms Suit Microservices

Why Go Fits This Problem

The Service Architecture

Service Boundaries and Communication

The Tradeoffs We Accepted

When to Choose Differently

What We Learned

The Bottom Line

Related Articles

From Cron to Distributed: Scaling Scheduled Workflows Beyond a Single Server

Multi-Region Health Checks: Why Your Uptime Monitoring Needs Geographic Diversity

Building Real-Time Dashboards with TanStack Query and WebSockets