System design philosophy: how I approach building production systems

Published: 2026-01-19 · Tags: System Design, Architecture, Engineering Practice

Over time, I have converged on a small set of principles that guide how I design systems. They are not about specific technologies. They are about managing complexity, failure, and change.

1) Make trade-offs explicit

There is no universally “good” architecture. There are only architectures that optimise for specific constraints: latency, throughput, cost, organisational structure, regulatory environment, and expected rate of change.

My first design task is always to surface these constraints and make trade-offs visible. When trade-offs are explicit, systems evolve coherently. When they are implicit, systems accrete contradictions.

2) Design contracts before components

Systems fail less often at the level of algorithms than at the level of assumptions between parts. I therefore prioritise:

  • clear data contracts and schemas,
  • explicit API semantics,
  • documented failure behaviour,
  • versioning strategies from day one.

Components can be rewritten. Contracts are much harder to change. Designing them carefully is high-leverage work.

3) Treat failure as a design input

Production systems are defined more by how they fail than how they work when everything is healthy. I design for:

  • partial availability,
  • degraded modes,
  • bounded blast radius,
  • fast detection and recovery.

This usually leads to simpler, more composable architectures than designs optimised only for the happy path.

4) Observability is part of the product

If a system cannot be understood while it is running, it cannot be operated safely. I treat metrics, logs, traces, and domain-level signals as first-class deliverables.

Good observability shortens feedback loops, reduces incident duration, and allows teams to evolve systems with confidence.

5) Prefer evolutionary architectures

I aim for designs that:

  • can be deployed incrementally,
  • allow coexistence of versions,
  • support safe backfills and reprocessing,
  • and tolerate organisational change.

This usually means slightly more discipline up front, and significantly less pain over the system’s lifetime.

6) Optimise for long-term engineering effectiveness

The output of an engineering organisation is not systems. It is its sustained ability to change systems.

I therefore care deeply about:

  • tooling and automation,
  • clear ownership and interfaces,
  • documentation that reflects reality,
  • and reducing cognitive load.

A practical summary

In practice, this philosophy leads me to build systems that are:

  • contract-driven,
  • failure-aware,
  • observable by default,
  • and structured for continuous evolution.

The technologies change. These constraints do not.


Related articles: Production ML in practice · Event-driven systems in practice

If this resonates with how your team builds systems, you can reach me via Contact.