Work

My work focuses on designing and building production systems that sit at the intersection of software engineering, data platforms, and applied machine learning. I have worked on backend and distributed systems, event-driven pipelines, and ML-enabled services where the primary challenges are reliability, observability, and long-term evolvability rather than one-off delivery. Across both industry roles and personal projects, I tend to own systems end-to-end: from data ingestion and modelling, through service design and deployment, to operational tooling and iteration loops. This page highlights selected examples of that work.

Project portfolio

Below are concise project summaries of selected work.

Production ML systems — applied machine learning in industry

Context: Worked on ML-enabled production systems where the primary challenges were not model training, but operating reliable, observable, and evolvable ML components within larger backend platforms. The systems involved data pipelines, real-time and batch scoring, and multiple downstream consumers.

Approach: Focused on defining clear data contracts, ensuring training/serving parity, building deployment and rollback mechanisms for models, and adding observability for both system and model health. Designed components to tolerate partial failure, support replay, and evolve safely over time.

Impact: Improved production stability and diagnosability of ML services, reduced operational incidents, and enabled teams to iterate on models and features with significantly lower deployment risk.

Stack: Python, production data pipelines, ML frameworks, cloud infrastructure, monitoring and alerting tooling.

Coffee Diff — end-to-end data platform and discovery application

Context: Designed and built a production web application that continuously collects and normalises product data from 200+ specialty coffee roasters, transforming heterogeneous sources into a unified dataset with search and discovery capabilities.

Approach: Implemented automated data ingestion from multiple APIs and web sources, with normalisation pipelines to standardise attributes and classifications. Built the application as a cloud-hosted service with recovery mechanisms, structured logging, and historical snapshotting of data changes. The system was designed to tolerate partial failures, support reprocessing, and provide operational visibility.

Impact: Created a continuously updated dataset and application that enables structured exploration of a fragmented market. Established a foundation for longitudinal analysis of trends in specialty coffee offerings. The platform also supported an internal research paper analysing statistical properties of the collected data.

Stack: Python, Java, API integrations, data normalisation pipelines, cloud infrastructure, Nginx, reverse proxy and TLS, structured logging with archival to object storage (S3), and web application tooling.

Visit coffeediff.co.uk