Writing

Notes on software engineering, applied machine learning, and building reliable production systems. I write to document trade-offs, patterns, and lessons learned.

Latest

Production ML in practice: what matters after the model is trained

Published: 2026-01-19 · Tags: Production ML, Reliability, Systems

Training a model is usually the easy part. This article covers the production fundamentals that keep ML systems reliable after deployment: data contracts, deployment discipline, observability, failure handling, and iteration loops.

This list is still in flux. It may change to "blog-like" format later (RSS feed, pagination, tag pages).

Topics

Distributed systems

Event-driven architectures, scaling considerations, failure modes, and pragmatic trade-offs.

Production ML

Model lifecycle, data quality, monitoring, evaluation loops, and operational patterns.

Engineering effectiveness

Tooling, automation, CI/CD, and practices that keep delivery fast and reliable.

External publications

My other publications, which I authored or co-authored elsewhere.

  • Towards Increased Expressiveness in Service Level Agreements | Concurrency and Computation: Practice and Experience, Special Issue: Middleware for Grid Computing: A ‘Possible Future’, Volume 19, Issue 14, p1975–1990 · Sep 25, 2007 — link
  • Applying the Grid to 3D Capture Technology | Concurrency and Computation: Practice & Experience - Selected Papers from AHM 2004, Volume 19 Issue 2, p235 - 249 · Feb 1, 2007 — link
  • Surface Director Sliding in LC Cell with Light-Controlled Chirality | Molecular Crystals and Liquid Crystals, Volume 453, September 2006 , pages 263 - 274 · Sep 1, 2006 — link