The first conference of Operational Machine Learning: OpML ‘19

I attended OpML ’19 is a conference for “Operational Machine Learning” held at Santa Clara on May 20th.

OpML ‘19
_The 2019 USENIX Conference on Operational Machine Learning (OpML ‘19) will take place on Monday, May 20, 2019, at the…

The scope of this conference is varied and seems not to be specified yet, even if I attended it. I’ll borrow the description from the OpML website.

The 2019 USENIX Conference on Operational Machine Learning (OpML ’19) provides a forum for both researchers and industry practitioners to develop and bring impactful research advances and cutting edge solutions to the pervasive challenges of ML production lifecycle management. ML production lifecycle is a necessity for wide-scale adoption and deployment of machine learning and deep learning across industries and for businesses to benefit from the core ML algorithms and research advances.

Overview of the conference

  • The number of attendees was 210, they came from LinkedIn, Microsoft, Google, Airbnb, Facebook, etc.
  • The target of “Operational Machine Learning” is diverse. I thought it focuses on MLOps things such as reproducibility, ML DSL for productionization, visualization, stakeholder management, but there are many talks about ML for system, system utilization optimization, SRE for ML, hardware accelerator, etc.
  • There is a contrast between tech giants, e.g. Google, Uber, Facebook, Airbnb, Microsoft, and LinkedIn, and other followers. While ML lead companies are talking about their OSSs or ML infrastructures, following companies tend to talk about their specific use case or their solutions (those speakers seems to be small ML ventures).

Some interesting talks

Keynote: Ray: A Distributed Framework for Emerging AI Applications

MLOp Lifecycle Scheme for Vision-based Inspection Process in Manufacturing

From [\_slides\_lim.pdf](

AIOps: Challenges and Experiences in Azure

How the Experts Do It: Production ML at Scale

A panel discussion for ML infrastructures

Lead and moderator: Joel Young, LinkedIn


  • Sandhya Ramu, Director, AI SRE, LinkedIn
  • Andrew Hoh, Product Manager, ML Infra and Applied ML, AirBNB
  • Aditya Kalro, Engineering Manager, AI Infra Services and Platform, Facebook
  • Faisal Siddiqi, Engineering Manager, Personalization Infrastructure, Netflix
  • Pranav Khaitan, Engineering Manager, Personalization and Dialog ML Infra, Google

The important thing to keep top level is

  • the lead time from experiment to production
  • Flows build for production with involving different team
  • Not everything is the highest priority. Metrics, dashboards are important

Cost of run/train vs Agility

  • It’s hard to find down streaming use cases. (Airbnb)
  • Monitor model resource usage
  • Keep ML infrastructure extremely flexible
  • Hard to force using a single framework

What are the important things for your ML platform?


  • Reliability
  • Scalability
  • Developer productivity


  • Agility (Available libraries etc)
  • Enabling the latest technology
  • Cost and impact of Machine Learning


  • How quickly/many A/B test we can do
  • How rapid new researcher can do?


  • Business impact
  • # of users for the infrastructures
  • How many inferences/scoring is done?
  • Availability, scalability, cost, and long-term decision making


  • Innovation aspect
  • How can the ML infrastructure system will empower the next 5 yrs products?

Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform

From [](

Disdat: Bundle Data Management for Machine Learning Pipelines

Predictive Caching@Scale

Aki Ariga
Aki Ariga
Staff Software Engineer

Interested in Machine Learning, ML Ops, and Data driven business. If you like my blog post, I’m glad if you can buy me a tea 😉

  Gift a cup of Tea