Deploying spherical models to a frictionless vacuum

MLOps.WTF Edition #8

Nov 21, 2024

Ahoy there 🚢,

Matt Squire here, CTO and co-founder of Fuzzy Labs, and this is the 8th edition of MLOps.WTF, a newsletter where I discuss topics in Machine Learning, AI, and MLOps.

Deploying spherical models to a frictionless vacuum

As the physicist Richard Feynman warned, there is a “serious disease” afflicting those who work with computers; one that hinders productivity. “The trouble with computers is that you 'play' with them!”.

While mostly known for his contributions to quantum and particle physics, Feynman was also keenly interested in the young field of computer science. In the 1980s he gave a series of lectures at Caltech covering topics from turing machines, logic gates, and compression, to quantum computers and programs that run on DNA molecules. These lectures were later compiled into a book, Feynman Lectures on Computation. My battered copy is among my most treasured possessions.

Through the lens of physics, Feynman reveals all kinds of insights into computer science. In one chapter, he masterfully reduces the entire field down to “simple” thermodynamics, and then proceeds to figure out a formula for the minimum amount of energy required to compute anything at all. And that means for any kind of computer that we might imagine, whether it runs on a silicon chip, or a big lump of inter-connected neurons, we have a fundamental lower bound on energy per computation step.

Don’t worry, I won’t share Feynman’s maths here — you’ll have to read the book for that. But this question of minimum energy is something we’ve been thinking about a lot recently, particularly when it comes to deploying machine learning models to power-constrained hardware.

One example came up earlier this year when we were working with a seaweed farming startup (no, seriously). They wanted to use AI to monitor their farms and provide early warning about threats; nearby boats, storms, pesky seagulls, etc. So we trained a computer vision model, and deployed it to a smart buoy which our customer built, equipped with a camera and a raspberry pi.

The idea was to reduce site visits for inspection and maintenance, and make running the farms less expensive overall. Inspection entails getting in a boat and sailing out to the farm, so anything we could do to minimise trips was worthwhile. But since the smart buoy is battery-powered, if the model is too power-hungry then all we would accomplish is to replace inspection trips with battery change trips.

Our choice of hardware is certainly a component in energy use. But models themselves can also differ in how much energy they consume for inference. And decisions that are made when we train a model directly contribute to how much energy is used at inference time.

This isn’t a problem unique to seaweed farmers. Recently another customer asked whether, prior to training a model, we can accurately characterise its inference-time energy use. And if so, can we incorporate that information into an MLOps pipeline?

We didn’t have an immediate answer, but it caught our curiosity. It turns out the first question was explored in a paper titled NeuralPower: Predict and Deploy Energy-Efficient Neural Networks, presented at the 2017 Asian Conference on Machine Learning.

The authors looked at convolutional neural networks, or CNNs, a popular architecture for computer vision models (particularly at the time; this is a 7 year old paper after all!) and they posed three questions:

How much energy is consumed for an inference made by a CNN?
Is it possible to predict this energy consumption before a model is even trained?
If yes, how should somebody select an energy-efficient CNN for deployment?

Their approach was to train a set of regression models that predict both power consumption and latency per model layer. A CNN is made up of differing kinds of layers — e.g. convolutional, dense, pooling — and each has differing performance characteristics. To collect training data, the authors ran some off-the-shelf models like AlexNet on a particular GPU, and measured the power drawn during inference for each model layer.

So far, so good. But what about the MLOps question? We wanted a tangible tool that could be incorporated into a continuous train-test-deploy pipeline. The authors did release some code on Github, but it’s written in Matlab, and I don’t think it’s unkind for me to describe it as not quite production-ready.

For the past few weeks, we’ve been collecting our own training dataset using Nvidia’s Jetson Orin platform, which is designed for edge-deployed machine learning applications, and re-implementing this paper in Python. We want to see whether we can reproduce the authors’ results, and at the same time understand how to engineer a meaningful tool out of it. Ultimately, we’d like to build an open source tool to help MLOps engineers better understand the energy impact of their models.

MLOps is still a relatively new field. A lot of the time we’ll encounter a problem, just like this one, that we don’t immediately know how to solve. Often that’s simply because nobody has figured it out yet, and whenever that’s the case, there’s an opportunity for some fundamental research and genuine innovation.

The quadratic power

In fact, we think that the opportunity for innovation in MLOps (and in AI/ML tooling more broadly) is big enough that we’re building a research lab within Fuzzy Labs, aptly named Fuzzy Labs², for innovative tech projects, fundamental research into MLOps, and collaboration across industry and academia to advance the state of the art in production AI.

As you’ve no doubt inferred (pun intended), our project on model energy use is our starting point. As well as building a tool, we want to open source all of the things it takes to get there, so that means having an open dataset (with version control), open experiment tracking, and the code to train the predictive models that sit behind any tool.

That pattern reflects how we think an MLOps research lab should look more broadly too: working in the open, sharing our methods, our successes and our failures.

You can read more about Fuzzy Labs² here. We’d love your thoughts, feedback, and suggestions too.

MLOps.WTF Manchester - 28 November

We’ve got one more important announcement to make. We’re running our very first in-person MLOps.WTF event in Manchester, UK next week, on Thursday the 28th of November. There’ll be speakers from Peak AI, Naimuri, and elsewhere discussing real life experiences in production machine learning.

We’d love to see you there. You can sign up by clicking here.

And finally

The internet has been abuzz this week with the story of world-renowned mathematician Terence Tao’s unconventional approach to proving a particular theorem, which resulted in his aunt walking in on him with his eyes closed writhing around on the floor as he visualised a particular transformation.

When you’re the best mathematician in the world, you can admit to these eccentricities. While I’m a big advocate of visualisation and finding new perspectives on problems, I’ve never quite taken it to these extremes… Though if it works, it’s something I’d be very happy to see adopted by the engineers at Fuzzy Labs!

MLOps.wtf by Matt Squire

Discussion about this post