Amplemarket's mission is to help companies grow all around the world. At the core of this mission is making sales team's lives easier and Machine Learning is a big part of that.
Our Machine Learning team is still small, but we're able to achieve a great amount of scale. Our secret sauce? Keeping things simple until we can't anymore. One of the big reasons why we are able to maintain this dynamic is because we leverage the Public Cloud.
One Cloud, too many options.
Most Cloud providers offer multiple options for ML teams to deploy their software. Those options are separated in 3 big buckets: FaaS (Functions as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service). To simplify things, on Google Cloud Run this means: Cloud Functions (FaaS), Cloud Run (PaaS), and Compute Engine (IaaS). On a different Cloud? There's probably an equivalent service.
These three services are distinguishable. They have different complexities, pricing structures, costs, and "elasticities". At Amplemarket, we usually start with the most simple - that's Cloud Functions (FaaS). Unfortunately, Machine Learning services usually require some sort of storage (e.g,. for your model for example), or access to a large amount of memory (e.g., loading a transformer model).
This is when Cloud Run comes into place. Cloud Run allows us to deploy our Machine Learning models as a Docker Container. This has several benefits:
- Cloud independence: Containers can run in any Public Cloud, and even in a private Data Center.
- Elasticity: If we get many requests, Google Cloud can spin up more containers to meet demand - without us having to touch the service.
- Cost-effectiveness: We only pay for the time we are using the service (e.g., if you deploy to a Virtual Machine, you pay for the entire life cycle of the instance).
But how do we exactly deploy our models into Cloud Run?
From model to microservice.
In a previous post we talked about how we serve our Machine Learning models with FastAPI. To ensure we can run our model in a Docker Container, we need to package it.
Through several iterations, we've found that the following structure tends to work well:
This structure is by no means fixed, but it has several advantages:
Dockerizing the microservice
In the example above, we are packaging our application as a FastAPI microservice. This will expose the API to other teams at Amplemarket. But in order to deploy to Cloud Run, we need to package our app into a Docker Container.
Here's the contents of the Dockerfile:
Bonus: If you're using PyTorch and don't need GPU access, the instruction above installs a much smaller version. It's the difference between a 5GB and a 2GB container.
To run the microservice locally, we start by building it:
Deploying to Google Cloud Run
In order to deploy our container to Cloud Run, we first need to make sure we have the Google Cloud CLI installed. To do so, follow the instructions here.
Google Cloud Run launches services based on Containers. If we want to launch a new service, we first need to build and push a container to Google Cloud:
Now that our container has been pushed, we need to tell Cloud Run to launch a service based on it:
Let's see this command:
There are many more options to specify if you want.
In the need, this command will also output a link where your API is now running.
Continuously delivering our service.
Running the above commands every time we want to deploy or update our API is not an option. Any reliable service must have some sort of continuous delivery system.
Using GitHub Actions, we can include a file in our repo that automatically deploys the service every time we update the code.
With the above file, every time you push to your repository, GitHub Actions will automatically build a new version of the container, push it, and update the Cloud Run service. So that whatever you are running in the Cloud is an exact replica of your repo.
If you've reached this far, congrats! You've just learned how to deploy a Machine Learning Microservice to Google Cloud run. Google Cloud will now spin up as many instances of your container as required, depending on traffic. This also allows your app to downscale when it's not being used, making it a great alternative to serving ML models.
But it's not all rainbows and unicorns. The service can sometimes take some time to wake up even though you can configure it to be "always on". This obviously carries an increased cost. I recommend reviewing some Cloud Run to ensure everything is set up according to your needs.
Subscribe to Amplemarket Blog
Sales Tips, Email Resources, Marketing Content