Using AWS Lambda

How we deploy Machine Learning models at RavenPack

At RavenPack our team of developers integrate all the NLP and Machine Learning innovations we create in order to help our customers capture the insights they need to extract value in real-time. As Machine Learning engineers, however, there are occasions where we want to test our models in the real world, with real data and to receive real feedback from the market without releasing a new product version. Thus we require an efficient, cost effective and scalable way to deploy these models to be tested by the rest of the teams at the company, and our clients. This is where AWS Lambda comes in.

AWS Lambda

AWS Lambda is the best-in-class when it comes to serverless computing on AWS. Fast, reliable and traceable, it offers all we need and is one of the building blocks for other teams at RavenPack.

Unfortunately for us, due to the size limitations, we were unable to deploy our machine learning libraries and models, or so was the case until the last re:Invent event.

In November 2020 AWS announced Container Image support for Lambda and we enthusiastically seized the opportunity.

Our use case

Once this technical limitation was overcome we wanted to be able to:

  • Quickly deploy models on the AWS Cloud to receive market feedback.
  • Test the behavior locally for debugging.
  • Create a scalable, cost-effective and efficient framework.

In this article, we will share how we did it and greatly reduced our Time to Innovate window.

Our ML model

In this use case, our machine learning model was a classifier for a Computer Vision task. We used transfer learning with a classifier on top of VGG19 and Tensorflow 2 as the framework.

This is not crucial, but to have an idea of the latencies, on our machine (i7–10700K @3.8, 32GB DDR4@2400 Mhz), CPU inference times were around 500ms/image.

Automating the workflow

We created a simple bash script to orchestrate the pipeline from creating the docker image to uploading it to ECR.

All good things must come… to a docker container

When we launch the bash script, the following steps are done:

  • The previous deployment is removed, ensuring that old models are removed or overwritten.
  • The lambda folder is created.
  • The requirements.txt, dockerfile and machine learning models are copied on their paths.

Once this is done, the docker image could be built as the next step in the bash script file. This just downloaded or reused AWS lambda python3.8 framework base image and installed the required packages in the image.

To conclude, it sets the handler function as an entrypoint. This was required by AWS Lambda to work properly.

Test the docker image in the local environment

Uploading, setting up the image and testing it on AWS would have been expensive in terms of time and also if there are bugs in our code, so it was preferable to test the image locally before submitting it to ECR.

For this, the lambda base image provided an interface that emulated the lambda behavior. We only had to run the docker image locally and send our requests to the endpoint.

Let’s run it by mapping the 8080 port to our local port 9000, running in a terminal

sudo docker run -p 9000:8080 --env MY_ENVIRONMENT_VARIABLES lambda_deployment:latest

This started a server the same way running a Flask app locally would do.

Now we could start debugging from python or any other language and iterate.

Unfortunately, we had to regenerate the image for every bug correction we made. For that reason it was more convenient to use docker-compose when we wanted to do lambda deployments. We would create our own base image that contained the AWS Lambda base image and all the packages in the requirements.txt file and then to use that as the base image in order to only modify the handler function or the models.

Regenerating the image usually takes around 3 minutes but with this approach we could take it down to less than 30 seconds.

This is a sample request sent to our model using python

On every call, the container is going to return the result, but in the terminal it displays the total duration time, billed time and used resources. This is useful to dimension our Lambda accordingly, as otherwise we would either over-dimension resources with the additional expense or risk long inference times and errors.

In our case, we increased the latency to 1200ms at first but after refactoring the code and loading the model outside the handler body, it was reduced to 800ms, a good tradeoff for going serverless!

Once everything is debugged, then we can submit our image to ECR.

Submitting the image to ECR

We needed to first create an ECR repository via the AWS console for example. In this case, ravenpack-test.

Once everything had been debugged and we knew for sure the image worked, then there was time to also upload it to ECR. In the bash file shown above, we just needed to uncomment the last three lines, to retag the image and upload it to AWS ECR. To avoid getting an upload error, it is very important to authenticate your docker client prior to pushing the image this way:

aws ecr get-login-password >password.txt
password=$(<password.txt)
sudo docker login -u AWS -p $password <aws_account_id>.dkr.ecr.<region>.amazonaws.com

Then, be sure to correctly tag the image and upload it to the repository. Although the costs are not very high to maintain several images on ECR, we advise to check beforehand that it behaves as expected locally, and to only upload when necessary.

image_name=$aws_account_id.dkr.ecr.$aws_region.amazonaws.com/ravenpack-test # proper naming for the image on the ECR repository
sudo docker tag lambda_deployment $imaeg_name # rename the image to the ECR repository name
sudo docker push $aws_account_id.dkr.ecr.$aws_region.amazonaws.com/ravenpack-test # push the image

Load the image in AWS Lambda

Now that the image is in ECR, we just need to create a new lambda function and select the Container Image option.

Conclusions

In this article we have shown the steps to serve a Machine Learning model using AWS Lambda and its new container image feature.

This proves to be a cost-effective solution, as the total cost for 100 inferences/day would be less than US$1/mo (US$ 0.5 fixed cost per hosting and US$0.15 variable costs per 1000 inferences).