Deploy ML Docker

# USE CASE

We've been able to deploy container images to AWS Lambda for a while and earlier this year we're also able to quickly attach a url to our function. So let's deploy a containerized ML model and setup an endpoint for inference. We'll build out a project template, create a virtual environment to test locally. We'll want to build the docker image compatible with AWS Lambda and it would be nice to test that locally before deployment. And finally we'll deploy to Lambda with Serverless and configure our url endpoint.

# REQUIREMENTS

We'll need to have a few things installed: AWS CLI, Docker CLI, Serverless, I'll use Pyenv and Poetry to scaffold a project and create a virtualenv for local testing.

# SETUP

Let's setup our folder structure:

# terminal
pyenv local 3.9.14
poetry new src
cd src
mkdir models
touch handler.py ./src/main.py

Let's add a few dependencies we'll need to for our inference. Poetry will create a virtual environment for us and we'll see a .venv folder created.

# terminal
poetry add torch transformers sentencepiece sacremoses

# FUNCTIONAL CODE

For this example we'll use a hugging face model to translate some text from english to spanish. In our main.py let's create a function to save that model locally and another function to make an inference on some input text:

# ./src/src/main.py
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

def save_model_local():
  tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-es")
  model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-es")
  # save model locally
  tokenizer.save_pretrained("./models/Helsinki-NLP")
  model.save_pretrained("./models/Helsinki-NLP")

def inference(text = "Can you translate this for me?") -> str:
  translator = pipeline("translation", model="./models/Helsinki-NLP")
  output = translator(text)
  output[0]['translation_text']
  return output[0]['translation_text']

We'll want to try this out locally so let's create a couple scripts for Poetry to execute:

# ./src/pyproject.toml
[tool.poetry.scripts]
main = "src.main:inference"
save = "src.main:save_model_local"

We've pointed poetry to the folder, file and function to run. Then we can run these scripts by first starting up a shell with Poetry and the run command:

# terminal
poetry shelll
poetry run save
poetry run main
➜ ¿Puedes traducir esto para mí?

Beautiful, we know the model is correctly returning our translation. Let's quickly export our dependecies out to a requirements.txt file:

# terminal
poetry export --without-hashes --format=requirements.txt > requirements.txt

# DOCKER CONFIG

And now we can setup our Dockerfile for deployment on AWS Lambda:

# ./Dockerfile
FROM public.ecr.aws/lambda/python:3.9.2022.11.30.08
# The AWS base images provide the following environment variables:
# LAMBDA_TASK_ROOT=/var/task
# LAMBDA_RUNTIME_DIR=/var/runtime

COPY ./src/requirements.txt .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

COPY src/src ${LAMBDA_TASK_ROOT}/src/
COPY src/models ${LAMBDA_TASK_ROOT}/models/
COPY src/handler.py ${LAMBDA_TASK_ROOT}

ENV PYTHONUNBUFFERED=1

CMD ["handler.translate"]

Notice we're using an AWS Lambda base image to ensure compatibility and to test the container locally with the built-in Runtime interface emulator (RIE). By default AWS provides us with the /var/task/ folder, this is where we need to install our dependencies, and we've also copied over our main code and model. For ease of use we're going to create a handler file in the root folder which will be the point of entry for Lambda.

# ./src/handler.py
from src.main import inference
import json

def translate(event, context) -> str:
  if "body" in event:
    # function url
    body = json.loads(event['body'])
    req = body['text']
  else:
    # direct invocation
    req = event["text"]
  
  output = inference(req)
  return output

We want to handle a couple situations: when we invoke via a function url the text will come with the request in a body attribute, otherwise (local docker testing or testing in the AWS console) we will receive the text directly in the event. In either case we simply want to pass that text along to our inference call.

After we build our image we'll need to push it somewhere so our Lambda function can reference it. So first we'll use the AWS CLI to create a repo in ECR:

# terminal
container=containerdeploy-dev-inference && \
aws ecr create-repository --repository-name $container --image-scanning-configuration scanOnPush=true
# {
#     "repository": {
#         "repositoryArn": "arn:aws:ecr:us-east-1:999999999999:repository/containerdeploy-dev-inference",
#         "registryId": "67xxxxxxxx11",
#         "repositoryName": "containerdeploy-dev-inference",
#         "repositoryUri": "999999999999.dkr.ecr.us-east-1.amazonaws.com/containerdeploy-dev-inference",
#         "createdAt": "2022-12-04T22:29:30-07:00",
#         "imageTagMutability": "MUTABLE",
#         "imageScanningConfiguration": {
#             "scanOnPush": true
#         },
#         "encryptionConfiguration": {
#             "encryptionType": "AES256"
#         }
#     }
# }

The repository has been created and its given us an Uri we can use. Before pushing the image we'll need to make sure Docker has access to this repo:

# terminal
uri=999999999999.dkr.ecr.us-east-1.amazonaws.com
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $uri
-> Login Succeeded

Now we can build our image, tag it, and push to our newly created ECR repo.

# terminal
tag="1.0" && \
uri="999999999999.dkr.ecr.us-east-1.amazonaws.com" && \
container="containerdeploy-dev-inference" && \
# Link local image to AWS ECR repository and push it
docker build . -t $container:$tag && \
docker tag $container:$tag $uri/$container:$tag && \
docker push $uri/$container:$tag

-> sha256:95xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx55751d

Things look good so far. We've successfully pushed this image to our ECR repository and it's provided us with an sha256, hang on to this, we'll need it for our Serverless configuration. If we log into our AWS console and navigate to Elastic Container Registry we will see our newly created repo with our custom image. Before we deploy this to Lambda it would be great to test the Docker image to make sure it's working. Since the base AWS image has RIE installed in it by default we can run the container and invoke the function using a curl command with a specific url format:

# starting the container
docker run -p 9000:8080 $container:$tag
# you can invoke the function with data as below, this will be passed to the event object
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"text": "I need to translate some text to spanish"}'
# or conveniently you can point to a file
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @src/tests/event.json
-> "Necesito traducir un texto al espa\u00f1ol."%
# and we'll see a Lmabda like output from our container
-> from container
START RequestId: fcd54696-0009-47fc-b455-9297f5c3d8e5 Version: $LATEST
Necesito traducir un texto al español.
END RequestId: fcd54696
REPORT RequestId: fcd54696
	Init Duration: 0.17 ms
	Duration: 3100.61 ms
	Billed Duration: 3101 ms
	Memory Size: 3008 MB
	Max Memory Used: 3008 MB

# SERVERLESS DEPLOY

As a final step we'll put together a Serverless configuration to deploy a lambda with our container:

org: myorg
app: lambda
service: containerdeploy
frameworkVersion: '3'

custom:
  dockersha: sha256:95xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx55751d
  uri: 999999999999.dkr.ecr.us-east-1.amazonaws.com
  container: containerdeploy-dev-inference

provider:
  name: aws
  deploymentMethod: direct
  runtime: python3.9
  stage: dev
  region: us-east-1
  memorySize: 4096
  logRetentionInDays: 3
  iam:
    role:
      managedPolicies:
        - 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess'

functions:
  inference:
    image: ${self:custom.uri}/${self:custom.container}@${self:custom.dockersha}
    timeout: 240
# terminal
sls deploy --verbose

We've set a few custom variables to reference our image hash, the url of our ECR repository and the name of our container. Other than a few standard config values we'll want to make sure our function has ECR privileges. And in the functions section teh critiical part is to provide the path to our image for invocation. We can quickly deploy to our Serverless dashboard with sls deploy, and we can monitor our function metrics straight from Serverless console.

← Home