Skip to content

Deploying in Cluster

The library can be used inside containers and deployed on Kubernetes clusters. There are certain prerequisites and information that would help deploy the library in your cluster.

Health checks

The library uses the Triton Inference Server to handle HTTP/gRPC requests. Triton Server provides endpoints to validate if the server is ready and in a healthy state. The following API endpoints can be used in your orchestrator to control the application ready and live states:

  • Ready: /v2/health/ready
  • Live: /v2/health/live

Exposing ports

The library uses the Triton Inference Server, which exposes the HTTP, gRPC, and metrics ports for communication. In the default configuration, the following ports have to be exposed:

  • 8000 for HTTP
  • 8001 for gRPC
  • 8002 for metrics

If the library is inside a Docker container, the ports can be exposed by passing an extra argument to the docker run command. An example of passing ports configuration:

docker run -p 8000:8000 -p 8001:8001 -p 8002:8002 {image}

To deploy a container in Kubernetes, add a ports definition for the container in YAML deployment configuration:

containers:
  - name: pytriton
    ...
    ports:
      - containerPort: 8000
        name: http
      - containerPort: 8001
        name: grpc
      - containerPort: 8002
        name: metrics

Configuring shared memory

The connection between Python callbacks and the Triton Inference Server uses shared memory to pass data between the processes. In the Docker container, the default amount of shared memory is 64MB, which may not be enough to pass input and output data of the model. The PyTriton initialize 16MB of shared memory for Proxy Backend at start to pass input/output tensors between processes. The additional memory is allocated dynamically. In case of failure, the size of available shared memory might need to be increased.

To increase the available shared memory size, pass an additional flag to the docker run command. An example of increasing the shared memory size to 8GB:

docker run --shm-size 8GB {image}
To increase the shared memory size for Kubernetes, the following configuration can be used:

spec:
  volumes:
    - name: shared-memory
      emptyDir:
        medium: Memory
  containers:
    - name: pytriton
      ...
      volumeMounts:
        - mountPath: /dev/shm
          name: shared-memory

Specify container init process

You can use the --init flag of the docker run command to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures that reaping zombie processes are performed inside the container. The reaping zombie processes functionality is important in case of an unexpected error occurrence in scripts hosting PyTriton.