Deploying in Cluster
The library can be used inside containers and deployed on Kubernetes clusters. There are certain prerequisites and information that would help deploy the library in your cluster.
Health checks
The library uses the Triton Inference Server to handle HTTP/gRPC requests. Triton Server provides endpoints to validate if the server is ready and in a healthy state. The following API endpoints can be used in your orchestrator to control the application ready and live states:
- Ready:
/v2/health/ready
- Live:
/v2/health/live
Exposing ports
The library uses the Triton Inference Server, which exposes the HTTP, gRPC, and metrics ports for communication. In the default configuration, the following ports have to be exposed:
- 8000 for HTTP
- 8001 for gRPC
- 8002 for metrics
If the library is inside a Docker container, the ports can be exposed by passing an extra argument to the docker run
command. An example of passing ports configuration:
To deploy a container in Kubernetes, add a ports definition for the container in YAML deployment configuration:
containers:
- name: pytriton
...
ports:
- containerPort: 8000
name: http
- containerPort: 8001
name: grpc
- containerPort: 8002
name: metrics
Configuring shared memory
The connection between Python callbacks and the Triton Inference Server uses shared memory to pass data between the
processes. In the Docker container, the default amount of shared memory is 64MB
, which may not be enough to pass input and
output data of the model. The PyTriton initialize 16MB
of shared memory for Proxy Backend
at start to pass
input/output tensors between processes. The additional memory is allocated dynamically. In case of failure, the size
of available shared memory might need to be increased.
To increase the available shared memory size, pass an additional flag to the docker run
command.
An example of increasing the shared memory size to 8GB:
spec:
volumes:
- name: shared-memory
emptyDir:
medium: Memory
containers:
- name: pytriton
...
volumeMounts:
- mountPath: /dev/shm
name: shared-memory
Specify container init process
You can use the --init
flag of the docker run
command to indicate that an init process should be used as the PID 1 in the container.
Specifying an init process ensures that reaping zombie processes are performed inside the container. The reaping zombie
processes functionality is important in case of an unexpected error occurrence in scripts hosting PyTriton.