Deployment on PyTriton

The PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments. In general, using PyTriton can serve any Python function. The Triton Model Navigator provides a runner - an abstraction that connects the model checkpoint with its runtime, making the inference process more accessible and straightforward. The runner is a Python API through which an optimized model can serve inference.

Obtaining runner from Package

The Navigator Package provides an API for obtaining the model for serving inference. One of the options is to obtain the runner:

runner = package.get_runner()

The default behavior is to select the model and runner which during profiling obtained the smallest latency and the largest throughput. This runner is considered as most optimal for serving inference queries. Learn more about the get_runner method in Navigator Package API.

To use the runner in PyTriton additional information for the serving model is required. For that purpose, we provide a PyTritonAdapter that contains all the minimal information required to prepare for successful deployment of a model using PyTriton.

Using PyTritonAdapter

The Triton Model Navigator provides a dedicated PyTritonAdapter to retrieve the runner and other information required to bind a model for serving inference. Following that, you can initialize the PyTriton server using the adapter information:

pytriton_adapter = nav.pytriton.PyTritonAdapter(package=package, strategy=nav.MaxThroughputStrategy())
runner = pytriton_adapter.runner

runner.activate()


@batch
def infer_func(**inputs):
    return runner.infer(inputs)


with Triton() as triton:
    triton.bind(
        model_name="resnet50",
        infer_func=infer_func,
        inputs=pytriton_adapter.inputs,
        outputs=pytriton_adapter.outputs,
        config=pytriton_adapter.config,
    )
    triton.serve()

Once the python script is executed, the model inference is served through HTTP/gRPC endpoints.

Read more about the adapter API and deployment configuration.