Skip to content

Deploying a model on the Triton Inference Server

Triton Model Navigator provides an API for working with the Triton model repository. Currently, we support adding your own model or a pre-selected model from a Navigator Package.

The API only provides possible functionality for the given model's type and only provides offline validation of the provided configuration. In the end, the model with the configuration is created inside the provided model repository path.

Adding your own model to the Triton model repository

When you works with an already exported model you can provide a path to where one's model is located. Then you can use one of the specialized APIs that guides you through what options are possible for deployment of tbe selected model type.

Example of deploying a TensorRT model:

import model_navigator as nav

nav.triton.model_repository.add_model(
    model_repository_path="/path/to/triton/model/repository",
    model_path="/path/to/model/plan/file",
    model_name="NameOfModel",
    config=nav.triton.TensorRTModelConfig(
        max_batch_size=256,
        optimization=nav.triton.CUDAGraphOptimization(),
        response_cache=True,
    )
)

The model catalog with the model file and configuration is going to be created inside model_repository_path.

Adding model from package to the Triton model repository

When you want to deploy a model from a package created during the optimize process you can use:

import model_navigator as nav

nav.triton.model_repository.add_model_from_package(
    model_repository_path="/path/to/triton/model/repository",
    model_name="NameOfModel",
    package=package,
)

The model is automatically selected based on profiling results. The default selection options can be adjusted by changing the strategy argument.

Using Triton Model Analyzer

A model added to the Triton Inference Server can be further optimized in the target environment using Triton Model Analyzer.

Please, follow the documentation to learn more how to use Triton Model Analyzer.