Quick Start

These sections provide an overview of optimizing the model, deploying model for serving inference on PyTriton or Triton Inference Server as well as using the Navigator Package. In each section you will find links to learn more about Model Navigator features.

Optimize Model

Optimizing models using Model Navigator is as simple as calling optimize function. The optimization process requires at least:

model - a Python object, callable or file path with model to optimize.
dataloader - a method or class generating input data. The data is utilized to determine the maximum and minimum shapes of the model inputs and create output samples that are used during the optimization process.

Here is an example of running optimize on Torch Hub ResNet50 model:

import torch
import model_navigator as nav

package = nav.torch.optimize(
    model=torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True).eval(),
    dataloader=[torch.randn(1, 3, 256, 256) for _ in range(10)],
)

Once the model has been optimized the created artifacts are stored in navigator_workspace and a Package object is returned from the function. Read more about optimize in documentation

Deploy model in PyTriton

The PyTriton can be used to serve inference of any optimized format. Model Navigator provide a dedicated PyTritonAdapter to retrieve the runner and other information required to bind a model for serving inference. The runner is an abstraction that connects the model checkpoint with its runtime, making the inference process more accessible and straightforward.

Following that, you can initialize the PyTriton server using the adapter information:

pytriton_adapter = nav.pytriton.PyTritonAdapter(package=package, strategy=nav.MaxThroughputStrategy())
runner = pytriton_adapter.runner

runner.activate()


@batch
def infer_func(**inputs):
    return runner.infer(inputs)


with Triton() as triton:
    triton.bind(
        model_name="resnet50",
        infer_func=infer_func,
        inputs=pytriton_adapter.inputs,
        outputs=pytriton_adapter.outputs,
        config=pytriton_adapter.config,
    )
    triton.serve()

Read more about deploying model on PyTriton in documentation

Deploy model in Triton Inference Server

The optimized model can be also used for serving inference on Triton Inference Server when the serialized format has been created. Model Navigator provide functionality to generate a model deployment configuration directly inside Triton model_repository. The following command will select the model format with the highest throughput and create the Triton deployment in defined path to model repository:

nav.triton.model_repository.add_model_from_package(
    model_repository_path=pathlib.Path("model_repository"),
    model_name="resnet50",
    package=package,
    strategy=nav.MaxThroughputStrategy(),
)

Once the entry is created, you can simply start Triton Inference Server mounting the defined model_repository_path.

Read more about deploying model on Triton Inference Server in documentation

Using Navigator Package

The Navigator Package is an artifact that can be produced at the end of the optimization process. The package is a simple Zip file which contains the optimization details, model metadata and serialized formats and can be saved using:

nav.package.save(
    package=package,
    path="/path/to/package.nav"
)

The package can be easily loaded on other machines and used to re-run the optimization process or profile the model. Read more about using package in documentation.