Triton Model Navigator
Overview
Welcome to the Triton Model Navigator, an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs. The Triton Model Navigator streamlines the process of moving models and pipelines implemented in PyTorch, TensorFlow, and/or ONNX to TensorRT.
The Triton Model Navigator automates several critical steps, including model export, conversion, correctness testing, and profiling. By providing a single entry point for various supported frameworks, users can efficiently search for the best deployment option using the per-framework optimize function. The resulting optimized models are ready for deployment on either PyTriton or Triton Inference Server.
Features at Glance
The distinct capabilities of the Triton Model Navigator are summarized in the feature matrix:
Feature | Description |
---|---|
Ease-of-use | A single line of code to run all possible optimization paths directly from your source code |
Wide Framework Support | Compatible with various machine learning frameworks including PyTorch, TensorFlow, and ONNX |
Models Optimization | Enhance the performance of models such as ResNET and BERT for efficient inference deployment |
Pipelines Optimization | Streamline Python code pipelines for models such as Stable Diffusion and Whisper using Inplace Optimization, exclusive to PyTorch |
Model Export and Conversion | Automate the process of exporting and converting models between various formats with focus on TensorRT and Torch-TensorRT |
Correctness Testing | Ensures the converted model produces correct outputs validating against the original model |
Performance Profiling | Profiles models to select the optimal format based on performance metrics such as latency and throughput to optimize target hardware utilization |
Models Deployment | Automates models and pipelines deployment on PyTriton and the Triton Inference Server through a dedicated API |
Support Matrix for Frameworks
The Triton Model Navigator efficiently produces various optimized models ready for deployment. The accompanying table showcases the diverse model formats achievable through the Triton Model Navigator across different frameworks, highlighting its versatility.
Table: Supported conversion target formats per supported Python framework or file.
PyTorch | TensorFlow 2 | JAX | ONNX |
---|---|---|---|
Torch Compile | SavedModel | SavedModel | TensorRT |
TorchScript Trace | TensorRT in TensorFlow | TensorRT in TensorFlow | |
TorchScript Script | ONNX | ONNX | |
Torch-TensorRT | TensorRT | TensorRT | |
ONNX | |||
TensorRT |
Note: The Triton Model Navigator has the capability to support any Python function as input. However, in this particular case, its role is limited to profiling the function without generating any serialized models.
The Inplace Optimize feature is dedicated for PyTorch to optimize pipelines patching nn.Modules
and optimize them to
TensorRT. The table below highlights the possible optimization paths for Inplace Optimize:
Table: Supported conversion target formats for Inplace Optimize.
PyTorch |
---|
Torch Compile |
TorchScript Trace |
TorchScript Script |
Torch-TensorRT |
ONNX |
TensorRT |
What next?
Learn more about using the Triton Model Navigator in Quick Start, where you will find more information about optimizing models and serving inference.