Triton Model Navigator
The Triton Model Navigator automates the process of moving model from source to deployment on Triton Inference Server. The tool validate possible export and conversion paths to serializable formats like TensorRT and select the most promising format for production deployment.
How it works?
The Triton Model Navigator is designed to provide a single entrypoint for each supported framework. The usage is
simple as call to dedicated optimize
function to start the process of searching for the best
possible deployment by going through a broad spectrum of model conversions.
The optimize
internally performs model export, conversion, correctness testing, performance profiling,
and saves all generated artifacts in the navigator_workspace
, which is represented by a returned package
object.
The result of optimize
process can be saved as a portable Navigator Package with the save
function.
Saved packages only contain the base model formats along with the best selected format based on latency and throughput.
The package can be reused to recreate the process on same or different hardware. The configuration and execution status
is saved in the status.yaml
file located inside the workspace and the Navigator Package
.
Finally, the Navigator Package
can be used for model deployment
on Triton Inference Server. Dedicated API helps with obtaining all
necessary parameters and creating model_repository
or receive the optimized model for inference in Python environment.