Changelog

0.4.0

new: optimize method that replace export and perform max batch size search and improved profiling during process
new: Introduced custom configs in optimize for better parametrization of export/conversion commands
new: Support for adding user runners for model correctness and profiling
new: Search for max possible batch size per format during conversion and profiling
new: API for creating Triton model store from Navigator Package and user provided models
change: Improved status structure for Navigator Package
deprecated: Optimize for Triton Inference Server support
deprecated: HuggingFace contrib module
Bug fixes and other improvements
Version of external components used during testing:
PyTorch 1.14.0a0+410ce96
TensorFlow 2.11.0
TensorRT 8.5.2.2
ONNX Runtime 1.13.1
Polygraphy: 0.43.1
GraphSurgeon: 0.4.6
tf2onnx v1.13.0
Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.8

Updated NVIDIA containers defaults to 22.11
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.7

Updated NVIDIA containers defaults to 22.10
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.6

Updated NVIDIA containers defaults to 22.09
Model Navigator Export API:
- new: cast int64 input data to int32 in runner for Torch-TensorRT
- new: cast 64-bit data samples to 32-bit values for TensorRT
- new: verbose flag for logging export and conversion commands to console
- new: debug flag to enable debug mode for export and conversion commands
- change: logs from commands are streamed to console during command run
- change: package load omit the log files and autogenerated scripts
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.5

Updated NVIDIA containers defaults to 22.08
Model Navigator Export API:
- new: TRTExec runner use use_cuda_graph=True by default
- new: log warning instead of raising error when dataloader dump inputs with nan or inf values
- new: enabled logging for command input parameters
- fix: invalid use of Polygraphy TRT profile when trt_dynamic_axes is passed to export function
Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.19.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.4

Updated NVIDIA containers defaults to 22.07
Model Navigator OTIS:
- deprecated: TF32 precision for TensorRT from CLI options - will be removed in future versions
- fix: Tensorflow module was imported when obtaining model signature during conversion
Model Navigator Export API:
- new: Support for building framework containers with Model Navigator installed
- new: Example for loading Navigator Package for reproducing the results
- new: Create reproducing script for correctness and performance steps
- new: TrtexecRunner for correctness and performance tests with trtexec tool
- new: Use TF32 support by default for models with FP32 precision
- new: Reset conversion parameters to defaults when using load for package
- new: Testing all options for JAX export enable_xla and jit_compile parameters
- change: Profiling stability improvements
- change: Rename of onnx_runtimes export function parameters to runtimes
- deprecated: TF32 precision for TensorRT from available options - will be removed in future versions
- fix: Do not save TF-TRT models to the .nav package
- fix: Do not save TF-TRT models from the .nav package
- fix: Correctly load .nav packages when _input_names or _output_names specified
- fix: Adjust TF and TF-TRT model signatures to match input_names
- fix: Save ONNX opset for CLI configuration inside package
- fix: Reproduction scripts were missing for failing paths
Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.3

Model Navigator Export API:
- new: Improved handling inputs and outputs metadata
- new: Navigator Package version updated to 0.1.3
- new: Backward compatibility with previous versions of Navigator Package
- fix: Dynamic shapes for output shapes were read incorrectly
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.2

Updated NVIDIA containers defaults to 22.06
Model Navigator OTIS:
- new: Perf Analyzer profiling data use base64 format for content
- fix: Signature for TensorRT model when has uint64 or int64 input and/or outputs defined
Model Navigator Export API:
- new: Updated navigator package format to 0.1.1
- new: Added Model Navigator version to status file
- new: Add atol and rtol configuration to CLI config for model
- new: Added experimental support for JAX models
- new: In case of export or conversion failures prepare minimal scripts to reproduce errors
- fix: Conversion parameters are not stored in Navigator Package for CLI execution
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.1

Updated NVIDIA containers defaults to 22.05
Model Navigator OTIS:
- fix: Saving paths inside the Triton package status file
- fix: Empty list of gpus cause the process run on CPU only
- fix: Reading content from zipped Navigator Package
- fix: When no GPU or target device set to CPU optimize avoid running unsupported conversions in CLI
- new: Converter accept passing target device kind to selected CPU or GPU supported conversions
- new: Added support for OpenVINO accelerator for ONNXRuntime
- new: Added option --config-search-early-exit-enable for Model Analyzer early exit support in manual profiling mode
- new: Added option --model-config-name to the select command. It allows to pick a particular model configuration for deployment from the set of all configurations generated by Triton Model Analyzer, even if it's not the best performing one.
- removed: The --tensorrt-strict-types option has been removed due to deprecation of the functionality in upstream libraries.
Model Navigator Export API:
- new: Added dynamic shapes support and trt dynamic shapes support for TensorFlow2 export
- new: Improved per format logging
- new: PyTorch to Torch-TRT precision selection added
- new: Advanced profiling (measurement windows, configurable batch sizes)
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.16.0
- tf2onnx: v1.10.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.0

Updated NVIDIA containers defaults to 22.04
Model Navigator Export API
- Support for exporting models from TensorFlow2 and PyTorch source code to supported target formats
- Support for conversion from ONNX to supported target formats
- Support for exporting HuggingFace models
- Conversion, Correctness and performance tests for exported models
- Definition of package structure for storing all exported models and additional metadata
Model Navigator OTIS:
- change: run command has been deprecated and may be removed in a future release
- new: optimize command replace run and produces an output *.triton.nav package
- new: select selects the best-performing configuration from *.triton.nav package and create a Triton Inference Server model repository
- new: Added support for using shared memory option for Perf Analyzer
Remove wkhtmltopdf package dependency
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.14.0
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.2.7

Updated NVIDIA containers defaults to 22.02
Removed support for Python 3.7
Triton Model configuration related:
- Support dynamic batching without setting preferred batch size value
Profiling related:
- Deprecated --config-search-max-preferred-batch-size flag as is no longer supported in Triton Model Analyzer
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.2.6

Updated NVIDIA containers defaults to 22.01
Removed support for Python 3.6 due to EOL
Conversion related:
- Added support for Torch-TensorRT conversion
Fixes and improvements
- Processes inside containers started by Model Navigator now run without root privileges
- Fix for volume mounts while running Triton Inference Server in container from other container
- Fix for conversion of models without file extension on input and output paths
- Fix using --model-format argument when input and output files have no extension
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.5

Updated NVIDIA containers defaults to 21.12
Conversion related:
- [Experimental] TF-TRT - fixed default dataset profile generation
Configuration Model on Triton related
- Fixed name for onnxruntime backend in Triton model deployment configuration
Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.4 (2021-12-07)

Updated NVIDIA containers defaults to 21.10
Fixed generating profiling data when dtypes are not passed
Conversion related:
- [Experimental] Added support for TF-TRT conversion
Configuration Model on Triton related
- Added possibility to select batching mode - default, dynamic and disabled options supported
Install dependencies from pip packages instead of wheels for Polygraphy and Triton Model Analyzer
fixes and improvements
Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.3 (2021-11-10)

Updated NVIDIA containers defaults to 21.09
Improved naming of arguments specific for TensorRT conversion and acceleration with backward compatibility
Use pip package for Triton Model Analyzer installation with minimal version 1.8.0
Fixed model_repository path to be not relative to <navigator_workspace> dir
Handle exit codes correctly from CLI commands
Support for use device ids for --gpus argument
Conversion related
- Added support for precision modes to support multiple precisions during conversion to TensorRT
- Added --tensorrt-sparse-weights flag for sparse weight optimization for TensorRT
- Added --tensorrt-strict-types flag forcing it to choose tactics based on the layer precision for TensorRT
- Added --tensorrt-explicit-precision flag enabling explicit precision mode
- Fixed nan values appearing in relative tolerance during conversion to TensorRT
Configuration Model on Triton related
- Removed default value for engine_count_per_device
- Added possibility to define Triton Custom Backend parameters with triton_backend_parameters command
- Added possibility to define max workspace size for TensorRT backend accelerator using argument tensorrt_max_workspace_size
Profiling related
- Added config_search prefix to all profiling parameters (BREAKING CHANGE)
- Added config_search_max_preferred_batch_size parameter
- Added config_search_backend_parameters parameter
fixes and improvements
Versions of used external components:
- Polygraphy: 0.32.0
- GraphSurgeon: 0.3.13
- tf2onnx: v1.9.2 (support for ONNX opset 14, tf 1.15 and 2.6)
- Triton Model Analyzer 1.8.2
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT

0.2.2 (2021-09-06)

Updated NVIDIA containers defaults to 21.08
Versions of used external components:
- Triton Model Analyzer: 1.7.0
- Triton Inference Server Client: 2.13.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT

0.2.1 (2021-08-17)

Fixed triton-model-config error when tensorrt_capture_cuda_graph flag is not passed
Dump Conversion Comparator inputs and outputs into JSON files
Added information in logs on the tolerance parameters values to pass the conversion verification
Use count_windows mode as default option for Perf Analyzer
Added possibility to define custom docker images
Bugfixes
Versions of used external components:
- Triton Model Analyzer: 1.6.0
- Triton Inference Server Client: 2.12.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- TensorRT backend acceleration not supported for ONNX Runtime in Triton Inference Server ver. 21.07

0.2.0 (2021-07-05)

comprehensive refactor of command-line API in order to provide more gradual pipeline steps execution
Versions of used external components:
- Triton Model Analyzer: 21.05
- tf2onnx: v1.8.5 (support for ONNX opset 13, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
  - affected NVIDIA PyTorch containers: 20.12, 21.02, 21.03
  - workaround: use PyTorch containers newer than 21.03
- possible to define a single profile for TensorRT

0.1.1 (2021-04-12)

documentation update

0.1.0 (2021-04-09)

Release of main components:
- Model Converter - converts the model to a set of variants optimized for inference or to be later optimized by Triton Inference Server backend.
- Model Repo Builder - setup Triton Inference Server Model Repository, including its configuration.
- Model Analyzer - select optimal Triton Inference Server configuration based on models compute and memory requirements, available computation infrastructure, and model application constraints.
- Helm Chart Generator - deploy Triton Inference Server and model with optimal configuration to cloud.
Versions of used external components:
- Triton Model Analyzer: 21.03+616e8a30
- tf2onnx: v1.8.4 (support for ONNX opset 13, tf 1.15 and 2.4)
- Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.
Known issues
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
  - affected NVIDIA PyTorch containers: 20.12, 21.03
  - workaround: use containers different from above
- Triton Inference Server stays in the background when the profile process is interrupted by the user