Changelog
0.4.0
- new:
optimize
method that replaceexport
and perform max batch size search and improved profiling during process - new: Introduced custom configs in
optimize
for better parametrization of export/conversion commands - new: Support for adding user runners for model correctness and profiling
- new: Search for max possible batch size per format during conversion and profiling
- new: API for creating Triton model store from Navigator Package and user provided models
- change: Improved status structure for Navigator Package
- deprecated: Optimize for Triton Inference Server support
- deprecated: HuggingFace contrib module
-
Bug fixes and other improvements
-
Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.2.2
- ONNX Runtime 1.13.1
- Polygraphy: 0.43.1
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.8
-
Updated NVIDIA containers defaults to 22.11
-
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.7
-
Updated NVIDIA containers defaults to 22.10
-
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.6
- Updated NVIDIA containers defaults to 22.09
-
Model Navigator Export API:
- new: cast int64 input data to int32 in runner for Torch-TensorRT
- new: cast 64-bit data samples to 32-bit values for TensorRT
- new: verbose flag for logging export and conversion commands to console
- new: debug flag to enable debug mode for export and conversion commands
- change: logs from commands are streamed to console during command run
- change: package load omit the log files and autogenerated scripts
-
Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.5
- Updated NVIDIA containers defaults to 22.08
-
Model Navigator Export API:
- new: TRTExec runner use
use_cuda_graph=True
by default - new: log warning instead of raising error when dataloader dump inputs with
nan
orinf
values - new: enabled logging for command input parameters
- fix: invalid use of Polygraphy TRT profile when trt_dynamic_axes is passed to export function
- new: TRTExec runner use
-
Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.19.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.4
- Updated NVIDIA containers defaults to 22.07
- Model Navigator OTIS:
- deprecated:
TF32
precision for TensorRT from CLI options - will be removed in future versions - fix: Tensorflow module was imported when obtaining model signature during conversion
- deprecated:
-
Model Navigator Export API:
- new: Support for building framework containers with Model Navigator installed
- new: Example for loading Navigator Package for reproducing the results
- new: Create reproducing script for correctness and performance steps
- new: TrtexecRunner for correctness and performance tests with trtexec tool
- new: Use TF32 support by default for models with FP32 precision
- new: Reset conversion parameters to defaults when using
load
for package - new: Testing all options for JAX export enable_xla and jit_compile parameters
- change: Profiling stability improvements
- change: Rename of
onnx_runtimes
export function parameters toruntimes
- deprecated:
TF32
precision for TensorRT from available options - will be removed in future versions - fix: Do not save TF-TRT models to the .nav package
- fix: Do not save TF-TRT models from the .nav package
- fix: Correctly load .nav packages when
_input_names
or_output_names
specified - fix: Adjust TF and TF-TRT model signatures to match
input_names
- fix: Save ONNX opset for CLI configuration inside package
- fix: Reproduction scripts were missing for failing paths
-
Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.3
-
Model Navigator Export API:
- new: Improved handling inputs and outputs metadata
- new: Navigator Package version updated to 0.1.3
- new: Backward compatibility with previous versions of Navigator Package
- fix: Dynamic shapes for output shapes were read incorrectly
-
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.2
- Updated NVIDIA containers defaults to 22.06
- Model Navigator OTIS:
- new: Perf Analyzer profiling data use base64 format for content
- fix: Signature for TensorRT model when has
uint64
orint64
input and/or outputs defined
-
Model Navigator Export API:
- new: Updated navigator package format to 0.1.1
- new: Added Model Navigator version to status file
- new: Add atol and rtol configuration to CLI config for model
- new: Added experimental support for JAX models
- new: In case of export or conversion failures prepare minimal scripts to reproduce errors
- fix: Conversion parameters are not stored in Navigator Package for CLI execution
-
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.1
- Updated NVIDIA containers defaults to 22.05
- Model Navigator OTIS:
- fix: Saving paths inside the Triton package status file
- fix: Empty list of gpus cause the process run on CPU only
- fix: Reading content from zipped Navigator Package
- fix: When no GPU or target device set to CPU
optimize
avoid running unsupported conversions in CLI - new: Converter accept passing target device kind to selected CPU or GPU supported conversions
- new: Added support for OpenVINO accelerator for ONNXRuntime
- new: Added option
--config-search-early-exit-enable
for Model Analyzer early exit support in manual profiling mode - new: Added option
--model-config-name
to theselect
command. It allows to pick a particular model configuration for deployment from the set of all configurations generated by Triton Model Analyzer, even if it's not the best performing one. - removed: The
--tensorrt-strict-types
option has been removed due to deprecation of the functionality in upstream libraries.
-
Model Navigator Export API:
- new: Added dynamic shapes support and trt dynamic shapes support for TensorFlow2 export
- new: Improved per format logging
- new: PyTorch to Torch-TRT precision selection added
- new: Advanced profiling (measurement windows, configurable batch sizes)
-
Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.16.0
- tf2onnx: v1.10.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.3.0
- Updated NVIDIA containers defaults to 22.04
- Model Navigator Export API
- Support for exporting models from TensorFlow2 and PyTorch source code to supported target formats
- Support for conversion from ONNX to supported target formats
- Support for exporting HuggingFace models
- Conversion, Correctness and performance tests for exported models
- Definition of package structure for storing all exported models and additional metadata
- Model Navigator OTIS:
- change:
run
command has been deprecated and may be removed in a future release - new:
optimize
command replacerun
and produces an output*.triton.nav
package - new:
select
selects the best-performing configuration from*.triton.nav
package and create a Triton Inference Server model repository - new: Added support for using shared memory option for Perf Analyzer
- change:
-
Remove wkhtmltopdf package dependency
-
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.14.0
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.2.7
- Updated NVIDIA containers defaults to 22.02
- Removed support for Python 3.7
- Triton Model configuration related:
- Support dynamic batching without setting preferred batch size value
-
Profiling related:
- Deprecated
--config-search-max-preferred-batch-size
flag as is no longer supported in Triton Model Analyzer
- Deprecated
-
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
0.2.6
- Updated NVIDIA containers defaults to 22.01
- Removed support for Python 3.6 due to EOL
- Conversion related:
- Added support for Torch-TensorRT conversion
-
Fixes and improvements
- Processes inside containers started by Model Navigator now run without root privileges
- Fix for volume mounts while running Triton Inference Server in container from other container
- Fix for conversion of models without file extension on input and output paths
- Fix using
--model-format
argument when input and output files have no extension
-
Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
0.2.5
- Updated NVIDIA containers defaults to 21.12
- Conversion related:
- [Experimental] TF-TRT - fixed default dataset profile generation
-
Configuration Model on Triton related
- Fixed name for onnxruntime backend in Triton model deployment configuration
-
Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
0.2.4 (2021-12-07)
- Updated NVIDIA containers defaults to 21.10
- Fixed generating profiling data when
dtypes
are not passed - Conversion related:
- [Experimental] Added support for TF-TRT conversion
- Configuration Model on Triton related
- Added possibility to select batching mode - default, dynamic and disabled options supported
- Install dependencies from pip packages instead of wheels for Polygraphy and Triton Model Analyzer
-
fixes and improvements
-
Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
0.2.3 (2021-11-10)
- Updated NVIDIA containers defaults to 21.09
- Improved naming of arguments specific for TensorRT conversion and acceleration with backward compatibility
- Use pip package for Triton Model Analyzer installation with minimal version 1.8.0
- Fixed
model_repository
path to be not relative to<navigator_workspace>
dir - Handle exit codes correctly from CLI commands
- Support for use device ids for
--gpus
argument - Conversion related
- Added support for precision modes to support multiple precisions during conversion to TensorRT
- Added
--tensorrt-sparse-weights
flag for sparse weight optimization for TensorRT - Added
--tensorrt-strict-types
flag forcing it to choose tactics based on the layer precision for TensorRT - Added
--tensorrt-explicit-precision
flag enabling explicit precision mode - Fixed nan values appearing in relative tolerance during conversion to TensorRT
- Configuration Model on Triton related
- Removed default value for
engine_count_per_device
- Added possibility to define Triton Custom Backend parameters with
triton_backend_parameters
command - Added possibility to define max workspace size for TensorRT backend accelerator using
argument
tensorrt_max_workspace_size
- Removed default value for
- Profiling related
- Added
config_search
prefix to all profiling parameters (BREAKING CHANGE) - Added
config_search_max_preferred_batch_size
parameter - Added
config_search_backend_parameters
parameter
- Added
-
fixes and improvements
-
Versions of used external components:
- Polygraphy: 0.32.0
- GraphSurgeon: 0.3.13
- tf2onnx: v1.9.2 (support for ONNX opset 14, tf 1.15 and 2.6)
- Triton Model Analyzer 1.8.2
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
0.2.2 (2021-09-06)
-
Updated NVIDIA containers defaults to 21.08
-
Versions of used external components:
- Triton Model Analyzer: 1.7.0
- Triton Inference Server Client: 2.13.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
0.2.1 (2021-08-17)
- Fixed triton-model-config error when tensorrt_capture_cuda_graph flag is not passed
- Dump Conversion Comparator inputs and outputs into JSON files
- Added information in logs on the tolerance parameters values to pass the conversion verification
- Use
count_windows
mode as default option for Perf Analyzer - Added possibility to define custom docker images
-
Bugfixes
-
Versions of used external components:
- Triton Model Analyzer: 1.6.0
- Triton Inference Server Client: 2.12.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- TensorRT backend acceleration not supported for ONNX Runtime in Triton Inference Server ver. 21.07
0.2.0 (2021-07-05)
-
comprehensive refactor of command-line API in order to provide more gradual pipeline steps execution
-
Versions of used external components:
- Triton Model Analyzer: 21.05
- tf2onnx: v1.8.5 (support for ONNX opset 13, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
-
Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due
to issue in PyTorch 1.8
- affected NVIDIA PyTorch containers: 20.12, 21.02, 21.03
- workaround: use PyTorch containers newer than 21.03
- possible to define a single profile for TensorRT
0.1.1 (2021-04-12)
- documentation update
0.1.0 (2021-04-09)
-
Release of main components:
- Model Converter - converts the model to a set of variants optimized for inference or to be later optimized by Triton Inference Server backend.
- Model Repo Builder - setup Triton Inference Server Model Repository, including its configuration.
- Model Analyzer - select optimal Triton Inference Server configuration based on models compute and memory requirements, available computation infrastructure, and model application constraints.
- Helm Chart Generator - deploy Triton Inference Server and model with optimal configuration to cloud.
-
Versions of used external components:
- Triton Model Analyzer: 21.03+616e8a30
- tf2onnx: v1.8.4 (support for ONNX opset 13, tf 1.15 and 2.4)
- Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.
-
Known issues
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due
to issue in PyTorch 1.8
- affected NVIDIA PyTorch containers: 20.12, 21.03
- workaround: use containers different from above
- Triton Inference Server stays in the background when the profile process is interrupted by the user