Skip to content

Changelog

0.5.4

  • new: Custom implementation for ONNX and TensorRT runners
  • new: Use CUDA 12 for JAX in unit tests and functional tests
  • new: Step-by-step examples
  • new: Updated documentation
  • new: TensorRTCUDAGraph runner introduced with support for CUDA graphs
  • fix: Optimal shape not set correctly during adaptive conversion
  • fix: Find max batch size command for JAX
  • fix: Save stdout to logfiles in debug mode

  • Version of external components used during testing:

  • PyTorch 2.1.0a0+fe05266f
  • TensorFlow 2.12.0
  • TensorRT 8.6.1
  • ONNX Runtime 1.14.1
  • Polygraphy: 0.47.1
  • GraphSurgeon: 0.3.26
  • tf2onnx v1.14.0
  • Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.3

0.5.2

0.5.1

0.5.0

  • new: Support for PyTriton deployment
  • new: Support for Python models with python.optimize API
  • new: PyTorch 2 compile CPU and CUDA runners
  • new: Collect conversion max batch size in status
  • new: PyTorch runners with compile support
  • change: Improved handling CUDA and CPU runners
  • change: Reduced finding device max batch size time by running it once as separate pipeline
  • change: Stored find max batch size result in separate filed in status

  • Version of external components used during testing:

  • PyTorch 1.14.0a0+410ce96
  • TensorFlow 2.11.0
  • TensorRT 8.5.3
  • ONNX Runtime 1.13.1
  • Polygraphy: 0.44.2
  • GraphSurgeon: 0.4.6
  • tf2onnx v1.13.0
  • Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0

  • new: optimize method that replace export and perform max batch size search and improved profiling during process
  • new: Introduced custom configs in optimize for better parametrization of export/conversion commands
  • new: Support for adding user runners for model correctness and profiling
  • new: Search for max possible batch size per format during conversion and profiling
  • new: API for creating Triton model store from Navigator Package and user provided models
  • change: Improved status structure for Navigator Package
  • deprecated: Optimize for Triton Inference Server support
  • deprecated: HuggingFace contrib module
  • Bug fixes and other improvements

  • Version of external components used during testing:

  • PyTorch 1.14.0a0+410ce96
  • TensorFlow 2.11.0
  • TensorRT 8.5.2.2
  • ONNX Runtime 1.13.1
  • Polygraphy: 0.43.1
  • GraphSurgeon: 0.4.6
  • tf2onnx v1.13.0
  • Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.3.8

0.3.7

0.3.6

  • Updated NVIDIA containers defaults to 22.09
  • Model Navigator Export API:

    • new: cast int64 input data to int32 in runner for Torch-TensorRT
    • new: cast 64-bit data samples to 32-bit values for TensorRT
    • new: verbose flag for logging export and conversion commands to console
    • new: debug flag to enable debug mode for export and conversion commands
    • change: logs from commands are streamed to console during command run
    • change: package load omit the log files and autogenerated scripts
  • Version of external components used during testing:

0.3.5

  • Updated NVIDIA containers defaults to 22.08
  • Model Navigator Export API:

    • new: TRTExec runner use use_cuda_graph=True by default
    • new: log warning instead of raising error when dataloader dump inputs with nan or inf values
    • new: enabled logging for command input parameters
    • fix: invalid use of Polygraphy TRT profile when trt_dynamic_axes is passed to export function
  • Version of external components used during testing:

0.3.4

  • Updated NVIDIA containers defaults to 22.07
  • Model Navigator OTIS:
    • deprecated: TF32 precision for TensorRT from CLI options - will be removed in future versions
    • fix: Tensorflow module was imported when obtaining model signature during conversion
  • Model Navigator Export API:

    • new: Support for building framework containers with Model Navigator installed
    • new: Example for loading Navigator Package for reproducing the results
    • new: Create reproducing script for correctness and performance steps
    • new: TrtexecRunner for correctness and performance tests with trtexec tool
    • new: Use TF32 support by default for models with FP32 precision
    • new: Reset conversion parameters to defaults when using load for package
    • new: Testing all options for JAX export enable_xla and jit_compile parameters
    • change: Profiling stability improvements
    • change: Rename of onnx_runtimes export function parameters to runtimes
    • deprecated: TF32 precision for TensorRT from available options - will be removed in future versions
    • fix: Do not save TF-TRT models to the .nav package
    • fix: Do not save TF-TRT models from the .nav package
    • fix: Correctly load .nav packages when _input_names or _output_names specified
    • fix: Adjust TF and TF-TRT model signatures to match input_names
    • fix: Save ONNX opset for CLI configuration inside package
    • fix: Reproduction scripts were missing for failing paths
  • Version of external components used during testing:

0.3.3

  • Model Navigator Export API:

    • new: Improved handling inputs and outputs metadata
    • new: Navigator Package version updated to 0.1.3
    • new: Backward compatibility with previous versions of Navigator Package
    • fix: Dynamic shapes for output shapes were read incorrectly
  • Version of external components used during testing:

0.3.2

  • Updated NVIDIA containers defaults to 22.06
  • Model Navigator OTIS:
    • new: Perf Analyzer profiling data use base64 format for content
    • fix: Signature for TensorRT model when has uint64 or int64 input and/or outputs defined
  • Model Navigator Export API:

    • new: Updated navigator package format to 0.1.1
    • new: Added Model Navigator version to status file
    • new: Add atol and rtol configuration to CLI config for model
    • new: Added experimental support for JAX models
    • new: In case of export or conversion failures prepare minimal scripts to reproduce errors
    • fix: Conversion parameters are not stored in Navigator Package for CLI execution
  • Version of external components used during testing:

0.3.1

  • Updated NVIDIA containers defaults to 22.05
  • Model Navigator OTIS:
    • fix: Saving paths inside the Triton package status file
    • fix: Empty list of gpus cause the process run on CPU only
    • fix: Reading content from zipped Navigator Package
    • fix: When no GPU or target device set to CPU optimize avoid running unsupported conversions in CLI
    • new: Converter accept passing target device kind to selected CPU or GPU supported conversions
    • new: Added support for OpenVINO accelerator for ONNXRuntime
    • new: Added option --config-search-early-exit-enable for Model Analyzer early exit support in manual profiling mode
    • new: Added option --model-config-name to the select command. It allows to pick a particular model configuration for deployment from the set of all configurations generated by Triton Model Analyzer, even if it's not the best performing one.
    • removed: The --tensorrt-strict-types option has been removed due to deprecation of the functionality in upstream libraries.
  • Model Navigator Export API:

    • new: Added dynamic shapes support and trt dynamic shapes support for TensorFlow2 export
    • new: Improved per format logging
    • new: PyTorch to Torch-TRT precision selection added
    • new: Advanced profiling (measurement windows, configurable batch sizes)
  • Version of external components used during testing:

0.3.0

  • Updated NVIDIA containers defaults to 22.04
  • Model Navigator Export API
    • Support for exporting models from TensorFlow2 and PyTorch source code to supported target formats
    • Support for conversion from ONNX to supported target formats
    • Support for exporting HuggingFace models
    • Conversion, Correctness and performance tests for exported models
    • Definition of package structure for storing all exported models and additional metadata
  • Model Navigator OTIS:
    • change: run command has been deprecated and may be removed in a future release
    • new: optimize command replace run and produces an output *.triton.nav package
    • new: select selects the best-performing configuration from *.triton.nav package and create a Triton Inference Server model repository
    • new: Added support for using shared memory option for Perf Analyzer
  • Remove wkhtmltopdf package dependency

  • Version of external components used during testing:

0.2.7

  • Updated NVIDIA containers defaults to 22.02
  • Removed support for Python 3.7
  • Triton Model configuration related:
    • Support dynamic batching without setting preferred batch size value
  • Profiling related:

    • Deprecated --config-search-max-preferred-batch-size flag as is no longer supported in Triton Model Analyzer
  • Version of external components used during testing:

0.2.6

  • Updated NVIDIA containers defaults to 22.01
  • Removed support for Python 3.6 due to EOL
  • Conversion related:
    • Added support for Torch-TensorRT conversion
  • Fixes and improvements

    • Processes inside containers started by Model Navigator now run without root privileges
    • Fix for volume mounts while running Triton Inference Server in container from other container
    • Fix for conversion of models without file extension on input and output paths
    • Fix using --model-format argument when input and output files have no extension
  • Version of external components used during testing:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
    • possible to define a single profile for TensorRT
    • no custom ops support
    • Triton Inference Server stays in the background when the profile process is interrupted by the user
    • TF-TRT conversion lost outputs shapes info

0.2.5

  • Updated NVIDIA containers defaults to 21.12
  • Conversion related:
    • [Experimental] TF-TRT - fixed default dataset profile generation
  • Configuration Model on Triton related

    • Fixed name for onnxruntime backend in Triton model deployment configuration
  • Version of external components used during testing:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
    • possible to define a single profile for TensorRT
    • no custom ops support
    • Triton Inference Server stays in the background when the profile process is interrupted by the user
    • TF-TRT conversion lost outputs shapes info

0.2.4 (2021-12-07)

  • Updated NVIDIA containers defaults to 21.10
  • Fixed generating profiling data when dtypes are not passed
  • Conversion related:
    • [Experimental] Added support for TF-TRT conversion
  • Configuration Model on Triton related
    • Added possibility to select batching mode - default, dynamic and disabled options supported
  • Install dependencies from pip packages instead of wheels for Polygraphy and Triton Model Analyzer
  • fixes and improvements

  • Version of external components used during testing:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
    • possible to define a single profile for TensorRT
    • no custom ops support
    • Triton Inference Server stays in the background when the profile process is interrupted by the user
    • TF-TRT conversion lost outputs shapes info

0.2.3 (2021-11-10)

  • Updated NVIDIA containers defaults to 21.09
  • Improved naming of arguments specific for TensorRT conversion and acceleration with backward compatibility
  • Use pip package for Triton Model Analyzer installation with minimal version 1.8.0
  • Fixed model_repository path to be not relative to <navigator_workspace> dir
  • Handle exit codes correctly from CLI commands
  • Support for use device ids for --gpus argument
  • Conversion related
    • Added support for precision modes to support multiple precisions during conversion to TensorRT
    • Added --tensorrt-sparse-weights flag for sparse weight optimization for TensorRT
    • Added --tensorrt-strict-types flag forcing it to choose tactics based on the layer precision for TensorRT
    • Added --tensorrt-explicit-precision flag enabling explicit precision mode
    • Fixed nan values appearing in relative tolerance during conversion to TensorRT
  • Configuration Model on Triton related
    • Removed default value for engine_count_per_device
    • Added possibility to define Triton Custom Backend parameters with triton_backend_parameters command
    • Added possibility to define max workspace size for TensorRT backend accelerator using argument tensorrt_max_workspace_size
  • Profiling related
    • Added config_search prefix to all profiling parameters (BREAKING CHANGE)
    • Added config_search_max_preferred_batch_size parameter
    • Added config_search_backend_parameters parameter
  • fixes and improvements

  • Versions of used external components:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • missing support for models without batching support
    • no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
    • possible to define a single profile for TensorRT

0.2.2 (2021-09-06)

  • Updated NVIDIA containers defaults to 21.08

  • Versions of used external components:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • missing support for models without batching support
    • no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
    • possible to define a single profile for TensorRT

0.2.1 (2021-08-17)

  • Fixed triton-model-config error when tensorrt_capture_cuda_graph flag is not passed
  • Dump Conversion Comparator inputs and outputs into JSON files
  • Added information in logs on the tolerance parameters values to pass the conversion verification
  • Use count_windows mode as default option for Perf Analyzer
  • Added possibility to define custom docker images
  • Bugfixes

  • Versions of used external components:

  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • missing support for models without batching support
    • no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
    • possible to define a single profile for TensorRT
    • TensorRT backend acceleration not supported for ONNX Runtime in Triton Inference Server ver. 21.07

0.2.0 (2021-07-05)

  • comprehensive refactor of command-line API in order to provide more gradual pipeline steps execution

  • Versions of used external components:

    • Triton Model Analyzer: 21.05
    • tf2onnx: v1.8.5 (support for ONNX opset 13, tf 1.15 and 2.5)
    • Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
  • Known issues and limitations

    • missing support for stateful models (ex. time-series one)
    • missing support for models without batching support
    • no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
    • issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
      • affected NVIDIA PyTorch containers: 20.12, 21.02, 21.03
      • workaround: use PyTorch containers newer than 21.03
    • possible to define a single profile for TensorRT

0.1.1 (2021-04-12)

  • documentation update

0.1.0 (2021-04-09)

  • Release of main components:

    • Model Converter - converts the model to a set of variants optimized for inference or to be later optimized by Triton Inference Server backend.
    • Model Repo Builder - setup Triton Inference Server Model Repository, including its configuration.
    • Model Analyzer - select optimal Triton Inference Server configuration based on models compute and memory requirements, available computation infrastructure, and model application constraints.
    • Helm Chart Generator - deploy Triton Inference Server and model with optimal configuration to cloud.
  • Versions of used external components:

    • Triton Model Analyzer: 21.03+616e8a30
    • tf2onnx: v1.8.4 (support for ONNX opset 13, tf 1.15 and 2.4)
    • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.
  • Known issues

    • missing support for stateful models (ex. time-series one)
    • missing support for models without batching support
    • no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
    • issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
      • affected NVIDIA PyTorch containers: 20.12, 21.03
      • workaround: use containers different from above
    • Triton Inference Server stays in the background when the profile process is interrupted by the user