Skip to content

Accelerators

model_navigator.api.triton.AutoMixedPrecisionAccelerator dataclass

Auto-mixed-precision accelerator for TensorFlow. Enable automatic FP16 precision.

Currently empty - no arguments required.

model_navigator.api.triton.GPUIOAccelerator dataclass

GPU IO accelerator for TensorFlow.

Currently empty - no arguments required.

model_navigator.api.triton.OpenVINOAccelerator dataclass

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.api.triton.OpenVINOAccelerator dataclass

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.api.triton.TensorRTAccelerator dataclass

TensorRT accelerator configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
precision TensorRTOptPrecision

The precision used for optimization

TensorRTOptPrecision.FP32
max_workspace_size Optional[int]

The maximum GPU memory the model can use temporarily during execution

None
max_cached_engines Optional[int]

The maximum number of cached TensorRT engines in dynamic TensorRT ops

None
minimum_segment_size Optional[int]

The smallest model subgraph that will be considered for optimization by TensorRT

None

model_navigator.api.triton.TensorRTOptPrecision

Bases: enum.Enum

TensorRT optimization allowed precision.

Parameters:

Name Type Description Default
FP16

fp16 precision

required
FP32

fp32 precision

required