Skip to content

Accelerators

model_navigator.api.triton.AutoMixedPrecisionAccelerator dataclass

Auto-mixed-precision accelerator for TensorFlow. Enable automatic FP16 precision.

Currently empty - no arguments required.

model_navigator.api.triton.GPUIOAccelerator dataclass

GPU IO accelerator for TensorFlow.

Currently empty - no arguments required.

model_navigator.api.triton.OpenVINOAccelerator dataclass

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.api.triton.OpenVINOAccelerator dataclass

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.api.triton.TensorRTAccelerator dataclass

TensorRT accelerator configuration.

Read more in Triton Inference server model configuration

Parameters:

  • precision (TensorRTOptPrecision, default: FP32 ) –

    The precision used for optimization

  • max_workspace_size (Optional[int], default: None ) –

    The maximum GPU memory the model can use temporarily during execution

  • max_cached_engines (Optional[int], default: None ) –

    The maximum number of cached TensorRT engines in dynamic TensorRT ops

  • minimum_segment_size (Optional[int], default: None ) –

    The smallest model subgraph that will be considered for optimization by TensorRT

model_navigator.api.triton.TensorRTOptPrecision

Bases: Enum

TensorRT optimization allowed precision.

Parameters:

  • FP16

    fp16 precision

  • FP32

    fp32 precision