Skip to content

Accelerators

model_navigator.triton.AutoMixedPrecisionAccelerator dataclass

AutoMixedPrecisionAccelerator()

Auto-mixed-precision accelerator for TensorFlow. Enable automatic FP16 precision.

Currently empty - no arguments required.

model_navigator.triton.GPUIOAccelerator dataclass

GPUIOAccelerator()

GPU IO accelerator for TensorFlow.

Currently empty - no arguments required.

model_navigator.triton.OpenVINOAccelerator dataclass

OpenVINOAccelerator()

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.triton.OpenVINOAccelerator dataclass

OpenVINOAccelerator()

OpenVINO optimization.

Currently empty - no arguments required.

model_navigator.triton.TensorRTAccelerator dataclass

TensorRTAccelerator(precision=TensorRTOptPrecision.FP32, max_workspace_size=None, max_cached_engines=None, minimum_segment_size=None)

TensorRT accelerator configuration.

Read more in Triton Inference server model configuration

Parameters:

  • precision (TensorRTOptPrecision, default: FP32 ) –

    The precision used for optimization

  • max_workspace_size (Optional[int], default: None ) –

    The maximum GPU memory the model can use temporarily during execution

  • max_cached_engines (Optional[int], default: None ) –

    The maximum number of cached TensorRT engines in dynamic TensorRT ops

  • minimum_segment_size (Optional[int], default: None ) –

    The smallest model subgraph that will be considered for optimization by TensorRT

model_navigator.triton.TensorRTOptPrecision

Bases: Enum

TensorRT optimization allowed precision.

Parameters:

  • FP16

    fp16 precision

  • FP32

    fp32 precision