Accelerators
model_navigator.triton.AutoMixedPrecisionAccelerator
dataclass
Auto-mixed-precision accelerator for TensorFlow. Enable automatic FP16 precision.
Currently empty - no arguments required.
model_navigator.triton.GPUIOAccelerator
dataclass
GPU IO accelerator for TensorFlow.
Currently empty - no arguments required.
model_navigator.triton.OpenVINOAccelerator
dataclass
OpenVINO optimization.
Currently empty - no arguments required.
model_navigator.triton.OpenVINOAccelerator
dataclass
OpenVINO optimization.
Currently empty - no arguments required.
model_navigator.triton.TensorRTAccelerator
dataclass
TensorRTAccelerator(precision=TensorRTOptPrecision.FP32, max_workspace_size=None, max_cached_engines=None, minimum_segment_size=None)
TensorRT accelerator configuration.
Read more in Triton Inference server model configuration
Parameters:
-
precision
(TensorRTOptPrecision
, default:FP32
) –The precision used for optimization
-
max_workspace_size
(Optional[int]
, default:None
) –The maximum GPU memory the model can use temporarily during execution
-
max_cached_engines
(Optional[int]
, default:None
) –The maximum number of cached TensorRT engines in dynamic TensorRT ops
-
minimum_segment_size
(Optional[int]
, default:None
) –The smallest model subgraph that will be considered for optimization by TensorRT
model_navigator.triton.TensorRTOptPrecision
Bases: Enum
TensorRT optimization allowed precision.
Parameters:
-
FP16
–fp16 precision
-
FP32
–fp32 precision