Specialized Configs for Triton Backends
The Python API provides specialized configuration classes that help provide only available options for the given type of model.
model_navigator.api.triton.BaseSpecializedModelConfig
dataclass
Bases: ABC
Common fields for specialized model configs.
Read more in Triton Inference server documentation
Parameters:
-
max_batch_size(int, default:4) –The maximal batch size that would be handled by model.
-
batching(bool, default:True) –Flag to enable/disable batching for model.
-
batcher(Union[DynamicBatcher, SequenceBatcher], default:field(default_factory=DynamicBatcher)) –Configuration of Dynamic Batching for the model.
-
instance_groups(List[InstanceGroup], default:field(default_factory=lambda : [])) –Instance groups configuration for multiple instances of the model
-
parameters(Dict[str, str], default:field(default_factory=lambda : {})) –Custom parameters for model or backend
-
response_cache(bool, default:False) –Flag to enable/disable response cache for the model
-
warmup(Dict[str, ModelWarmup], default:field(default_factory=lambda : {})) –Warmup configuration for model
model_navigator.api.triton.ONNXModelConfig
dataclass
Bases: BaseSpecializedModelConfig
Specialized model config for ONNX backend supported model.
Parameters:
-
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.ONNXRuntimeONNX
-
optimization(Optional[ONNXOptimization], default:None) –Possible optimization for ONNX models
model_navigator.api.triton.ONNXOptimization
dataclass
ONNX possible optimizations.
Parameters:
-
accelerator(Union[OpenVINOAccelerator, TensorRTAccelerator]) –Execution accelerator for model
model_navigator.api.triton.PythonModelConfig
dataclass
Bases: BaseSpecializedModelConfig
Specialized model config for Python backend supported model.
Parameters:
-
inputs(Sequence[InputTensorSpec], default:field(default_factory=lambda : [])) –Required definition of model inputs
-
outputs(Sequence[OutputTensorSpec], default:field(default_factory=lambda : [])) –Required definition of model outputs
model_navigator.api.triton.PyTorchModelConfig
dataclass
Bases: BaseSpecializedModelConfig
Specialized model config for PyTorch backend supported model.
Parameters:
-
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.PyTorchLibtorch
-
inputs(Sequence[InputTensorSpec], default:field(default_factory=lambda : [])) –Required definition of model inputs
-
outputs(Sequence[OutputTensorSpec], default:field(default_factory=lambda : [])) –Required definition of model outputs
model_navigator.api.triton.TensorFlowModelConfig
dataclass
Bases: BaseSpecializedModelConfig
Specialized model config for TensorFlow backend supported model.
Parameters:
-
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.TensorFlowSavedModel, Platform.TensorFlowGraphDef
-
optimization(Optional[TensorFlowOptimization], default:None) –Possible optimization for TensorFlow models
model_navigator.api.triton.TensorFlowOptimization
dataclass
TensorFlow possible optimizations.
Parameters:
-
accelerator(Union[AutoMixedPrecisionAccelerator, GPUIOAccelerator, TensorRTAccelerator]) –Execution accelerator for model
model_navigator.api.triton.TensorRTModelConfig
dataclass
Bases: BaseSpecializedModelConfig
Specialized model config for TensorRT platform supported model.
Parameters:
-
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.TensorRTPlan
-
optimization(Optional[TensorRTOptimization], default:None) –Possible optimization for TensorRT models
model_navigator.api.triton.TensorRTOptimization
dataclass
TensorRT possible optimizations.
Parameters:
-
cuda_graphs(bool, default:False) –Use CUDA graphs API to capture model operations and execute them more efficiently.
-
gather_kernel_buffer_threshold(Optional[int], default:None) –The backend may use a gather kernel to gather input data if the device has direct access to the source buffer and the destination buffer.
-
eager_batching(bool, default:False) –Start preparing the next batch before the model instance is ready for the next inference.