Specialized Configs for Triton Backends
The Python API provides specialized configuration classes that help provide only available options for the given type of model.
model_navigator.triton.BaseSpecializedModelConfig
dataclass
BaseSpecializedModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: []())
Bases: ABC
Common fields for specialized model configs.
Read more in Triton Inference server documentation
Parameters:
-
max_batch_size
(int
, default:4
) –The maximal batch size that would be handled by model.
-
batching
(bool
, default:True
) –Flag to enable/disable batching for model.
-
default_model_filename
(Optional[str]
, default:None
) –Optional filename of the model file to use.
-
batcher
(Union[DynamicBatcher, SequenceBatcher]
, default:DynamicBatcher()
) –Configuration of Dynamic Batching for the model.
-
instance_groups
(List[InstanceGroup]
, default:lambda: []()
) –Instance groups configuration for multiple instances of the model
-
parameters
(Dict[str, str]
, default:lambda: {}()
) –Custom parameters for model or backend
-
response_cache
(bool
, default:False
) –Flag to enable/disable response cache for the model
-
warmup
(Dict[str, ModelWarmup]
, default:lambda: {}()
) –Warmup configuration for model
backend
abstractmethod
property
Backend property that has to be overridden by specialized configs.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/base_model_config.py
model_navigator.triton.ONNXModelConfig
dataclass
ONNXModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
Bases: BaseSpecializedModelConfig
Specialized model config for ONNX backend supported model.
Parameters:
-
platform
(Optional[Platform]
, default:None
) –Override backend parameter with platform. Possible options: Platform.ONNXRuntimeONNX
-
optimization
(Optional[ONNXOptimization]
, default:None
) –Possible optimization for ONNX models
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/onnx_model_config.py
model_navigator.triton.ONNXOptimization
dataclass
ONNX possible optimizations.
Parameters:
-
accelerator
(Union[OpenVINOAccelerator, TensorRTAccelerator]
) –Execution accelerator for model
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/onnx_model_config.py
model_navigator.triton.PythonModelConfig
dataclass
PythonModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: []())
Bases: BaseSpecializedModelConfig
Specialized model config for Python backend supported model.
Parameters:
-
inputs
(Sequence[InputTensorSpec]
, default:lambda: []()
) –Required definition of model inputs
-
outputs
(Sequence[OutputTensorSpec]
, default:lambda: []()
) –Required definition of model outputs
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/python_model_config.py
model_navigator.triton.PyTorchModelConfig
dataclass
PyTorchModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None)
Bases: BaseSpecializedModelConfig
Specialized model config for PyTorch backend supported model.
Parameters:
-
platform
(Optional[Platform]
, default:None
) –Override backend parameter with platform. Possible options: Platform.PyTorchLibtorch
-
inputs
(Sequence[InputTensorSpec]
, default:lambda: []()
) –Required definition of model inputs
-
outputs
(Sequence[OutputTensorSpec]
, default:lambda: []()
) –Required definition of model outputs
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/pytorch_model_config.py
model_navigator.triton.TensorFlowModelConfig
dataclass
TensorFlowModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
Bases: BaseSpecializedModelConfig
Specialized model config for TensorFlow backend supported model.
Parameters:
-
platform
(Optional[Platform]
, default:None
) –Override backend parameter with platform. Possible options: Platform.TensorFlowSavedModel, Platform.TensorFlowGraphDef
-
optimization
(Optional[TensorFlowOptimization]
, default:None
) –Possible optimization for TensorFlow models
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorflow_model_config.py
model_navigator.triton.TensorFlowOptimization
dataclass
TensorFlow possible optimizations.
Parameters:
-
accelerator
(Union[AutoMixedPrecisionAccelerator, GPUIOAccelerator, TensorRTAccelerator]
) –Execution accelerator for model
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorflow_model_config.py
model_navigator.triton.TensorRTModelConfig
dataclass
TensorRTModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
Bases: BaseSpecializedModelConfig
Specialized model config for TensorRT platform supported model.
Parameters:
-
platform
(Optional[Platform]
, default:None
) –Override backend parameter with platform. Possible options: Platform.TensorRTPlan
-
optimization
(Optional[TensorRTOptimization]
, default:None
) –Possible optimization for TensorRT models
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorrt_model_config.py
model_navigator.triton.TensorRTOptimization
dataclass
TensorRT possible optimizations.
Parameters:
-
cuda_graphs
(bool
, default:False
) –Use CUDA graphs API to capture model operations and execute them more efficiently.
-
gather_kernel_buffer_threshold
(Optional[int]
, default:None
) –The backend may use a gather kernel to gather input data if the device has direct access to the source buffer and the destination buffer.
-
eager_batching
(bool
, default:False
) –Start preparing the next batch before the model instance is ready for the next inference.
__post_init__
Validate the configuration for early error handling.