Skip to content

Specialized Configs for Triton Backends

The Python API provides specialized configuration classes that help provide only available options for the given type of model.

model_navigator.api.triton.BaseSpecializedModelConfig dataclass

Bases: ABC

Common fields for specialized model configs.

Read more in Triton Inference server documentation

Parameters:

  • max_batch_size (int, default: 4 ) –

    The maximal batch size that would be handled by model.

  • batching (bool, default: True ) –

    Flag to enable/disable batching for model.

  • batcher (Union[DynamicBatcher, SequenceBatcher], default: dataclasses.field(default_factory=DynamicBatcher) ) –

    Configuration of Dynamic Batching for the model.

  • instance_groups (List[InstanceGroup], default: dataclasses.field(default_factory=lambda : []) ) –

    Instance groups configuration for multiple instances of the model

  • parameters (Dict[str, str], default: dataclasses.field(default_factory=lambda : {}) ) –

    Custom parameters for model or backend

  • response_cache (bool, default: False ) –

    Flag to enable/disable response cache for the model

  • warmup (Dict[str, ModelWarmup], default: dataclasses.field(default_factory=lambda : {}) ) –

    Warmup configuration for model

backend abstractmethod property

backend: Backend

Backend property that has to be overriden by specialized configs.

model_navigator.api.triton.ONNXModelConfig dataclass

Bases: BaseSpecializedModelConfig

Specialized model config for ONNX backend supported model.

Parameters:

  • platform (Optional[Platform], default: None ) –

    Override backend parameter with platform. Possible options: Platform.ONNXRuntimeONNX

  • optimization (Optional[ONNXOptimization], default: None ) –

    Possible optimization for ONNX models

backend property

backend: Backend

Define backend value for config.

model_navigator.api.triton.ONNXOptimization dataclass

ONNX possible optimizations.

Parameters:

model_navigator.api.triton.PythonModelConfig dataclass

Bases: BaseSpecializedModelConfig

Specialized model config for Python backend supported model.

Parameters:

backend property

backend: Backend

Define backend value for config.

model_navigator.api.triton.PyTorchModelConfig dataclass

Bases: BaseSpecializedModelConfig

Specialized model config for PyTorch backend supported model.

Parameters:

backend property

backend: Backend

Define backend value for config.

model_navigator.api.triton.TensorFlowModelConfig dataclass

Bases: BaseSpecializedModelConfig

Specialized model config for TensorFlow backend supported model.

Parameters:

  • platform (Optional[Platform], default: None ) –

    Override backend parameter with platform. Possible options: Platform.TensorFlowSavedModel, Platform.TensorFlowGraphDef

  • optimization (Optional[TensorFlowOptimization], default: None ) –

    Possible optimization for TensorFlow models

backend property

backend: Backend

Define backend value for config.

model_navigator.api.triton.TensorFlowOptimization dataclass

TensorFlow possible optimizations.

Parameters:

model_navigator.api.triton.TensorRTModelConfig dataclass

Bases: BaseSpecializedModelConfig

Specialized model config for TensorRT platform supported model.

Parameters:

  • platform (Optional[Platform], default: None ) –

    Override backend parameter with platform. Possible options: Platform.TensorRTPlan

  • optimization (Optional[TensorRTOptimization], default: None ) –

    Possible optimization for TensorRT models

backend property

backend: Backend

Define backend value for config.

model_navigator.api.triton.TensorRTOptimization dataclass

TensorRT possible optimizations.

Parameters:

  • cuda_graphs (bool, default: False ) –

    Use CUDA graphs API to capture model operations and execute them more efficiently.

  • gather_kernel_buffer_threshold (Optional[int], default: None ) –

    The backend may use a gather kernel to gather input data if the device has direct access to the source buffer and the destination buffer.

  • eager_batching (bool, default: False ) –

    Start preparing the next batch before the model instance is ready for the next inference.