Specialized Configs for Triton Backends
The Python API provides specialized configuration classes that help provide only available options for the given type of model.
          model_navigator.api.triton.BaseSpecializedModelConfig
  
  
      dataclass
  
BaseSpecializedModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: []())
            Bases: ABC
Common fields for specialized model configs.
Read more in Triton Inference server documentation
Parameters:
- 
        
max_batch_size(int, default:4) –The maximal batch size that would be handled by model.
 - 
        
batching(bool, default:True) –Flag to enable/disable batching for model.
 - 
        
default_model_filename(Optional[str], default:None) –Optional filename of the model file to use.
 - 
        
batcher(Union[DynamicBatcher, SequenceBatcher], default:DynamicBatcher()) –Configuration of Dynamic Batching for the model.
 - 
        
instance_groups(List[InstanceGroup], default:lambda: []()) –Instance groups configuration for multiple instances of the model
 - 
        
parameters(Dict[str, str], default:lambda: {}()) –Custom parameters for model or backend
 - 
        
response_cache(bool, default:False) –Flag to enable/disable response cache for the model
 - 
        
warmup(Dict[str, ModelWarmup], default:lambda: {}()) –Warmup configuration for model
 
          backend
  
  
      abstractmethod
      property
  
  Backend property that has to be overridden by specialized configs.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/base_model_config.py
            
          model_navigator.api.triton.ONNXModelConfig
  
  
      dataclass
  
ONNXModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
            Bases: BaseSpecializedModelConfig
Specialized model config for ONNX backend supported model.
Parameters:
- 
        
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.ONNXRuntimeONNX
 - 
        
optimization(Optional[ONNXOptimization], default:None) –Possible optimization for ONNX models
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/onnx_model_config.py
            
          model_navigator.api.triton.ONNXOptimization
  
  
      dataclass
  
  ONNX possible optimizations.
Parameters:
- 
        
accelerator(Union[OpenVINOAccelerator, TensorRTAccelerator]) –Execution accelerator for model
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/onnx_model_config.py
            
          
          model_navigator.api.triton.PythonModelConfig
  
  
      dataclass
  
PythonModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: []())
            Bases: BaseSpecializedModelConfig
Specialized model config for Python backend supported model.
Parameters:
- 
        
inputs(Sequence[InputTensorSpec], default:lambda: []()) –Required definition of model inputs
 - 
        
outputs(Sequence[OutputTensorSpec], default:lambda: []()) –Required definition of model outputs
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/python_model_config.py
            
          model_navigator.api.triton.PyTorchModelConfig
  
  
      dataclass
  
PyTorchModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None)
            Bases: BaseSpecializedModelConfig
Specialized model config for PyTorch backend supported model.
Parameters:
- 
        
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.PyTorchLibtorch
 - 
        
inputs(Sequence[InputTensorSpec], default:lambda: []()) –Required definition of model inputs
 - 
        
outputs(Sequence[OutputTensorSpec], default:lambda: []()) –Required definition of model outputs
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/pytorch_model_config.py
            
          model_navigator.api.triton.TensorFlowModelConfig
  
  
      dataclass
  
TensorFlowModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
            Bases: BaseSpecializedModelConfig
Specialized model config for TensorFlow backend supported model.
Parameters:
- 
        
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.TensorFlowSavedModel, Platform.TensorFlowGraphDef
 - 
        
optimization(Optional[TensorFlowOptimization], default:None) –Possible optimization for TensorFlow models
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorflow_model_config.py
            
          model_navigator.api.triton.TensorFlowOptimization
  
  
      dataclass
  
  TensorFlow possible optimizations.
Parameters:
- 
        
accelerator(Union[AutoMixedPrecisionAccelerator, GPUIOAccelerator, TensorRTAccelerator]) –Execution accelerator for model
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorflow_model_config.py
            
          model_navigator.api.triton.TensorRTModelConfig
  
  
      dataclass
  
TensorRTModelConfig(max_batch_size=4, batching=True, default_model_filename=None, batcher=DynamicBatcher(), instance_groups=lambda: [](), parameters=lambda: {}(), response_cache=False, warmup=lambda: {}(), inputs=lambda: [](), outputs=lambda: [](), platform=None, optimization=None)
            Bases: BaseSpecializedModelConfig
Specialized model config for TensorRT platform supported model.
Parameters:
- 
        
platform(Optional[Platform], default:None) –Override backend parameter with platform. Possible options: Platform.TensorRTPlan
 - 
        
optimization(Optional[TensorRTOptimization], default:None) –Possible optimization for TensorRT models
 
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/tensorrt_model_config.py
            
          model_navigator.api.triton.TensorRTOptimization
  
  
      dataclass
  
  TensorRT possible optimizations.
Parameters:
- 
        
cuda_graphs(bool, default:False) –Use CUDA graphs API to capture model operations and execute them more efficiently.
 - 
        
gather_kernel_buffer_threshold(Optional[int], default:None) –The backend may use a gather kernel to gather input data if the device has direct access to the source buffer and the destination buffer.
 - 
        
eager_batching(bool, default:False) –Start preparing the next batch before the model instance is ready for the next inference.
 
__post_init__
Validate the configuration for early error handling.