Model Config
            pytriton.model_config.ModelConfig
  
      dataclass
  
ModelConfig(batching: bool = True, max_batch_size: int = 4, batcher: DynamicBatcher = DynamicBatcher(), response_cache: bool = False, decoupled: bool = False, model_warmup: Optional[List[ModelWarmup]] = None)
Additional model configuration for running model through Triton Inference Server.
Parameters:
- 
            
batching(bool, default:True) –Flag to enable/disable batching for model.
 - 
            
max_batch_size(int, default:4) –The maximal batch size that would be handled by model.
 - 
            
batcher(DynamicBatcher, default:DynamicBatcher()) –Configuration of Dynamic Batching for the model.
 - 
            
response_cache(bool, default:False) –Flag to enable/disable response cache for the model
 - 
            
decoupled(bool, default:False) –Flag to enable/disable decoupled from requests execution
 
            pytriton.model_config.Tensor
  
      dataclass
  
Tensor(shape: tuple, dtype: Union[dtype, Type[dtype], Type[object]], name: Optional[str] = None, optional: Optional[bool] = False)
Model input and output definition for Triton deployment.
Parameters:
- 
            
shape(tuple) –Shape of the input/output tensor.
 - 
            
dtype(Union[dtype, Type[dtype], Type[object]]) –Data type of the input/output tensor.
 - 
            
name(Optional[str], default:None) –Name of the input/output of model.
 - 
            
optional(Optional[bool], default:False) –Flag to mark if input is optional.
 
__post_init__
Override object values on post init or field override.
pytriton.model_config.DeviceKind
              Bases: Enum
Device kind for model deployment.
Parameters:
- 
            
KIND_AUTO–Automatically select the device for model deployment.
 - 
            
KIND_CPU–Model is deployed on CPU.
 - 
            
KIND_GPU–Model is deployed on GPU.
 
            pytriton.model_config.DynamicBatcher
  
      dataclass
  
DynamicBatcher(max_queue_delay_microseconds: int = 0, preferred_batch_size: Optional[list] = None, preserve_ordering: bool = False, priority_levels: int = 0, default_priority_level: int = 0, default_queue_policy: Optional[QueuePolicy] = None, priority_queue_policy: Optional[Dict[int, QueuePolicy]] = None)
Dynamic batcher configuration.
More in Triton Inference Server documentation
Parameters:
- 
            
max_queue_delay_microseconds(int, default:0) –The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.
 - 
            
preferred_batch_size(Optional[list], default:None) –Preferred batch sizes for dynamic batching.
 - 
            
preserve_ordering–Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.
 - 
            
priority_levels(int, default:0) –The number of priority levels to be enabled for the model.
 - 
            
default_priority_level(int, default:0) –The priority level used for requests that don't specify their priority.
 - 
            
default_queue_policy(Optional[QueuePolicy], default:None) –The default queue policy used for requests.
 - 
            
priority_queue_policy(Optional[Dict[int, QueuePolicy]], default:None) –Specify the queue policy for the priority level.
 
            pytriton.model_config.QueuePolicy
  
      dataclass
  
QueuePolicy(timeout_action: TimeoutAction = REJECT, default_timeout_microseconds: int = 0, allow_timeout_override: bool = False, max_queue_size: int = 0)
Model queue policy configuration.
More in Triton Inference Server documentation
Parameters:
- 
            
timeout_action(TimeoutAction, default:REJECT) –The action applied to timed-out request.
 - 
            
default_timeout_microseconds(int, default:0) –The default timeout for every request, in microseconds.
 - 
            
allow_timeout_override(bool, default:False) –Whether individual request can override the default timeout value.
 - 
            
max_queue_size(int, default:0) –The maximum queue size for holding requests.
 
pytriton.model_config.TimeoutAction
              Bases: Enum
Timeout action definition for timeout_action QueuePolicy field.
Parameters:
- 
            
REJECT–Reject the request and return error message accordingly.
 - 
            
DELAY–Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.