Skip to content

Model Config

pytriton.model_config.ModelConfig dataclass

ModelConfig(batching: bool = True, max_batch_size: int = 4, batcher: DynamicBatcher = DynamicBatcher(), response_cache: bool = False, decoupled: bool = False)

Additional model configuration for running model through Triton Inference Server.


  • batching (bool, default: True ) –

    Flag to enable/disable batching for model.

  • max_batch_size (int, default: 4 ) –

    The maximal batch size that would be handled by model.

  • batcher (DynamicBatcher, default: DynamicBatcher() ) –

    Configuration of Dynamic Batching for the model.

  • response_cache (bool, default: False ) –

    Flag to enable/disable response cache for the model

  • decoupled (bool, default: False ) –

    Flag to enable/disable decoupled from requests execution

pytriton.model_config.Tensor dataclass

Tensor(shape: tuple, dtype: Union[dtype, Type[dtype], Type[object]], name: Optional[str] = None, optional: Optional[bool] = False)

Model input and output definition for Triton deployment.


  • shape (tuple) –

    Shape of the input/output tensor.

  • dtype (Union[dtype, Type[dtype], Type[object]]) –

    Data type of the input/output tensor.

  • name (Optional[str], default: None ) –

    Name of the input/output of model.

  • optional (Optional[bool], default: False ) –

    Flag to mark if input is optional.



Override object values on post init or field override.

Source code in pytriton/model_config/
def __post_init__(self):
    """Override object values on post init or field override."""
    if isinstance(self.dtype, np.dtype):
        object.__setattr__(self, "dtype", self.dtype.type)  # pytype: disable=attribute-error


Bases: Enum

Device kind for model deployment.



    Automatically select the device for model deployment.


    Model is deployed on CPU.


    Model is deployed on GPU.

pytriton.model_config.DynamicBatcher dataclass

DynamicBatcher(max_queue_delay_microseconds: int = 0, preferred_batch_size: Optional[list] = None, preserve_ordering: bool = False, priority_levels: int = 0, default_priority_level: int = 0, default_queue_policy: Optional[QueuePolicy] = None, priority_queue_policy: Optional[Dict[int, QueuePolicy]] = None)

Dynamic batcher configuration.

More in Triton Inference Server documentation


  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.

  • preferred_batch_size (Optional[list], default: None ) –

    Preferred batch sizes for dynamic batching.

  • preserve_ordering

    Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.

  • priority_levels (int, default: 0 ) –

    The number of priority levels to be enabled for the model.

  • default_priority_level (int, default: 0 ) –

    The priority level used for requests that don't specify their priority.

  • default_queue_policy (Optional[QueuePolicy], default: None ) –

    The default queue policy used for requests.

  • priority_queue_policy (Optional[Dict[int, QueuePolicy]], default: None ) –

    Specify the queue policy for the priority level.

pytriton.model_config.QueuePolicy dataclass

QueuePolicy(timeout_action: TimeoutAction = REJECT, default_timeout_microseconds: int = 0, allow_timeout_override: bool = False, max_queue_size: int = 0)

Model queue policy configuration.

More in Triton Inference Server documentation


  • timeout_action (TimeoutAction, default: REJECT ) –

    The action applied to timed-out request.

  • default_timeout_microseconds (int, default: 0 ) –

    The default timeout for every request, in microseconds.

  • allow_timeout_override (bool, default: False ) –

    Whether individual request can override the default timeout value.

  • max_queue_size (int, default: 0 ) –

    The maximum queue size for holding requests.


Bases: Enum

Timeout action definition for timeout_action QueuePolicy field.



    Reject the request and return error message accordingly.


    Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.