Skip to content

Model Config

pytriton.model_config.ModelConfig dataclass

ModelConfig(batching: bool = True, max_batch_size: int = 4, batcher: DynamicBatcher = DynamicBatcher(), response_cache: bool = False, decoupled: bool = False)

Additional model configuration for running model through Triton Inference Server.

Parameters:

  • batching (bool, default: True ) –

    Flag to enable/disable batching for model.

  • max_batch_size (int, default: 4 ) –

    The maximal batch size that would be handled by model.

  • batcher (DynamicBatcher, default: DynamicBatcher() ) –

    Configuration of Dynamic Batching for the model.

  • response_cache (bool, default: False ) –

    Flag to enable/disable response cache for the model

  • decoupled (bool, default: False ) –

    Flag to enable/disable decoupled from requests execution

pytriton.model_config.Tensor dataclass

Tensor(shape: tuple, dtype: Union[dtype, Type[dtype], Type[object]], name: Optional[str] = None, optional: Optional[bool] = False)

Model input and output definition for Triton deployment.

Parameters:

  • shape (tuple) –

    Shape of the input/output tensor.

  • dtype (Union[dtype, Type[dtype], Type[object]]) –

    Data type of the input/output tensor.

  • name (Optional[str], default: None ) –

    Name of the input/output of model.

  • optional (Optional[bool], default: False ) –

    Flag to mark if input is optional.

__post_init__

__post_init__()

Override object values on post init or field override.

Source code in pytriton/model_config/tensor.py
def __post_init__(self):
    """Override object values on post init or field override."""
    if isinstance(self.dtype, np.dtype):
        object.__setattr__(self, "dtype", self.dtype.type)  # pytype: disable=attribute-error

pytriton.model_config.DeviceKind

Bases: Enum

Device kind for model deployment.

Parameters:

  • KIND_AUTO

    Automatically select the device for model deployment.

  • KIND_CPU

    Model is deployed on CPU.

  • KIND_GPU

    Model is deployed on GPU.

pytriton.model_config.DynamicBatcher dataclass

DynamicBatcher(max_queue_delay_microseconds: int = 0, preferred_batch_size: Optional[list] = None, preserve_ordering: bool = False, priority_levels: int = 0, default_priority_level: int = 0, default_queue_policy: Optional[QueuePolicy] = None, priority_queue_policy: Optional[Dict[int, QueuePolicy]] = None)

Dynamic batcher configuration.

More in Triton Inference Server documentation

Parameters:

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.

  • preferred_batch_size (Optional[list], default: None ) –

    Preferred batch sizes for dynamic batching.

  • preserve_ordering

    Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.

  • priority_levels (int, default: 0 ) –

    The number of priority levels to be enabled for the model.

  • default_priority_level (int, default: 0 ) –

    The priority level used for requests that don't specify their priority.

  • default_queue_policy (Optional[QueuePolicy], default: None ) –

    The default queue policy used for requests.

  • priority_queue_policy (Optional[Dict[int, QueuePolicy]], default: None ) –

    Specify the queue policy for the priority level.

pytriton.model_config.QueuePolicy dataclass

QueuePolicy(timeout_action: TimeoutAction = REJECT, default_timeout_microseconds: int = 0, allow_timeout_override: bool = False, max_queue_size: int = 0)

Model queue policy configuration.

More in Triton Inference Server documentation

Parameters:

  • timeout_action (TimeoutAction, default: REJECT ) –

    The action applied to timed-out request.

  • default_timeout_microseconds (int, default: 0 ) –

    The default timeout for every request, in microseconds.

  • allow_timeout_override (bool, default: False ) –

    Whether individual request can override the default timeout value.

  • max_queue_size (int, default: 0 ) –

    The maximum queue size for holding requests.

pytriton.model_config.TimeoutAction

Bases: Enum

Timeout action definition for timeout_action QueuePolicy field.

Parameters:

  • REJECT

    Reject the request and return error message accordingly.

  • DELAY

    Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.