Model Config

pytriton.model_config.ModelConfig `dataclass`

ModelConfig(batching: bool = True, max_batch_size: int = 4, batcher: DynamicBatcher = DynamicBatcher(), response_cache: bool = False, decoupled: bool = False)

Additional model configuration for running model through Triton Inference Server.

Parameters:

batching (bool, default: True ) –

Flag to enable/disable batching for model.
max_batch_size (int, default: 4 ) –

The maximal batch size that would be handled by model.
batcher (DynamicBatcher, default: DynamicBatcher() ) –

Configuration of Dynamic Batching for the model.
response_cache (bool, default: False ) –

Flag to enable/disable response cache for the model
decoupled (bool, default: False ) –

Flag to enable/disable decoupled from requests execution

pytriton.model_config.Tensor `dataclass`

Tensor(shape: tuple, dtype: Union[dtype, Type[dtype], Type[object]], name: Optional[str] = None, optional: Optional[bool] = False)

Model input and output definition for Triton deployment.

Parameters:

shape (tuple) –

Shape of the input/output tensor.
dtype (Union[dtype, Type[dtype], Type[object]]) –

Data type of the input/output tensor.
name (Optional[str], default: None ) –

Name of the input/output of model.
optional (Optional[bool], default: False ) –

Flag to mark if input is optional.

__post_init__

__post_init__()

Override object values on post init or field override.

Source code in pytriton/model_config/tensor.py

def __post_init__(self):
    """Override object values on post init or field override."""
    if isinstance(self.dtype, np.dtype):
        object.__setattr__(self, "dtype", self.dtype.type)  # pytype: disable=attribute-error

pytriton.model_config.DeviceKind

Bases: Enum

Device kind for model deployment.

Parameters:

KIND_AUTO –

Automatically select the device for model deployment.
KIND_CPU –

Model is deployed on CPU.
KIND_GPU –

Model is deployed on GPU.

pytriton.model_config.DynamicBatcher `dataclass`

DynamicBatcher(max_queue_delay_microseconds: int = 0, preferred_batch_size: Optional[list] = None, preserve_ordering: bool = False, priority_levels: int = 0, default_priority_level: int = 0, default_queue_policy: Optional[QueuePolicy] = None, priority_queue_policy: Optional[Dict[int, QueuePolicy]] = None)

Dynamic batcher configuration.

More in Triton Inference Server documentation

Parameters:

max_queue_delay_microseconds (int, default: 0 ) –

The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.
preferred_batch_size (Optional[list], default: None ) –

Preferred batch sizes for dynamic batching.
preserve_ordering –

Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.
priority_levels (int, default: 0 ) –

The number of priority levels to be enabled for the model.
default_priority_level (int, default: 0 ) –

The priority level used for requests that don't specify their priority.
default_queue_policy (Optional[QueuePolicy], default: None ) –

The default queue policy used for requests.
priority_queue_policy (Optional[Dict[int, QueuePolicy]], default: None ) –

Specify the queue policy for the priority level.

pytriton.model_config.QueuePolicy `dataclass`

QueuePolicy(timeout_action: TimeoutAction = TimeoutAction.REJECT, default_timeout_microseconds: int = 0, allow_timeout_override: bool = False, max_queue_size: int = 0)

Model queue policy configuration.

More in Triton Inference Server documentation

Parameters:

timeout_action (TimeoutAction, default: REJECT ) –

The action applied to timed-out request.
default_timeout_microseconds (int, default: 0 ) –

The default timeout for every request, in microseconds.
allow_timeout_override (bool, default: False ) –

Whether individual request can override the default timeout value.
max_queue_size (int, default: 0 ) –

The maximum queue size for holding requests.

pytriton.model_config.TimeoutAction

Bases: Enum

Timeout action definition for timeout_action QueuePolicy field.

Parameters:

REJECT –

Reject the request and return error message accordingly.
DELAY –

Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.

Model Config

pytriton.model_config.ModelConfig dataclass

pytriton.model_config.Tensor dataclass

__post_init__

pytriton.model_config.DeviceKind

pytriton.model_config.DynamicBatcher dataclass

pytriton.model_config.QueuePolicy dataclass

pytriton.model_config.TimeoutAction

pytriton.model_config.ModelConfig `dataclass`

pytriton.model_config.Tensor `dataclass`

pytriton.model_config.DynamicBatcher `dataclass`

pytriton.model_config.QueuePolicy `dataclass`