Skip to content

Dynamic Batcher

model_navigator.api.triton.DynamicBatcher dataclass

Dynamic batching configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
max_queue_delay_microseconds int

The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.

0
preferred_batch_size Optional[list]

Preferred batch sizes for dynamic batching.

None
preserve_ordering

Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.

False
priority_levels int

The number of priority levels to be enabled for the model.

0
default_priority_level int

The priority level used for requests that don't specify their priority.

0
default_queue_policy Optional[QueuePolicy]

The default queue policy used for requests.

None
priority_queue_policy Optional[Dict[int, QueuePolicy]]

Specify the queue policy for the priority level.

None

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.default_priority_level > self.priority_levels:
        raise ModelNavigatorWrongParameterError(
            "The `default_priority_level` must be between 1 and " f"{self.priority_levels}."
        )

    if self.priority_queue_policy:
        if not self.priority_levels:
            raise ModelNavigatorWrongParameterError(
                "Provide the `priority_levels` if you want to define `priority_queue_policy` "
                "for Dynamic Batching."
            )

        for priority in self.priority_queue_policy.keys():
            if priority < 0 or priority > self.priority_levels:
                raise ModelNavigatorWrongParameterError(
                    f"Invalid `priority`={priority} provided. The value must be between "
                    f"1 and {self.priority_levels}."
                )

model_navigator.api.triton.QueuePolicy dataclass

Model queue policy configuration.

Used for default_queue_policy and priority_queue_policy fields in DynamicBatcher configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
timeout_action TimeoutAction

The action applied to timed-out request.

TimeoutAction.REJECT
default_timeout_microseconds int

The default timeout for every request, in microseconds.

0
allow_timeout_override bool

Whether individual request can override the default timeout value.

False
max_queue_size int

The maximum queue size for holding requests.

0

model_navigator.api.triton.TimeoutAction

Bases: enum.Enum

Timeout action definition for timeout_action QueuePolicy field.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
REJECT

"REJECT"

required
DELAY

"DELAY"

required