Dynamic Batcher
model_navigator.api.triton.DynamicBatcher
dataclass
Dynamic batching configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_queue_delay_microseconds
(
int
) –The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.
-
preferred_batch_size
(
Optional[list]
) –Preferred batch sizes for dynamic batching.
-
preserve_ordering
–
Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.
-
priority_levels
(
int
) –The number of priority levels to be enabled for the model.
-
default_priority_level
(
int
) –The priority level used for requests that don't specify their priority.
-
default_queue_policy
(
Optional[QueuePolicy]
) –The default queue policy used for requests.
-
priority_queue_policy
(
Optional[Dict[int, QueuePolicy]]
) –Specify the queue policy for the priority level.
model_navigator.api.triton.QueuePolicy
dataclass
Model queue policy configuration.
Used for default_queue_policy
and priority_queue_policy
fields in DynamicBatcher configuration.
Read more in Triton Inference server model configuration
Parameters:
-
timeout_action
(
TimeoutAction
) –The action applied to timed-out request.
-
default_timeout_microseconds
(
int
) –The default timeout for every request, in microseconds.
-
allow_timeout_override
(
bool
) –Whether individual request can override the default timeout value.
-
max_queue_size
(
int
) –The maximum queue size for holding requests.
model_navigator.api.triton.TimeoutAction
Timeout action definition for timeout_action QueuePolicy field.
Read more in Triton Inference server model configuration
Parameters:
-
REJECT
–
"REJECT"
-
DELAY
–
"DELAY"