Dynamic Batcher
model_navigator.api.triton.DynamicBatcher
dataclass
Dynamic batching configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_queue_delay_microseconds |
int
|
The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching. |
0
|
preferred_batch_size |
Optional[list]
|
Preferred batch sizes for dynamic batching. |
None
|
preserve_ordering |
Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler. |
False
|
|
priority_levels |
int
|
The number of priority levels to be enabled for the model. |
0
|
default_priority_level |
int
|
The priority level used for requests that don't specify their priority. |
0
|
default_queue_policy |
Optional[QueuePolicy]
|
The default queue policy used for requests. |
None
|
priority_queue_policy |
Optional[Dict[int, QueuePolicy]]
|
Specify the queue policy for the priority level. |
None
|
__post_init__()
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.api.triton.QueuePolicy
dataclass
Model queue policy configuration.
Used for default_queue_policy
and priority_queue_policy
fields in DynamicBatcher configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout_action |
TimeoutAction
|
The action applied to timed-out request. |
TimeoutAction.REJECT
|
default_timeout_microseconds |
int
|
The default timeout for every request, in microseconds. |
0
|
allow_timeout_override |
bool
|
Whether individual request can override the default timeout value. |
False
|
max_queue_size |
int
|
The maximum queue size for holding requests. |
0
|
model_navigator.api.triton.TimeoutAction
Timeout action definition for timeout_action QueuePolicy field.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
REJECT |
"REJECT" |
required | |
DELAY |
"DELAY" |
required |