Sequence Batcher
model_navigator.api.triton.SequenceBatcher
dataclass
Sequence batching configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
strategy |
Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]]
|
The strategy used by the sequence batcher. |
None
|
max_sequence_idle_microseconds |
Optional[int]
|
The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted. |
None
|
control_inputs |
List[SequenceBatcherControlInput]
|
The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model. |
dataclasses.field(default_factory=lambda : [])
|
states |
List[SequenceBatcherState]
|
The optional state that can be stored in Triton for performing inference requests on a sequence. |
dataclasses.field(default_factory=lambda : [])
|
__post_init__()
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.api.triton.SequenceBatcherControl
dataclass
Sequence Batching control configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kind |
SequenceBatcherControlKind
|
The kind of this control. |
required |
dtype |
Optional[Union[np.dtype, Type[np.dtype]]]
|
The control's datatype. |
None
|
int32_false_true |
List[int]
|
The control's true and false setting is indicated by setting a value in an int32 tensor. |
dataclasses.field(default_factory=lambda : [])
|
fp32_false_true |
List[float]
|
The control's true and false setting is indicated by setting a value in a fp32 tensor. |
dataclasses.field(default_factory=lambda : [])
|
bool_false_true |
List[bool]
|
The control's true and false setting is indicated by setting a value in a bool tensor. |
dataclasses.field(default_factory=lambda : [])
|
__post_init__()
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.api.triton.SequenceBatcherControlInput
dataclass
Sequence Batching control input configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_name |
str
|
The name of the model input. |
required |
controls |
List[SequenceBatcherControl]
|
List of control value(s) that should be communicated to the model using this model input. |
required |
model_navigator.api.triton.SequenceBatcherControlKind
Sequence Batching control options.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
CONTROL_SEQUENCE_START |
"CONTROL_SEQUENCE_START" |
required | |
CONTROL_SEQUENCE_READY |
"CONTROL_SEQUENCE_READY" |
required | |
CONTROL_SEQUENCE_END |
"CONTROL_SEQUENCE_END" |
required | |
CONTROL_SEQUENCE_CORRID |
"CONTROL_SEQUENCE_CORRID" |
required |
model_navigator.api.triton.SequenceBatcherInitialState
dataclass
Sequence Batching initial state configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
required | |
shape |
Tuple[int, ...]
|
The shape of the state tensor, not including the batch dimension. |
required |
dtype |
Optional[Union[np.dtype, Type[np.dtype]]]
|
The data-type of the state. |
None
|
zero_data |
Optional[bool]
|
The identifier for using zeros as initial state data. |
None
|
data_file |
Optional[str]
|
The file whose content will be used as the initial data for the state in row-major order. |
None
|
__post_init__()
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.api.triton.SequenceBatcherState
dataclass
Sequence Batching state configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_name |
str
|
The name of the model state input. |
required |
output_name |
str
|
The name of the model state output. |
required |
dtype |
Union[np.dtype, Type[np.dtype]]
|
The data-type of the state. |
required |
shape |
Tuple[int, ...]
|
The shape of the state tensor. |
required |
initial_states |
List[SequenceBatcherInitialState]
|
The optional field to specify the list of initial states for the model. |
dataclasses.field(default_factory=lambda : [])
|
__post_init__()
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.api.triton.SequenceBatcherStrategyDirect
dataclass
Sequence Batching strategy direct configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_queue_delay_microseconds |
int
|
The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching. |
0
|
minimum_slot_utilization |
float
|
The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires. |
0.0
|
model_navigator.api.triton.SequenceBatcherStrategyOldest
dataclass
Sequence Batching strategy oldest configuration.
Read more in Triton Inference server model configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_candidate_sequences |
int
|
Maximum number of candidate sequences that the batcher maintains. |
required |
preferred_batch_size |
List[int]
|
Preferred batch sizes for dynamic batching of candidate sequences. |
dataclasses.field(default_factory=lambda : [])
|
max_queue_delay_microseconds |
int
|
The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching. |
0
|