Sequence Batcher
model_navigator.triton.SequenceBatcher
dataclass
SequenceBatcher(strategy=None, max_sequence_idle_microseconds=None, control_inputs=lambda: [](), states=lambda: []())
Sequence batching configuration.
Read more in Triton Inference server model configuration
Parameters:
-
strategy
(Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]]
, default:None
) –The strategy used by the sequence batcher.
-
max_sequence_idle_microseconds
(Optional[int]
, default:None
) –The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
-
control_inputs
(List[SequenceBatcherControlInput]
, default:lambda: []()
) –The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
-
states
(List[SequenceBatcherState]
, default:lambda: []()
) –The optional state that can be stored in Triton for performing inference requests on a sequence.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherControl
dataclass
SequenceBatcherControl(kind, dtype=None, int32_false_true=lambda: [](), fp32_false_true=lambda: [](), bool_false_true=lambda: []())
Sequence Batching control configuration.
Read more in Triton Inference server model configuration
Parameters:
-
kind
(SequenceBatcherControlKind
) –The kind of this control.
-
dtype
(Optional[Union[dtype, Type[dtype]]]
, default:None
) –The control's datatype.
-
int32_false_true
(List[int]
, default:lambda: []()
) –The control's true and false setting is indicated by setting a value in an int32 tensor.
-
fp32_false_true
(List[float]
, default:lambda: []()
) –The control's true and false setting is indicated by setting a value in a fp32 tensor.
-
bool_false_true
(List[bool]
, default:lambda: []()
) –The control's true and false setting is indicated by setting a value in a bool tensor.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherControlInput
dataclass
Sequence Batching control input configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(str
) –The name of the model input.
-
controls
(List[SequenceBatcherControl]
) –List of control value(s) that should be communicated to the model using this model input.
model_navigator.triton.SequenceBatcherControlKind
Bases: Enum
Sequence Batching control options.
Read more in Triton Inference server model configuration
Parameters:
-
CONTROL_SEQUENCE_START
–"CONTROL_SEQUENCE_START"
-
CONTROL_SEQUENCE_READY
–"CONTROL_SEQUENCE_READY"
-
CONTROL_SEQUENCE_END
–"CONTROL_SEQUENCE_END"
-
CONTROL_SEQUENCE_CORRID
–"CONTROL_SEQUENCE_CORRID"
model_navigator.triton.SequenceBatcherInitialState
dataclass
Sequence Batching initial state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
name
(str
) – -
shape
(Tuple[int, ...]
) –The shape of the state tensor, not including the batch dimension.
-
dtype
(Optional[Union[dtype, Type[dtype]]]
, default:None
) –The data-type of the state.
-
zero_data
(Optional[bool]
, default:None
) –The identifier for using zeros as initial state data.
-
data_file
(Optional[Path]
, default:None
) –The file whose content will be used as the initial data for the state in row-major order.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherState
dataclass
Sequence Batching state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(str
) –The name of the model state input.
-
output_name
(str
) –The name of the model state output.
-
dtype
(Union[dtype, Type[dtype]]
) –The data-type of the state.
-
shape
(Tuple[int, ...]
) –The shape of the state tensor.
-
initial_states
(List[SequenceBatcherInitialState]
, default:lambda: []()
) –The optional field to specify the list of initial states for the model.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherStrategyDirect
dataclass
Sequence Batching strategy direct configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_queue_delay_microseconds
(int
, default:0
) –The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
-
minimum_slot_utilization
(float
, default:0.0
) –The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.
model_navigator.triton.SequenceBatcherStrategyOldest
dataclass
SequenceBatcherStrategyOldest(max_candidate_sequences, preferred_batch_size=lambda: [](), max_queue_delay_microseconds=0)
Sequence Batching strategy oldest configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_candidate_sequences
(int
) –Maximum number of candidate sequences that the batcher maintains.
-
preferred_batch_size
(List[int]
, default:lambda: []()
) –Preferred batch sizes for dynamic batching of candidate sequences.
-
max_queue_delay_microseconds
(int
, default:0
) –The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.