Sequence Batcher
model_navigator.triton.SequenceBatcher
dataclass
SequenceBatcher(strategy=None, max_sequence_idle_microseconds=None, control_inputs=lambda: [](), states=lambda: []())
Sequence batching configuration.
Read more in Triton Inference server model configuration
Parameters:
-
strategy(Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]], default:None) –The strategy used by the sequence batcher.
-
max_sequence_idle_microseconds(Optional[int], default:None) –The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
-
control_inputs(List[SequenceBatcherControlInput], default:lambda: []()) –The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
-
states(List[SequenceBatcherState], default:lambda: []()) –The optional state that can be stored in Triton for performing inference requests on a sequence.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherControl
dataclass
SequenceBatcherControl(kind, dtype=None, int32_false_true=lambda: [](), fp32_false_true=lambda: [](), bool_false_true=lambda: []())
Sequence Batching control configuration.
Read more in Triton Inference server model configuration
Parameters:
-
kind(SequenceBatcherControlKind) –The kind of this control.
-
dtype(Optional[Union[dtype, Type[dtype]]], default:None) –The control's datatype.
-
int32_false_true(List[int], default:lambda: []()) –The control's true and false setting is indicated by setting a value in an int32 tensor.
-
fp32_false_true(List[float], default:lambda: []()) –The control's true and false setting is indicated by setting a value in a fp32 tensor.
-
bool_false_true(List[bool], default:lambda: []()) –The control's true and false setting is indicated by setting a value in a bool tensor.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherControlInput
dataclass
Sequence Batching control input configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name(str) –The name of the model input.
-
controls(List[SequenceBatcherControl]) –List of control value(s) that should be communicated to the model using this model input.
model_navigator.triton.SequenceBatcherControlKind
Bases: Enum
Sequence Batching control options.
Read more in Triton Inference server model configuration
Parameters:
-
CONTROL_SEQUENCE_START–"CONTROL_SEQUENCE_START"
-
CONTROL_SEQUENCE_READY–"CONTROL_SEQUENCE_READY"
-
CONTROL_SEQUENCE_END–"CONTROL_SEQUENCE_END"
-
CONTROL_SEQUENCE_CORRID–"CONTROL_SEQUENCE_CORRID"
model_navigator.triton.SequenceBatcherInitialState
dataclass
Sequence Batching initial state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
name(str) – -
shape(Tuple[int, ...]) –The shape of the state tensor, not including the batch dimension.
-
dtype(Optional[Union[dtype, Type[dtype]]], default:None) –The data-type of the state.
-
zero_data(Optional[bool], default:None) –The identifier for using zeros as initial state data.
-
data_file(Optional[Path], default:None) –The file whose content will be used as the initial data for the state in row-major order.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherState
dataclass
Sequence Batching state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name(str) –The name of the model state input.
-
output_name(str) –The name of the model state output.
-
dtype(Union[dtype, Type[dtype]]) –The data-type of the state.
-
shape(Tuple[int, ...]) –The shape of the state tensor.
-
initial_states(List[SequenceBatcherInitialState], default:lambda: []()) –The optional field to specify the list of initial states for the model.
__post_init__
Validate the configuration for early error handling.
Source code in model_navigator/triton/specialized_configs/common.py
model_navigator.triton.SequenceBatcherStrategyDirect
dataclass
Sequence Batching strategy direct configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_queue_delay_microseconds(int, default:0) –The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
-
minimum_slot_utilization(float, default:0.0) –The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.
model_navigator.triton.SequenceBatcherStrategyOldest
dataclass
SequenceBatcherStrategyOldest(max_candidate_sequences, preferred_batch_size=lambda: [](), max_queue_delay_microseconds=0)
Sequence Batching strategy oldest configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_candidate_sequences(int) –Maximum number of candidate sequences that the batcher maintains.
-
preferred_batch_size(List[int], default:lambda: []()) –Preferred batch sizes for dynamic batching of candidate sequences.
-
max_queue_delay_microseconds(int, default:0) –The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.