Sequence Batcher
model_navigator.api.triton.SequenceBatcher
dataclass
Sequence batching configuration.
Read more in Triton Inference server model configuration
Parameters:
-
strategy
(
Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]]
) –The strategy used by the sequence batcher.
-
max_sequence_idle_microseconds
(
Optional[int]
) –The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
-
control_inputs
(
List[SequenceBatcherControlInput]
) –The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
-
states
(
List[SequenceBatcherState]
) –The optional state that can be stored in Triton for performing inference requests on a sequence.
model_navigator.api.triton.SequenceBatcherControl
dataclass
Sequence Batching control configuration.
Read more in Triton Inference server model configuration
Parameters:
-
kind
(
SequenceBatcherControlKind
) –The kind of this control.
-
dtype
(
Optional[Union[np.dtype, Type[np.dtype]]]
) –The control's datatype.
-
int32_false_true
(
List[int]
) –The control's true and false setting is indicated by setting a value in an int32 tensor.
-
fp32_false_true
(
List[float]
) –The control's true and false setting is indicated by setting a value in a fp32 tensor.
-
bool_false_true
(
List[bool]
) –The control's true and false setting is indicated by setting a value in a bool tensor.
model_navigator.api.triton.SequenceBatcherControlInput
dataclass
Sequence Batching control input configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(
str
) –The name of the model input.
-
controls
(
List[SequenceBatcherControl]
) –List of control value(s) that should be communicated to the model using this model input.
model_navigator.api.triton.SequenceBatcherControlKind
Sequence Batching control options.
Read more in Triton Inference server model configuration
Parameters:
-
CONTROL_SEQUENCE_START
–
"CONTROL_SEQUENCE_START"
-
CONTROL_SEQUENCE_READY
–
"CONTROL_SEQUENCE_READY"
-
CONTROL_SEQUENCE_END
–
"CONTROL_SEQUENCE_END"
-
CONTROL_SEQUENCE_CORRID
–
"CONTROL_SEQUENCE_CORRID"
model_navigator.api.triton.SequenceBatcherInitialState
dataclass
Sequence Batching initial state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
name
(
str
) – -
shape
(
Tuple[int, ...]
) –The shape of the state tensor, not including the batch dimension.
-
dtype
(
Optional[Union[np.dtype, Type[np.dtype]]]
) –The data-type of the state.
-
zero_data
(
Optional[bool]
) –The identifier for using zeros as initial state data.
-
data_file
(
Optional[str]
) –The file whose content will be used as the initial data for the state in row-major order.
model_navigator.api.triton.SequenceBatcherState
dataclass
Sequence Batching state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(
str
) –The name of the model state input.
-
output_name
(
str
) –The name of the model state output.
-
dtype
(
Union[np.dtype, Type[np.dtype]]
) –The data-type of the state.
-
shape
(
Tuple[int, ...]
) –The shape of the state tensor.
-
initial_states
(
List[SequenceBatcherInitialState]
) –The optional field to specify the list of initial states for the model.
model_navigator.api.triton.SequenceBatcherStrategyDirect
dataclass
Sequence Batching strategy direct configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_queue_delay_microseconds
(
int
) –The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
-
minimum_slot_utilization
(
float
) –The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.
model_navigator.api.triton.SequenceBatcherStrategyOldest
dataclass
Sequence Batching strategy oldest configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_candidate_sequences
(
int
) –Maximum number of candidate sequences that the batcher maintains.
-
preferred_batch_size
(
List[int]
) –Preferred batch sizes for dynamic batching of candidate sequences.
-
max_queue_delay_microseconds
(
int
) –The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.