Sequence Batcher
model_navigator.api.triton.SequenceBatcher
dataclass
Sequence batching configuration.
Read more in Triton Inference server model configuration
Parameters:
-
strategy
(
Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]], default:None) –The strategy used by the sequence batcher.
-
max_sequence_idle_microseconds
(
Optional[int], default:None) –The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
-
control_inputs
(
List[SequenceBatcherControlInput], default:dataclasses.field(default_factory=lambda : [])) –The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
-
states
(
List[SequenceBatcherState], default:dataclasses.field(default_factory=lambda : [])) –The optional state that can be stored in Triton for performing inference requests on a sequence.
model_navigator.api.triton.SequenceBatcherControl
dataclass
Sequence Batching control configuration.
Read more in Triton Inference server model configuration
Parameters:
-
kind
(
SequenceBatcherControlKind) –The kind of this control.
-
dtype
(
Optional[Union[dtype, Type[numpy.dtype]]], default:None) –The control's datatype.
-
int32_false_true
(
List[int], default:dataclasses.field(default_factory=lambda : [])) –The control's true and false setting is indicated by setting a value in an int32 tensor.
-
fp32_false_true
(
List[float], default:dataclasses.field(default_factory=lambda : [])) –The control's true and false setting is indicated by setting a value in a fp32 tensor.
-
bool_false_true
(
List[bool], default:dataclasses.field(default_factory=lambda : [])) –The control's true and false setting is indicated by setting a value in a bool tensor.
model_navigator.api.triton.SequenceBatcherControlInput
dataclass
Sequence Batching control input configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(
str) –The name of the model input.
-
controls
(
List[SequenceBatcherControl]) –List of control value(s) that should be communicated to the model using this model input.
model_navigator.api.triton.SequenceBatcherControlKind
Bases: Enum
Sequence Batching control options.
Read more in Triton Inference server model configuration
Parameters:
-
CONTROL_SEQUENCE_START
–
"CONTROL_SEQUENCE_START"
-
CONTROL_SEQUENCE_READY
–
"CONTROL_SEQUENCE_READY"
-
CONTROL_SEQUENCE_END
–
"CONTROL_SEQUENCE_END"
-
CONTROL_SEQUENCE_CORRID
–
"CONTROL_SEQUENCE_CORRID"
model_navigator.api.triton.SequenceBatcherInitialState
dataclass
Sequence Batching initial state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
name
(
str) – -
shape
(
Tuple[int, ...]) –The shape of the state tensor, not including the batch dimension.
-
dtype
(
Optional[Union[dtype, Type[numpy.dtype]]], default:None) –The data-type of the state.
-
zero_data
(
Optional[bool], default:None) –The identifier for using zeros as initial state data.
-
data_file
(
Optional[pathlib.Path], default:None) –The file whose content will be used as the initial data for the state in row-major order.
model_navigator.api.triton.SequenceBatcherState
dataclass
Sequence Batching state configuration.
Read more in Triton Inference server model configuration
Parameters:
-
input_name
(
str) –The name of the model state input.
-
output_name
(
str) –The name of the model state output.
-
dtype
(
Union[dtype, Type[numpy.dtype]]) –The data-type of the state.
-
shape
(
Tuple[int, ...]) –The shape of the state tensor.
-
initial_states
(
List[SequenceBatcherInitialState], default:dataclasses.field(default_factory=lambda : [])) –The optional field to specify the list of initial states for the model.
model_navigator.api.triton.SequenceBatcherStrategyDirect
dataclass
Sequence Batching strategy direct configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_queue_delay_microseconds
(
int, default:0) –The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
-
minimum_slot_utilization
(
float, default:0.0) –The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.
model_navigator.api.triton.SequenceBatcherStrategyOldest
dataclass
Sequence Batching strategy oldest configuration.
Read more in Triton Inference server model configuration
Parameters:
-
max_candidate_sequences
(
int) –Maximum number of candidate sequences that the batcher maintains.
-
preferred_batch_size
(
List[int], default:dataclasses.field(default_factory=lambda : [])) –Preferred batch sizes for dynamic batching of candidate sequences.
-
max_queue_delay_microseconds
(
int, default:0) –The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.