Sequence Batcher

model_navigator.api.triton.SequenceBatcher `dataclass`

Sequence batching configuration.

Read more in Triton Inference server model configuration

Parameters:

strategy (Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]], default: None ) –

The strategy used by the sequence batcher.
max_sequence_idle_microseconds (Optional[int], default: None ) –

The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
control_inputs (List[SequenceBatcherControlInput], default: dataclasses.field(default_factory=lambda : []) ) –

The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
states (List[SequenceBatcherState], default: dataclasses.field(default_factory=lambda : []) ) –

The optional state that can be stored in Triton for performing inference requests on a sequence.

model_navigator.api.triton.SequenceBatcherControl `dataclass`

Sequence Batching control configuration.

Read more in Triton Inference server model configuration

Parameters:

kind (SequenceBatcherControlKind) –

The kind of this control.
dtype (Optional[Union[dtype, Type[numpy.dtype]]], default: None ) –

The control's datatype.
int32_false_true (List[int], default: dataclasses.field(default_factory=lambda : []) ) –

The control's true and false setting is indicated by setting a value in an int32 tensor.
fp32_false_true (List[float], default: dataclasses.field(default_factory=lambda : []) ) –

The control's true and false setting is indicated by setting a value in a fp32 tensor.
bool_false_true (List[bool], default: dataclasses.field(default_factory=lambda : []) ) –

The control's true and false setting is indicated by setting a value in a bool tensor.

model_navigator.api.triton.SequenceBatcherControlInput `dataclass`

Sequence Batching control input configuration.

Read more in Triton Inference server model configuration

Parameters:

input_name (str) –

The name of the model input.
controls (List[SequenceBatcherControl]) –

List of control value(s) that should be communicated to the model using this model input.

model_navigator.api.triton.SequenceBatcherControlKind

Bases: Enum

Sequence Batching control options.

Read more in Triton Inference server model configuration

Parameters:

CONTROL_SEQUENCE_START –

"CONTROL_SEQUENCE_START"
CONTROL_SEQUENCE_READY –

"CONTROL_SEQUENCE_READY"
CONTROL_SEQUENCE_END –

"CONTROL_SEQUENCE_END"
CONTROL_SEQUENCE_CORRID –

"CONTROL_SEQUENCE_CORRID"

model_navigator.api.triton.SequenceBatcherInitialState `dataclass`

Sequence Batching initial state configuration.

Read more in Triton Inference server model configuration

Parameters:

name (str) –
shape (Tuple[int, ...]) –

The shape of the state tensor, not including the batch dimension.
dtype (Optional[Union[dtype, Type[numpy.dtype]]], default: None ) –

The data-type of the state.
zero_data (Optional[bool], default: None ) –

The identifier for using zeros as initial state data.
data_file (Optional[pathlib.Path], default: None ) –

The file whose content will be used as the initial data for the state in row-major order.

model_navigator.api.triton.SequenceBatcherState `dataclass`

Sequence Batching state configuration.

Read more in Triton Inference server model configuration

Parameters:

input_name (str) –

The name of the model state input.
output_name (str) –

The name of the model state output.
dtype (Union[dtype, Type[numpy.dtype]]) –

The data-type of the state.
shape (Tuple[int, ...]) –

The shape of the state tensor.
initial_states (List[SequenceBatcherInitialState], default: dataclasses.field(default_factory=lambda : []) ) –

The optional field to specify the list of initial states for the model.

model_navigator.api.triton.SequenceBatcherStrategyDirect `dataclass`

Sequence Batching strategy direct configuration.

Read more in Triton Inference server model configuration

Parameters:

max_queue_delay_microseconds (int, default: 0 ) –

The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
minimum_slot_utilization (float, default: 0.0 ) –

The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.

model_navigator.api.triton.SequenceBatcherStrategyOldest `dataclass`

Sequence Batching strategy oldest configuration.

Read more in Triton Inference server model configuration

Parameters:

max_candidate_sequences (int) –

Maximum number of candidate sequences that the batcher maintains.
preferred_batch_size (List[int], default: dataclasses.field(default_factory=lambda : []) ) –

Preferred batch sizes for dynamic batching of candidate sequences.
max_queue_delay_microseconds (int, default: 0 ) –

The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.

Sequence Batcher

model_navigator.api.triton.SequenceBatcher dataclass

model_navigator.api.triton.SequenceBatcherControl dataclass

model_navigator.api.triton.SequenceBatcherControlInput dataclass

model_navigator.api.triton.SequenceBatcherControlKind

model_navigator.api.triton.SequenceBatcherInitialState dataclass

model_navigator.api.triton.SequenceBatcherState dataclass

model_navigator.api.triton.SequenceBatcherStrategyDirect dataclass

model_navigator.api.triton.SequenceBatcherStrategyOldest dataclass

model_navigator.api.triton.SequenceBatcher `dataclass`

model_navigator.api.triton.SequenceBatcherControl `dataclass`

model_navigator.api.triton.SequenceBatcherControlInput `dataclass`

model_navigator.api.triton.SequenceBatcherInitialState `dataclass`

model_navigator.api.triton.SequenceBatcherState `dataclass`

model_navigator.api.triton.SequenceBatcherStrategyDirect `dataclass`

model_navigator.api.triton.SequenceBatcherStrategyOldest `dataclass`