Sequence Batcher

model_navigator.api.triton.SequenceBatcher `dataclass`

SequenceBatcher(strategy=None, max_sequence_idle_microseconds=None, control_inputs=lambda: [](), states=lambda: []())

Sequence batching configuration.

Read more in Triton Inference server model configuration

Parameters:

strategy (Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]], default: None ) –

The strategy used by the sequence batcher.
max_sequence_idle_microseconds (Optional[int], default: None ) –

The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.
control_inputs (List[SequenceBatcherControlInput], default: lambda: []() ) –

The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.
states (List[SequenceBatcherState], default: lambda: []() ) –

The optional state that can be stored in Triton for performing inference requests on a sequence.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py

def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.strategy and (
        not isinstance(self.strategy, SequenceBatcherStrategyDirect)
        and not isinstance(self.strategy, SequenceBatcherStrategyOldest)
    ):
        raise ModelNavigatorWrongParameterError("Unsupported strategy type provided.")

model_navigator.api.triton.SequenceBatcherControl `dataclass`

SequenceBatcherControl(kind, dtype=None, int32_false_true=lambda: [](), fp32_false_true=lambda: [](), bool_false_true=lambda: []())

Sequence Batching control configuration.

Read more in Triton Inference server model configuration

Parameters:

kind (SequenceBatcherControlKind) –

The kind of this control.
dtype (Optional[Union[dtype, Type[dtype]]], default: None ) –

The control's datatype.
int32_false_true (List[int], default: lambda: []() ) –

The control's true and false setting is indicated by setting a value in an int32 tensor.
fp32_false_true (List[float], default: lambda: []() ) –

The control's true and false setting is indicated by setting a value in a fp32 tensor.
bool_false_true (List[bool], default: lambda: []() ) –

The control's true and false setting is indicated by setting a value in a bool tensor.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py

def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and self.dtype is None:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control type requires `dtype` to be specified.")

    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and any([
        self.int32_false_true,
        self.fp32_false_true,
        self.bool_false_true,
    ]):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires `dtype` to be specified only."
        )

    controls = [
        SequenceBatcherControlKind.CONTROL_SEQUENCE_START,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_END,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_READY,
    ]

    if self.kind in controls and self.dtype:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control does not support `dtype` parameter.")

    if self.kind in controls and not (self.int32_false_true or self.fp32_false_true or self.bool_false_true):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires one of: "
            "`int32_false_true`, `fp32_false_true`, `bool_false_true` to be specified."
        )

    if self.int32_false_true and len(self.int32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `int32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.fp32_false_true and len(self.fp32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `fp32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.bool_false_true and len(self.bool_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `bool_false_true` field should be two element list with false and true values. "
            "Example: [False, True]"
        )

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("dtype", self.dtype, np.dtype, optional=True)

model_navigator.api.triton.SequenceBatcherControlInput `dataclass`

SequenceBatcherControlInput(input_name, controls)

Sequence Batching control input configuration.

Read more in Triton Inference server model configuration

Parameters:

input_name (str) –

The name of the model input.
controls (List[SequenceBatcherControl]) –

List of control value(s) that should be communicated to the model using this model input.

model_navigator.api.triton.SequenceBatcherControlKind

Bases: Enum

Sequence Batching control options.

Read more in Triton Inference server model configuration

Parameters:

CONTROL_SEQUENCE_START –

"CONTROL_SEQUENCE_START"
CONTROL_SEQUENCE_READY –

"CONTROL_SEQUENCE_READY"
CONTROL_SEQUENCE_END –

"CONTROL_SEQUENCE_END"
CONTROL_SEQUENCE_CORRID –

"CONTROL_SEQUENCE_CORRID"

model_navigator.api.triton.SequenceBatcherInitialState `dataclass`

SequenceBatcherInitialState(name, shape, dtype=None, zero_data=None, data_file=None)

Sequence Batching initial state configuration.

Read more in Triton Inference server model configuration

Parameters:

name (str) –
shape (Tuple[int, ...]) –

The shape of the state tensor, not including the batch dimension.
dtype (Optional[Union[dtype, Type[dtype]]], default: None ) –

The data-type of the state.
zero_data (Optional[bool], default: None ) –

The identifier for using zeros as initial state data.
data_file (Optional[Path], default: None ) –

The file whose content will be used as the initial data for the state in row-major order.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py

def __post_init__(self):
    """Validate the configuration for early error handling."""
    if not self.zero_data and not self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. None was provided.")

    if self.zero_data and self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. Both were provided.")

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("name", self.name, str)
    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.api.triton.SequenceBatcherState `dataclass`

SequenceBatcherState(input_name, output_name, dtype, shape, initial_states=lambda: []())

Sequence Batching state configuration.

Read more in Triton Inference server model configuration

Parameters:

input_name (str) –

The name of the model state input.
output_name (str) –

The name of the model state output.
dtype (Union[dtype, Type[dtype]]) –

The data-type of the state.
shape (Tuple[int, ...]) –

The shape of the state tensor.
initial_states (List[SequenceBatcherInitialState], default: lambda: []() ) –

The optional field to specify the list of initial states for the model.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py

def __post_init__(self):
    """Validate the configuration for early error handling."""
    self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.api.triton.SequenceBatcherStrategyDirect `dataclass`

SequenceBatcherStrategyDirect(max_queue_delay_microseconds=0, minimum_slot_utilization=0.0)

Sequence Batching strategy direct configuration.

Read more in Triton Inference server model configuration

Parameters:

max_queue_delay_microseconds (int, default: 0 ) –

The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.
minimum_slot_utilization (float, default: 0.0 ) –

The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.

model_navigator.api.triton.SequenceBatcherStrategyOldest `dataclass`

SequenceBatcherStrategyOldest(max_candidate_sequences, preferred_batch_size=lambda: [](), max_queue_delay_microseconds=0)

Sequence Batching strategy oldest configuration.

Read more in Triton Inference server model configuration

Parameters:

max_candidate_sequences (int) –

Maximum number of candidate sequences that the batcher maintains.
preferred_batch_size (List[int], default: lambda: []() ) –

Preferred batch sizes for dynamic batching of candidate sequences.
max_queue_delay_microseconds (int, default: 0 ) –

The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.

Sequence Batcher

model_navigator.api.triton.SequenceBatcher dataclass

__post_init__

model_navigator.api.triton.SequenceBatcherControl dataclass

__post_init__

model_navigator.api.triton.SequenceBatcherControlInput dataclass

model_navigator.api.triton.SequenceBatcherControlKind

model_navigator.api.triton.SequenceBatcherInitialState dataclass

__post_init__

model_navigator.api.triton.SequenceBatcherState dataclass

__post_init__

model_navigator.api.triton.SequenceBatcherStrategyDirect dataclass

model_navigator.api.triton.SequenceBatcherStrategyOldest dataclass

model_navigator.api.triton.SequenceBatcher `dataclass`

model_navigator.api.triton.SequenceBatcherControl `dataclass`

model_navigator.api.triton.SequenceBatcherControlInput `dataclass`

model_navigator.api.triton.SequenceBatcherInitialState `dataclass`

model_navigator.api.triton.SequenceBatcherState `dataclass`

model_navigator.api.triton.SequenceBatcherStrategyDirect `dataclass`

model_navigator.api.triton.SequenceBatcherStrategyOldest `dataclass`