Skip to content

Sequence Batcher

model_navigator.api.triton.SequenceBatcher dataclass

Sequence batching configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
strategy Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]]

The strategy used by the sequence batcher.

None
max_sequence_idle_microseconds Optional[int]

The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.

None
control_inputs List[SequenceBatcherControlInput]

The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.

dataclasses.field(default_factory=lambda : [])
states List[SequenceBatcherState]

The optional state that can be stored in Triton for performing inference requests on a sequence.

dataclasses.field(default_factory=lambda : [])

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.strategy and (
        not isinstance(self.strategy, SequenceBatcherStrategyDirect)
        and not isinstance(self.strategy, SequenceBatcherStrategyOldest)
    ):
        raise ModelNavigatorWrongParameterError("Unsupported strategy type provided.")

model_navigator.api.triton.SequenceBatcherControl dataclass

Sequence Batching control configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
kind SequenceBatcherControlKind

The kind of this control.

required
dtype Optional[Union[np.dtype, Type[np.dtype]]]

The control's datatype.

None
int32_false_true List[int]

The control's true and false setting is indicated by setting a value in an int32 tensor.

dataclasses.field(default_factory=lambda : [])
fp32_false_true List[float]

The control's true and false setting is indicated by setting a value in a fp32 tensor.

dataclasses.field(default_factory=lambda : [])
bool_false_true List[bool]

The control's true and false setting is indicated by setting a value in a bool tensor.

dataclasses.field(default_factory=lambda : [])

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and self.dtype is None:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control type requires `dtype` to be specified.")

    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and any(
        [self.int32_false_true, self.fp32_false_true, self.bool_false_true]
    ):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires `dtype` to be specified only."
        )

    controls = [
        SequenceBatcherControlKind.CONTROL_SEQUENCE_START,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_END,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_READY,
    ]

    if self.kind in controls and self.dtype:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control does not support `dtype` parameter.")

    if self.kind in controls and not (self.int32_false_true or self.fp32_false_true or self.bool_false_true):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires one of: "
            "`int32_false_true`, `fp32_false_true`, `bool_false_true` to be specified."
        )

    if self.int32_false_true and len(self.int32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `int32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.fp32_false_true and len(self.fp32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `fp32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.bool_false_true and len(self.bool_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `bool_false_true` field should be two element list with false and true values. "
            "Example: [False, True]"
        )

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("dtype", self.dtype, np.dtype, optional=True)

model_navigator.api.triton.SequenceBatcherControlInput dataclass

Sequence Batching control input configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
input_name str

The name of the model input.

required
controls List[SequenceBatcherControl]

List of control value(s) that should be communicated to the model using this model input.

required

model_navigator.api.triton.SequenceBatcherControlKind

Bases: enum.Enum

Sequence Batching control options.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
CONTROL_SEQUENCE_START

"CONTROL_SEQUENCE_START"

required
CONTROL_SEQUENCE_READY

"CONTROL_SEQUENCE_READY"

required
CONTROL_SEQUENCE_END

"CONTROL_SEQUENCE_END"

required
CONTROL_SEQUENCE_CORRID

"CONTROL_SEQUENCE_CORRID"

required

model_navigator.api.triton.SequenceBatcherInitialState dataclass

Sequence Batching initial state configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
name str
required
shape Tuple[int, ...]

The shape of the state tensor, not including the batch dimension.

required
dtype Optional[Union[np.dtype, Type[np.dtype]]]

The data-type of the state.

None
zero_data Optional[bool]

The identifier for using zeros as initial state data.

None
data_file Optional[str]

The file whose content will be used as the initial data for the state in row-major order.

None

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if not self.zero_data and not self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. None was provided.")

    if self.zero_data and self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. Both were provided.")

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("name", self.name, str)
    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.api.triton.SequenceBatcherState dataclass

Sequence Batching state configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
input_name str

The name of the model state input.

required
output_name str

The name of the model state output.

required
dtype Union[np.dtype, Type[np.dtype]]

The data-type of the state.

required
shape Tuple[int, ...]

The shape of the state tensor.

required
initial_states List[SequenceBatcherInitialState]

The optional field to specify the list of initial states for the model.

dataclasses.field(default_factory=lambda : [])

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.api.triton.SequenceBatcherStrategyDirect dataclass

Sequence Batching strategy direct configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
max_queue_delay_microseconds int

The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.

0
minimum_slot_utilization float

The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.

0.0

model_navigator.api.triton.SequenceBatcherStrategyOldest dataclass

Sequence Batching strategy oldest configuration.

Read more in Triton Inference server model configuration

Parameters:

Name Type Description Default
max_candidate_sequences int

Maximum number of candidate sequences that the batcher maintains.

required
preferred_batch_size List[int]

Preferred batch sizes for dynamic batching of candidate sequences.

dataclasses.field(default_factory=lambda : [])
max_queue_delay_microseconds int

The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.

0