Skip to content

Sequence Batcher

model_navigator.triton.SequenceBatcher dataclass

SequenceBatcher(strategy=None, max_sequence_idle_microseconds=None, control_inputs=lambda: [](), states=lambda: []())

Sequence batching configuration.

Read more in Triton Inference server model configuration

Parameters:

  • strategy (Optional[Union[SequenceBatcherStrategyDirect, SequenceBatcherStrategyOldest]], default: None ) –

    The strategy used by the sequence batcher.

  • max_sequence_idle_microseconds (Optional[int], default: None ) –

    The maximum time, in microseconds, that a sequence is allowed to be idle before it is aborted.

  • control_inputs (List[SequenceBatcherControlInput], default: lambda: []() ) –

    The model input(s) that the server should use to communicate sequence start, stop, ready and similar control values to the model.

  • states (List[SequenceBatcherState], default: lambda: []() ) –

    The optional state that can be stored in Triton for performing inference requests on a sequence.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.strategy and (
        not isinstance(self.strategy, SequenceBatcherStrategyDirect)
        and not isinstance(self.strategy, SequenceBatcherStrategyOldest)
    ):
        raise ModelNavigatorWrongParameterError("Unsupported strategy type provided.")

model_navigator.triton.SequenceBatcherControl dataclass

SequenceBatcherControl(kind, dtype=None, int32_false_true=lambda: [](), fp32_false_true=lambda: [](), bool_false_true=lambda: []())

Sequence Batching control configuration.

Read more in Triton Inference server model configuration

Parameters:

  • kind (SequenceBatcherControlKind) –

    The kind of this control.

  • dtype (Optional[Union[dtype, Type[dtype]]], default: None ) –

    The control's datatype.

  • int32_false_true (List[int], default: lambda: []() ) –

    The control's true and false setting is indicated by setting a value in an int32 tensor.

  • fp32_false_true (List[float], default: lambda: []() ) –

    The control's true and false setting is indicated by setting a value in a fp32 tensor.

  • bool_false_true (List[bool], default: lambda: []() ) –

    The control's true and false setting is indicated by setting a value in a bool tensor.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and self.dtype is None:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control type requires `dtype` to be specified.")

    if self.kind == SequenceBatcherControlKind.CONTROL_SEQUENCE_CORRID and any([
        self.int32_false_true,
        self.fp32_false_true,
        self.bool_false_true,
    ]):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires `dtype` to be specified only."
        )

    controls = [
        SequenceBatcherControlKind.CONTROL_SEQUENCE_START,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_END,
        SequenceBatcherControlKind.CONTROL_SEQUENCE_READY,
    ]

    if self.kind in controls and self.dtype:
        raise ModelNavigatorWrongParameterError(f"The {self.kind} control does not support `dtype` parameter.")

    if self.kind in controls and not (self.int32_false_true or self.fp32_false_true or self.bool_false_true):
        raise ModelNavigatorWrongParameterError(
            f"The {self.kind} control type requires one of: "
            "`int32_false_true`, `fp32_false_true`, `bool_false_true` to be specified."
        )

    if self.int32_false_true and len(self.int32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `int32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.fp32_false_true and len(self.fp32_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `fp32_false_true` field should be two element list with false and true values. Example: [0 , 1]"
        )

    if self.bool_false_true and len(self.bool_false_true) != 2:
        raise ModelNavigatorWrongParameterError(
            "The `bool_false_true` field should be two element list with false and true values. "
            "Example: [False, True]"
        )

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("dtype", self.dtype, np.dtype, optional=True)

model_navigator.triton.SequenceBatcherControlInput dataclass

SequenceBatcherControlInput(input_name, controls)

Sequence Batching control input configuration.

Read more in Triton Inference server model configuration

Parameters:

  • input_name (str) –

    The name of the model input.

  • controls (List[SequenceBatcherControl]) –

    List of control value(s) that should be communicated to the model using this model input.

model_navigator.triton.SequenceBatcherControlKind

Bases: Enum

Sequence Batching control options.

Read more in Triton Inference server model configuration

Parameters:

  • CONTROL_SEQUENCE_START

    "CONTROL_SEQUENCE_START"

  • CONTROL_SEQUENCE_READY

    "CONTROL_SEQUENCE_READY"

  • CONTROL_SEQUENCE_END

    "CONTROL_SEQUENCE_END"

  • CONTROL_SEQUENCE_CORRID

    "CONTROL_SEQUENCE_CORRID"

model_navigator.triton.SequenceBatcherInitialState dataclass

SequenceBatcherInitialState(name, shape, dtype=None, zero_data=None, data_file=None)

Sequence Batching initial state configuration.

Read more in Triton Inference server model configuration

Parameters:

  • name (str) –
  • shape (Tuple[int, ...]) –

    The shape of the state tensor, not including the batch dimension.

  • dtype (Optional[Union[dtype, Type[dtype]]], default: None ) –

    The data-type of the state.

  • zero_data (Optional[bool], default: None ) –

    The identifier for using zeros as initial state data.

  • data_file (Optional[Path], default: None ) –

    The file whose content will be used as the initial data for the state in row-major order.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    if not self.zero_data and not self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. None was provided.")

    if self.zero_data and self.data_file:
        raise ModelNavigatorWrongParameterError("zero_data or data_file has to be defined. Both were provided.")

    if self.dtype:
        self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("name", self.name, str)
    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.triton.SequenceBatcherState dataclass

SequenceBatcherState(input_name, output_name, dtype, shape, initial_states=lambda: []())

Sequence Batching state configuration.

Read more in Triton Inference server model configuration

Parameters:

  • input_name (str) –

    The name of the model state input.

  • output_name (str) –

    The name of the model state output.

  • dtype (Union[dtype, Type[dtype]]) –

    The data-type of the state.

  • shape (Tuple[int, ...]) –

    The shape of the state tensor.

  • initial_states (List[SequenceBatcherInitialState], default: lambda: []() ) –

    The optional field to specify the list of initial states for the model.

__post_init__

__post_init__()

Validate the configuration for early error handling.

Source code in model_navigator/triton/specialized_configs/common.py
def __post_init__(self):
    """Validate the configuration for early error handling."""
    self.dtype = cast_dtype(dtype=self.dtype)

    expect_type("shape", self.shape, tuple)
    expect_type("dtype", self.dtype, np.dtype, optional=True)
    is_shape_correct("shape", self.shape)

model_navigator.triton.SequenceBatcherStrategyDirect dataclass

SequenceBatcherStrategyDirect(max_queue_delay_microseconds=0, minimum_slot_utilization=0.0)

Sequence Batching strategy direct configuration.

Read more in Triton Inference server model configuration

Parameters:

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.

  • minimum_slot_utilization (float, default: 0.0 ) –

    The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.

model_navigator.triton.SequenceBatcherStrategyOldest dataclass

SequenceBatcherStrategyOldest(max_candidate_sequences, preferred_batch_size=lambda: [](), max_queue_delay_microseconds=0)

Sequence Batching strategy oldest configuration.

Read more in Triton Inference server model configuration

Parameters:

  • max_candidate_sequences (int) –

    Maximum number of candidate sequences that the batcher maintains.

  • preferred_batch_size (List[int], default: lambda: []() ) –

    Preferred batch sizes for dynamic batching of candidate sequences.

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.