Skip to content

Sequence Batcher

model_navigator.api.triton.SequenceBatcher dataclass

Sequence batching configuration.

Read more in Triton Inference server model configuration

Parameters:

model_navigator.api.triton.SequenceBatcherControl dataclass

Sequence Batching control configuration.

Read more in Triton Inference server model configuration

Parameters:

  • kind (SequenceBatcherControlKind) –

    The kind of this control.

  • dtype (Optional[Union[dtype, Type[numpy.dtype]]], default: None ) –

    The control's datatype.

  • int32_false_true (List[int], default: dataclasses.field(default_factory=lambda : []) ) –

    The control's true and false setting is indicated by setting a value in an int32 tensor.

  • fp32_false_true (List[float], default: dataclasses.field(default_factory=lambda : []) ) –

    The control's true and false setting is indicated by setting a value in a fp32 tensor.

  • bool_false_true (List[bool], default: dataclasses.field(default_factory=lambda : []) ) –

    The control's true and false setting is indicated by setting a value in a bool tensor.

model_navigator.api.triton.SequenceBatcherControlInput dataclass

Sequence Batching control input configuration.

Read more in Triton Inference server model configuration

Parameters:

  • input_name (str) –

    The name of the model input.

  • controls (List[SequenceBatcherControl]) –

    List of control value(s) that should be communicated to the model using this model input.

model_navigator.api.triton.SequenceBatcherControlKind

Bases: Enum

Sequence Batching control options.

Read more in Triton Inference server model configuration

Parameters:

  • CONTROL_SEQUENCE_START

    "CONTROL_SEQUENCE_START"

  • CONTROL_SEQUENCE_READY

    "CONTROL_SEQUENCE_READY"

  • CONTROL_SEQUENCE_END

    "CONTROL_SEQUENCE_END"

  • CONTROL_SEQUENCE_CORRID

    "CONTROL_SEQUENCE_CORRID"

model_navigator.api.triton.SequenceBatcherInitialState dataclass

Sequence Batching initial state configuration.

Read more in Triton Inference server model configuration

Parameters:

  • name (str) –
  • shape (Tuple[int, ...]) –

    The shape of the state tensor, not including the batch dimension.

  • dtype (Optional[Union[dtype, Type[numpy.dtype]]], default: None ) –

    The data-type of the state.

  • zero_data (Optional[bool], default: None ) –

    The identifier for using zeros as initial state data.

  • data_file (Optional[pathlib.Path], default: None ) –

    The file whose content will be used as the initial data for the state in row-major order.

model_navigator.api.triton.SequenceBatcherState dataclass

Sequence Batching state configuration.

Read more in Triton Inference server model configuration

Parameters:

  • input_name (str) –

    The name of the model state input.

  • output_name (str) –

    The name of the model state output.

  • dtype (Union[dtype, Type[numpy.dtype]]) –

    The data-type of the state.

  • shape (Tuple[int, ...]) –

    The shape of the state tensor.

  • initial_states (List[SequenceBatcherInitialState], default: dataclasses.field(default_factory=lambda : []) ) –

    The optional field to specify the list of initial states for the model.

model_navigator.api.triton.SequenceBatcherStrategyDirect dataclass

Sequence Batching strategy direct configuration.

Read more in Triton Inference server model configuration

Parameters:

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a candidate request will be delayed in the sequence batch scheduling queue to wait for additional requests for batching.

  • minimum_slot_utilization (float, default: 0.0 ) –

    The minimum slot utilization that must be satisfied to execute the batch before 'max_queue_delay_microseconds' expires.

model_navigator.api.triton.SequenceBatcherStrategyOldest dataclass

Sequence Batching strategy oldest configuration.

Read more in Triton Inference server model configuration

Parameters:

  • max_candidate_sequences (int) –

    Maximum number of candidate sequences that the batcher maintains.

  • preferred_batch_size (List[int], default: dataclasses.field(default_factory=lambda : []) ) –

    Preferred batch sizes for dynamic batching of candidate sequences.

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a candidate request will be delayed in the dynamic batch scheduling queue to wait for additional requests for batching.