Skip to content

PyTritonAdapter API

model_navigator.api.pytriton.PyTritonAdapter

model_navigator.api.pytriton.PyTritonAdapter(package, strategy=None, runner_return_type=TensorType.NUMPY)

Provides model and configuration for PyTrtion deployment.

Initialize PyTritonAdapter.

Parameters:

  • package (Package) –

    A package object to be searched for best possible model.

  • strategy (Optional[RuntimeSearchStrategy], default: None ) –

    Strategy for finding the best model. Defaults to MaxThroughputAndMinLatencyStrategy

  • runner_return_type (TensorType, default: NUMPY ) –

    The type of the output tensor. Defaults to TensorType.NUMPY. If the return_type supports CUDA tensors (e.g. TensorType.TORCH) and the input tensors are on CUDA, there will be no additional data transfer between CPU and GPU.

Source code in model_navigator/api/pytriton.py
def __init__(
    self,
    package: Package,
    strategy: Optional[RuntimeSearchStrategy] = None,
    runner_return_type: TensorType = TensorType.NUMPY,
):
    """Initialize PyTritonAdapter.

    Args:
        package: A package object to be searched for best possible model.
        strategy: Strategy for finding the best model. Defaults to `MaxThroughputAndMinLatencyStrategy`
        runner_return_type: The type of the output tensor. Defaults to `TensorType.NUMPY`.
            If the return_type supports CUDA tensors (e.g. TensorType.TORCH) and the input tensors are on CUDA,
            there will be no additional data transfer between CPU and GPU.
    """
    self._package = package
    self._strategy = MaxThroughputAndMinLatencyStrategy() if strategy is None else strategy
    self._runner = self._package.get_runner(strategy=self._strategy, return_type=runner_return_type)
    self._batching = self._package.status.config.get("batch_dim", None) == 0

batching property

batching: bool

Returns status of batching support by the runner.

Returns:

  • bool

    True if runner supports batching, False otherwise.

config property

config: ModelConfig

Returns config for pytriton.

Returns:

  • ModelConfig

    ModelConfig with configuration for PyTrtion bind method.

inputs property

inputs: List[Tensor]

Returns inputs configuration.

Returns:

  • List[Tensor]

    List with Tensor objects describing inputs configuration of runner

outputs property

outputs: List[Tensor]

Returns outputs configuration.

Returns:

  • List[Tensor]

    List with Tensor objects describing outpus configuration of runner

runner property

runner: NavigatorRunner

Returns runner.

Runner must be activated before use with activate() method.

Returns:

  • NavigatorRunner

    Model Navigator runner.

model_navigator.api.pytriton.ModelConfig dataclass

Additional model configuration for running model through Triton Inference Server.

Parameters:

  • batching (bool, default: True ) –

    Flag to enable/disable batching for model.

  • max_batch_size (int, default: 4 ) –

    The maximal batch size that would be handled by model.

  • batcher (DynamicBatcher, default: dataclasses.field(default_factory=DynamicBatcher) ) –

    Configuration of Dynamic Batching for the model.

  • response_cache (bool, default: False ) –

    Flag to enable/disable response cache for the model

model_navigator.api.pytriton.DynamicBatcher dataclass

Dynamic batcher configuration.

More in Triton Inference Server documentation

Parameters:

  • max_queue_delay_microseconds (int, default: 0 ) –

    The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.

  • preferred_batch_size (Optional[list], default: None ) –

    Preferred batch sizes for dynamic batching.

  • preserve_ordering (bool, default: False ) –

    Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.

  • priority_levels (int, default: 0 ) –

    The number of priority levels to be enabled for the model.

  • default_priority_level (int, default: 0 ) –

    The priority level used for requests that don't specify their priority.

  • default_queue_policy (Optional[QueuePolicy], default: None ) –

    The default queue policy used for requests.

  • priority_queue_policy (Optional[Dict[int, QueuePolicy]], default: None ) –

    Specify the queue policy for the priority level.

model_navigator.api.pytriton.Tensor dataclass

Model input and output definition for Triton deployment.

Parameters:

  • shape (tuple) –

    Shape of the input/output tensor.

  • dtype (Union[dtype, Type[numpy.dtype], Type[object]]) –

    Data type of the input/output tensor.

  • name (Optional[str], default: None ) –

    Name of the input/output of model.

  • optional (Optional[bool], default: False ) –

    Flag to mark if input is optional.

model_navigator.api.pytriton.QueuePolicy dataclass

Model queue policy configuration.

More in Triton Inference Server documentation

Parameters:

  • timeout_action (TimeoutAction, default: REJECT ) –

    The action applied to timed-out request.

  • default_timeout_microseconds (int, default: 0 ) –

    The default timeout for every request, in microseconds.

  • allow_timeout_override (bool, default: False ) –

    Whether individual request can override the default timeout value.

  • max_queue_size (int, default: 0 ) –

    The maximum queue size for holding requests.

model_navigator.api.pytriton.TimeoutAction

Bases: Enum

Timeout action definition for timeout_action QueuePolicy field.

Parameters:

  • REJECT (str) –

    Reject the request and return error message accordingly.

  • DELAY (str) –

    Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.