Skip to content

PyTriton

model_navigator.api.pytriton

Public API definition for PyTriton related functionality.

DynamicBatcher dataclass

Dynamic batcher configuration.

More in Triton Inference Server documentation

Parameters:

Name Type Description Default
max_queue_delay_microseconds int

The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.

0
preferred_batch_size Optional[list]

Preferred batch sizes for dynamic batching.

None
preserve_ordering bool

Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.

False
priority_levels int

The number of priority levels to be enabled for the model.

0
default_priority_level int

The priority level used for requests that don't specify their priority.

0
default_queue_policy Optional[QueuePolicy]

The default queue policy used for requests.

None
priority_queue_policy Optional[Dict[int, QueuePolicy]]

Specify the queue policy for the priority level.

None

ModelConfig dataclass

Additional model configuration for running model through Triton Inference Server.

Parameters:

Name Type Description Default
batching bool

Flag to enable/disable batching for model.

True
max_batch_size int

The maximal batch size that would be handled by model.

4
batcher DynamicBatcher

Configuration of Dynamic Batching for the model.

DynamicBatcher()
response_cache bool

Flag to enable/disable response cache for the model

False

PyTritonAdapter(package, strategy=None)

Provides model and configuration for PyTrtion deployment.

Initialize PyTritonAdapter.

Parameters:

Name Type Description Default
package Package

A package object to be searched for best possible model.

required
strategy Optional[RuntimeSearchStrategy]

Strategy for finding the best model. Defaults to MaxThroughputAndMinLatencyStrategy

None
Source code in model_navigator/api/pytriton.py
def __init__(self, package: Package, strategy: Optional[RuntimeSearchStrategy] = None):
    """Initialize PyTritonAdapter.

    Args:
        package: A package object to be searched for best possible model.
        strategy: Strategy for finding the best model. Defaults to `MaxThroughputAndMinLatencyStrategy`
    """
    self._package = package
    self._strategy = MaxThroughputAndMinLatencyStrategy() if strategy is None else strategy
    self._runner = self._package.get_runner(strategy=self._strategy)
    self._batching = self._package.status.config.get("batch_dim", None) == 0

batching: bool property

Returns status of batching support by the runner.

Returns:

Type Description
bool

True if runner supports batching, False otherwise.

config: ModelConfig property

Returns config for pytriton.

Returns:

Type Description
ModelConfig

ModelConfig with configuration for PyTrtion bind method.

inputs: List[Tensor] property

Returns inputs configuration.

Returns:

Type Description
List[Tensor]

List with Tensor objects describing inputs configuration of runner

outputs: List[Tensor] property

Returns outputs configuration.

Returns:

Type Description
List[Tensor]

List with Tensor objects describing outpus configuration of runner

runner: NavigatorRunner property

Returns runner.

Runner must be activated before use with activate() method.

Returns:

Type Description
NavigatorRunner

Model Navigator runner.

QueuePolicy dataclass

Model queue policy configuration.

More in Triton Inference Server documentation

Parameters:

Name Type Description Default
timeout_action TimeoutAction

The action applied to timed-out request.

TimeoutAction.REJECT
default_timeout_microseconds int

The default timeout for every request, in microseconds.

0
allow_timeout_override bool

Whether individual request can override the default timeout value.

False
max_queue_size int

The maximum queue size for holding requests.

0

Tensor dataclass

Model input and output definition for Triton deployment.

Parameters:

Name Type Description Default
shape tuple

Shape of the input/output tensor.

required
dtype Union[np.dtype, Type[np.dtype], Type[object]]

Data type of the input/output tensor.

required
name Optional[str]

Name of the input/output of model.

None
optional Optional[bool]

Flag to mark if input is optional.

False

TimeoutAction

Bases: enum.Enum

Timeout action definition for timeout_action QueuePolicy field.

Parameters:

Name Type Description Default
REJECT str

Reject the request and return error message accordingly.

required
DELAY str

Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.

required