PyTriton

`model_navigator.api.pytriton`

Public API definition for PyTriton related functionality.

`DynamicBatcher` `dataclass`

Dynamic batcher configuration.

More in Triton Inference Server documentation

Parameters:

Name	Type	Description	Default
`max_queue_delay_microseconds`	`int`	The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.	`0`
`preferred_batch_size`	`Optional[list]`	Preferred batch sizes for dynamic batching.	`None`
`preserve_ordering`	`bool`	Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.	`False`
`priority_levels`	`int`	The number of priority levels to be enabled for the model.	`0`
`default_priority_level`	`int`	The priority level used for requests that don't specify their priority.	`0`
`default_queue_policy`	`Optional[QueuePolicy]`	The default queue policy used for requests.	`None`
`priority_queue_policy`	`Optional[Dict[int, QueuePolicy]]`	Specify the queue policy for the priority level.	`None`

`ModelConfig` `dataclass`

Additional model configuration for running model through Triton Inference Server.

Parameters:

Name	Type	Description	Default
`batching`	`bool`	Flag to enable/disable batching for model.	`True`
`max_batch_size`	`int`	The maximal batch size that would be handled by model.	`4`
`batcher`	`DynamicBatcher`	Configuration of Dynamic Batching for the model.	`DynamicBatcher()`
`response_cache`	`bool`	Flag to enable/disable response cache for the model	`False`

`PyTritonAdapter(package, strategy=None)`

Provides model and configuration for PyTrtion deployment.

Initialize PyTritonAdapter.

Parameters:

Name	Type	Description	Default
`package`	`Package`	A package object to be searched for best possible model.	required
`strategy`	`Optional[RuntimeSearchStrategy]`	Strategy for finding the best model. Defaults to `MaxThroughputAndMinLatencyStrategy`	`None`

Source code in model_navigator/api/pytriton.py

def __init__(self, package: Package, strategy: Optional[RuntimeSearchStrategy] = None):
    """Initialize PyTritonAdapter.

    Args:
        package: A package object to be searched for best possible model.
        strategy: Strategy for finding the best model. Defaults to `MaxThroughputAndMinLatencyStrategy`
    """
    self._package = package
    self._strategy = MaxThroughputAndMinLatencyStrategy() if strategy is None else strategy
    self._runner = self._package.get_runner(strategy=self._strategy)
    self._batching = self._package.status.config.get("batch_dim", None) == 0

`batching: bool` `property`

Returns status of batching support by the runner.

Returns:

Type	Description
`bool`	True if runner supports batching, False otherwise.

`config: ModelConfig` `property`

Returns config for pytriton.

Returns:

Type	Description
`ModelConfig`	ModelConfig with configuration for PyTrtion bind method.

`inputs: List[Tensor]` `property`

Returns inputs configuration.

Returns:

Type	Description
`List[Tensor]`	List with Tensor objects describing inputs configuration of runner

`outputs: List[Tensor]` `property`

Returns outputs configuration.

Returns:

Type	Description
`List[Tensor]`	List with Tensor objects describing outpus configuration of runner

`runner: NavigatorRunner` `property`

Returns runner.

Runner must be activated before use with activate() method.

Returns:

Type	Description
`NavigatorRunner`	Model Navigator runner.

`QueuePolicy` `dataclass`

Model queue policy configuration.

More in Triton Inference Server documentation

Parameters:

Name	Type	Description	Default
`timeout_action`	`TimeoutAction`	The action applied to timed-out request.	`TimeoutAction.REJECT`
`default_timeout_microseconds`	`int`	The default timeout for every request, in microseconds.	`0`
`allow_timeout_override`	`bool`	Whether individual request can override the default timeout value.	`False`
`max_queue_size`	`int`	The maximum queue size for holding requests.	`0`

`Tensor` `dataclass`

Model input and output definition for Triton deployment.

Parameters:

Name	Type	Description	Default
`shape`	`tuple`	Shape of the input/output tensor.	required
`dtype`	`Union[np.dtype, Type[np.dtype], Type[object]]`	Data type of the input/output tensor.	required
`name`	`Optional[str]`	Name of the input/output of model.	`None`
`optional`	`Optional[bool]`	Flag to mark if input is optional.	`False`

`TimeoutAction`

Bases: enum.Enum

Timeout action definition for timeout_action QueuePolicy field.

Parameters:

Name	Type	Description	Default
`REJECT`	`str`	Reject the request and return error message accordingly.	required
`DELAY`	`str`	Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.	required

PyTriton

model_navigator.api.pytriton

DynamicBatcher dataclass

ModelConfig dataclass

PyTritonAdapter(package, strategy=None)

batching: bool property

config: ModelConfig property

inputs: List[Tensor] property

outputs: List[Tensor] property

runner: NavigatorRunner property

QueuePolicy dataclass

Tensor dataclass

TimeoutAction

`model_navigator.api.pytriton`

`DynamicBatcher` `dataclass`

`ModelConfig` `dataclass`

`PyTritonAdapter(package, strategy=None)`

`batching: bool` `property`

`config: ModelConfig` `property`

`inputs: List[Tensor]` `property`

`outputs: List[Tensor]` `property`

`runner: NavigatorRunner` `property`

`QueuePolicy` `dataclass`

`Tensor` `dataclass`

`TimeoutAction`