PyTriton
model_navigator.api.pytriton
Public API definition for PyTriton related functionality.
DynamicBatcher
dataclass
Dynamic batcher configuration.
More in Triton Inference Server documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_queue_delay_microseconds |
int
|
The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching. |
0
|
preferred_batch_size |
Optional[list]
|
Preferred batch sizes for dynamic batching. |
None
|
preserve_ordering |
bool
|
Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler. |
False
|
priority_levels |
int
|
The number of priority levels to be enabled for the model. |
0
|
default_priority_level |
int
|
The priority level used for requests that don't specify their priority. |
0
|
default_queue_policy |
Optional[QueuePolicy]
|
The default queue policy used for requests. |
None
|
priority_queue_policy |
Optional[Dict[int, QueuePolicy]]
|
Specify the queue policy for the priority level. |
None
|
ModelConfig
dataclass
Additional model configuration for running model through Triton Inference Server.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batching |
bool
|
Flag to enable/disable batching for model. |
True
|
max_batch_size |
int
|
The maximal batch size that would be handled by model. |
4
|
batcher |
DynamicBatcher
|
Configuration of Dynamic Batching for the model. |
DynamicBatcher()
|
response_cache |
bool
|
Flag to enable/disable response cache for the model |
False
|
PyTritonAdapter(package, strategy=None)
Provides model and configuration for PyTrtion deployment.
Initialize PyTritonAdapter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
package |
Package
|
A package object to be searched for best possible model. |
required |
strategy |
Optional[RuntimeSearchStrategy]
|
Strategy for finding the best model. Defaults to |
None
|
Source code in model_navigator/api/pytriton.py
batching: bool
property
Returns status of batching support by the runner.
Returns:
Type | Description |
---|---|
bool
|
True if runner supports batching, False otherwise. |
config: ModelConfig
property
Returns config for pytriton.
Returns:
Type | Description |
---|---|
ModelConfig
|
ModelConfig with configuration for PyTrtion bind method. |
inputs: List[Tensor]
property
outputs: List[Tensor]
property
runner: NavigatorRunner
property
Returns runner.
Runner must be activated before use with activate() method.
Returns:
Type | Description |
---|---|
NavigatorRunner
|
Model Navigator runner. |
QueuePolicy
dataclass
Model queue policy configuration.
More in Triton Inference Server documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout_action |
TimeoutAction
|
The action applied to timed-out request. |
TimeoutAction.REJECT
|
default_timeout_microseconds |
int
|
The default timeout for every request, in microseconds. |
0
|
allow_timeout_override |
bool
|
Whether individual request can override the default timeout value. |
False
|
max_queue_size |
int
|
The maximum queue size for holding requests. |
0
|
Tensor
dataclass
Model input and output definition for Triton deployment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shape |
tuple
|
Shape of the input/output tensor. |
required |
dtype |
Union[np.dtype, Type[np.dtype], Type[object]]
|
Data type of the input/output tensor. |
required |
name |
Optional[str]
|
Name of the input/output of model. |
None
|
optional |
Optional[bool]
|
Flag to mark if input is optional. |
False
|
TimeoutAction
Timeout action definition for timeout_action QueuePolicy field.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
REJECT |
str
|
Reject the request and return error message accordingly. |
required |
DELAY |
str
|
Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed. |
required |