PyTritonAdapter API
model_navigator.api.pytriton.PyTritonAdapter
Provides model and configuration for PyTrtion deployment.
Initialize PyTritonAdapter.
Parameters:
-
package
(
Package
) –A package object to be searched for best possible model.
-
strategy
(
Optional[RuntimeSearchStrategy]
) –Strategy for finding the best model. Defaults to
MaxThroughputAndMinLatencyStrategy
Source code in model_navigator/api/pytriton.py
batching
property
Returns status of batching support by the runner.
Returns:
-
bool
–True if runner supports batching, False otherwise.
config
property
Returns config for pytriton.
Returns:
-
ModelConfig
–ModelConfig with configuration for PyTrtion bind method.
inputs
property
outputs
property
model_navigator.api.pytriton.ModelConfig
dataclass
Additional model configuration for running model through Triton Inference Server.
Parameters:
-
batching
(
bool
) –Flag to enable/disable batching for model.
-
max_batch_size
(
int
) –The maximal batch size that would be handled by model.
-
batcher
(
DynamicBatcher
) –Configuration of Dynamic Batching for the model.
-
response_cache
(
bool
) –Flag to enable/disable response cache for the model
model_navigator.api.pytriton.DynamicBatcher
dataclass
Dynamic batcher configuration.
More in Triton Inference Server documentation
Parameters:
-
max_queue_delay_microseconds
(
int
) –The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.
-
preferred_batch_size
(
Optional[list]
) –Preferred batch sizes for dynamic batching.
-
preserve_ordering
(
bool
) –Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.
-
priority_levels
(
int
) –The number of priority levels to be enabled for the model.
-
default_priority_level
(
int
) –The priority level used for requests that don't specify their priority.
-
default_queue_policy
(
Optional[QueuePolicy]
) –The default queue policy used for requests.
-
priority_queue_policy
(
Optional[Dict[int, QueuePolicy]]
) –Specify the queue policy for the priority level.
model_navigator.api.pytriton.Tensor
dataclass
Model input and output definition for Triton deployment.
Parameters:
-
shape
(
tuple
) –Shape of the input/output tensor.
-
dtype
(
Union[np.dtype, Type[np.dtype], Type[object]]
) –Data type of the input/output tensor.
-
name
(
Optional[str]
) –Name of the input/output of model.
-
optional
(
Optional[bool]
) –Flag to mark if input is optional.
model_navigator.api.pytriton.QueuePolicy
dataclass
Model queue policy configuration.
More in Triton Inference Server documentation
Parameters:
-
timeout_action
(
TimeoutAction
) –The action applied to timed-out request.
-
default_timeout_microseconds
(
int
) –The default timeout for every request, in microseconds.
-
allow_timeout_override
(
bool
) –Whether individual request can override the default timeout value.
-
max_queue_size
(
int
) –The maximum queue size for holding requests.