PyTriton
          model_navigator.api.pytriton
  Public API definition for PyTriton related functionality.
        DynamicBatcher
  
  
      dataclass
  
  Dynamic batcher configuration.
More in Triton Inference Server documentation
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| max_queue_delay_microseconds | int | The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching. | 0 | 
| preferred_batch_size | Optional[list] | Preferred batch sizes for dynamic batching. | None | 
| preserve_ordering | bool | Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler. | False | 
| priority_levels | int | The number of priority levels to be enabled for the model. | 0 | 
| default_priority_level | int | The priority level used for requests that don't specify their priority. | 0 | 
| default_queue_policy | Optional[QueuePolicy] | The default queue policy used for requests. | None | 
| priority_queue_policy | Optional[Dict[int, QueuePolicy]] | Specify the queue policy for the priority level. | None | 
        ModelConfig
  
  
      dataclass
  
  Additional model configuration for running model through Triton Inference Server.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| batching | bool | Flag to enable/disable batching for model. | True | 
| max_batch_size | int | The maximal batch size that would be handled by model. | 4 | 
| batcher | DynamicBatcher | Configuration of Dynamic Batching for the model. | DynamicBatcher() | 
| response_cache | bool | Flag to enable/disable response cache for the model | False | 
PyTritonAdapter(package, strategy=None)
  Provides model and configuration for PyTrtion deployment.
Initialize PyTritonAdapter.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| package | Package | A package object to be searched for best possible model. | required | 
| strategy | Optional[RuntimeSearchStrategy] | Strategy for finding the best model. Defaults to  | None | 
Source code in model_navigator/api/pytriton.py
            
batching: bool
  
  
      property
  
  Returns status of batching support by the runner.
Returns:
| Type | Description | 
|---|---|
| bool | True if runner supports batching, False otherwise. | 
config: ModelConfig
  
  
      property
  
  Returns config for pytriton.
Returns:
| Type | Description | 
|---|---|
| ModelConfig | ModelConfig with configuration for PyTrtion bind method. | 
inputs: List[Tensor]
  
  
      property
  
  
outputs: List[Tensor]
  
  
      property
  
  
runner: NavigatorRunner
  
  
      property
  
  Returns runner.
Runner must be activated before use with activate() method.
Returns:
| Type | Description | 
|---|---|
| NavigatorRunner | Model Navigator runner. | 
        QueuePolicy
  
  
      dataclass
  
  Model queue policy configuration.
More in Triton Inference Server documentation
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| timeout_action | TimeoutAction | The action applied to timed-out request. | TimeoutAction.REJECT | 
| default_timeout_microseconds | int | The default timeout for every request, in microseconds. | 0 | 
| allow_timeout_override | bool | Whether individual request can override the default timeout value. | False | 
| max_queue_size | int | The maximum queue size for holding requests. | 0 | 
        Tensor
  
  
      dataclass
  
  Model input and output definition for Triton deployment.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| shape | tuple | Shape of the input/output tensor. | required | 
| dtype | Union[np.dtype, Type[np.dtype], Type[object]] | Data type of the input/output tensor. | required | 
| name | Optional[str] | Name of the input/output of model. | None | 
| optional | Optional[bool] | Flag to mark if input is optional. | False | 
        TimeoutAction
  Timeout action definition for timeout_action QueuePolicy field.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| REJECT | str | Reject the request and return error message accordingly. | required | 
| DELAY | str | Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed. | required |