Deployment Config
model_navigator.pytriton.ModelConfig
dataclass
ModelConfig(batching=True, max_batch_size=4, batcher=DynamicBatcher(), response_cache=False, decoupled=False)
Additional model configuration for running model through Triton Inference Server.
Parameters:
-
batching
(bool
, default:True
) –Flag to enable/disable batching for model.
-
max_batch_size
(int
, default:4
) –The maximal batch size that would be handled by model.
-
batcher
(DynamicBatcher
, default:DynamicBatcher()
) –Configuration of Dynamic Batching for the model.
-
response_cache
(bool
, default:False
) –Flag to enable/disable response cache for the model
-
decoupled
(bool
, default:False
) –Flag to enable/disable decoupled transaction policy
model_navigator.pytriton.DynamicBatcher
dataclass
DynamicBatcher(max_queue_delay_microseconds=0, preferred_batch_size=None, preserve_ordering=False, priority_levels=0, default_priority_level=0, default_queue_policy=None, priority_queue_policy=None)
Dynamic batcher configuration.
More in Triton Inference Server documentation
Parameters:
-
max_queue_delay_microseconds
(int
, default:0
) –The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching.
-
preferred_batch_size
(Optional[list]
, default:None
) –Preferred batch sizes for dynamic batching.
-
preserve_ordering
(bool
, default:False
) –Should the dynamic batcher preserve the ordering of responses to match the order of requests received by the scheduler.
-
priority_levels
(int
, default:0
) –The number of priority levels to be enabled for the model.
-
default_priority_level
(int
, default:0
) –The priority level used for requests that don't specify their priority.
-
default_queue_policy
(Optional[QueuePolicy]
, default:None
) –The default queue policy used for requests.
-
priority_queue_policy
(Optional[Dict[int, QueuePolicy]]
, default:None
) –Specify the queue policy for the priority level.
model_navigator.pytriton.Tensor
dataclass
Model input and output definition for Triton deployment.
Parameters:
-
shape
(tuple
) –Shape of the input/output tensor.
-
dtype
(Union[dtype, Type[dtype], Type[object]]
) –Data type of the input/output tensor.
-
name
(Optional[str]
, default:None
) –Name of the input/output of model.
-
optional
(Optional[bool]
, default:False
) –Flag to mark if input is optional.
model_navigator.pytriton.QueuePolicy
dataclass
QueuePolicy(timeout_action=TimeoutAction.REJECT, default_timeout_microseconds=0, allow_timeout_override=False, max_queue_size=0)
Model queue policy configuration.
More in Triton Inference Server documentation
Parameters:
-
timeout_action
(TimeoutAction
, default:REJECT
) –The action applied to timed-out request.
-
default_timeout_microseconds
(int
, default:0
) –The default timeout for every request, in microseconds.
-
allow_timeout_override
(bool
, default:False
) –Whether individual request can override the default timeout value.
-
max_queue_size
(int
, default:0
) –The maximum queue size for holding requests.
model_navigator.pytriton.TimeoutAction
Bases: Enum
Timeout action definition for timeout_action QueuePolicy field.
Parameters:
-
REJECT
–Reject the request and return error message accordingly.
-
DELAY
–Delay the request until all other requests at the same (or higher) priority levels that have not reached their timeouts are processed.