Triton
pytriton.triton.Triton
Triton(*, config: Optional[TritonConfig] = None, workspace: Union[Workspace, str, Path, None] = None, triton_lifecycle_policy: Optional[TritonLifecyclePolicy] = None)
Bases: TritonBase
Triton Inference Server for Python models.
Initialize Triton Inference Server context for starting server and loading models.
Parameters:
-
config
(Optional[TritonConfig]
, default:None
) –TritonConfig object with optional customizations for Triton Inference Server. Configuration can be passed also through environment variables. See TritonConfig.from_env() class method for details.
Order of precedence:
- config defined through
config
parameter of init method. - config defined in environment variables
- default TritonConfig values
- config defined through
-
workspace
(Union[Workspace, str, Path, None]
, default:None
) –workspace or path where the Triton Model Store and files used by pytriton will be created. If workspace is
None
random workspace will be created. Workspace will be deleted in Triton.stop(). -
triton_lifecycle_policy
(Optional[TritonLifecyclePolicy]
, default:None
) –policy indicating when Triton server is launched and where the model store is located (locally or remotely managed by Triton server). If triton_lifecycle_policy is None, DefaultTritonLifecyclePolicy is used by default (Triton server is launched on startup and model store is not local). Only if triton_lifecycle_policy is None and config.allow_vertex_ai is True, VertextAILifecyclePolicy is used instead.
Source code in pytriton/triton.py
__enter__
__enter__() -> Triton
__exit__
Exit the context stopping the process and cleaning the workspace.
Parameters:
-
*_
–unused arguments
bind
bind(model_name: str, infer_func: Union[Callable, Sequence[Callable]], inputs: Sequence[Tensor], outputs: Sequence[Tensor], model_version: int = 1, config: Optional[ModelConfig] = None, strict: bool = False) -> None
Create a model with given name and inference callable binding into Triton Inference Server.
More information about model configuration: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md
Parameters:
-
infer_func
(Union[Callable, Sequence[Callable]]
) –Inference callable to handle request/response from Triton Inference Server
-
inputs
(Sequence[Tensor]
) –Definition of model inputs
-
outputs
(Sequence[Tensor]
) –Definition of model outputs
-
model_name
(str
) –Name under which model is available in Triton Inference Server. It can only contain
-
model_version
(int
, default:1
) –Version of model
-
config
(Optional[ModelConfig]
, default:None
) –Model configuration for Triton Inference Server deployment
-
strict
(bool
, default:False
) –Enable strict validation between model config outputs and inference function result
Source code in pytriton/triton.py
connect
Connect to Triton Inference Server.
Raises:
-
TimeoutError
–if Triton Inference Server is not ready after timeout
Source code in pytriton/triton.py
is_connected
is_connected() -> bool
run
serve
serve(monitoring_period_s: float = MONITORING_PERIOD_S) -> None
Run Triton Inference Server and lock thread for serving requests/response.
Parameters:
-
monitoring_period_s
(float
, default:MONITORING_PERIOD_S
) –the timeout of monitoring if Triton and models are available. Every monitoring_period_s seconds main thread wakes up and check if triton server and proxy backend are still alive and sleep again. If triton or proxy is not alive - method returns.
Source code in pytriton/triton.py
stop
stop() -> bool
Stop Triton Inference Server and clean workspace.
Source code in pytriton/triton.py
pytriton.triton.RemoteTriton
Bases: TritonBase
RemoteTriton connects to Triton Inference Server running on remote host.
Initialize RemoteTriton.
Parameters:
-
url
(str
) –Triton Inference Server URL in form of
:// : If scheme is not provided, http is used as default. If port is not provided, 8000 is used as default for http and 8001 for grpc. -
workspace
(Union[Workspace, str, Path, None]
, default:None
) –path to be created where the files used by pytriton will be stored (e.g. socket files for communication). If workspace is
None
temporary workspace will be created. Workspace should be created in shared filesystem space between RemoteTriton and Triton Inference Server to allow access to socket files (if you use containers, folder must be shared between containers).
Source code in pytriton/triton.py
__enter__
__enter__() -> RemoteTriton
Entering the context connects to remote Triton server.
Returns:
-
RemoteTriton
–A RemoteTriton object
__exit__
Exit the context stopping the process and cleaning the workspace.
Parameters:
-
*_
–unused arguments
bind
bind(model_name: str, infer_func: Union[Callable, Sequence[Callable]], inputs: Sequence[Tensor], outputs: Sequence[Tensor], model_version: int = 1, config: Optional[ModelConfig] = None, strict: bool = False) -> None
Create a model with given name and inference callable binding into Triton Inference Server.
More information about model configuration: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md
Parameters:
-
infer_func
(Union[Callable, Sequence[Callable]]
) –Inference callable to handle request/response from Triton Inference Server
-
inputs
(Sequence[Tensor]
) –Definition of model inputs
-
outputs
(Sequence[Tensor]
) –Definition of model outputs
-
model_name
(str
) –Name under which model is available in Triton Inference Server. It can only contain
-
model_version
(int
, default:1
) –Version of model
-
config
(Optional[ModelConfig]
, default:None
) –Model configuration for Triton Inference Server deployment
-
strict
(bool
, default:False
) –Enable strict validation between model config outputs and inference function result
Source code in pytriton/triton.py
connect
Connect to Triton Inference Server.
Raises:
-
TimeoutError
–if Triton Inference Server is not ready after timeout
Source code in pytriton/triton.py
is_connected
is_connected() -> bool
serve
serve(monitoring_period_s: float = MONITORING_PERIOD_S) -> None
Run Triton Inference Server and lock thread for serving requests/response.
Parameters:
-
monitoring_period_s
(float
, default:MONITORING_PERIOD_S
) –the timeout of monitoring if Triton and models are available. Every monitoring_period_s seconds main thread wakes up and check if triton server and proxy backend are still alive and sleep again. If triton or proxy is not alive - method returns.
Source code in pytriton/triton.py
stop
stop() -> bool
Stop Triton Inference Server and clean workspace.
Source code in pytriton/triton.py
pytriton.proxy.types.Request
dataclass
Data class for request data including numpy array inputs.