Triton
pytriton.triton.Triton
Triton(*, config: Optional[TritonConfig] = None, workspace: Union[Workspace, str, Path, None] = None)
Bases: TritonBase
Triton Inference Server for Python models.
Initialize Triton Inference Server context for starting server and loading models.
Parameters:
-
config(Optional[TritonConfig], default:None) –TritonConfig object with optional customizations for Triton Inference Server. Configuration can be passed also through environment variables. See TritonConfig.from_env() class method for details.
Order of precedence:
- config defined through
configparameter of init method. - config defined in environment variables
- default TritonConfig values
- config defined through
-
workspace(Union[Workspace, str, Path, None], default:None) –workspace or path where the Triton Model Store and files used by pytriton will be created. If workspace is
Nonerandom workspace will be created. Workspace will be deleted in Triton.stop().
Source code in pytriton/triton.py
__enter__
__enter__() -> Triton
__exit__
Exit the context stopping the process and cleaning the workspace.
Parameters:
-
*_–unused arguments
bind
bind(model_name: str, infer_func: Union[Callable, Sequence[Callable]], inputs: Sequence[Tensor], outputs: Sequence[Tensor], model_version: int = 1, config: Optional[ModelConfig] = None, strict: bool = False) -> None
Create a model with given name and inference callable binding into Triton Inference Server.
More information about model configuration: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md
Parameters:
-
infer_func(Union[Callable, Sequence[Callable]]) –Inference callable to handle request/response from Triton Inference Server
-
inputs(Sequence[Tensor]) –Definition of model inputs
-
outputs(Sequence[Tensor]) –Definition of model outputs
-
model_name(str) –Name under which model is available in Triton Inference Server. It can only contain
-
model_version(int, default:1) –Version of model
-
config(Optional[ModelConfig], default:None) –Model configuration for Triton Inference Server deployment
-
strict(bool, default:False) –Enable strict validation between model config outputs and inference function result
Source code in pytriton/triton.py
connect
Connect to Triton Inference Server.
Raises:
-
TimeoutError–if Triton Inference Server is not ready after timeout
Source code in pytriton/triton.py
is_connected
is_connected() -> bool
run
serve
serve(monitoring_period_s: float = MONITORING_PERIOD_S) -> None
Run Triton Inference Server and lock thread for serving requests/response.
Parameters:
-
monitoring_period_s(float, default:MONITORING_PERIOD_S) –the timeout of monitoring if Triton and models are available. Every monitoring_period_s seconds main thread wakes up and check if triton server and proxy backend are still alive and sleep again. If triton or proxy is not alive - method returns.
Source code in pytriton/triton.py
stop
stop() -> bool
Stop Triton Inference Server and clean workspace.
Source code in pytriton/triton.py
pytriton.triton.RemoteTriton
Bases: TritonBase
RemoteTriton connects to Triton Inference Server running on remote host.
Initialize RemoteTriton.
Parameters:
-
url(str) –Triton Inference Server URL in form of
:// : If scheme is not provided, http is used as default. If port is not provided, 8000 is used as default for http and 8001 for grpc. -
workspace(Union[Workspace, str, Path, None], default:None) –path to be created where the files used by pytriton will be stored (e.g. socket files for communication). If workspace is
Nonetemporary workspace will be created. Workspace should be created in shared filesystem space between RemoteTriton and Triton Inference Server to allow access to socket files (if you use containers, folder must be shared between containers).
Source code in pytriton/triton.py
__enter__
__enter__() -> RemoteTriton
Entering the context connects to remote Triton server.
Returns:
-
RemoteTriton–A RemoteTriton object
__exit__
Exit the context stopping the process and cleaning the workspace.
Parameters:
-
*_–unused arguments
bind
bind(model_name: str, infer_func: Union[Callable, Sequence[Callable]], inputs: Sequence[Tensor], outputs: Sequence[Tensor], model_version: int = 1, config: Optional[ModelConfig] = None, strict: bool = False) -> None
Create a model with given name and inference callable binding into Triton Inference Server.
More information about model configuration: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md
Parameters:
-
infer_func(Union[Callable, Sequence[Callable]]) –Inference callable to handle request/response from Triton Inference Server
-
inputs(Sequence[Tensor]) –Definition of model inputs
-
outputs(Sequence[Tensor]) –Definition of model outputs
-
model_name(str) –Name under which model is available in Triton Inference Server. It can only contain
-
model_version(int, default:1) –Version of model
-
config(Optional[ModelConfig], default:None) –Model configuration for Triton Inference Server deployment
-
strict(bool, default:False) –Enable strict validation between model config outputs and inference function result
Source code in pytriton/triton.py
connect
Connect to Triton Inference Server.
Raises:
-
TimeoutError–if Triton Inference Server is not ready after timeout
Source code in pytriton/triton.py
is_connected
is_connected() -> bool
serve
serve(monitoring_period_s: float = MONITORING_PERIOD_S) -> None
Run Triton Inference Server and lock thread for serving requests/response.
Parameters:
-
monitoring_period_s(float, default:MONITORING_PERIOD_S) –the timeout of monitoring if Triton and models are available. Every monitoring_period_s seconds main thread wakes up and check if triton server and proxy backend are still alive and sleep again. If triton or proxy is not alive - method returns.
Source code in pytriton/triton.py
stop
stop() -> bool
Stop Triton Inference Server and clean workspace.
Source code in pytriton/triton.py
pytriton.proxy.types.Request
dataclass
Data class for request data including numpy array inputs.