Remote Mode
Remote mode is a way to use the PyTriton with the Triton Inference Server running remotely (at this moment it must be deployed on the same machine, but may be launched in a different container).
To bind the model in remote mode, it is required to use the RemoteTriton class instead of Triton.
Only difference of using RemoteTriton is that it requires the triton url argument in the constructor.
Example of binding a model in remote mode
Example below assumes that the Triton Inference Server is running on the same machine (launched with PyTriton in separate python script).
RemoteTriton binds remote model to existing Triton Inference Server.
When RemoteTriton is closed, the model is unloaded from the server.
import numpy as np
from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import RemoteTriton, TritonConfig
triton_config = TritonConfig(
cache_config=[f"local,size={1024 * 1024}"], # 1MB
)
@batch
def _add_sub(**inputs):
a_batch, b_batch = inputs.values()
add_batch = a_batch + b_batch
sub_batch = a_batch - b_batch
return {"add": add_batch, "sub": sub_batch}
with RemoteTriton(url='localhost') as triton:
triton.bind(
model_name="AddSub",
infer_func=_add_sub,
inputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
outputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
config=ModelConfig(max_batch_size=8, response_cache=True)
)
triton.serve()