Remote Mode

Remote mode is a way to use the PyTriton with the Triton Inference Server running remotely (at this moment it must be deployed on the same machine, but may be launched in a different container).

To bind the model in remote mode, it is required to use the RemoteTriton class instead of Triton. Only difference of using RemoteTriton is that it requires the triton url argument in the constructor.

Example of binding a model in remote mode

Example below assumes that the Triton Inference Server is running on the same machine (launched with PyTriton in separate python script).

RemoteTriton binds remote model to existing Triton Inference Server. When RemoteTriton is closed, the model is unloaded from the server.

import numpy as np

from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import RemoteTriton, TritonConfig

triton_config = TritonConfig(
    cache_config=[f"local,size={1024 * 1024}"],  # 1MB
)

@batch
def _add_sub(**inputs):
    a_batch, b_batch = inputs.values()
    add_batch = a_batch + b_batch
    sub_batch = a_batch - b_batch
    return {"add": add_batch, "sub": sub_batch}

with RemoteTriton(url='localhost') as triton:
    triton.bind(
        model_name="AddSub",
        infer_func=_add_sub,
        inputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
        outputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
        config=ModelConfig(max_batch_size=8, response_cache=True)
    )
    triton.serve()