There is no one-to-one match between our solution and Triton Inference Server features, especially in terms of supporting a user model store.
Support is currently limited to the x86-64 instruction set architecture.
Running multiple scripts hosting PyTriton on the same machine or container is not feasible.
Deadlocks may occur in some models when employing the NCCL communication library and multiple Inference Callables are triggered concurrently. This issue can be observed when deploying multiple instances of the same model or multiple models within a single server script. Additional information about this issue can be found here.