TritonConfig
pytriton.triton.TritonConfig
dataclass
TritonConfig(model_repository: Optional[Path] = None, id: Optional[str] = None, log_verbose: Optional[int] = None, log_file: Optional[Path] = None, exit_timeout_secs: Optional[int] = None, exit_on_error: Optional[bool] = None, strict_readiness: Optional[bool] = None, allow_http: Optional[bool] = None, http_address: Optional[str] = None, http_port: Optional[int] = None, http_header_forward_pattern: Optional[str] = None, http_thread_count: Optional[int] = None, allow_grpc: Optional[bool] = None, grpc_address: Optional[str] = None, grpc_port: Optional[int] = None, grpc_header_forward_pattern: Optional[str] = None, grpc_infer_allocation_pool_size: Optional[int] = None, grpc_use_ssl: Optional[bool] = None, grpc_use_ssl_mutual: Optional[bool] = None, grpc_server_cert: Optional[Path] = None, grpc_server_key: Optional[Path] = None, grpc_root_cert: Optional[Path] = None, grpc_infer_response_compression_level: Optional[str] = None, grpc_keepalive_time: Optional[int] = None, grpc_keepalive_timeout: Optional[int] = None, grpc_keepalive_permit_without_calls: Optional[bool] = None, grpc_http2_max_pings_without_data: Optional[int] = None, grpc_http2_min_recv_ping_interval_without_data: Optional[int] = None, grpc_http2_max_ping_strikes: Optional[int] = None, allow_metrics: Optional[bool] = None, allow_gpu_metrics: Optional[bool] = None, allow_cpu_metrics: Optional[bool] = None, metrics_interval_ms: Optional[int] = None, metrics_port: Optional[int] = None, metrics_address: Optional[str] = None, allow_sagemaker: Optional[bool] = None, sagemaker_port: Optional[int] = None, sagemaker_safe_port_range: Optional[str] = None, sagemaker_thread_count: Optional[int] = None, allow_vertex_ai: Optional[bool] = None, vertex_ai_port: Optional[int] = None, vertex_ai_thread_count: Optional[int] = None, vertex_ai_default_model: Optional[str] = None, metrics_config: Optional[List[str]] = None, trace_config: Optional[List[str]] = None, cache_config: Optional[List[str]] = None, cache_directory: Optional[str] = None, buffer_manager_thread_count: Optional[int] = None)
Triton Inference Server configuration class for customization of server execution.
The arguments are optional. If value is not provided the defaults for Triton Inference Server are used. Please, refer to https://github.com/triton-inference-server/server/ for more details.
Parameters:
-
id
(Optional[str]
, default:None
) –Identifier for this server.
-
log_verbose
(Optional[int]
, default:None
) –Set verbose logging level. Zero (0) disables verbose logging and values >= 1 enable verbose logging.
-
log_file
(Optional[Path]
, default:None
) –Set the name of the log output file.
-
exit_timeout_secs
(Optional[int]
, default:None
) –Timeout (in seconds) when exiting to wait for in-flight inferences to finish.
-
exit_on_error
(Optional[bool]
, default:None
) –Exit the inference server if an error occurs during initialization.
-
strict_readiness
(Optional[bool]
, default:None
) –If true /v2/health/ready endpoint indicates ready if the server is responsive and all models are available.
-
allow_http
(Optional[bool]
, default:None
) –Allow the server to listen for HTTP requests.
-
http_address
(Optional[str]
, default:None
) –The address for the http server to bind to. Default is 0.0.0.0.
-
http_port
(Optional[int]
, default:None
) –The port for the server to listen on for HTTP requests. Default is 8000.
-
http_header_forward_pattern
(Optional[str]
, default:None
) –The regular expression pattern that will be used for forwarding HTTP headers as inference request parameters.
-
http_thread_count
(Optional[int]
, default:None
) –Number of threads handling HTTP requests.
-
allow_grpc
(Optional[bool]
, default:None
) –Allow the server to listen for GRPC requests.
-
grpc_address
(Optional[str]
, default:None
) –The address for the grpc server to binds to. Default is 0.0.0.0.
-
grpc_port
(Optional[int]
, default:None
) –The port for the server to listen on for GRPC requests. Default is 8001.
-
grpc_header_forward_pattern
(Optional[str]
, default:None
) –The regular expression pattern that will be used for forwarding GRPC headers as inference request parameters.
-
grpc_infer_allocation_pool_size
(Optional[int]
, default:None
) –The maximum number of inference request/response objects that remain allocated for reuse. As long as the number of in-flight requests doesn't exceed this value there will be no allocation/deallocation of request/response objects.
-
grpc_use_ssl
(Optional[bool]
, default:None
) –Use SSL authentication for GRPC requests. Default is false.
-
grpc_use_ssl_mutual
(Optional[bool]
, default:None
) –Use mututal SSL authentication for GRPC requests. This option will preempt grpc_use_ssl if it is also specified. Default is false.
-
grpc_server_cert
(Optional[Path]
, default:None
) –File holding PEM-encoded server certificate. Ignored unless grpc_use_ssl is true.
-
grpc_server_key
(Optional[Path]
, default:None
) –Path to file holding PEM-encoded server key. Ignored unless grpc_use_ssl is true.
-
grpc_root_cert
(Optional[Path]
, default:None
) –Path to file holding PEM-encoded root certificate. Ignored unless grpc_use_ssl is true.
-
grpc_infer_response_compression_level
(Optional[str]
, default:None
) –The compression level to be used while returning the inference response to the peer. Allowed values are none, low, medium and high. Default is none.
-
grpc_keepalive_time
(Optional[int]
, default:None
) –The period (in milliseconds) after which a keepalive ping is sent on the transport.
-
grpc_keepalive_timeout
(Optional[int]
, default:None
) –The period (in milliseconds) the sender of the keepalive ping waits for an acknowledgement.
-
grpc_keepalive_permit_without_calls
(Optional[bool]
, default:None
) –Allows keepalive pings to be sent even if there are no calls in flight
-
grpc_http2_max_pings_without_data
(Optional[int]
, default:None
) –The maximum number of pings that can be sent when there is no data/header frame to be sent.
-
grpc_http2_min_recv_ping_interval_without_data
(Optional[int]
, default:None
) –If there are no data/header frames being sent on the transport, this channel argument on the server side controls the minimum time (in milliseconds) that gRPC Core would expect between receiving successive pings.
-
grpc_http2_max_ping_strikes
(Optional[int]
, default:None
) –Maximum number of bad pings that the server will tolerate before sending an HTTP2 GOAWAY frame and closing the transport.
-
grpc_restricted_protocol
–Specify restricted GRPC protocol setting. The format of this flag is
<protocols>,<key>=<value>
. Where<protocol>
is a comma-separated list of protocols to be restricted.<key>
will be additional header key to be checked when a GRPC request is received, and<value>
is the value expected to be matched. -
allow_metrics
(Optional[bool]
, default:None
) –Allow the server to provide prometheus metrics.
-
allow_gpu_metrics
(Optional[bool]
, default:None
) –Allow the server to provide GPU metrics.
-
allow_cpu_metrics
(Optional[bool]
, default:None
) –Allow the server to provide CPU metrics.
-
metrics_interval_ms
(Optional[int]
, default:None
) –Metrics will be collected once every
<metrics-interval-ms>
milliseconds. -
metrics_port
(Optional[int]
, default:None
) –The port reporting prometheus metrics.
-
metrics_address
(Optional[str]
, default:None
) –The address for the metrics server to bind to. Default is the same as http_address.
-
allow_sagemaker
(Optional[bool]
, default:None
) –Allow the server to listen for Sagemaker requests.
-
sagemaker_port
(Optional[int]
, default:None
) –The port for the server to listen on for Sagemaker requests.
-
sagemaker_safe_port_range
(Optional[str]
, default:None
) –Set the allowed port range for endpoints other than the SageMaker endpoints.
-
sagemaker_thread_count
(Optional[int]
, default:None
) –Number of threads handling Sagemaker requests.
-
allow_vertex_ai
(Optional[bool]
, default:None
) –Allow the server to listen for Vertex AI requests.
-
vertex_ai_port
(Optional[int]
, default:None
) –The port for the server to listen on for Vertex AI requests.
-
vertex_ai_thread_count
(Optional[int]
, default:None
) –Number of threads handling Vertex AI requests.
-
vertex_ai_default_model
(Optional[str]
, default:None
) –The name of the model to use for single-model inference requests.
-
metrics_config
(Optional[List[str]]
, default:None
) –Specify a metrics-specific configuration setting. The format of this flag is
<setting>=<value>
. It can be specified multiple times -
trace_config
(Optional[List[str]]
, default:None
) –Specify global or trace mode specific configuration setting. The format of this flag is
<mode>,<setting>=<value>
. Where<mode>
is either 'triton' or 'opentelemetry'. The default is 'triton'. To specify global trace settings (level, rate, count, or mode), the format would be<setting>=<value>
. For 'triton' mode, the server will use Triton's Trace APIs. For 'opentelemetry' mode, the server will use OpenTelemetry's APIs to generate, collect and export traces for individual inference requests. More details, including supported settings can be found at Triton trace guide. -
cache_config
(Optional[List[str]]
, default:None
) –Specify a cache-specific configuration setting. The format of this flag is
<cache_name>,<setting>=<value>
. Where<cache_name>
is the name of the cache, such as 'local' or 'redis'. Example:local,size=1048576
will configure a 'local' cache implementation with a fixed buffer pool of size 1048576 bytes. -
cache_directory
(Optional[str]
, default:None
) –The global directory searched for cache shared libraries. Default is '/opt/tritonserver/caches'. This directory is expected to contain a cache implementation as a shared library with the name 'libtritoncache.so'.
-
buffer_manager_thread_count
(Optional[int]
, default:None
) –The number of threads used to accelerate copies and other operations required to manage input and output tensor contents.
__post_init__
Validate configuration for early error handling.
from_dict
classmethod
from_dict(config: Dict[str, Any]) -> TritonConfig
Creates a TritonConfig
instance from an input dictionary. Values are converted into correct types.
Parameters:
Returns:
-
TritonConfig
–a
TritonConfig
instance
Source code in pytriton/triton.py
from_env
classmethod
from_env() -> TritonConfig
Creates TritonConfig from environment variables.
Environment variables should start with PYTRITON_TRITON_CONFIG_
prefix. For example:
PYTRITON_TRITON_CONFIG_GRPC_PORT=45436
PYTRITON_TRITON_CONFIG_LOG_VERBOSE=4
Typical use:
triton_config = TritonConfig.from_env()
Returns:
-
TritonConfig
–TritonConfig class instantiated from environment variables.