Skip to content

Changelog

0.2.4 (2023-08-10)

  • new: Introduced strict flag in Triton.bind which enables data types and shapes validation of inference callable outputs against model config
  • new: AsyncioModelClient which works in FastAPI and other async frameworks
  • fix: FuturesModelClient do not raise gevent.exceptions.InvalidThreadUseError
  • fix: Do not throw TimeoutError if could not connect to server during model verification

  • Version of Triton Inference Server embedded in wheel: 2.33.0

0.2.3 (2023-07-21)

  • Improved verification of Proxy Backend environment when running under same Python interpreter
  • Fixed pytriton.version to represent currently installed version

  • Version of Triton Inference Server embedded in wheel: 2.33.0

0.2.2 (2023-07-19)

  • Added inference_timeout_s parameters to client classes
  • Renamed PyTritonClientUrlParseError to PyTritonClientInvalidUrlError
  • ModelClient and FuturesModelClient methods raise PyTritonClientClosedError when used after client is closed
  • Pinned tritonclient dependency due to issues with tritonclient >= 2.34 on systems with glibc version lower than 2.34
  • Added warning after Triton Server setup and teardown while using too verbose logging level as it may cause a significant performance drop in model inference

  • Version of Triton Inference Server embedded in wheel: 2.33.0

0.2.1 (2023-06-28)

  • Fixed handling TritonConfig.cache_directory option - the directory was always overwritten with the default value.
  • Fixed tritonclient dependency - PyTriton need tritonclient supporting http headers and parameters
  • Improved shared memory usage to match 64MB limit (default value for Docker, Kubernetes) reducing the initial size for PyTriton Proxy Backend.

  • Version of Triton Inference Server embedded in wheel: 2.33.0

0.2.0 (2023-05-30)

  • Added support for using custom HTTP/gRPC request headers and parameters.

This change breaks backward compatibility of the inference function signature. The undecorated inference function now accepts a list of Request instances instead of a list of dictionaries. The Request class contains data for inputs and parameters for combined parameters and headers.

See docs/custom_params.md for further information

  • Added FuturesModelClient which enables sending inference requests in a parallel manner.
  • Added displaying documentation link after models are loaded.

  • Version of Triton Inference Server embedded in wheel: 2.33.0

0.1.5 (2023-05-12)

  • Improved pytriton.decorators.group_by_values function
  • Modified the function to avoid calling the inference callable on each individual sample when grouping by string/bytes input
  • Added pad_fn argument for easy padding and combining of the inference results
  • Fixed Triton binaries search
  • Improved Workspace management (remove workspace on shutdown)

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

0.1.4 (2023-03-16)

  • Add validation of the model name passed to Triton bind method.
  • Add monkey patching of InferenceServerClient.__del__ method to prevent unhandled exceptions.

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

0.1.3 (2023-02-20)

  • Fixed getting model config in fill_optionals decorator.

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

0.1.2 (2023-02-14)

  • Fixed wheel build to support installations on operating systems with glibc version 2.31 or higher.
  • Updated the documentation on custom builds of the package.
  • Change: TritonContext instance is shared across bound models and contains model_configs dictionary.
  • Fixed support of binding multiple models that uses methods of the same class.

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

0.1.1 (2023-01-31)

  • Change: The @first_value decorator has been updated with new features:
  • Renamed from @first_values to @first_value
  • Added a strict flag to toggle the checking of equality of values on a single selected input of the request. Default is True
  • Added a squeeze_single_values flag to toggle the squeezing of single value ND arrays to scalars. Default is True
  • Fix: @fill_optionals now supports non-batching models
  • Fix: @first_value fixed to work with optional inputs
  • Fix: @group_by_values fixed to work with string inputs
  • Fix: @group_by_values fixed to work per sample-wise

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

0.1.0 (2023-01-12)

  • Initial release of PyTriton

  • Version of external components used during testing:

  • Triton Inference Server: 2.29.0
  • Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.