Skip to content

Profile

model_navigator.package.profile

profile(package, dataloader=None, target_formats=None, target_device=DeviceKind.CUDA, runners=None, max_batch_size=None, batch_sizes=None, window_size=DEFAULT_WINDOW_SIZE, stability_percentage=DEFAULT_STABILITY_PERCENTAGE, stabilization_windows=DEFAULT_STABILIZATION_WINDOWS, min_trials=DEFAULT_MIN_TRIALS, max_trials=DEFAULT_MAX_TRIALS, throughput_cutoff_threshold=DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD, throughput_backoff_limit=DEFAULT_THROUGHPUT_BACKOFF_LIMIT, verbose=False)

Profile provided package.

When dataloader is provided, use all samples obtained from dataloader per-each batch size to perform profiling. The profiling result return the min, max and average results per batch size from all samples.

When no dataloader provided, the profiling sample from package is used.

Parameters:

  • package (Package) –

    Package to profile.

  • dataloader (Optional[SizedDataLoader], default: None ) –

    Sized iterable with data that will be feed to the model

  • target_formats (Optional[Tuple[Union[str, Format], ...]], default: None ) –

    Formats to profile. Defaults to target formats from the package.

  • target_device (Optional[DeviceKind], default: CUDA ) –

    Target device to run profiling on.

  • runners (Optional[Tuple[Union[str, Type[NavigatorRunner]], ...]], default: None ) –

    Runners to run profiling on. Defaults to runners from the package.

  • max_batch_size (Optional[int], default: None ) –

    Maximal batch size used for profiling. Default: None

  • batch_sizes (Optional[List[int]], default: None ) –

    List of batch sizes to profile. Default: None

  • window_size (int, default: DEFAULT_WINDOW_SIZE ) –

    Number of inference queries performed in measurement window

  • stability_percentage (float, default: DEFAULT_STABILITY_PERCENTAGE ) –

    Allowed percentage of variation from the mean in three consecutive windows.

  • stabilization_windows (int, default: DEFAULT_STABILIZATION_WINDOWS ) –

    Number consecutive windows selected for stabilization.

  • min_trials (int, default: DEFAULT_MIN_TRIALS ) –

    Minimal number of window trials.

  • max_trials (int, default: DEFAULT_MAX_TRIALS ) –

    Maximum number of window trials.

  • throughput_cutoff_threshold (float, default: DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD ) –

    Minimum throughput increase to continue profiling.

  • throughput_backoff_limit (int, default: DEFAULT_THROUGHPUT_BACKOFF_LIMIT ) –

    Back-off limit to run multiple more profiling steps to avoid stop at local minimum when throughput saturate based on throughput_cutoff_threshold.

  • verbose (bool, default: False ) –

    If True enable verbose logging. Defaults to False.

Returns:

Source code in model_navigator/package/__init__.py
def profile(
    package: Package,
    dataloader: Optional[SizedDataLoader] = None,
    target_formats: Optional[Tuple[Union[str, Format], ...]] = None,
    target_device: Optional[DeviceKind] = DeviceKind.CUDA,
    runners: Optional[Tuple[Union[str, Type[NavigatorRunner]], ...]] = None,
    max_batch_size: Optional[int] = None,
    batch_sizes: Optional[List[int]] = None,
    window_size: int = DEFAULT_WINDOW_SIZE,
    stability_percentage: float = DEFAULT_STABILITY_PERCENTAGE,
    stabilization_windows: int = DEFAULT_STABILIZATION_WINDOWS,
    min_trials: int = DEFAULT_MIN_TRIALS,
    max_trials: int = DEFAULT_MAX_TRIALS,
    throughput_cutoff_threshold: float = DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD,
    throughput_backoff_limit: int = DEFAULT_THROUGHPUT_BACKOFF_LIMIT,
    verbose: bool = False,
) -> ProfilingResults:
    """Profile provided package.

    When `dataloader` is provided, use all samples obtained from dataloader per-each batch size to perform profiling.
    The profiling result return the min, max and average results per batch size from all samples.

    When no `dataloader` provided, the profiling sample from package is used.

    Args:
        package: Package to profile.
        dataloader: Sized iterable with data that will be feed to the model
        target_formats: Formats to profile. Defaults to target formats from the package.
        target_device: Target device to run profiling on.
        runners: Runners to run profiling on. Defaults to runners from the package.
        max_batch_size: Maximal batch size used for profiling. Default: None
        batch_sizes: List of batch sizes to profile. Default: None
        window_size: Number of inference queries performed in measurement window
        stability_percentage: Allowed percentage of variation from the mean in three consecutive windows.
        stabilization_windows: Number consecutive windows selected for stabilization.
        min_trials: Minimal number of window trials.
        max_trials: Maximum number of window trials.
        throughput_cutoff_threshold: Minimum throughput increase to continue profiling.
        throughput_backoff_limit: Back-off limit to run multiple more profiling steps to avoid stop at local minimum
                                  when throughput saturate based on `throughput_cutoff_threshold`.
        verbose: If True enable verbose logging. Defaults to False.

    Returns:
        Profiling results
    """
    if package.is_empty() and package.model is None:
        raise ModelNavigatorEmptyPackageError(
            "Package is empty and source model is not loaded. Unable to run optimize."
        )

    config = package.config
    is_source_available = package.model is not None

    if target_formats is None:
        target_formats = get_target_formats(framework=package.framework, is_source_available=is_source_available)

    if runners is None:
        runners = default_runners(device_kind=target_device)

    if dataloader is None:
        dataloader = []

    optimization_profile = OptimizationProfile(
        max_batch_size=max_batch_size,
        batch_sizes=batch_sizes,
        window_size=window_size,
        stability_percentage=stability_percentage,
        stabilization_windows=stabilization_windows,
        min_trials=min_trials,
        max_trials=max_trials,
        throughput_cutoff_threshold=throughput_cutoff_threshold,
        throughput_backoff_limit=throughput_backoff_limit,
    )

    _update_config(
        config=config,
        dataloader=dataloader,
        is_source_available=is_source_available,
        target_formats=target_formats,
        runners=runners,
        optimization_profile=optimization_profile,
        verbose=verbose,
        target_device=target_device,
    )

    builders = [
        preprocessing_builder,
        profiling_builder,
    ]

    model_configs = _get_model_configs(
        config=config,
        custom_configs=[],
    )
    profiling_results = profile_pipeline(
        package=package,
        config=config,
        builders=builders,
        models_config=model_configs,
    )

    return profiling_results

model_navigator.package.ProfilingResults dataclass

ProfilingResults(models, samples_data)

Profiling results for models.

Parameters:

to_dict

to_dict()

Return results in form of dictionary.

Source code in model_navigator/package/profiling_results.py
def to_dict(self):
    """Return results in form of dictionary."""
    return dataclass2dict(self)

to_file

to_file(path)

Save results to file.

Parameters:

Source code in model_navigator/package/profiling_results.py
def to_file(self, path: Union[str, pathlib.Path]):
    """Save results to file.

    Args:
        path: A path to yaml files
    """
    path = pathlib.Path(path)
    data = self.to_dict()
    with path.open("w") as f:
        yaml.safe_dump(data, f, sort_keys=False)

model_navigator.package.profiling_results.RunnerResults dataclass

RunnerResults(runners)

Result for runners.

Parameters:

model_navigator.package.profiling_results.RunnerProfilingResults dataclass

RunnerProfilingResults(status, detailed)

Profiling results for runner.

Parameters:

  • status (CommandStatus) –

    Status of profiling execution

  • detailed (Dict[int, List[ProfilingResult]]) –

    Result mapping - per sample id

model_navigator.package.profiling_results.ProfilingResult dataclass

ProfilingResult(batch_size, avg_latency, std_latency, p50_latency, p90_latency, p95_latency, p99_latency, throughput, avg_gpu_clock, request_count)

Result for single profiling for sample.

Parameters:

  • batch_size (int) –

    Size of batch used for profiling

  • avg_latency (float) –

    Average latency of profiling

  • std_latency (float) –

    Standard deviation of profiled latency

  • p50_latency (float) –

    50th percentile of measured latency

  • p90_latency (float) –

    90th percentile of measured latency

  • p95_latency (float) –

    95th percentile of measured latency

  • p99_latency (float) –

    99th percentile of measured latency

  • throughput (float) –

    Inferences per second

  • request_count (int) –

    Number of inference requests