Profile

model_navigator.package.profile

profile(package, dataloader=None, target_formats=None, target_device=DeviceKind.CUDA, runners=None, max_batch_size=None, batch_sizes=None, window_size=DEFAULT_WINDOW_SIZE, stability_percentage=DEFAULT_STABILITY_PERCENTAGE, stabilization_windows=DEFAULT_STABILIZATION_WINDOWS, min_trials=DEFAULT_MIN_TRIALS, max_trials=DEFAULT_MAX_TRIALS, throughput_cutoff_threshold=DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD, throughput_backoff_limit=DEFAULT_THROUGHPUT_BACKOFF_LIMIT, verbose=False)

Profile provided package.

When dataloader is provided, use all samples obtained from dataloader per-each batch size to perform profiling. The profiling result return the min, max and average results per batch size from all samples.

When no dataloader provided, the profiling sample from package is used.

Parameters:

package (Package) –

Package to profile.
dataloader (Optional[SizedDataLoader], default: None ) –

Sized iterable with data that will be feed to the model
target_formats (Optional[Tuple[Union[str, Format], ...]], default: None ) –

Formats to profile. Defaults to target formats from the package.
target_device (Optional[DeviceKind], default: CUDA ) –

Target device to run profiling on.
runners (Optional[Tuple[Union[str, Type[NavigatorRunner]], ...]], default: None ) –

Runners to run profiling on. Defaults to runners from the package.
max_batch_size (Optional[int], default: None ) –

Maximal batch size used for profiling. Default: None
batch_sizes (Optional[List[int]], default: None ) –

List of batch sizes to profile. Default: None
window_size (int, default: DEFAULT_WINDOW_SIZE ) –

Number of inference queries performed in measurement window
stability_percentage (float, default: DEFAULT_STABILITY_PERCENTAGE ) –

Allowed percentage of variation from the mean in three consecutive windows.
stabilization_windows (int, default: DEFAULT_STABILIZATION_WINDOWS ) –

Number consecutive windows selected for stabilization.
min_trials (int, default: DEFAULT_MIN_TRIALS ) –

Minimal number of window trials.
max_trials (int, default: DEFAULT_MAX_TRIALS ) –

Maximum number of window trials.
throughput_cutoff_threshold (float, default: DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD ) –

Minimum throughput increase to continue profiling.
throughput_backoff_limit (int, default: DEFAULT_THROUGHPUT_BACKOFF_LIMIT ) –

Back-off limit to run multiple more profiling steps to avoid stop at local minimum when throughput saturate based on throughput_cutoff_threshold.
verbose (bool, default: False ) –

If True enable verbose logging. Defaults to False.

Returns:

ProfilingResults –

Profiling results

Source code in model_navigator/package/__init__.py

def profile(
    package: Package,
    dataloader: Optional[SizedDataLoader] = None,
    target_formats: Optional[Tuple[Union[str, Format], ...]] = None,
    target_device: Optional[DeviceKind] = DeviceKind.CUDA,
    runners: Optional[Tuple[Union[str, Type[NavigatorRunner]], ...]] = None,
    max_batch_size: Optional[int] = None,
    batch_sizes: Optional[List[int]] = None,
    window_size: int = DEFAULT_WINDOW_SIZE,
    stability_percentage: float = DEFAULT_STABILITY_PERCENTAGE,
    stabilization_windows: int = DEFAULT_STABILIZATION_WINDOWS,
    min_trials: int = DEFAULT_MIN_TRIALS,
    max_trials: int = DEFAULT_MAX_TRIALS,
    throughput_cutoff_threshold: float = DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD,
    throughput_backoff_limit: int = DEFAULT_THROUGHPUT_BACKOFF_LIMIT,
    verbose: bool = False,
) -> ProfilingResults:
    """Profile provided package.

    When `dataloader` is provided, use all samples obtained from dataloader per-each batch size to perform profiling.
    The profiling result return the min, max and average results per batch size from all samples.

    When no `dataloader` provided, the profiling sample from package is used.

    Args:
        package: Package to profile.
        dataloader: Sized iterable with data that will be feed to the model
        target_formats: Formats to profile. Defaults to target formats from the package.
        target_device: Target device to run profiling on.
        runners: Runners to run profiling on. Defaults to runners from the package.
        max_batch_size: Maximal batch size used for profiling. Default: None
        batch_sizes: List of batch sizes to profile. Default: None
        window_size: Number of inference queries performed in measurement window
        stability_percentage: Allowed percentage of variation from the mean in three consecutive windows.
        stabilization_windows: Number consecutive windows selected for stabilization.
        min_trials: Minimal number of window trials.
        max_trials: Maximum number of window trials.
        throughput_cutoff_threshold: Minimum throughput increase to continue profiling.
        throughput_backoff_limit: Back-off limit to run multiple more profiling steps to avoid stop at local minimum
                                  when throughput saturate based on `throughput_cutoff_threshold`.
        verbose: If True enable verbose logging. Defaults to False.

    Returns:
        Profiling results
    """
    if package.is_empty() and package.model is None:
        raise ModelNavigatorEmptyPackageError(
            "Package is empty and source model is not loaded. Unable to run optimize."
        )

    config = package.config
    is_source_available = package.model is not None

    if target_formats is None:
        target_formats = get_target_formats(framework=package.framework, is_source_available=is_source_available)

    if runners is None:
        runners = default_runners(device_kind=target_device)

    if dataloader is None:
        dataloader = []

    optimization_profile = OptimizationProfile(
        max_batch_size=max_batch_size,
        batch_sizes=batch_sizes,
        window_size=window_size,
        stability_percentage=stability_percentage,
        stabilization_windows=stabilization_windows,
        min_trials=min_trials,
        max_trials=max_trials,
        throughput_cutoff_threshold=throughput_cutoff_threshold,
        throughput_backoff_limit=throughput_backoff_limit,
    )

    _update_config(
        config=config,
        dataloader=dataloader,
        is_source_available=is_source_available,
        target_formats=target_formats,
        runners=runners,
        optimization_profile=optimization_profile,
        verbose=verbose,
        target_device=target_device,
    )

    builders = [
        preprocessing_builder,
        profiling_builder,
    ]

    model_configs = _get_model_configs(
        config=config,
        custom_configs=[],
    )
    profiling_results = profile_pipeline(
        package=package,
        config=config,
        builders=builders,
        models_config=model_configs,
    )

    return profiling_results

model_navigator.package.ProfilingResults `dataclass`

ProfilingResults(models, samples_data)

Profiling results for models.

Parameters:

models (Dict[str, RunnerResults]) –

Mapping of models and their runner results

to_dict

to_dict()

Return results in form of dictionary.

Source code in model_navigator/package/profiling_results.py

def to_dict(self):
    """Return results in form of dictionary."""
    return dataclass2dict(self)

to_file

to_file(path)

Save results to file.

Parameters:

path (Union[str, Path]) –

A path to yaml files

Source code in model_navigator/package/profiling_results.py

def to_file(self, path: Union[str, pathlib.Path]):
    """Save results to file.

    Args:
        path: A path to yaml files
    """
    path = pathlib.Path(path)
    data = self.to_dict()
    with path.open("w") as f:
        yaml.safe_dump(data, f, sort_keys=False)

model_navigator.package.profiling_results.RunnerResults `dataclass`

RunnerResults(runners)

Result for runners.

Parameters:

runners (Dict[str, RunnerProfilingResults]) –

Mapping of runner and their profiling results

model_navigator.package.profiling_results.RunnerProfilingResults `dataclass`

RunnerProfilingResults(status, detailed)

Profiling results for runner.

Parameters:

status (CommandStatus) –

Status of profiling execution
detailed (Dict[int, List[ProfilingResult]]) –

Result mapping - per sample id

model_navigator.package.profiling_results.ProfilingResult `dataclass`

ProfilingResult(batch_size, avg_latency, std_latency, p50_latency, p90_latency, p95_latency, p99_latency, throughput, avg_gpu_clock, request_count)

Result for single profiling for sample.

Parameters:

batch_size (int) –

Size of batch used for profiling
avg_latency (float) –

Average latency of profiling
std_latency (float) –

Standard deviation of profiled latency
p50_latency (float) –

50th percentile of measured latency
p90_latency (float) –

90th percentile of measured latency
p95_latency (float) –

95th percentile of measured latency
p99_latency (float) –

99th percentile of measured latency
throughput (float) –

Inferences per second
request_count (int) –

Number of inference requests

Profile

model_navigator.package.profile

model_navigator.package.ProfilingResults dataclass

to_dict

to_file

model_navigator.package.profiling_results.RunnerResults dataclass

model_navigator.package.profiling_results.RunnerProfilingResults dataclass

model_navigator.package.profiling_results.ProfilingResult dataclass

model_navigator.package.ProfilingResults `dataclass`

model_navigator.package.profiling_results.RunnerResults `dataclass`

model_navigator.package.profiling_results.RunnerProfilingResults `dataclass`

model_navigator.package.profiling_results.ProfilingResult `dataclass`