Profile
model_navigator.package.profile
profile(package, dataloader=None, target_formats=None, target_device=DeviceKind.CUDA, runners=None, max_batch_size=None, batch_sizes=None, window_size=DEFAULT_WINDOW_SIZE, stability_percentage=DEFAULT_STABILITY_PERCENTAGE, stabilization_windows=DEFAULT_STABILIZATION_WINDOWS, min_trials=DEFAULT_MIN_TRIALS, max_trials=DEFAULT_MAX_TRIALS, throughput_cutoff_threshold=DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD, throughput_backoff_limit=DEFAULT_THROUGHPUT_BACKOFF_LIMIT, verbose=False)
Profile provided package.
When dataloader
is provided, use all samples obtained from dataloader per-each batch size to perform profiling.
The profiling result return the min, max and average results per batch size from all samples.
When no dataloader
provided, the profiling sample from package is used.
Parameters:
-
package
(Package
) –Package to profile.
-
dataloader
(Optional[SizedDataLoader]
, default:None
) –Sized iterable with data that will be feed to the model
-
target_formats
(Optional[Tuple[Union[str, Format], ...]]
, default:None
) –Formats to profile. Defaults to target formats from the package.
-
target_device
(Optional[DeviceKind]
, default:CUDA
) –Target device to run profiling on.
-
runners
(Optional[Tuple[Union[str, Type[NavigatorRunner]], ...]]
, default:None
) –Runners to run profiling on. Defaults to runners from the package.
-
max_batch_size
(Optional[int]
, default:None
) –Maximal batch size used for profiling. Default: None
-
batch_sizes
(Optional[List[int]]
, default:None
) –List of batch sizes to profile. Default: None
-
window_size
(int
, default:DEFAULT_WINDOW_SIZE
) –Number of inference queries performed in measurement window
-
stability_percentage
(float
, default:DEFAULT_STABILITY_PERCENTAGE
) –Allowed percentage of variation from the mean in three consecutive windows.
-
stabilization_windows
(int
, default:DEFAULT_STABILIZATION_WINDOWS
) –Number consecutive windows selected for stabilization.
-
min_trials
(int
, default:DEFAULT_MIN_TRIALS
) –Minimal number of window trials.
-
max_trials
(int
, default:DEFAULT_MAX_TRIALS
) –Maximum number of window trials.
-
throughput_cutoff_threshold
(float
, default:DEFAULT_THROUGHPUT_CUTOFF_THRESHOLD
) –Minimum throughput increase to continue profiling.
-
throughput_backoff_limit
(int
, default:DEFAULT_THROUGHPUT_BACKOFF_LIMIT
) –Back-off limit to run multiple more profiling steps to avoid stop at local minimum when throughput saturate based on
throughput_cutoff_threshold
. -
verbose
(bool
, default:False
) –If True enable verbose logging. Defaults to False.
Returns:
-
ProfilingResults
–Profiling results
Source code in model_navigator/package/__init__.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
|
model_navigator.package.ProfilingResults
dataclass
Profiling results for models.
Parameters:
-
models
(Dict[str, RunnerResults]
) –Mapping of models and their runner results
to_dict
to_file
Save results to file.
Parameters:
Source code in model_navigator/package/profiling_results.py
model_navigator.package.profiling_results.RunnerResults
dataclass
Result for runners.
Parameters:
-
runners
(Dict[str, RunnerProfilingResults]
) –Mapping of runner and their profiling results
model_navigator.package.profiling_results.RunnerProfilingResults
dataclass
Profiling results for runner.
Parameters:
-
status
(CommandStatus
) –Status of profiling execution
-
detailed
(Dict[int, List[ProfilingResult]]
) –Result mapping - per sample id
model_navigator.package.profiling_results.ProfilingResult
dataclass
ProfilingResult(batch_size, avg_latency, std_latency, p50_latency, p90_latency, p95_latency, p99_latency, throughput, avg_gpu_clock, request_count)
Result for single profiling for sample.
Parameters:
-
batch_size
(int
) –Size of batch used for profiling
-
avg_latency
(float
) –Average latency of profiling
-
std_latency
(float
) –Standard deviation of profiled latency
-
p50_latency
(float
) –50th percentile of measured latency
-
p90_latency
(float
) –90th percentile of measured latency
-
p95_latency
(float
) –95th percentile of measured latency
-
p99_latency
(float
) –99th percentile of measured latency
-
throughput
(float
) –Inferences per second
-
request_count
(int
) –Number of inference requests