7.5. Benchmarking#
Tip
This notebook benchmarks JAX on a single CPU core. Compare with Julia results as reported in ComPWA/polarimetry#27. See also the Extended benchmark #68 discussion.
Note
This notebook uses only one run and one loop for %timeit, because JAX seems to cache its return values.
Physical cores: 2
Total cores: 4
%%time
polarimetry_exprs = formulate_polarimetry(amplitude_builder, reference_subsystem)
unfolded_polarimetry_exprs = [
cached.unfold(expr, model.amplitudes) for expr in polarimetry_exprs
]
unfolded_intensity_expr = cached.unfold(model)
CPU times: user 4.99 s, sys: 12.8 ms, total: 5 s
Wall time: 5 s
7.5.1. DataTransformer performance#
n_events = 100_000
phsp_sample = generate_phasespace_sample(model.decay, n_events, seed=0)
transformer = create_data_transformer(model)
%timeit -n1 -r1 transformer(phsp_sample) # first run, so no cache and JIT-compilation
%timeit -n1 -r1 transformer(phsp_sample) # second run with cache
%timeit -n1 -r1 transformer(phsp_sample) # third run with cache
phsp_sample = transformer(phsp_sample)
random_point = {k: v[0] if len(v.shape) > 0 else v for k, v in phsp_sample.items()}
313 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.29 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.69 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
res = 54
grid_sample = generate_meshgrid_sample(model.decay, res)
%timeit -n1 -r1 transformer(grid_sample) # first run, without cache, but already compiled
%timeit -n1 -r1 transformer(grid_sample) # second run with cache
%timeit -n1 -r1 transformer(grid_sample) # third run with cache
grid_sample = transformer(grid_sample)
328 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
459 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
208 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
7.5.2. Parametrized function#
Total number of mathematical operations:
\(\alpha_x\): 86,766
\(\alpha_y\): 86,770
\(\alpha_z\): 86,766
\(I_\mathrm{tot}\): 28,430
%%time
parametrized_polarimetry_funcs = [
cached.lambdify(expr, model.parameter_defaults)
for expr in unfolded_polarimetry_exprs
]
parametrized_intensity_func = cached.lambdify(
unfolded_intensity_expr, model.parameter_defaults
)
CPU times: user 5.62 s, sys: 43.8 ms, total: 5.66 s
Wall time: 5.66 s
rng = np.random.default_rng(seed=0)
original_parameters = dict(parametrized_intensity_func.parameters)
modified_parameters = {
k: rng.uniform(0.9, 1.1) * v
for k, v in parametrized_intensity_func.parameters.items()
}
7.5.2.1. One data point#
7.5.2.1.1. JIT-compilation#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(random_point)
<TimeitResult : 977 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](random_point)
array = parametrized_polarimetry_funcs[1](random_point)
array = parametrized_polarimetry_funcs[2](random_point)
<TimeitResult : 4.52 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.1.2. Compiled performance#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(random_point)
<TimeitResult : 932 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](random_point)
array = parametrized_polarimetry_funcs[1](random_point)
array = parametrized_polarimetry_funcs[2](random_point)
<TimeitResult : 3.76 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.2. 54x54 grid sample#
7.5.2.2.1. Compiled but uncached#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(grid_sample)
<TimeitResult : 9.14 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](grid_sample)
array = parametrized_polarimetry_funcs[1](grid_sample)
array = parametrized_polarimetry_funcs[2](grid_sample)
<TimeitResult : 1min 3s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.2.2. Second run with cache#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(grid_sample)
<TimeitResult : 718 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](grid_sample)
array = parametrized_polarimetry_funcs[1](grid_sample)
array = parametrized_polarimetry_funcs[2](grid_sample)
<TimeitResult : 1.38 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.3. 100.000 event phase space sample#
7.5.2.3.1. Compiled but uncached#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(phsp_sample)
<TimeitResult : 2.84 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](phsp_sample)
array = parametrized_polarimetry_funcs[1](phsp_sample)
array = parametrized_polarimetry_funcs[2](phsp_sample)
<TimeitResult : 17.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.3.2. Second run with cache#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(phsp_sample)
<TimeitResult : 1.46 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](phsp_sample)
array = parametrized_polarimetry_funcs[1](phsp_sample)
array = parametrized_polarimetry_funcs[2](phsp_sample)
<TimeitResult : 1.47 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.4. Recompilation after parameter modification#
parametrized_intensity_func.update_parameters(modified_parameters)
for func in parametrized_polarimetry_funcs:
func.update_parameters(modified_parameters)
7.5.2.4.1. Compiled but uncached#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(phsp_sample)
<TimeitResult : 2.79 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](phsp_sample)
array = parametrized_polarimetry_funcs[1](phsp_sample)
array = parametrized_polarimetry_funcs[2](phsp_sample)
<TimeitResult : 17.6 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.2.4.2. Second run with cache#
%%timeit -n1 -r1 -q -o
array = parametrized_intensity_func(phsp_sample)
<TimeitResult : 3.57 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = parametrized_polarimetry_funcs[0](phsp_sample)
array = parametrized_polarimetry_funcs[1](phsp_sample)
array = parametrized_polarimetry_funcs[2](phsp_sample)
<TimeitResult : 1.03 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
parametrized_intensity_func.update_parameters(original_parameters)
for func in parametrized_polarimetry_funcs:
func.update_parameters(original_parameters)
7.5.3. All parameters substituted#
subs_polarimetry_exprs = [
cached.xreplace(expr, model.parameter_defaults)
for expr in unfolded_polarimetry_exprs
]
subs_intensity_expr = cached.xreplace(unfolded_intensity_expr, model.parameter_defaults)
Number of mathematical operations after substituting all parameters:
\(\alpha_x\): 31,488
\(\alpha_y\): 31,492
\(\alpha_z\): 31,488
\(I_\mathrm{tot}\): 10,360
%%time
polarimetry_funcs = [cached.lambdify(expr) for expr in subs_polarimetry_exprs]
intensity_func = cached.lambdify(subs_intensity_expr)
CPU times: user 2.75 s, sys: 16 ms, total: 2.77 s
Wall time: 2.77 s
7.5.3.1. One data point#
7.5.3.1.1. JIT-compilation#
%%timeit -n1 -r1 -q -o
array = intensity_func(random_point)
<TimeitResult : 536 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](random_point)
array = polarimetry_funcs[1](random_point)
array = polarimetry_funcs[2](random_point)
<TimeitResult : 2.48 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.3.1.2. Compiled performance#
%%timeit -n1 -r1 -q -o
array = intensity_func(random_point)
<TimeitResult : 180 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](random_point)
array = polarimetry_funcs[1](random_point)
array = polarimetry_funcs[2](random_point)
<TimeitResult : 382 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.3.2. 54x54 grid sample#
7.5.3.2.1. Compiled but uncached#
%%timeit -n1 -r1 -q -o
array = intensity_func(grid_sample)
<TimeitResult : 6.55 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](grid_sample)
array = polarimetry_funcs[1](grid_sample)
array = polarimetry_funcs[2](grid_sample)
<TimeitResult : 46.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.3.2.2. Second run with cache#
%%timeit -n1 -r1 -q -o
array = intensity_func(grid_sample)
<TimeitResult : 116 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](grid_sample)
array = polarimetry_funcs[1](grid_sample)
array = polarimetry_funcs[2](grid_sample)
<TimeitResult : 190 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.3.3. 100.000 event phase space sample#
7.5.3.3.1. Compiled but uncached#
%%timeit -n1 -r1 -q -o
array = intensity_func(phsp_sample)
<TimeitResult : 1.91 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](phsp_sample)
array = polarimetry_funcs[1](phsp_sample)
array = polarimetry_funcs[2](phsp_sample)
<TimeitResult : 13.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.3.3.2. Second run with cache#
%%timeit -n1 -r1 -q -o
array = intensity_func(phsp_sample)
<TimeitResult : 160 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
%%timeit -n1 -r1 -q -o
array = polarimetry_funcs[0](phsp_sample)
array = polarimetry_funcs[1](phsp_sample)
array = polarimetry_funcs[2](phsp_sample)
<TimeitResult : 170 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>
7.5.4. Summary#
| parametrized | substituted | |||
|---|---|---|---|---|
| I | ɑ | I | ɑ | |
| random point (compilation) | 977 ms | 4.52 s | 536 ms | 2.48 s |
| random point (cached) | 932 μs | 3.76 ms | 180 μs | 382 μs |
| 54x54 grid | 9.14 s | 1min 3s | 6.55 s | 46.8 s |
| 54x54 grid (cached) | 718 μs | 1.38 ms | 116 μs | 190 μs |
| 100,000 phsp | 2.84 s | 17.7 s | 1.91 s | 13.1 s |
| 100,000 phsp (cached) | 1.46 ms | 1.47 ms | 160 μs | 170 μs |
| modified 100,000 phsp | 2.79 s | 17.6 s | ||
| modified 100,000 phsp (cached) | 3.57 ms | 1.03 ms | ||