Generate data samples¶
Warning
The pycompwa is no longer maintained. Use the ComPWA packages QRules, AmpForm, and TensorWaves instead!
There are two different types of data samples: (1) phase space samples and (2) intensity-based samples (which mimic the intensity distributions of the reaction). In addition, you may want to generate more data points in specific regions of phase space (e.g. in the case of a sharp resonance). This is called importance sampling.
Usually, you need a phase space sample to generate an intensity-based sample, so we start with that.
Phase space samples¶
With the pycompwa.ui
module you can easily create various types of Monte Carlo phase space samples. You do this with the function generate_phsp()
.
One of the required arguments is a PhaseSpaceEventGenerator
instance. There are currently two options: EvtGenGenerator
and RootGenerator
. Using the EvtGenGenerator
is recommended for numerical precision:
- class EvtGenGenerator
Bases:
pycompwa.ui.PhaseSpaceEventGenerator
- __init__(self: EvtGenGenerator, arg0: ParticleStateTransitionKinematicsInfo) None
As you can see, such a PhaseSpaceEventGenerator
requires state transition info from a Kinematics
instance, such as the HelicityKinematics
class.
The second argument for the generate_phsp()
function is a random number generator. Here, there are again two options: StdUniformRealGenerator
and RootUniformRealGenerator
.
To illustrate this procedure, we first need to create some kinematics model:
import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR) # not interested in warnings now
from pycompwa.expertsystem.amplitude.helicitydecay import (
HelicityAmplitudeGeneratorXML,
)
from pycompwa.expertsystem.ui.system_control import (
InteractionTypes,
StateTransitionManager,
)
initial_state = [("J/psi", [-1, 1])]
final_state = [("gamma"), ("pi0"), ("pi0")]
tbd_manager = StateTransitionManager(
initial_state,
final_state,
formalism_type="helicity",
topology_building="isobar",
)
tbd_manager.set_allowed_interaction_types(
[InteractionTypes.Strong, InteractionTypes.EM]
)
tbd_manager.allowed_intermediate_particles = ["f"]
graph_interaction_settings_groups = tbd_manager.prepare_graphs()
solutions, _ = tbd_manager.find_solutions(graph_interaction_settings_groups)
print("found " + str(len(solutions)) + " solutions!")
xml_generator = HelicityAmplitudeGeneratorXML()
xml_generator.generate(solutions)
xml_generator.write_to_file("model.xml")
2021-10-27 10:25:35,200 INFO [default] Logging to file disabled!
2021-10-27 10:25:35,200 [INFO] Log level: INFO
2021-10-27 10:25:35,200 [INFO] Current date and time: Wed Oct 27 10:25:35 2021
found 32 solutions!
Now we can go through the procedure explained above:
import pycompwa.ui as pwa
# specify kinematics
kinematics = pwa.create_helicity_kinematics("model.xml")
# specify generators
generator = pwa.EvtGenGenerator(
kinematics.get_particle_state_transition_kinematics_info()
)
random_generator = pwa.StdUniformRealGenerator(12345)
# generate phase space sample
phsp_sample = pwa.generate_phsp(1000, generator, random_generator)
2021-10-27 10:25:41,034 [INFO] HelicityKinematics::HelicityKinematics() | Initialized kinematics for reaction ( J/psi )->( gamma[ID=2] pi0[ID=3] pi0[ID=4] )
Event position to final state ID mapping:
0: 2
1: 3
2: 4
2021-10-27 10:25:41,035 [INFO] Generating phase-space MC: [1000 events]
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h?25l
0.00min [ ********** 100.00% ] 0.00min , end time: 10:25AM ?25h
Intensity-based samples¶
Data samples are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need an Intensity
object and a phase space over which to generate that intensity distribution. We therefore call such a data sample an intensity-based sample.
An intensity-based sample is generated with the function generate()
. Its usage is similar to generate_phsp()
, but now you have to give an Intensity
and a Kinematics
instance as additional arguments. Sampling is performed through hit & miss, which is reflected by the efficiency printed at the end.
# create intensity profile
particle_list = pwa.read_particles("model.xml")
intensity_builder = pwa.IntensityBuilderXML(
"model.xml", particle_list, kinematics, phsp_sample
)
intensity = intensity_builder.create_intensity()
# generate data
data_sample = pwa.generate(
100, kinematics, generator, intensity, random_generator
)
2021-10-27 10:25:41,057 [INFO] Setting phase space sample weights...
2021-10-27 10:25:41,067 [INFO] Updating data container content...
2021-10-27 10:25:41,093 [INFO] Generating hit-and-miss sample: [100 events]
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.00min [ ********** 100.00% ] 0.00min , end time: 10:25AM ?25h
2021-10-27 10:25:41,347 [INFO] Successfully generated 100 with an efficiency of 0.00666667
Importance sampling¶
In the example above, we used the decay \(J/\psi \to \gamma \pi^0 \pi^0\) with f resonances only. Imagine, however, that we want to investigate the resonance \(\omega \to \gamma\pi^0\). This resonance has such a narrow peak that you will run into problems when normalizing the intensity model.
To normalize a distribution, you need know the integral over that distribution. The integral over an intensity distribution distribution can only be computed numerically. We simply do this by computing the sum of the intensities evaluated for each point in a phase space (Monte Carlo integration), and averaging over the number of phase space points. This means that the higher the number of phase space points, the more accurate your integral.
With a narrow peak, however, many of the points in an evenly distributed phase space sample will lie outside the peak region, where the intensity (and therefore their contribution to the integral) is highest. So in order to have a good estimate for the integral over this narrow peak structure, you want to have a larger number of phase space points under that peak. Now, you could just use a bigger phase space sample (increasing number of points within the peak region), but this will be resource intensive as it will result in many hits in regions that hardly contributes to the integral.
A smarter strategy is generate a sample that has a higher density of phase space points within the peak region. This strategy is called importance sampling.
We first again create an amplitude model, this time including a ‘widened’ \(\omega\) resonance (see also Particle database). Widening \(\omega\) is only done so that the resonance is still visible―the peak is still extremely narrow.
from pycompwa.expertsystem.amplitude.helicitydecay import (
HelicityAmplitudeGeneratorXML,
)
from pycompwa.expertsystem.state.particle import particle_list
from pycompwa.expertsystem.ui.system_control import (
InteractionTypes,
StateTransitionManager,
)
initial_state = [("J/psi", [-1, 1])]
final_state = [("gamma"), ("pi0"), ("pi0")]
tbd_manager = StateTransitionManager(
initial_state,
final_state,
formalism_type="helicity",
topology_building="isobar",
)
# Widen omega resonance
omega = particle_list["omega(782)"]
parameters = omega["DecayInfo"]["Parameter"]
for par in parameters:
if par["@Type"] == "Width":
par["Value"] = 0.001
tbd_manager.set_allowed_interaction_types(
[InteractionTypes.Strong, InteractionTypes.EM]
)
tbd_manager.allowed_intermediate_particles = ["f2(1270)", "omega"]
graph_interaction_settings_groups = tbd_manager.prepare_graphs()
solutions, _ = tbd_manager.find_solutions(graph_interaction_settings_groups)
print("found " + str(len(solutions)) + " solutions!")
xml_generator = HelicityAmplitudeGeneratorXML()
xml_generator.generate(solutions)
xml_generator.write_to_file("model.xml")
found 28 solutions!
Now we generate a phase space sample that is importance sampled by the
intensity. This can be done with the
generate_importance_sampled_phsp()
function.
Note
Note that you still need a ‘normal’ phase space sample for the IntensityBuilderXML
. In theory, however, this shouldn’t be necessary at this stage, but in the current implementation, this sample is required for computing the outermost intensity. Luckily, during data generation the parameters of the model stay fixed, so this normalization is irrelevant and just causes a constant scaling factor.
import pycompwa.ui as pwa
# specify kinematics
particle_list = pwa.read_particles("model.xml")
kinematics = pwa.create_helicity_kinematics("model.xml", particle_list)
# specify generators
generator = pwa.EvtGenGenerator(
kinematics.get_particle_state_transition_kinematics_info()
)
random_generator = pwa.StdUniformRealGenerator(12345)
# generate evenly distributed phase space sample
phsp_sample = pwa.generate_phsp(100000, generator, random_generator)
# create intensity
intensity_builder = pwa.IntensityBuilderXML(
"model.xml", particle_list, kinematics, phsp_sample
)
intensity = intensity_builder.create_intensity()
# generate importance-sampled phase space sample
phsp_sample_importance = pwa.generate_importance_sampled_phsp(
2000, kinematics, generator, intensity, random_generator
)
2021-10-27 10:25:44,604 [INFO] HelicityKinematics::HelicityKinematics() | Initialized kinematics for reaction ( J/psi )->( gamma[ID=2] pi0[ID=3] pi0[ID=4] )
Event position to final state ID mapping:
0: 2
1: 3
2: 4
2021-10-27 10:25:44,606 [INFO] Generating phase-space MC: [100000 events]
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.00min [ ********** 100.00% ] 0.00min , end time: 10:25AM ?25h
2021-10-27 10:25:44,690 [INFO] Setting phase space sample weights...
2021-10-27 10:25:44,692 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_-1_to_omega(782)_1+pi0_0;omega(782)_1_to_gamma_-1+pi0_0; already exists!
2021-10-27 10:25:44,692 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_-1_to_omega(782)_-1+pi0_0;omega(782)_-1_to_gamma_-1+pi0_0; already exists!
2021-10-27 10:25:44,694 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_1_to_omega(782)_1+pi0_0;omega(782)_1_to_gamma_-1+pi0_0; already exists!
2021-10-27 10:25:44,694 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_1_to_omega(782)_-1+pi0_0;omega(782)_-1_to_gamma_-1+pi0_0; already exists!
2021-10-27 10:25:44,696 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_1_to_omega(782)_1+pi0_0;omega(782)_1_to_gamma_1+pi0_0; already exists!
2021-10-27 10:25:44,696 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_1_to_omega(782)_-1+pi0_0;omega(782)_-1_to_gamma_1+pi0_0; already exists!
2021-10-27 10:25:44,698 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_-1_to_omega(782)_1+pi0_0;omega(782)_1_to_gamma_1+pi0_0; already exists!
2021-10-27 10:25:44,699 [ERROR] IntensityBuilderXML::addFunctionTreeComponent(): FunctionTree with name J/psi_-1_to_omega(782)_-1+pi0_0;omega(782)_-1_to_gamma_1+pi0_0; already exists!
2021-10-27 10:25:44,703 [INFO] Updating data container content...
2021-10-27 10:25:46,673 [INFO] Generating phase space sample (hit-and-miss importance sampled): [2000 events]
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
2021-10-27 10:25:47,100 [INFO] Tools::generateImportanceSampledPhsp() | Error in HitMiss procedure: Maximum value of random number generation smaller then amplitude maximum! We raise the maximum to 46.4232 value and restart generation!
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
2021-10-27 10:25:47,313 [INFO] Tools::generateImportanceSampledPhsp() | Error in HitMiss procedure: Maximum value of random number generation smaller then amplitude maximum! We raise the maximum to 51.6626 value and restart generation!
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
2021-10-27 10:25:47,755 [INFO] Tools::generateImportanceSampledPhsp() | Error in HitMiss procedure: Maximum value of random number generation smaller then amplitude maximum! We raise the maximum to 69.7599 value and restart generation!
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.03min [ ****------ 40.50% ] 0.05min , end time: 10:25AM ?25h?25l
0.05min [ ********** 100.00% ] 0.00min , end time: 10:25AM ?25h
Warning
The current implementation is simple and uniformly generates events in the phase space domain. Depending on the shape of your Intensity, this can be quite computation intensive. However, ideally you would only run this once for a specific reaction.
The effect of this importance sampling can be best seen in a Dalitz plot. There, one can see that there are more events in the regions where the intensity is large (the resonances), but one still sees a uniform distribution overall.
To create a Dalitz plot, we first have to create a DataSet
, as we saw in this section:
kinematics.create_all_subsystems()
phsp_set = kinematics.convert(phsp_sample)
phsp_set_importance = kinematics.convert(phsp_sample_importance)
2021-10-27 10:25:50,750 [INFO] creating all Subsystems!
We can then use matplotlib.pyplot
to create the Dalitz plot. Note that we use replace_ids()
to transform the final state IDs into readable particles names.
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from pycompwa.data import naming
def dalitz_plot(data_set, mass_x, mass_y, bins=50, **kwargs):
"""Helper function to create a Dalitz plot with useful axis titles."""
plt.hist2d(
data_set.data[mass_x],
data_set.data[mass_y],
weights=data_set.weights,
bins=bins,
**kwargs,
)
plt.xlabel(naming.replace_ids(mass_x, kinematics))
plt.ylabel(naming.replace_ids(mass_y, kinematics))
dalitz_plot(
phsp_set_importance, "mSq_(3,4)", "mSq_(2,4)", bins=80, norm=LogNorm()
)
plt.gca().set_title("importance-sampled phase space");
Note how this compares to the evenly distributed phase space sample and the intensity-based sample:
dalitz_plot(phsp_set, "mSq_(3,4)", "mSq_(2,4)", bins=80, norm=LogNorm())
plt.gca().set_title("evenly distributed phase space");
pwa.Logging("error")
data_sample = pwa.generate(
2000, kinematics, generator, intensity, random_generator
)
data_set = kinematics.convert(data_sample)
dalitz_plot(data_set, "mSq_(3,4)", "mSq_(2,4)", bins=80, norm=LogNorm())
plt.gca().set_title("intensity-based sample");
2021-10-27 10:25:51,961 [INFO] Logging to file disabled!
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.00min [ ---------- 0.00% ] 0.00min , end time: 10:25AM ?25h
?25l
0.03min [ *******--- 72.00% ] 0.01min , end time: 10:25AM ?25h?25l
0.03min [ ********** 100.00% ] 0.00min , end time: 10:25AM ?25h