# Data module

:::{warning}

The {doc}`pycompwa </index>` is no longer maintained. Use the [ComPWA](https://compwa-org.rtfd.io) packages [QRules](https://qrules.rtfd.io), [AmpForm](https://ampform.rtfd.io), and [TensorWaves](https://tensorwaves.rtfd.io) instead!

:::

The {mod}`pycompwa.data` module provides several tools for importing, exporting, and visualizing data. By data, we mean *event-wise collections of four-momentum tuples*, possibly organized by particle name. We choose to work with {mod}`pandas` as a back-end, because it allows fast manipulation and visualization of data sets and can import and export to several standard data formats.

This notebook shows how to conveniently manipulate momentum tuple collections in such a way that they can be imported into ComPWA. It is also shown how to import data from other frameworks and how to do conversions to kinematic variables.

## PWA data frame

Input data for PWA frameworks mainly consists of event-wise four-momentum tuples, grouped by particle. The core of the {mod}`.data` module is therefore handled by a specially formatted {class}`pandas.DataFrame`. Such as specific format not only allows us to import and export to different file formats, but also to convert the data to ComPWA objects, such as an {class}`.EventCollection`.

The format is guaranteed through a decorator called {func}`~pandas.api.extensions.register_dataframe_accessor`. Such an accessor extends a {class}`~pandas.DataFrame` with several properties, granted that the {class}`~pandas.DataFrame` properly validates according to that accessor. In the {mod}`.data` module, this accessor is the {class}`.PwaAccessor`. We call a {class}`~pandas.DataFrame` that is formatted according to this accessor a {class}`PWA DataFrame <.PwaAccessor>`.

To be sure, all this sounds a bit abstract. To illustrate the usage of this accessor, we therefore first have to create a skeleton {class}`PWA DataFrame <.PwaAccessor>`. This can be done through the {mod}`.create` module.

In [None]:
from pycompwa.data import create

In [None]:
frame = create.pwa_frame(
    particle_names=["gamma", "pi0", "pi0"], number_of_rows=3
)
frame

Note that his {class}`~pandas.DataFrame` has hierarchical column name (see [multi-indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html)): the first column layer is the particle name, the second contains the four-momentum labels. In addition, duplicate particle names have been made unique by adding an index. Values are undefined by default, but you can set them later on. Here, we do this manually, but you can use this procedure for importing large data sets in your own Python scripts.

In [None]:
frame["gamma", "p_x"] = [-0.520903, -0.285015, 0.632325]
frame["gamma", "p_y"] = [0.885259, 0.520381, -0.779928]
frame["gamma", "p_z"] = [0.655934, -0.996574, -0.892786]
frame["gamma", "E"] = [1.21872, 1.15982, 1.34357]

frame["pi0-1", "p_x"] = [0.653672, 0.452265, 0.113717]
frame["pi0-1", "p_y"] = [-0.813022, -0.76188, 0.605441]
frame["pi0-1", "p_z"] = [-1.01763, 0.00723327, 0.718613]
frame["pi0-1", "E"] = [1.46359, 0.896256, 0.956093]

frame["pi0-2", "p_x"] = [-0.132769, -0.16725, -0.746043]
frame["pi0-2", "p_y"] = [-0.0722372, 0.241499, 0.174487]
frame["pi0-2", "p_z"] = [0.361697, 0.989341, 0.174172]
frame["pi0-2", "E"] = [0.414596, 1.04082, 0.797233]

We can now already have a glance at some of the properties that the {class}`.PwaAccessor` offers. You can access these properties through the `pwa` namespace and perform some standard {mod}`pandas` computation on them:

In [None]:
print("Particles:", frame.pwa.particles)
print("Momentum labels:", frame.pwa.momentum_labels)
print("Weights:", frame.pwa.weights)
print("Average pi0 mass:\n", frame[["pi0-1", "pi0-2"]].pwa.mass.stack().mean())
print(
    "Average gamma 3-momentum:\n",
    frame["gamma"].pwa.rho.mean(),
    "+/-",
    frame["gamma"].pwa.rho.std(),
)

We'll see more of these properties after we import some real data.

## Import and export data

The module {mod}`.data.io` allows one to import from and to data formats of other PWA frameworks. Here's an example, importing a `pawianHists.root` file. Such a file not only contains [ROOT histograms](https://root.cern.ch/root/htmldoc/guides/users-guide/Histograms.html) of the kinematic distributions, but also two [TTrees](https://root.cern.ch/doc/master/classTTree.html) of four-momentum tuples: one for data and one for fit intensities.

In [None]:
from pycompwa.data import io

In [None]:
frame_data = io.pawian.read_hists_file("jpsi_f0_gammapipi.root", "data")
frame_fit = io.pawian.read_hists_file("jpsi_f0_gammapipi.root", "fit")
frame_fit

Note how, here too, the {class}`~pandas.DataFrame` is formatted in such a way that it can be handled by the {class}`.PwaAccessor`. Also note that the {class}`~pandas.DataFrame` for the fit result contains weights. As discussed in {doc}`/usage/workflow/step4`, these are the fit intensities for each data point in the phase space. This allows us to already make some quick visualization of invariant mass distribution of the resonance:

In [None]:
import matplotlib.pyplot as plt

In [None]:
f0_data = frame_data["pi0_1"] + frame_data["pi0_2"]
f0_fit = frame_fit["pi0_1"] + frame_fit["pi0_2"]

f0_data.pwa.mass.hist(label="data", bins=60, density=True, alpha=0.5)
f0_fit.pwa.mass.hist(
    label="fit",
    bins=60,
    density=True,
    histtype="step",
    color="red",
    weights=frame_fit.pwa.intensities,
)

plt.xlabel("$M(\pi^0,\pi^0)$ [GeV]")
plt.legend()
plt.gca().set_title("f0 resonance")
plt.show()

You can also easily export the data again after you've made some adjustments, like selecting certain events. Just to illustrate the benefits of {mod}`pandas`, we apply some filter on one of the $\pi^0$ mass, export the frame to an ASCII file and import it again:

In [None]:
selection = frame_data[abs(f0_data.pwa.mass - 0.990) < 0.05]
io.pawian.write_ascii(selection, "filtered_data.dat")
io.pawian.read_ascii("filtered_data.dat", ["gamma", "pi0", "pi0"])

## Conversion to kinematic variables

Having a {class}`PWA DataFrame <.PwaAccessor>`, we can use ComPWA to convert the momentum tuples to kinematic variables. For that, of course, we first need to {doc}`create a model file </usage/workflow/step1>` for the kinematics. As can be seen from the column names of the {class}`PWA DataFrame <.PwaAccessor>` that we imported from the `pawianHists.root` file, we have momentum tuples for a $J/\psi \to \gamma\pi^0\pi^0$ decay and we saw that there is only one resonance ($f_0(980)$):

In [None]:
import logging

import pycompwa.ui as pwa

# not interested in warnings now
logger = logging.getLogger()
logger.setLevel(logging.ERROR)
pwa.Logging("ERROR");

In [None]:
from pycompwa.expertsystem.ui.system_control import (
    InteractionTypes,
    StateTransitionManager,
)

initial_state = [("J/psi", [-1, 1])]
final_state = [("gamma"), ("pi0"), ("pi0")]
tbd_manager = StateTransitionManager(
    initial_state,
    final_state,
    formalism_type="helicity",
    topology_building="isobar",
)

tbd_manager.set_allowed_interaction_types([InteractionTypes.EM])
tbd_manager.allowed_intermediate_particles = ["f0(980)"]

graph_interaction_settings_groups = tbd_manager.prepare_graphs()
solutions, _ = tbd_manager.find_solutions(graph_interaction_settings_groups)

from pycompwa.expertsystem.amplitude.helicitydecay import (
    HelicityAmplitudeGeneratorXML,
)

model_file = "jpsi_f0_gammapipi.xml"
xml_generator = HelicityAmplitudeGeneratorXML()
xml_generator.generate(solutions)
xml_generator.write_to_file(model_file)

Now that we have an XML model file defining the kinematics and a {class}`PWA DataFrame <.PwaAccessor>`, we can use the convert module to convert the {class}`~pandas.DataFrame` to an {class}`.EventCollection`. Note, however, that we will run into an exception:

In [None]:
from pycompwa.data import convert

In [None]:
try:
    convert.pandas_to_events(frame_data, model_file)
except Exception as exc:
    print("EXCEPTION:", exc)

What's going on here? The kinematics file works with final state IDs, so it doesn't understand the particle names here. Now, we could try to follow the first suggestion here, but this won't work:

In [None]:
from pycompwa.data import naming

In [None]:
naming.particle_to_id(frame_data, model_file)
frame_data.pwa.particles

As you can see, the $\gamma$ has been nicely renamed to its final state ID, but the renaming failed for the pions (it would have worked if the separator used for the added index for duplicate particles were a `-`). If we follow the second suggestion, it will work:

In [None]:
mapping = {"gamma": 2, "pi0_1": 3, "pi0_2": 4}
frame_data.rename(columns=mapping, inplace=True)
frame_fit.rename(columns=mapping, inplace=True)
events_data = convert.pandas_to_events(frame_data, model_file)
events_fit = convert.pandas_to_events(frame_fit, model_file)

Now that you have {class}`.EventCollection` instances, you are free to use all ComPWA functionality from {doc}`/usage/workflow/step3` onwards. If, however, you were more interested in the kinematic variables for these imported data sets immediately, you can expand the original {class}`PWA DataFrame <.PwaAccessor>` with the kinematic variables as follows:

In [None]:
import pycompwa.ui as pwa
from pycompwa.data import append

In [None]:
set_data = pwa.compute_kinematic_variables(events_data, model_file)
set_fit = pwa.compute_kinematic_variables(events_fit, model_file)
naming.id_to_particle(frame_data, model_file, make_unique=True)
naming.id_to_particle(frame_fit, model_file, make_unique=True)
append(frame_data, convert.data_set_to_pandas(set_data))
append(frame_fit, convert.data_set_to_pandas(set_fit))
frame_data.pwa.other_columns

Finally, we can plot the distributions of the kinematic variables (as computed by ComPWA) **of the imported data**.

In [None]:
var = "theta_2_4_vs_3"
frame_data[var].hist(label="data", bins=60, density=True, alpha=0.5)
frame_fit[var].hist(
    label="fit",
    bins=60,
    density=True,
    histtype="step",
    color="red",
    weights=frame_fit.pwa.intensities,
)
plt.gca().set_title(naming.replace_ids(var, model_file))
plt.legend();