The newest tool for PhysiCell provides an easy way to load your PhysiCell output data into python for analysis. This builds upon previous work on loading data into MATLAB. A post on that tool can be found at:
PhysiCell stores output data as a MultiCell Digital Snapshot (MultiCellDS) that consists of several files for each time step and is probably stored in your ./output directory. pyMCDS is a python object that is initialized with the .xml file
What you’ll need
- python-loader, available on GitHub at
- Python 3.x, recommended distribution available at
- A number of Python packages, included in anaconda or available through pip
- Some PhysiCell data, probably in your ./output directory
Anatomy of a MultiCell Digital Snapshot
Each time PhysiCell’s internal time tracker passes a time step where data is to be saved, it generates a number of files of various types. Each of these files will have a number at the end that indicates where it belongs in the sequence of outputs. All of the files from the first round of output will end in 00000000.* and the second round will be 00000001.* and so on. Let’s say we’re interested in a set of output from partway through the run, the 88th set of output files. The files we care about most from this set consists of:
- output00000087.xml: This file is the main organizer of the data. It contains an overview of the data stored in the MultiCellDS as well as some actual data including:
- Metadata about the time and runtime for the current time step
- Coordinates for the computational domain
- Parameters for diffusing substrates in the microenvironment
- Column labels for the cell data
- File names for the files that contain microenvironment and cell data at this time step
- output00000087_microenvironment0.mat: This is a MATLAB matrix file that contains all of the data about the microenvironment at this time step
- output00000087_cells_physicell.mat: This is a MATLAB matrix file that contains all of the tracked information about the individual cells in the model. It tells us things like the cells’ position, volume, secretion, cell cycle status, and user-defined cell parameters.
From the appropriate file in your PhysiCell directory, wherever pyMCDS.py lives, you can use the data loader in your own scripts or in an interactive session. To start you have to import the pyMCDS class?
Loading the data
Data is loaded into python from the MultiCellDS by initializing the pyMCDS object. The initialization function for pyMCDS takes one required and one optional argument.
We are interested in reading output00000087.xml that lives in ~/path/to/PhysiCell/output (don’t worry Windows paths work too). We would initialize our pyMCDS object using those names and the actual data would be stored in a member dictionary called data.?
We’ve tried to keep everything organized inside of this dictionary but let’s take a look at what we actually have in here. Of course in real output, there will probably not be a chemical named my_chemical, this is simply there to illustrate how multiple chemicals are handled.
The data member dictionary is a dictionary of dictionaries whose child dictionaries can be accessed through normal python dictionary syntax.?
Each of these subdictionaries contains data, we will take a look at exactly what that data is and how it can be accessed in the following sections.
The metadata dictionary contains information about the time of the simulation as well as units for both times and space. Here and in later sections blue boxes indicate scalars and green boxes indicate strings. We can access each of these things using normal dictionary syntax. We’ve also got access to a helper function get_time() for the common operation of retrieving the simulation time.?
The mesh dictionary has a lot more going on than the metadata dictionary. It contains three numpy arrays, indicated by orange boxes, as well as another dictionary. The three arrays contain 𝑥, 𝑦 and 𝑧 coordinates for the centers of the voxels that constiture the computational domain in a meshgrid format. This means that each of those arrays is tensors of rank three. Together they identify the coordinates of each possible point in the space.
In contrast, the arrays in the voxel dictionary are stored linearly. If we know that we care about voxel number 42, we want to use the stuff in the voxels dictionary. If we want to make a contour plot, we want to use the x_coordinates, y_coordinates, and z_coordinates arrays.?
The continuum_variables dictionary is the most complicated of the four. It contains subdictionaries that we access using the names of each of the chemicals in the microenvironment. In our toy example above, these are oxygen and my_chemical. If our model tracked diffusing oxygen, VEGF, and glucose, then the continuum_variables dictionary would contain a subdirectory for each of them.
For a particular chemical species in the microenvironment we have two more dictionaries called decay_rate and diffusion_coefficient, and a numpy array called data. The diffusion and decay dictionaries each complete the value stored as a scalar and the unit stored as a string. The numpy array contains the concentrations of the chemical in each voxel at this time and is the same shape as the meshgrids of the computational domain stored in the .data[‘mesh’] arrays.?
The discrete cells dictionary is relatively straightforward. It contains a number of numpy arrays that contain information regarding individual cells. These are all 1-dimensional arrays and each corresponds to one of the variables specified in the output*.xml file. With the default settings, these are:
- ID: unique integer that will identify the cell throughout its lifetime in the simulation
- position(_x, _y, _z): floating point positions for the cell in 𝑥, 𝑦, and 𝑧 directions
- total_volume: total volume of the cell
- cell_type: integer label for the cell as used in PhysiCell
- cycle_model: integer label for the cell cycle model as used in PhysiCell
- current_phase: integer specification for which phase of the cycle model the cell is currently in
- elapsed_time_in_phase: time that cell has been in current phase of cell cycle model
- nuclear_volume: volume of cell nucleus
- cytoplasmic_volume: volume of cell cytoplasm
- fluid_fraction: proportion of the volume due to fliud
- calcified_fraction: proportion of volume consisting of calcified material
- orientation(_x, _y, _z): direction in which cell is pointing
- migration_speed: current speed of cell
- motility_vector(_x, _y, _z): current direction of movement of cell
- migration_bias: coefficient for stochastic movement (higher is “more deterministic”)
- motility_bias_direction(_x, _y, _z): direction of movement bias
- persistence_time: time in-between direction changes for cell
These examples will not be made using our toy dataset described above but will instead be made using a single timepoint dataset that can be found at:
Substrate contour plot
One of the big advantages of working with PhysiCell data in python is that we have access to its plotting tools. For the sake of example let’s plot the partial pressure of oxygen throughout the computational domain along the 𝑧=0 plane. Once we’ve loaded our data by initializing a pyMCDS object, we can work entirely within python to produce the plot.?
Adding a cells layer
We can also use pandas to do fairly complex selections of cells to add to our plots. Below we use pandas and the previous plot to add a cells layer.?
The first extension of this project will be timeseries functionality. This will provide similar data loading functionality but for a time series of MultiCell Digital Snapshots instead of simply one point in time.