PhysiCell Tools : Python-Loader

The newest tool for PhysiCell provides an easy way to load your PhysiCell output data into python for analysis. This builds upon previous work on loading data into MATLAB. A post on that tool can be found at:

PhysiCell stores output data as a MultiCell Digital Snapshot (MultiCellDS) that consists of several files for each time step and is probably stored in your ./output directory. pyMCDS is a python object that is initialized with the .xml file

What you’ll need

Anatomy of a MultiCell Digital Snapshot

Each time PhysiCell’s internal time tracker passes a time step where data is to be saved, it generates a number of files of various types. Each of these files will have a number at the end that indicates where it belongs in the sequence of outputs. All of the files from the first round of output will end in 00000000.* and the second round will be 00000001.* and so on. Let’s say we’re interested in a set of output from partway through the run, the 88th set of output files. The files we care about most from this set consists of:

  • output00000087.xml: This file is the main organizer of the data. It contains an overview of the data stored in the MultiCellDS as well as some actual data including:
    • Metadata about the time and runtime for the current time step
    • Coordinates for the computational domain
    • Parameters for diffusing substrates in the microenvironment
    • Column labels for the cell data
    • File names for the files that contain microenvironment and cell data at this time step
  • output00000087_microenvironment0.mat: This is a MATLAB matrix file that contains all of the data about the microenvironment at this time step
  • output00000087_cells_physicell.mat: This is a MATLAB matrix file that contains all of the tracked information about the individual cells in the model. It tells us things like the cells’ position, volume, secretion, cell cycle status, and user-defined cell parameters.


Using pyMCDS

From the appropriate file in your PhysiCell directory, wherever lives, you can use the data loader in your own scripts or in an interactive session. To start you have to import the pyMCDS class?

1from pyMCDS import pyMCDS

Loading the data

Data is loaded into python from the MultiCellDS by initializing the pyMCDS object. The initialization function for pyMCDS takes one required and one optional argument.

12345678__init__(xml_file, [output_path = '.'])'''xml_file : stringString containing the name of the output xml fileoutput_path : String containing the path (relative or absolute) to the directorywhere PhysiCell output files are stored'''

We are interested in reading output00000087.xml that lives in ~/path/to/PhysiCell/output (don’t worry Windows paths work too). We would initialize our pyMCDS object using those names and the actual data would be stored in a member dictionary called data.?

123mcds = pyMCDS('output00000087.xml', '~/path/to/PhysiCell/output')# Now our data lives

We’ve tried to keep everything organized inside of this dictionary but let’s take a look at what we actually have in here. Of course in real output, there will probably not be a chemical named my_chemical, this is simply there to illustrate how multiple chemicals are handled.

Overview of dictionary-of-dictionaries structure

The data member dictionary is a dictionary of dictionaries whose child dictionaries can be accessed through normal python dictionary syntax.?['metadata']['continuum_variables']['my_chemical']

Each of these subdictionaries contains data, we will take a look at exactly what that data is and how it can be accessed in the following sections.


Expanded metadata subdictionary

The metadata dictionary contains information about the time of the simulation as well as units for both times and space. Here and in later sections blue boxes indicate scalars and green boxes indicate strings. We can access each of these things using normal dictionary syntax. We’ve also got access to a helper function get_time() for the common operation of retrieving the simulation time.?

1234>>>['metadata']['time_units']'min'>>> mcds.get_time()5220.0


Expanded mesh dictionary

The mesh dictionary has a lot more going on than the metadata dictionary. It contains three numpy arrays, indicated by orange boxes, as well as another dictionary. The three arrays contain 𝑥, 𝑦 and 𝑧 coordinates for the centers of the voxels that constiture the computational domain in a meshgrid format. This means that each of those arrays is tensors of rank three. Together they identify the coordinates of each possible point in the space.

In contrast, the arrays in the voxel dictionary are stored linearly. If we know that we care about voxel number 42, we want to use the stuff in the voxels dictionary. If we want to make a contour plot, we want to use the x_coordinates, y_coordinates, and z_coordinates arrays.?

010203040506070809101112131415161718192021222324252627# We can extract one of the meshgrid arrays as a numpy array>>> y_coords =['mesh']['y_coordinates']>>> y_coords.shape(75, 75, 75)>>> y_coords[0, 0, :4]array([-740., -740., -740., -740.]) # We can also extract the array of voxel centers>>> centers =['mesh']['voxels']['centers']>>> centers.shape(3, 421875)>>> centers[:, :4]array([[-740., -720., -700., -680.],[-740., -740., -740., -740.],[-740., -740., -740., -740.]]) # We have a handy function to quickly extract the components of the full meshgrid>>> xx, yy, zz = mcds.get_mesh()>>> yy.shape(75, 75, 75)>>> yy[0, 0, :4]array([-740., -740., -740., -740.]) # We can also use this to return the meshgrid describing an x, y plane>>> xx, yy = mcds.get_2D_mesh()>>> yy.shape(75, 75)

Continuum variables

Expanded microenvironment dictionaries

The continuum_variables dictionary is the most complicated of the four. It contains subdictionaries that we access using the names of each of the chemicals in the microenvironment. In our toy example above, these are oxygen and my_chemical. If our model tracked diffusing oxygen, VEGF, and glucose, then the continuum_variables dictionary would contain a subdirectory for each of them.

For a particular chemical species in the microenvironment we have two more dictionaries called decay_rate and diffusion_coefficient, and a numpy array called data. The diffusion and decay dictionaries each complete the value stored as a scalar and the unit stored as a string. The numpy array contains the concentrations of the chemical in each voxel at this time and is the same shape as the meshgrids of the computational domain stored in the .data[‘mesh’] arrays.?

01020304050607080910111213141516171819202122232425262728293031323334353637# we need to know the names of the substrates to work with# this data. We have a function to help us find them.>>> mcds.get_substrate_names()['oxygen', 'my_chemical'] # The diffusable chemical dictionaries are messy# if we need to do a lot with them it might be easier# to put them into their own instance>>> oxy_dict =['continuum_variables']['oxygen']>>> oxy_dict['decay_rate']{'value': 0.1, 'units': '1/min'} # What we care about most is probably the numpy # array of concentrations>>> oxy_conc = oxy_dict['data']>>> oxy_conc.shape(75, 75, 75) # Alternatively, we can get the same array with a function>>> oxy_conc2 = mcds.get_concentrations('oxygen')>>> oxy_conc2.shape(75, 75, 75) # We can also get the concentrations on a plane using the# same function and supplying a z value to "slice through"# note that right now the z_value must be an exact match# for a plane of voxel centers, in the future we may add# interpolation.>>> oxy_plane = mcds.get_concentrations('oxygen', z_value=100.0)>>> oxy_plane.shape(75, 75) # we can also find the concentration in a single voxel using the# position of a point within that voxel. This will give us an# array of all concentrations at that point.>>> mcds.get_concentrations_at(x=0., y=550., z=0.)array([17.945144460.99113448])

Discrete Cells

The discrete cells dictionary is relatively straightforward. It contains a number of numpy arrays that contain information regarding individual cells.  These are all 1-dimensional arrays and each corresponds to one of the variables specified in the output*.xml file. With the default settings, these are:

  • ID: unique integer that will identify the cell throughout its lifetime in the simulation
  • position(_x, _y, _z): floating point positions for the cell in 𝑥, 𝑦, and 𝑧 directions
  • total_volume: total volume of the cell
  • cell_type: integer label for the cell as used in PhysiCell
  • cycle_model: integer label for the cell cycle model as used in PhysiCell
  • current_phase: integer specification for which phase of the cycle model the cell is currently in
  • elapsed_time_in_phase: time that cell has been in current phase of cell cycle model
  • nuclear_volume: volume of cell nucleus
  • cytoplasmic_volume: volume of cell cytoplasm
  • fluid_fraction: proportion of the volume due to fliud
  • calcified_fraction: proportion of volume consisting of calcified material
  • orientation(_x, _y, _z): direction in which cell is pointing
  • polarity:
  • migration_speed: current speed of cell
  • motility_vector(_x, _y, _z): current direction of movement of cell
  • migration_bias: coefficient for stochastic movement (higher is “more deterministic”)
  • motility_bias_direction(_x, _y, _z): direction of movement bias
  • persistence_time: time in-between direction changes for cell
  • motility_reserved:


010203040506070809101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869# Extracting single variables is just like before>>> cell_ids =['discrete_cells']['ID']>>> cell_ids.shape(18595,)>>> cell_ids[:4]array([0., 1., 2., 3.]) # If we're clever we can extract 2D arrays>>> cell_vec = np.zeros((cell_ids.shape[0], 3))>>> vec_list = ['position_x', 'position_y', 'position_z']>>> for i, lab in enumerate(vec_list):...     cell_vec[:, i] =['discrete_cells'][lab]...array([[ -69.72657128-39.02046405, -233.63178904],[ -69.84507464-22.71693265, -233.59277388],[ -69.84891462,   -6.04070516, -233.61816711],[ -69.845265  ,   10.80035554, -233.61667313]]) # We can get the list of all of the variables stored in this dictionary>>> mcds.get_cell_variables()['ID','position_x','position_y','position_z','total_volume','cell_type','cycle_model','current_phase','elapsed_time_in_phase','nuclear_volume','cytoplasmic_volume','fluid_fraction','calcified_fraction','orientation_x','orientation_y','orientation_z','polarity','migration_speed','motility_vector_x','motility_vector_y','motility_vector_z','migration_bias','motility_bias_direction_x','motility_bias_direction_y','motility_bias_direction_z','persistence_time','motility_reserved','oncoprotein','elastic_coefficient','kill_rate','attachment_lifetime','attachment_rate']# We can also get all of the cell data as a pandas DataFrame >>> cell_df = mcds.get_cell_df() >>> cell_df.head() ID     position_x   position_y    position_z total_volume cell_type cycle_model ... 0.0   - 69.726571  - 39.020464  - 233.631789       2494.0       0.0         5.0 ... 1.0   - 69.845075  - 22.716933  - 233.592774       2494.0       0.0         5.0 ... 2.0   - 69.848915  - 6.040705   - 233.618167       2494.0       0.0         5.0 ... 3.0   - 69.845265    10.800356  - 233.616673       2494.0       0.0         5.0 ... 4.0   - 69.828161    27.324530  - 233.631579       2494.0       0.0         5.0 ...  # if we want to we can also get just the subset of cells that# are in a specific voxel>>> vox_df = mcds.get_cell_df_at(x=0.0, y=550.0, z=0.0)>>> vox_df.iloc[:, :5]ID  position_x  position_y  position_z  total_volume26718  228761.0    6.623617  536.709341   -1.282934   2454.81450752736  270274.0   -7.990034  538.184921    9.648955   1523.386488


These examples will not be made using our toy dataset described above but will instead be made using a single timepoint dataset that can be found at:

Substrate contour plot

One of the big advantages of working with PhysiCell data in python is that we have access to its plotting tools. For the sake of example let’s plot the partial pressure of oxygen throughout the computational domain along the 𝑧=0 plane. Once we’ve loaded our data by initializing a pyMCDS object, we can work entirely within python to produce the plot.?

010203040506070809101112131415161718192021222324252627282930313233343536373839404142from pyMCDS import pyMCDSimport numpy as npimport matplotlib.pyplot as plt # load datamcds = pyMCDS('output00003696.xml', '../output') # Set our z plane and get our substrate values along itz_val = 0.00plane_oxy = mcds.get_concentrations('oxygen', z_slice=z_val) # Get the 2D mesh for contour plottingxx, yy = mcds.get_mesh() # We want to be able to control the number of contour levels so we# need to do a little set upnum_levels = 21min_conc = plane_oxy.min()max_conc = plane_oxy.max()my_levels = np.linspace(min_conc, max_conc, num_levels) # set up the figure area and add data layersfig, ax = plt.subplot()cs = ax.contourf(xx, yy, plane_oxy, levels=my_levels)ax.contour(xx, yy, plane_oxy, color='black', levels = my_levels,linewidths=0.5) # Now we need to add our color barcbar1 = fig.colorbar(cs, shrink=0.75)cbar1.set_label('mmHg') # Let's put the time in to make these look niceax.set_aspect('equal')ax.set_xlabel('x (micron)')ax.set_ylabel('y (micron)')ax.set_title('oxygen (mmHg) at t = {:.1f} {:s}, z = {:.2f} {:s}'.format(mcds.get_time(),['metadata']['time_units'],z_val,['metadata']['spatial_units'])
oxygen partial pressures over z=0

Adding a cells layer

We can also use pandas to do fairly complex selections of cells to add to our plots. Below we use pandas and the previous plot to add a cells layer.?

010203040506070809101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566from pyMCDS import pyMCDSimport numpy as npimport matplotlib.pyplot as plt # load datamcds = pyMCDS('output00003696.xml', '../output') # Set our z plane and get our substrate values along itz_val = 0.00plane_oxy = mcds.get_concentrations('oxygen', z_slice=z_val) # Get the 2D mesh for contour plottingxx, yy = mcds.get_mesh() # We want to be able to control the number of contour levels so we# need to do a little set upnum_levels = 21min_conc = plane_oxy.min()max_conc = plane_oxy.max()my_levels = np.linspace(min_conc, max_conc, num_levels) # get our cells data and figure out which cells are in the planecell_df = mcds.get_cell_df()ds = mcds.get_mesh_spacing()inside_plane = (cell_df['position_z'] < z_val + ds) \ & (cell_df['position_z'] > z_val - ds)plane_cells = cell_df[inside_plane] # We're going to plot two types of cells and we want it to look nicecolors = ['black', 'grey']sizes = [20, 8]labels = ['Alive', 'Dead'] # set up the figure area and add microenvironment layerfig, ax = plt.subplot()cs = ax.contourf(xx, yy, plane_oxy, levels=my_levels) # get our cells of interestalive_cells = plane_cells[plane_cells['cycle_model'] < 6]dead_cells = plane_cells[plane_cells['cycle_model'] > 6] # plot the cell layerfor i, plot_cells in enumerate((alive_cells, dead_cells)):ax.scatter(plot_cells['position_x'].values, plot_cells['position_y'].values, facecolor='none', edgecolors=colors[i],alpha=0.6,s=sizes[i],label=labels[i]) # Now we need to add our color barcbar1 = fig.colorbar(cs, shrink=0.75)cbar1.set_label('mmHg') # Let's put the time in to make these look niceax.set_aspect('equal')ax.set_xlabel('x (micron)')ax.set_ylabel('y (micron)')ax.set_title('oxygen (mmHg) at t = {:.1f} {:s}, z = {:.2f} {:s}'.format(mcds.get_time(),['metadata']['time_units'],z_val,['metadata']['spatial_units'])ax.legend(loc='upper right')
adding a cell layer to the oxygen plot

Future Direction

The first extension of this project will be timeseries functionality. This will provide similar data loading functionality but for a time series of MultiCell Digital Snapshots instead of simply one point in time.