The Algorithm

In this section, we describe the functions used for actually running the algorithm. Here we only describe how to run the results after obtaining a CellProblem. If you want to know more about setting up the CellProblems, or solving any PDEs, please see:

Basis Functions

The functions below are used for defining basis functions:

StepwiseEQL.BasisSetType
BasisSet{F<:Tuple} <: Function

A set of basis functions.

Fields

  • bases::F: A Tuple of basis functions, each function being of the form (q, p) -> Number, with the same p for each basis function.

Constructors

BasisSet((f1, f2, ...))
BasisSet(f1, f2, ...)

Evaluating

You can evaluate a BasisSet using the method

(f::BasisSet{F})(q, θ, p) where {F}

which returns

\[\sum_{i=1}^n \theta_if_i(q, p)\]

source
StepwiseEQL.PolynomialBasisFunction
PolynomialBasis(d1, d2)

Construct a set of polynomial basis functions of degree d1 to d2, returning a BasisSet object.

Examples

julia> basis = PolynomialBasis(2, 4);

julia> basis(0.5, [0.1, 0.67, -2.3], nothing)
-0.034999999999999976

julia> 0.1 * 0.5^2 + 0.67 * 0.5^3 - 2.3 * 0.5^4
-0.034999999999999976
source

For example, the basis set $\{q, q^2, q^3\}$ can be defined in two ways:

using StepwiseEQL
f1 = (q, p) -> q 
f2 = (q, p) -> q^2 
f3 = (q, p) -> q^3 
B = BasisSet(f1, f2, f3)
(::BasisSet{Tuple{var"#307#308", var"#309#310", var"#311#312"}}) (generic function with 3 methods)
B = PolynomialBasis(1, 3)
(::BasisSet{Tuple{StepwiseEQL.var"#52#54"{Int64}, StepwiseEQL.var"#52#54"{Int64}, StepwiseEQL.var"#52#54"{Int64}}}) (generic function with 3 methods)

The reason for the second argument p is in case you want to give better control of the scaling on the basis function's coefficients. For example, the basis set $\{aq, bq^{-2}, cq^3\}$ can be defined as:

f1 = (q, p) -> p.a * q 
f2 = (q, p) -> p.b * q^(-2)
f3 = (q, p) -> p.c * q^3
B = BasisSet(f1, f2, f3)
(::BasisSet{Tuple{var"#313#314", var"#315#316", var"#317#318"}}) (generic function with 3 methods)

and you would then provide p when evaluating B, for example:

p = (a = 2.0, b = 3.0, c = 5.0) # same p for each basis function
B(0.2, [0.3, 0.5, 1], p)
37.66

which returns $\theta_1aq + \theta_2bq^{-2} + \theta_3cq^3$ with $\theta_1 = 0.3$, $\theta_2 = 0.5$, $\theta_3 = 1$, $a = 2$, $b = 3$, $c = 5$, and $q = 0.2$.

Averaging ODE Solutions

Usually averaging over multiple realisations from the discrete model can be handled internally by the stepwise function, but you may want to re-average for the purpose of plotting (as we do in case studies 3 and 4). For this reason, we provide AveragedODESolution as an exported function that you can use:

StepwiseEQL.AveragedODESolutionType
AveragedODESolution{K,S}

Struct representing a cell simulation averaged over many simulations.

Fields

  • u::K

These are the knots, storing the knots using for interpolating the solutions at each time, where u[j] the knots for the jth time.

  • t::Vector{Float64}

The vector of times.

  • q::Vector{Vector{Float64}}

The averaged densities, with q[j] the averaged density at time t[j] and at the corresponding knots in u[j].

  • cell_sol::S

The original cell simulation.

Constructor

You can construct an AveragedODESolution using the following method:

AveragedODESolution(
    sol::EnsembleSolution, 
    num_knots=100, 
    indices=eachindex(sol), 
    interp_fnc=LinearInterpolation, 
    stat=mean, 
    extrapolate=true
)

Note that the arguments are positional arguments, not keyword arguments.

Arguments

  • sol

The ensemble of cell solutions.

  • num_knots

The number of knots to use in the interpolant for averaging the cell solutions. Defaults to 100.

  • indices

The indices of the simulations to consider for averaging. Defaults to all simulations.

  • interp_fnc

A function of the form (u, t) that is used for averaging. Defaults to linear interpolation from DataInterpolations.jl.

  • stat

This should be a function, or a Tuple of two functions with one for each endpoint, that decides the range to use for the knots at each time step. Defaults to mean, meaning the knots for a given time will span between a and b, where a is the average left-most cell position over each simulation at that time, and b is the average right-most cell position over each simulation at that time. If, for example, stat were (minimum, maximum), then the minimum and maximum end-positions would be used for a and b, respectively, at the corresponding time.

  • extrapolate=true

Whether to allow for extrapolation when averaging the cell solutions. Defaults to true. If false, then when evaluating the interpolant for a given realisation and a given time, then the density for a node is set to zero if it exceeds the right-most knot position for the given time.

Output

Returns the corresponding AveragedODESolution.

source

The Stepwise Algorithm

The entry point into our stepwise procedure is stepwise_selection, documented in detail below.

StepwiseEQL.stepwise_selectionFunction
stepwise_selection(cell_sol; kwargs...)

Perform stepwise selection on the given cell solution cell_sol, given as a CellSolution from EpithelialDynamics1D.jl. In the list of keyword arguments below, the notation (A/B/C/D)_Y, for example, means that there are keyword arguments A_Y, B_Y, C_Y, and D_Y.

The result returned from this function will be one of:

  • EQLSolution

This gives the solution. You can display this in the REPL to see the table of results automatically. You can inspect the individual coefficients and other relevant objects by inspecting the fields of the struct. For example, if you have

sol = stepwise_selection(cell_sol; kwargs...)

then sol.diffusion_theta will be the set of coefficients for the diffusion function. You can query the full list of fields by using

propertynames(sol)

or typing sol.<TAB> in the REPL (<TAB> means hit tab). We also note that if you want to have more interactivity with the table that gets displayed in the REPL, you should look into the show method for EQLSolution, which has signature

Base.show(io::IO, ::MIME"text/plain", eql_sol::EQLSolution;
    step_limit=6,
    crop=:horizontal,
    backend=Val(:text),
    booktabs=true,
    show_votes=false,
    show_all_loss=false,
    crayon=Crayon(bold=true, foreground=:green),
    latex_crayon=["color{blue}", "textbf"],
    transpose=true)

For example,

Base.show(stdout, MIME"text/plain"(), sol; show_votes=false, show_all_loss=false, transpose=true)

prints the table without including the votes, only shows the complete loss function rather than also including its individual components, and transposes the table. A LaTeX version of the table can be obtained using backend=Val(:latex), e.g. a LaTeX version of the above could be printed using

Base.show(stdout, MIME"text/plain"(), sol; backend=Val(:latex), show_votes=false, show_all_loss=false, transpose=true)

You can also just use

latex_table(sol; kwargs...)

to get the LaTeX format printed.

  • EnsembleEQLSolution

In this case, you have provided model_samples as a keyword argument and model_samples > 1. This struct has four fields: solutions, which stores all the individual EQLSolutions; final_loss, which stores a Dict mapping final vectors of active coefficients to Tuples of the form (loss, n), where loss is the loss function at that final model, and n is the number of times that model was found out of the complete set of initial indicator vectors sampled; best_model, which is an indicator vector which gives the model out of those found with the least loss; best_model_indices, which gives the indices of all solutions in solutions which had a final indicator vector matching that of best_model.

Keyword Arguments

  • (diffusion/reaction/rhs/moving_boundary)_basis::BasisSet

The basis expansion to use for the diffusion, reaction, right-hand side, and moving boundary terms, respectively. If not provided, the functions are replaced with the zero function. For diffusion_basis, a BasisSet is required.

  • (diffusion/reaction/rhs/moving_boundary)_parameters

The parameters to use for evaluating the corresponding basis set. The default is nothing.

  • (diffusion/reaction/rhs/moving_boundary)_theta

This keyword argument indicates whether the coefficients θ used for evaluating the basis sets are fixed or are to be learned, with a default of nothing. If nothing, then they will be learned. Otherwise, they should be a vector of numbers indicating the coefficients θ to be used for the respective mechanism. For example, diffusion_theta = [1.0, 0.5] means that the diffusion function is evaluated with coefficients θ₁ᵈ = 1 and θ₂ᵈ = 0.5.

  • threshold_tol

The threshold tolerances to use for pruning the matrix, with a default of a zero tolerance for each quantity. This should be a NamedTuple if provided, only including any of the names q, x, dt, dx, dx2, dx_bc, and dL. The q, dt, dx, and dx2 tolerances are used for pruning the interior PDE, corresponding to tolerances based on the quantiles of q(x, t), ∂q(x, t)/∂t, ∂q(x, t)/∂x, and ∂²q(x, t)/∂x². The dx_bc tolerance is used for pruning based on ∂q(x, t)/∂t evaluated at the boundary, and is only used for the matrix corresponding to the right-hand side boundary condition. The dL tolerance is for dL/dt, and is only used for the matrix corresponding to the moving boundary condition. For example, providing

threshold_tol = (q = 0.1, dL = 0.3, x = 0.7, dx = 0.1, dx_bc = 0.05)

means that the matrix for the interior problem will only include points whose densities are between the 10% and 90% density quantiles, the 10% and 90% quantiles for ∂q(x, t)/∂x, at positions between 0 and 70% of the current leading edge for the given time. The matrix for the right-hand side boundary condition will only include points between the 5% and 95% quantiles for ∂q(x, t)/∂x at the boundary. The matrix for the moving boundary condition will only include points where the velocity dL/dt is between the 30% and 70% quantiles of dL/dt.

  • mesh_points

The number of mesh points to use for discretising the corresponding PDE. The default is 100.

  • cross_validation

Whether to use cross-validation for selecting the model. The default is false.

  • rng

The random number generator to use. This defaults to the global RNG, Random.default_rng().

  • skip

Indices of any coefficients that should be skipped over when deciding which terms to include or remove. By default, this is empty

  • regression

Whether to include the regression loss in the loss function. Defaults to false.

  • density

Whether to include the density loss in the loss function. Defaults to true.

  • complexity

The complexity penalty to use for the loss function. Defaults to 1. Use 0 if you would not like to penalise complexity.

  • loss_function

The loss function use. Defaults to default_loss(; regression, density, complexity). The loss function should take the same form as described in default_loss.

  • bidirectional

Whether to allow for steps in both directions, meaning terms can be either added or deleted at each step. The default is true. If false, then steps can only be taken backwards.

  • trials

If cross_validation is true, this is the number of votes to attempt at each step of the algorithm.

  • initial

The initial set of active coefficients. The default is :all, meaning all terms are initially active. If :random, then a random set of terms is initially active. If :none, then all terms are initially inactive. Otherwise, initial should be a vector of Booleans indicating the initially active terms, with true for active and false for inactive. The indices should match the flattened form of the coefficients, with the order of coefficients being diffusion, reaction, rhs, and then moving_boundary. For example, if the basis functions are

\[\begin{align*} D(q) &= \theta_1^d\varphi_1^d + \theta_2^d\varphi_2^d & (\text{diffusion}), \\ R(q) &= \theta_1^r\varphi_1^r + \theta_2^r\varphi_2^r + \theta_3^r\varphi_3^r & (\text{reaction}), \\ H(q) &= \theta_1^h\varphi_1^h + \theta_2^h\varphi_2^h + \theta_3^h\varphi_3^h + \theta_4^h\varphi_4^h & (\text{rhs}), \\ E(q) &= \theta_1^e\varphi_1^e + \theta_2^e\varphi_2^e & (\text{moving boundary}), \end{align*}\]

and you want to start with θ₁ᵈ, θ₁ʳ, θ₂ʳ, θ₂ʰ, and θ₂ᵉ active, then you should provide

initial = [1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1]
  • model_samples

How many initial models to randomly sample when providing the results. This defaults to 1. If model_samples > 1, then initial must be :random. The result returned in this case will be an EnsembleEQLSolution.

  • use_relative_err

Whether to use relative error (true) or absolute error (false) in the loss function summands. Defaults to true.

  • aggregate

When considering an ensemble of cell solutions, aggregate=true indicates that the matrix constructed should combine all the cell solutions into one matrix, and the loss function should be the sum of the loss functions for each cell solution. The default is false. If this is false, then for an ensemble of cell solutions you must have average=Val(true); if both are false and the cell solution is an ensemble, the call will error.

  • time_range

The time range, provided as a Tuple of the form (tmin, tmax), to use for constructing the matrix. The default is the complete time span that the cell simulation is performed over.

  • average

When considering an ensemble of cell solutions, average=Val(true) indicates that the matrix constructed should average the solutions together into a single matrix, and the loss function should use this averaged solution, given by an AveragedODESolution. The default is Val(true). If this is Val(false), then for an ensemble of cell solutions you must have aggregate=true; if both are false and the cell solution is an ensemble, the call will error.

  • simulation_indices

The simulations to consider in the matrix. Only relevant if an ensemble of cell solutions is being considered. Defaults to all simulations.

  • num_knots

The number of knots to use in the interpolant for averaging the cell solutions. Only relevant if an ensemble of cell solutions is being considered. Defaults to 100.

  • avg_interp_fnc

A function of the form (u, t) that is used for averaging. Only relevant if an ensemble of cell solutions is being considered. Defaults to linear interpolation from DataInterpolations.jl.

  • stat

This should be a function, or a Tuple of two functions with one for each endpoint, that decides the range to use for the knots at each time step. Only relevant if an ensemble of cell solutions is being considered. Defaults to mean, meaning the knots for a given time will span between a and b, where a is the average left-most cell position over each simulation at that time, and b is the average right-most cell position over each simulation at that time. If, for example, stat were (minimum, maximum), then the minimum and maximum end-positions would be used for a and b, respectively, at the corresponding time.

  • show_progress

Whether to print the current step of the algorithm at each step. Defaults to true.

  • max_steps

Maximum number of steps to allow for the algorithm. Defaults to 100.

  • leading_edge_error

Whether to include the leading edge error in the loss function. If this is true, then density must also be true. Only relevant if the cell simulation being considered is a moving boundary problem. Defaults to true.

  • extrapolate_pde

If the time range is smaller than the time span of the cell simulation, then setting extrapolate_pde to true means that the loss function will still consider all saved time points, not just those within the provided time_range. The default is false.

  • num_constraint_checks

The number of equally-spaced nodes to use between the minimum and maximum densities for constraining the diffusion term and the moving boundary term. Defaults to 100.

  • conserve_mass

Whether to enforce conservation of mass by constraining the diffusion and moving boundary terms to be equal. Defaults to false.

  • extrapolate_average

Whether to allow for extrapolation when averaging the cell solutions. Only relevant if an ensemble of cell solutions is being considered. Defaults to true. If false, then when evaluating the interpolant for a given realisation and a given time, then the density for a node is set to zero if it exceeds the right-most knot position for the given time.

  • couple_rhs

Whether to couple the boundary conditions at the right-hand side and at the moving boundary together. Only relevant if the moving boundary coefficients are fixed. If true, then the matrix for the right-hand boundary condition will also include terms coming from the moving boundary, replacing ∂q/∂x with the right-hand side's assumed basis expansion. Defaults to true. If false, then the matrix for the right-hand boundary condition will only include terms coming from the right-hand side.

source

Printing LaTeX Tables

From an EQLSolution, returned by stepwise_selection, you can print the LaTeX form of the table using latex_table:

StepwiseEQL.latex_tableFunction
latex_table(eql_sol::EQLSolution; kwargs...)

Prints the EQLSolution eql_sol in LaTeX format. Keyword arguments are passed to show, which has signature

Base.show(io::IO, ::MIME"text/plain", eql_sol::EQLSolution;
    step_limit=6,
    crop=:horizontal,
    backend=Val(:text),
    booktabs=true,
    show_votes=false,
    show_all_loss=false,
    crayon=Crayon(bold=true, foreground=:green),
    latex_crayon=["color{blue}", "textbf"],
    transpose=true)

(Of course, backend=Val(:latex) is passed - you cannot change that.)

source

Loss Function

If you want to provide a new loss function, you can provide it as a keyword loss_function to stepwise_selection. The function that is used to construct the default loss is below for example.

StepwiseEQL.default_lossFunction
default_loss(; kwargs...)
default_loss(regression_loss, density_loss, indicators; density=true, regression=false, complexity=1)

The default loss function. The first method, accepting the same keyword arguments as in the second method, returns a function that accepts the regression loss, density loss, and indicators, and returns the total loss, evaluated via the second method.

Arguments

  • regression_loss: The value of the regression loss.
  • density_loss: The value of the density loss.
  • indicators: A vector of Boolean values, indicating which the set of active coefficients.

Keyword Arguments

  • density=true: Whether to include the density loss.
  • regression=false: Whether to include the regression loss.
  • complexity=1: The complexity penalty coefficient. Use 0 to disable the complexity penalty.
source