The Algorithm
In this section, we describe the functions used for actually running the algorithm. Here we only describe how to run the results after obtaining a CellProblem
. If you want to know more about setting up the CellProblem
s, or solving any PDEs, please see:
- EpithelialDynamics1D.jl: This is the package that actually implements the discrete model of the epithelial dynamics. Please see its documentation for detailed information and examples.
- FiniteVolumeMethod1D.jl: This is the package that solves the PDEs on a fixed boundary. Please see its documentation for detailed information and examples.
- MovingBoundaryProblems1D.jl: This is the package that solves the PDEs on a moving boundary. Please see its documentation for detailed information and examples.
Basis Functions
The functions below are used for defining basis functions:
StepwiseEQL.BasisSet
— TypeBasisSet{F<:Tuple} <: Function
A set of basis functions.
Fields
bases::F
: ATuple
of basis functions, each function being of the form(q, p) -> Number
, with the samep
for each basis function.
Constructors
BasisSet((f1, f2, ...))
BasisSet(f1, f2, ...)
Evaluating
You can evaluate a BasisSet
using the method
(f::BasisSet{F})(q, θ, p) where {F}
which returns
\[\sum_{i=1}^n \theta_if_i(q, p)\]
StepwiseEQL.PolynomialBasis
— FunctionPolynomialBasis(d1, d2)
Construct a set of polynomial basis functions of degree d1
to d2
, returning a BasisSet
object.
Examples
julia> basis = PolynomialBasis(2, 4);
julia> basis(0.5, [0.1, 0.67, -2.3], nothing)
-0.034999999999999976
julia> 0.1 * 0.5^2 + 0.67 * 0.5^3 - 2.3 * 0.5^4
-0.034999999999999976
For example, the basis set $\{q, q^2, q^3\}$ can be defined in two ways:
using StepwiseEQL
f1 = (q, p) -> q
f2 = (q, p) -> q^2
f3 = (q, p) -> q^3
B = BasisSet(f1, f2, f3)
(::BasisSet{Tuple{var"#307#308", var"#309#310", var"#311#312"}}) (generic function with 3 methods)
B = PolynomialBasis(1, 3)
(::BasisSet{Tuple{StepwiseEQL.var"#52#54"{Int64}, StepwiseEQL.var"#52#54"{Int64}, StepwiseEQL.var"#52#54"{Int64}}}) (generic function with 3 methods)
The reason for the second argument p
is in case you want to give better control of the scaling on the basis function's coefficients. For example, the basis set $\{aq, bq^{-2}, cq^3\}$ can be defined as:
f1 = (q, p) -> p.a * q
f2 = (q, p) -> p.b * q^(-2)
f3 = (q, p) -> p.c * q^3
B = BasisSet(f1, f2, f3)
(::BasisSet{Tuple{var"#313#314", var"#315#316", var"#317#318"}}) (generic function with 3 methods)
and you would then provide p
when evaluating B
, for example:
p = (a = 2.0, b = 3.0, c = 5.0) # same p for each basis function
B(0.2, [0.3, 0.5, 1], p)
37.66
which returns $\theta_1aq + \theta_2bq^{-2} + \theta_3cq^3$ with $\theta_1 = 0.3$, $\theta_2 = 0.5$, $\theta_3 = 1$, $a = 2$, $b = 3$, $c = 5$, and $q = 0.2$.
Averaging ODE Solutions
Usually averaging over multiple realisations from the discrete model can be handled internally by the stepwise function, but you may want to re-average for the purpose of plotting (as we do in case studies 3 and 4). For this reason, we provide AveragedODESolution
as an exported function that you can use:
StepwiseEQL.AveragedODESolution
— TypeAveragedODESolution{K,S}
Struct representing a cell simulation averaged over many simulations.
Fields
u::K
These are the knots, storing the knots using for interpolating the solutions at each time, where u[j]
the knots for the j
th time.
t::Vector{Float64}
The vector of times.
q::Vector{Vector{Float64}}
The averaged densities, with q[j]
the averaged density at time t[j]
and at the corresponding knots in u[j]
.
cell_sol::S
The original cell simulation.
Constructor
You can construct an AveragedODESolution
using the following method:
AveragedODESolution(
sol::EnsembleSolution,
num_knots=100,
indices=eachindex(sol),
interp_fnc=LinearInterpolation,
stat=mean,
extrapolate=true
)
Note that the arguments are positional arguments, not keyword arguments.
Arguments
sol
The ensemble of cell solutions.
num_knots
The number of knots to use in the interpolant for averaging the cell solutions. Defaults to 100
.
indices
The indices of the simulations to consider for averaging. Defaults to all simulations.
interp_fnc
A function of the form (u, t)
that is used for averaging. Defaults to linear interpolation from DataInterpolations.jl.
stat
This should be a function, or a Tuple
of two functions with one for each endpoint, that decides the range to use for the knots at each time step. Defaults to mean
, meaning the knots for a given time will span between a
and b
, where a
is the average left-most cell position over each simulation at that time, and b
is the average right-most cell position over each simulation at that time. If, for example, stat
were (minimum, maximum)
, then the minimum and maximum end-positions would be used for a
and b
, respectively, at the corresponding time.
extrapolate=true
Whether to allow for extrapolation when averaging the cell solutions. Defaults to true
. If false
, then when evaluating the interpolant for a given realisation and a given time, then the density for a node is set to zero if it exceeds the right-most knot position for the given time.
Output
Returns the corresponding AveragedODESolution
.
The Stepwise Algorithm
The entry point into our stepwise procedure is stepwise_selection
, documented in detail below.
StepwiseEQL.stepwise_selection
— Functionstepwise_selection(cell_sol; kwargs...)
Perform stepwise selection on the given cell solution cell_sol
, given as a CellSolution
from EpithelialDynamics1D.jl. In the list of keyword arguments below, the notation (A/B/C/D)_Y
, for example, means that there are keyword arguments A_Y
, B_Y
, C_Y
, and D_Y
.
The result returned from this function will be one of:
EQLSolution
This gives the solution. You can display this in the REPL to see the table of results automatically. You can inspect the individual coefficients and other relevant objects by inspecting the fields of the struct. For example, if you have
sol = stepwise_selection(cell_sol; kwargs...)
then sol.diffusion_theta
will be the set of coefficients for the diffusion function. You can query the full list of fields by using
propertynames(sol)
or typing sol.<TAB>
in the REPL (<TAB>
means hit tab). We also note that if you want to have more interactivity with the table that gets displayed in the REPL, you should look into the show
method for EQLSolution
, which has signature
Base.show(io::IO, ::MIME"text/plain", eql_sol::EQLSolution;
step_limit=6,
crop=:horizontal,
backend=Val(:text),
booktabs=true,
show_votes=false,
show_all_loss=false,
crayon=Crayon(bold=true, foreground=:green),
latex_crayon=["color{blue}", "textbf"],
transpose=true)
For example,
Base.show(stdout, MIME"text/plain"(), sol; show_votes=false, show_all_loss=false, transpose=true)
prints the table without including the votes
, only shows the complete loss function rather than also including its individual components, and transposes the table. A LaTeX version of the table can be obtained using backend=Val(:latex)
, e.g. a LaTeX version of the above could be printed using
Base.show(stdout, MIME"text/plain"(), sol; backend=Val(:latex), show_votes=false, show_all_loss=false, transpose=true)
You can also just use
latex_table(sol; kwargs...)
to get the LaTeX format printed.
EnsembleEQLSolution
In this case, you have provided model_samples
as a keyword argument and model_samples > 1
. This struct has four fields: solutions
, which stores all the individual EQLSolutions
; final_loss
, which stores a Dict
mapping final vectors of active coefficients to Tuples
of the form (loss, n)
, where loss
is the loss function at that final model, and n
is the number of times that model was found out of the complete set of initial indicator vectors sampled; best_model
, which is an indicator vector which gives the model out of those found with the least loss; best_model_indices
, which gives the indices of all solutions in solutions
which had a final indicator vector matching that of best_model
.
Keyword Arguments
(diffusion/reaction/rhs/moving_boundary)_basis::BasisSet
The basis expansion to use for the diffusion, reaction, right-hand side, and moving boundary terms, respectively. If not provided, the functions are replaced with the zero function. For diffusion_basis
, a BasisSet
is required.
(diffusion/reaction/rhs/moving_boundary)_parameters
The parameters to use for evaluating the corresponding basis set. The default is nothing
.
(diffusion/reaction/rhs/moving_boundary)_theta
This keyword argument indicates whether the coefficients θ
used for evaluating the basis sets are fixed or are to be learned, with a default of nothing
. If nothing
, then they will be learned. Otherwise, they should be a vector of numbers indicating the coefficients θ
to be used for the respective mechanism. For example, diffusion_theta = [1.0, 0.5]
means that the diffusion function is evaluated with coefficients θ₁ᵈ = 1
and θ₂ᵈ = 0.5
.
threshold_tol
The threshold tolerances to use for pruning the matrix, with a default of a zero tolerance for each quantity. This should be a NamedTuple
if provided, only including any of the names q
, x
, dt
, dx
, dx2
, dx_bc
, and dL
. The q
, dt
, dx
, and dx2
tolerances are used for pruning the interior PDE, corresponding to tolerances based on the quantiles of q(x, t)
, ∂q(x, t)/∂t
, ∂q(x, t)/∂x
, and ∂²q(x, t)/∂x²
. The dx_bc
tolerance is used for pruning based on ∂q(x, t)/∂t
evaluated at the boundary, and is only used for the matrix corresponding to the right-hand side boundary condition. The dL
tolerance is for dL/dt
, and is only used for the matrix corresponding to the moving boundary condition. For example, providing
threshold_tol = (q = 0.1, dL = 0.3, x = 0.7, dx = 0.1, dx_bc = 0.05)
means that the matrix for the interior problem will only include points whose densities are between the 10% and 90% density quantiles, the 10% and 90% quantiles for ∂q(x, t)/∂x
, at positions between 0
and 70% of the current leading edge for the given time. The matrix for the right-hand side boundary condition will only include points between the 5% and 95% quantiles for ∂q(x, t)/∂x
at the boundary. The matrix for the moving boundary condition will only include points where the velocity dL/dt
is between the 30% and 70% quantiles of dL/dt
.
mesh_points
The number of mesh points to use for discretising the corresponding PDE. The default is 100
.
cross_validation
Whether to use cross-validation for selecting the model. The default is false
.
rng
The random number generator to use. This defaults to the global RNG, Random.default_rng()
.
skip
Indices of any coefficients that should be skipped over when deciding which terms to include or remove. By default, this is empty
regression
Whether to include the regression loss in the loss function. Defaults to false
.
density
Whether to include the density loss in the loss function. Defaults to true
.
complexity
The complexity penalty to use for the loss function. Defaults to 1
. Use 0
if you would not like to penalise complexity.
loss_function
The loss function use. Defaults to default_loss(; regression, density, complexity)
. The loss function should take the same form as described in default_loss
.
bidirectional
Whether to allow for steps in both directions, meaning terms can be either added or deleted at each step. The default is true
. If false
, then steps can only be taken backwards.
trials
If cross_validation
is true, this is the number of votes to attempt at each step of the algorithm.
initial
The initial set of active coefficients. The default is :all
, meaning all terms are initially active. If :random
, then a random set of terms is initially active. If :none
, then all terms are initially inactive. Otherwise, initial
should be a vector of Booleans indicating the initially active terms, with true
for active and false
for inactive. The indices should match the flattened form of the coefficients, with the order of coefficients being diffusion
, reaction
, rhs
, and then moving_boundary
. For example, if the basis functions are
\[\begin{align*} D(q) &= \theta_1^d\varphi_1^d + \theta_2^d\varphi_2^d & (\text{diffusion}), \\ R(q) &= \theta_1^r\varphi_1^r + \theta_2^r\varphi_2^r + \theta_3^r\varphi_3^r & (\text{reaction}), \\ H(q) &= \theta_1^h\varphi_1^h + \theta_2^h\varphi_2^h + \theta_3^h\varphi_3^h + \theta_4^h\varphi_4^h & (\text{rhs}), \\ E(q) &= \theta_1^e\varphi_1^e + \theta_2^e\varphi_2^e & (\text{moving boundary}), \end{align*}\]
and you want to start with θ₁ᵈ
, θ₁ʳ
, θ₂ʳ
, θ₂ʰ
, and θ₂ᵉ
active, then you should provide
initial = [1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1]
model_samples
How many initial models to randomly sample when providing the results. This defaults to 1
. If model_samples > 1
, then initial
must be :random
. The result returned in this case will be an EnsembleEQLSolution
.
use_relative_err
Whether to use relative error (true
) or absolute error (false)
in the loss function summands. Defaults to true
.
aggregate
When considering an ensemble of cell solutions, aggregate=true
indicates that the matrix constructed should combine all the cell solutions into one matrix, and the loss function should be the sum of the loss functions for each cell solution. The default is false
. If this is false
, then for an ensemble of cell solutions you must have average=Val(true)
; if both are false
and the cell solution is an ensemble, the call will error.
time_range
The time range, provided as a Tuple
of the form (tmin, tmax)
, to use for constructing the matrix. The default is the complete time span that the cell simulation is performed over.
average
When considering an ensemble of cell solutions, average=Val(true)
indicates that the matrix constructed should average the solutions together into a single matrix, and the loss function should use this averaged solution, given by an AveragedODESolution
. The default is Val(true)
. If this is Val(false)
, then for an ensemble of cell solutions you must have aggregate=true
; if both are false
and the cell solution is an ensemble, the call will error.
simulation_indices
The simulations to consider in the matrix. Only relevant if an ensemble of cell solutions is being considered. Defaults to all simulations.
num_knots
The number of knots to use in the interpolant for averaging the cell solutions. Only relevant if an ensemble of cell solutions is being considered. Defaults to 100
.
avg_interp_fnc
A function of the form (u, t)
that is used for averaging. Only relevant if an ensemble of cell solutions is being considered. Defaults to linear interpolation from DataInterpolations.jl.
stat
This should be a function, or a Tuple
of two functions with one for each endpoint, that decides the range to use for the knots at each time step. Only relevant if an ensemble of cell solutions is being considered. Defaults to mean
, meaning the knots for a given time will span between a
and b
, where a
is the average left-most cell position over each simulation at that time, and b
is the average right-most cell position over each simulation at that time. If, for example, stat
were (minimum, maximum)
, then the minimum and maximum end-positions would be used for a
and b
, respectively, at the corresponding time.
show_progress
Whether to print the current step of the algorithm at each step. Defaults to true
.
max_steps
Maximum number of steps to allow for the algorithm. Defaults to 100
.
leading_edge_error
Whether to include the leading edge error in the loss function. If this is true
, then density
must also be true
. Only relevant if the cell simulation being considered is a moving boundary problem. Defaults to true
.
extrapolate_pde
If the time range is smaller than the time span of the cell simulation, then setting extrapolate_pde
to true
means that the loss function will still consider all saved time points, not just those within the provided time_range
. The default is false
.
num_constraint_checks
The number of equally-spaced nodes to use between the minimum and maximum densities for constraining the diffusion term and the moving boundary term. Defaults to 100
.
conserve_mass
Whether to enforce conservation of mass by constraining the diffusion and moving boundary terms to be equal. Defaults to false
.
extrapolate_average
Whether to allow for extrapolation when averaging the cell solutions. Only relevant if an ensemble of cell solutions is being considered. Defaults to true
. If false
, then when evaluating the interpolant for a given realisation and a given time, then the density for a node is set to zero if it exceeds the right-most knot position for the given time.
couple_rhs
Whether to couple the boundary conditions at the right-hand side and at the moving boundary together. Only relevant if the moving boundary coefficients are fixed. If true
, then the matrix for the right-hand boundary condition will also include terms coming from the moving boundary, replacing ∂q/∂x
with the right-hand side's assumed basis expansion. Defaults to true
. If false
, then the matrix for the right-hand boundary condition will only include terms coming from the right-hand side.
Printing LaTeX Tables
From an EQLSolution
, returned by stepwise_selection
, you can print the LaTeX form of the table using latex_table
:
StepwiseEQL.latex_table
— Functionlatex_table(eql_sol::EQLSolution; kwargs...)
Prints the EQLSolution
eql_sol
in LaTeX format. Keyword arguments are passed to show
, which has signature
Base.show(io::IO, ::MIME"text/plain", eql_sol::EQLSolution;
step_limit=6,
crop=:horizontal,
backend=Val(:text),
booktabs=true,
show_votes=false,
show_all_loss=false,
crayon=Crayon(bold=true, foreground=:green),
latex_crayon=["color{blue}", "textbf"],
transpose=true)
(Of course, backend=Val(:latex)
is passed - you cannot change that.)
Loss Function
If you want to provide a new loss function, you can provide it as a keyword loss_function
to stepwise_selection
. The function that is used to construct the default loss is below for example.
StepwiseEQL.default_loss
— Functiondefault_loss(; kwargs...)
default_loss(regression_loss, density_loss, indicators; density=true, regression=false, complexity=1)
The default loss function. The first method, accepting the same keyword arguments as in the second method, returns a function that accepts the regression loss, density loss, and indicators, and returns the total loss, evaluated via the second method.
Arguments
regression_loss
: The value of the regression loss.density_loss
: The value of the density loss.indicators
: A vector of Boolean values, indicating which the set of active coefficients.
Keyword Arguments
density=true
: Whether to include the density loss.regression=false
: Whether to include the regression loss.complexity=1
: The complexity penalty coefficient. Use0
to disable the complexity penalty.