Differentiation Example
The purpose of this example is to explore derivative generation. For this, it is important to note that we are thinking of generating derivatives rather than estimating them: Following Alfeld (1989), derivative generation only seeks to find derivatives that best fit our assumptions of the data, i.e. that give a most satisfactory interpolant, rather than trying to find exact derivative values. The complete quote for this by Alfeld (1989) is below:
It seems inevitable that in order to obtain an interpolant that is both local and smooth one has to supply derivative data. Typically, such data are not part of the interpolation problem and have to be made up from existing functional data. This process is usually referred as derivative estimation, but this is probably a misnomer. The objective is not to estimate existing but unknown values of derivatives. Instead, it is to generate values that will yield a satisfactory interpolant. Even if an underlying primitive function did exist it might be preferable to use derivative values that differ from the exact ones. (For example, a maximum error might be decreased by using the "wrong" derivative values.) Therefore, I prefer the term derivative generation rather than derivative estimation.
For the purpose of this exploration, we use Franke's test function. This function, introduced by Franke and Nielson (1980), is given by
\[\begin{align*} f(x, y) &= \frac34\exp\left\{-\frac{(9x-2)^2 + (9y-2)^2}{4}\right\} + \frac34\exp\left\{-\frac{(9x+1)^2}{49} - \frac{9y+1}{10}\right\} \\ &+ \frac12\exp\left\{-\frac{(9x-7)^2 + (9y-3)^2}{4}\right\} - \frac15\exp\left\{-(9x-4)^2 - (9y-7)^2\right\}. \end{align*}\]
This function, and its derivatives, are defined below.
f = (x, y) -> 0.75 * exp(-((9 * x - 2)^2 + (9 * y - 2)^2) / 4) + 0.75 * exp(-(9 * x + 1)^2 / 49 - (9 * y + 1) / 10) + 0.5 * exp(-((9 * x - 7)^2 + (9 * y - 3)^2) / 4) - 0.2 * exp(-(9 * x - 4)^2 - (9 * y - 7)^2)
f′ = (x, y) -> [(exp(-(9 * x - 4)^2 - (9 * y - 7)^2) * (162 * x - 72)) / 5 - (3 * exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4) * ((81 * x) / 2 - 9)) / 4 - (exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4) * ((81 * x) / 2 - 63 / 2)) / 2 - (3 * exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10) * ((162 * x) / 49 + 18 / 49)) / 4
(exp(-(9 * x - 4)^2 - (9 * y - 7)^2) * (162 * y - 126)) / 5 - (3 * exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4) * ((81 * y) / 2 - 9)) / 4 - (exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4) * ((81 * y) / 2 - 27 / 2)) / 2 - (27 * exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10)) / 40]
f′′ = (x, y) -> [(162*exp(-(9 * x - 4)^2 - (9 * y - 7)^2))/5-(243*exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10))/98-(243*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4))/8-(81*exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4))/4+(3*exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10)*((162*x)/49+18/49)^2)/4+(3*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4)*((81*x)/2-9)^2)/4+(exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4)*((81*x)/2-63/2)^2)/2-(exp(-(9 * x - 4)^2 - (9 * y - 7)^2)*(162*x-72)^2)/5 (27*exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10)*((162*x)/49+18/49))/40+(3*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4)*((81*x)/2-9)*((81*y)/2-9))/4+(exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4)*((81*x)/2-63/2)*((81*y)/2-27/2))/2-(exp(-(9 * x - 4)^2 - (9 * y - 7)^2)*(162*x-72)*(162*y-126))/5
(27*exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10)*((162*x)/49+18/49))/40+(3*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4)*((81*x)/2-9)*((81*y)/2-9))/4+(exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4)*((81*x)/2-63/2)*((81*y)/2-27/2))/2-(exp(-(9 * x - 4)^2 - (9 * y - 7)^2)*(162*x-72)*(162*y-126))/5 (243*exp(-(9 * y) / 10 - (9 * x + 1)^2 / 49 - 1 / 10))/400+(162*exp(-(9 * x - 4)^2 - (9 * y - 7)^2))/5-(243*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4))/8-(81*exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4))/4+(3*exp(-(9 * x - 2)^2 / 4 - (9 * y - 2)^2 / 4)*((81*y)/2-9)^2)/4+(exp(-(9 * x - 7)^2 / 4 - (9 * y - 3)^2 / 4)*((81*y)/2-27/2)^2)/2-(exp(-(9 * x - 4)^2 - (9 * y - 7)^2)*(162*y-126)^2)/5]
Here is the surface of $f$ along with its derivatives.
using CairoMakie
function plot_f(fig, x, y, vals, title, i, show_3d=true, zlabel="z")
ax = Axis(fig[1, i], xlabel="x", ylabel="y", width=600, height=600, title=title, titlealign=:left)
c = contourf!(ax, x, y, vals, colormap=:viridis, extendhigh=:auto)
if show_3d
ax = Axis3(fig[2, i], xlabel="x", ylabel="y", zlabel=zlabel, width=600, height=600, title=" ", titlealign=:left, azimuth=0.49)
surface!(ax, x, y, vals, color=vals, colormap=:viridis)
end
return c
end
x = LinRange(0, 1, 100)
y = LinRange(0, 1, 100)
z = [f(x, y) for x in x, y in y]
∇ = [f′(x, y) for x in x, y in y]
∇₁ = first.(∇)
∇₂ = last.(∇)
H = [f′′(x, y) for x in x, y in y]
H₁₁ = getindex.(H, 1)
H₁₂ = getindex.(H, 2)
H₂₂ = getindex.(H, 4)
fig = Figure(fontsize = 36)
plot_f(fig, x, y, z, "(a): f", 1, true, "z")
plot_f(fig, x, y, ∇₁, "(b): ∂f/∂x", 2, true, "∂f/∂x")
plot_f(fig, x, y, ∇₂, "(c): ∂f/∂y", 3, true, "∂f/∂y")
plot_f(fig, x, y, H₁₁, "(d): ∂²f/∂x²", 4, true, "∂²f/∂x²")
plot_f(fig, x, y, H₂₂, "(f): ∂²f/∂y²", 5, true, "∂²f/∂y²")
plot_f(fig, x, y, H₁₂, "(e): ∂²f/∂x∂y", 6, true, "∂²f/∂x∂y")
resize_to_layout!(fig)
fig
For our analysis, we use the following data set:
using StableRNGs
using DelaunayTriangulation
using CairoMakie
rng = StableRNG(9199)
x = rand(rng, 500)
y = rand(rng, 500)
z = f.(x, y)
tri = triangulate([x'; y'])
vorn = voronoi(tri)
fig = Figure(fontsize=36, size=(1800, 600))
ax = Axis(fig[1, 1], xlabel="x", ylabel="y", width=600, height=600, title="(a): Data and triangulation", titlealign=:left)
scatter!(ax, x, y, color=:black, markersize=9)
triplot!(ax, tri, strokecolor=:black, strokewidth=2, show_convex_hull=false)
voronoiplot!(ax, vorn, strokecolor=:blue)
xlims!(ax, 0, 1)
ylims!(ax, 0, 1)
ax = Axis3(fig[1, 2], xlabel="x", ylabel="y", zlabel="z", width=600, height=600, azimuth=0.25, title="(b): Function values", titlealign=:left)
triangles = [T[j] for T in each_solid_triangle(tri), j in 1:3]
surface!(ax, x, y, z)
scatter!(ax, x, y, z, color=:black, markersize=9)
colgap!(fig.layout, 1, 75)
resize_to_layout!(fig)
fig
Generation at the Data Sites
To start with the example, we consider generating the derivatives at the data sites.
Gradients
Let us first estimate some gradients using the direct method.
using NaturalNeighbours
using DelaunayTriangulation
using CairoMakie
using LinearAlgebra
function plot_f2(fig, x, y, vals, title, i, tri, levels, show_3d=true, zlabel="z")
triangles = [T[j] for T in each_solid_triangle(tri), j in 1:3]
ax = Axis(fig[1, i], xlabel="x", ylabel="y", width=600, height=600, title=title, titlealign=:left)
c = tricontourf!(ax, x, y, vals, triangulation=triangles', colormap=:viridis, extendhigh=:auto, levels=levels)
if show_3d
ax = Axis3(fig[2, i], xlabel="x", ylabel="y", zlabel=zlabel, width=600, height=600, title=" ", titlealign=:left, azimuth=0.49)
mesh!(ax, hcat(x, y, vals), triangles, color=vals, colormap=:viridis, colorrange=extrema(levels))
end
return c
end
function plot_gradients(∇g, tri, f′, x, y)
∇g1 = first.(∇g)
∇g2 = last.(∇g)
fig = Figure(fontsize=36, size=(2400, 600))
plot_f2(fig, x, y, ∇g1, "(a): ∂f̂/∂x", 1, tri, -3.5:0.5:3.0, true, "∂f̂/∂x")
plot_f2(fig, x, y, ∇g2, "(b): ∂f̂/∂y", 3, tri, -3.5:0.5:3.0, true, "∂f̂/∂y")
plot_f2(fig, x, y, getindex.(f′.(x, y), 1), "(c): ∂f/∂x", 2, tri, -3.5:0.5:3.0, true, "∂f/∂x")
plot_f2(fig, x, y, getindex.(f′.(x, y), 2), "(d): ∂f/∂y", 4, tri, -3.5:0.5:3.0, true, "∂f/∂y")
plot_f2(fig, x, y, norm.(collect.(∇g) .- f′.(x, y)), "(e): Gradient error", 5, tri, 0:0.1:0.5, true, "|∇ε|")
resize_to_layout!(fig)
ε = 100sqrt(sum((norm.(collect.(∇g) .- f′.(x, y))) .^ 2) / sum(norm.(∇g) .^ 2))
return fig, ε
end
points = [x'; y']
z = f.(x, y)
tri = triangulate(points)
∇g = generate_gradients(tri, z)
fig, ε = plot_gradients(∇g, tri, f′, x, y)
julia> ε
10.251180094083372
A 10% error is not terrible, and the derivatives we obtain are reasonable.
Let's also look at the results when we jointly estimate the gradients and Hessians (this is the default option).
∇gr, _ = generate_derivatives(tri, z; method=Direct())
fig, ε = plot_gradients(∇gr, tri, f′, x, y)
julia> ε
7.717791597731752
The figures are smoother, and the error has now decreased to 7.7% – an improvement. We could also try and see what happens if we use the Iterative()
approach, using the first gradients we got as the initial gradients, or we could see what happens with use_cubic_terms=false
for the Direct()
method, but we won't show that here.
Hessians
Let's now look at estimating Hessians. We first consider the direct approach, including cubic terms.
to_mat(H::NTuple{3,Float64}) = [H[1] H[3]; H[3] H[2]]
function plot_hessians(H, tri, f′′, x, y)
H₁₁ = getindex.(H, 1)
H₁₂ = getindex.(H, 3)
H₂₂ = getindex.(H, 2)
fig = Figure(fontsize=36, size=(2400, 600))
plot_f2(fig, x, y, H₁₁, "(a): ∂²f̂/∂x²", 1, tri, -35:5:30, true, "∂²f̂/∂x²")
plot_f2(fig, x, y, H₂₂, "(c): ∂²f̂/∂y²", 3, tri, -35:5:30, true, "∂²f̂/∂y²")
plot_f2(fig, x, y, H₁₂, "(e): ∂²f̂/∂x∂y", 5, tri, -35:5:30, true, "∂²f̂/∂x∂y")
plot_f2(fig, x, y, getindex.(f′′.(x, y), 1), "(b): ∂²f/∂x²", 2, tri, -35:5:30, true, "∂²f/∂x²")
plot_f2(fig, x, y, getindex.(f′′.(x, y), 4), "(d): ∂²f/∂y²", 4, tri, -35:5:30, true, "∂²f/∂y²")
plot_f2(fig, x, y, getindex.(f′′.(x, y), 2), "(f): ∂²f/∂x∂y", 6, tri, -35:5:30, true, "∂²f/∂x∂y")
resize_to_layout!(fig)
ε = 100sqrt(sum((norm.(to_mat.(H) .- f′′.(x, y))) .^ 2) / sum(norm.(to_mat.(H)) .^ 2))
return fig, ε
end
_, Hg = generate_derivatives(tri, z)
fig, ε = plot_hessians(Hg, tri, f′′, x, y)
julia> ε
42.085578794275605
The error is certainly quite large, but remember that we are doing derivative generation here rather than estimation. Judging from the figures themselves, the derivatives we have obtained are actually pretty good.
Let's now see what happens if we only go up to quadratic terms.
_, Hg = generate_derivatives(tri, z, use_cubic_terms=false)
fig, ε = plot_hessians(Hg, tri, f′′, x, y)
julia> ε
35.20873081559232
The error has actually decreased, and the figures do indeed look better. So, in this case, including cubic terms does not improve the results significantly (sometimes it does).
What if we used the iterative approach?
_, Hg = generate_derivatives(tri, z, method=Iterative()) # the gradients will be generated first automatically
fig, ε = plot_hessians(Hg, tri, f′′, x, y)
julia> ε
39.58481626576425
The results are slightly worse, and varying alpha
doesn't seem to do much.
Generation Away from the Data Sites
Now let's consider derivative generation away from the data sites. The function differentiate
is used for this. We first construct our interpolant, ensuring we set derivatives=true
so that we get the gradients at the data sites first, and then we differentiate
.
itp = interpolate(tri, z; derivatives=true, method = Direct(), use_cubic_terms=false)
∂ = differentiate(itp, 1)
The second argument specifies the order of the resulting derivatives. Since we specify order 1, we will get gradients $(\partial_xf,\partial_yf)$.
Let's now define the grid for differentiating.
xg = LinRange(0, 1, 500)
yg = LinRange(0, 1, 500)
_x = vec([x for x in xg, _ in yg])
_y = vec([y for _ in xg, y in yg])
We can now evaluate ∂
. To approximate the function values at each point, we will use the Sibson(1)
method, since this will incorporate the gradient information. I would really like to eventually get Hiyoshi's $C^2$ interpolant, as discussed in Section 3.2.7.3 here, as this will also give us $C^2$ continuity at the derivative sites and thus give smoother surfaces (noting some complexity issues discussed in Section 6.5 of the linked thesis), but I've just not found the time to comprehend how to best implement it yet / digest the spline notation (see issue #1 if you are interested on contributing to this). Note also that, just as with the interpolation methods, it is best to give vectors to ∂
. Lastly, since we are evaluating away from the data sites, remember that the Sibson coordinates are now incorporated into the weights of the associated weighted least squares problem (that you could disable if you for some reason wanted to with use_sibson_weight=false
).
∇g = ∂(_x, _y; interpolant_method = Sibson(1))
Let's now plot our gradients. Note that there are some Inf
values in the computed gradients, and these correspond to points evaluated outside of the convex hull of our data sites.
function rrmserr(z, ẑ)
num = 0.0
den = 0.0
for (zᵢ, ẑᵢ) in zip(z, ẑ)
if all(isfinite, (zᵢ..., ẑᵢ...))
num += norm(zᵢ .- ẑᵢ)^2
den += norm(ẑᵢ)^2
end
end
# num /= length(ẑ)
return 100sqrt(num / den)
end
function plot_f2(fig, x, y, vals, title, i, levels, show_3d=true, zlabel="z")
ax = Axis(fig[1, i], xlabel="x", ylabel="y", width=600, height=600, title=title, titlealign=:left)
c = contourf!(ax, x, y, vals, colormap=:viridis, extendhigh=:auto, levels=levels)
if show_3d
ax = Axis3(fig[2, i], xlabel="x", ylabel="y", zlabel=zlabel, width=600, height=600, title=" ", titlealign=:left, azimuth=0.49)
surface!(ax, x, y, vals, color=vals, colormap=:viridis, colorrange=extrema(levels))
end
return c
end
function plot_gradients(∇g, f′, xg, yg)
∇g = reshape(∇g, (length(xg), length(yg)))
∇g1 = first.(∇g)
∇g2 = last.(∇g)
∇f = [f′(x, y) for x in xg, y in yg]
fig = Figure(fontsize=36, size=(2400, 600))
plot_f2(fig, xg, yg, ∇g1, "(a): ∂f̂/∂x", 1, -3.5:0.5:3.0, true, "∂f̂/∂x")
plot_f2(fig, xg, yg, ∇g2, "(b): ∂f̂/∂y", 3, -3.5:0.5:3.0, true, "∂f̂/∂y")
plot_f2(fig, xg, yg, first.(∇f), "(c): ∂f/∂x", 2, -3.5:0.5:3.0, true, "∂f/∂x")
plot_f2(fig, xg, yg, last.(∇f), "(d): ∂f/∂y", 4, -3.5:0.5:3.0, true, "∂f/∂y")
plot_f2(fig, xg, yg, norm.(collect.(∇g) .- ∇f), "(e): Gradient error", 5, 0:0.1:0.5, true, "|∇ε|")
resize_to_layout!(fig)
ε = rrmserr(∇f, collect.(∇g))
return fig, ε
end
fig, ε = plot_gradients(∇g, f′, xg, yg)
julia> ε
13.185747607565729
There are of course some strange artifacts near the convex hull, but the results are not terrible. Let's see what happens to the error if we instead use the other interpolant methods.
other_methods = [Sibson(), Laplace(), Nearest(), Triangle()]
∇gs = [∂(_x, _y; interpolant_method=method) for method in other_methods]
∇f = [f′(x, y) for x in xg, y in yg]
εs = [rrmserr(∇f, collect.(∇g)) for ∇g in ∇gs]
julia> hcat(other_methods, εs)
4×2 Matrix{Any}:
Sibson{0}() 28.6753
Laplace{0}() 25.499
Nearest{0}() 69.5744
Triangle{0}() 27.7737
Of course, the other methods are much worse.
Now let's go up to second order.
function plot_hessians(H, f′′, xg, yg)
H = reshape(H, (length(xg), length(yg)))
H₁₁ = getindex.(H, 1)
H₁₂ = getindex.(H, 3)
H₂₂ = getindex.(H, 2)
Hf = [f′′(x, y) for x in xg, y in yg]
fig = Figure(fontsize=36, size=(2400, 600))
plot_f2(fig, xg, yg, H₁₁, "(a): ∂²f̂/∂x²", 1, -35:5:30, true, "∂²f̂/∂x²")
plot_f2(fig, xg, yg, H₂₂, "(c): ∂²f̂/∂y²", 3, -35:5:30, true, "∂²f̂/∂y²")
plot_f2(fig, xg, yg, H₁₂, "(e): ∂²f̂/∂x∂y", 5, -35:5:30, true, "∂²f̂/∂x∂y")
plot_f2(fig, xg, yg, getindex.(Hf, 1), "(b): ∂²f/∂x²", 2, -35:5:30, true, "∂²f/∂x²")
plot_f2(fig, xg, yg, getindex.(Hf, 4), "(d): ∂²f/∂y²", 4, -35:5:30, true, "∂²f/∂y²")
plot_f2(fig, xg, yg, getindex.(Hf, 2), "(f): ∂²f/∂x∂y", 6, -35:5:30, true, "∂²f/∂x∂y")
resize_to_layout!(fig)
ε = rrmserr(Hf, to_mat.(H))
return fig, ε
end
∂ = differentiate(itp, 2)
∇Hg = ∂(_x, _y; interpolant_method=Sibson(1), method = Iterative())
∇g = first.(∇Hg)
Hg = last.(∇Hg)
zlims!(figH.content[4], -25, 25)
fig∇, ε∇ = plot_gradients(∇g, f′, xg, yg)
figH, εH = plot_hessians(Hg, f′′, xg, yg)
julia> ε∇
19.07546882353911
julia> εH
51.1267212244942
The gradients actually look better here, despite the greater error, especially around the convex hull. The Hessians are a bit problematic around the convex hull especially, but we are really asking a lot of the interpolant to get Hessians unfortunately.
Let's see if the direct approach can give us any improvements (the default is Iterative()
since we have derivative information in the interpolant).
∇Hg = ∂(_x, _y; interpolant_method=Sibson(1), method=Direct())
∇g = first.(∇Hg)
Hg = last.(∇Hg)
fig∇, ε∇ = plot_gradients(∇g, f′, xg, yg)
figH, εH = plot_hessians(Hg, f′′, xg, yg)
zlims!(figH.content[4], -25, 25)
julia> ε∇
9.853286514069882
julia> εH
46.7610990050276
Indeed, both the gradients and the Hessians appear to have improved, with some difficulties at the convex hull. Perhaps a better way to measure the error is to only include points that are away fro the convex hull. The following function can do this for us:
function rrmserr(z, ẑ, tri, x, y)
num = 0.0
den = 0.0
points = get_point(tri)
ch = get_convex_hull_vertices(tri)
for (zᵢ, ẑᵢ, xᵢ, yᵢ) in zip(z, ẑ, x, y)
q = (xᵢ, yᵢ)
δ = DelaunayTriangulation.distance_to_polygon(q, points, ch)
if δ > 1e-4 && all(isfinite, (zᵢ..., ẑᵢ...))
num += norm(zᵢ .- ẑᵢ)^2
den += norm(ẑᵢ)^2
end
end
# num /= length(ẑ)
return 100sqrt(num / den)
end
If we instead use this metric, then:
julia> ε∇_nohull = rrmserr(f′.(_x, _y), ∇g, ∂, _x, _y)
7.479964687679311
julia> εH_nohull = rrmserr(f′′.(_x, _y), to_mat.(Hg), ∂, _x, _y)
38.884740966379056
The errors are smaller, though not by much.