Overview
The package considers the following functional adaptive shrinkage
(FASH) scenarios. Given
sets of series data:
,
where
is the length of the
-th
series, we assume that each series
represents
measurements over the continuous treatment
at levels
.
Different from traditional smoothing methods that assumes each
has a separate prior
,
FASH assumes that all
are
with a common prior
.
Generalizing the idea from Stephens,
2017 and Urbut et al,
2018, the prior
takes the following form of a finite mixture of Gaussian processes (GP):
where
is the prior mixing weight vector,
is the mean function, and
is the covariance function of the
-th
GP.
Rather than integrating out the prior mixing weights
with a given prior
as
FASH optimizes
by maximizing the marginal likelihood of the data
:
where
denotes the marginal likelihood of the
-th
series data under the
-th
GP component.
Then the prior
is determined as:
Based on the estimated prior
,
FASH then obtains the posterior
for by:
where
is the posterior of the
-th
series data under the
-th
GP component.
With the posterior, FASH aim to simultaneously answer any subset of
the following questions:
- What is the estimated function
for each series
?
(Smoothing)
- With a false discovery rate (FDR) control, which
?
(Hypothesis testing)
- Is there any clustering structure in the estimated functions
in terms of their behaviors? (Clustering)
LGP Prior
For now, letβs assume the mean function
is zero, and each GP component is defined through the following ordinary
differential equation (ODE):
where
is a Gaussian white noise process and
is a known
th
order linear differential operator. Given the
operator, the covariance function
is completely specified by the single standard deviation parameter
.
This prior shrinks the function
toward the base model
,
which is the set of functions that satisfy
.
The smaller
is, the stronger the shrinkage is. By choosing different
operator, this one-parameter GP family can produce prior that encodes
different kinds of shapes. Some examples are discussed in Zhang
et.al 2023 and Zhang
et.al 2024.
The above one-parameter family of GP priors is flexible and
interpretable. By choosing the
operator, we can choose different types of base model to shrink the
function toward. In order words, it specifies the center of the
shrinkage (like the null hypothesis).
Example: Integrated Wiener Process
For example, when
,
the prior is called a second-order Integrated Wiener Process (IWP)
prior, which shrinks the function toward the base model
.
When all the observations are Gaussian, the posterior mean
using the second order IWP is exactly the cubic smoothing spline
estimate in Kimeldorf and
Wahba, 1970.
Computation Issue
To simplify the posterior computation with each GP component, we
apply the following two tricks:
-
Finite Element Method: The finite element method
approximates each GP
as a linear combination of basis functions:
,
where the
basis functions
are fixed and the weights
follow Gaussian distribution. This simplifies the computation of each
to
.
The weights not only have smaller dimension than the function
,
but also have a sparse precision matrix. See Zhang
et al, 2023 and Lindgren
et.al, 2011 for more details.
-
Laplace Approximation: An efficient way to compute
the posterior of the weights
is to use the Laplace approximation, as discussed in Rue
et al, 2009. The Laplace approximation approximates the posterior
distribution as a Gaussian distribution with the mode at the posterior
mean and the covariance matrix as the inverse of the Hessian matrix at
the mode:
.
In this way, the complicated integration required in the posterior
computation is replaced by a simpler optimization task with sparse
matrices. When the observations are Gaussian, the Laplace approximation
is exact. When the observations are not Gaussian, the Laplace
approximation provides reasonable approximation with very small amount
of computation cost.