Overview

The package considers the following functional adaptive shrinkage (FASH) scenarios. Given NN sets of series data: 𝐲={yi(tj):j∈[ni]}i=1N\boldsymbol{y} = \{y_i(t_j): j\in[n_i]\}_{i=1}^{N}, where nin_i is the length of the ii-th series, we assume that each series yiy_i represents nin_i measurements over the continuous treatment tt at levels {tj}j=1ni\{t_j\}_{j=1}^{n_i}.

Different from traditional smoothing methods that assumes each fi(t)f_i(t) has a separate prior gig_i, FASH assumes that all fi(t)f_i(t) are iidiid with a common prior gfg_f. Generalizing the idea from Stephens, 2017 and Urbut et al, 2018, the prior gfg_f takes the following form of a finite mixture of Gaussian processes (GP): gf|𝛑=βˆ‘k=0KΟ€kGP(mk,Ck),g_f|\boldsymbol{\pi} = \sum_{k=0}^{K} \pi_k\text{GP}(m_k,C_k), where 𝛑=[Ο€1,...,Ο€K]T\boldsymbol{\pi} = [\pi_1,...,\pi_K]^T is the prior mixing weight vector, mkm_k is the mean function, and CkC_k is the covariance function of the kk-th GP.

Rather than integrating out the prior mixing weights 𝛑\boldsymbol{\pi} with a given prior p(𝛑)p(\boldsymbol{\pi}) as gf=∫(gf|𝛑)p(𝛑)d𝛑,g_f = \int(g_f|\boldsymbol{\pi})p(\boldsymbol{\pi})d\boldsymbol{\pi}, FASH optimizes 𝛑̂\hat{\boldsymbol{\pi}} by maximizing the marginal likelihood of the data 𝐲\boldsymbol{y}: 𝛑̂=argmaxπ›‘βˆ‘i=1Nlog(βˆ‘k=0KΟ€k𝐋ik),\hat{\boldsymbol{\pi}} = \arg\max_{\boldsymbol{\pi}} \sum_{i=1}^{N} \log\left(\sum_{k=0}^{K} \pi_k \mathbf{L}_{ik}\right), where 𝐋ik\mathbf{L}_{ik} denotes the marginal likelihood of the ii-th series data under the kk-th GP component.

Then the prior gfg_f is determined as: gΜ‚f=∫(gf|𝛑)δ𝛑̂(𝛑)d𝛑=gf|𝛑̂.\hat{g}_f = \int(g_f|\boldsymbol{\pi})\delta_{\hat{\boldsymbol{\pi}}}(\boldsymbol{\pi})d\boldsymbol{\pi} = g_f|\hat{\boldsymbol{\pi}}.

Based on the estimated prior gΜ‚\hat{g}, FASH then obtains the posterior p(fi(t)|𝐲)p(f_i(t)|\boldsymbol{y}) for by: p(fi(t)|𝐲,𝛑̂)=βˆ‘k=0KΟ€Μƒkpk(fi(t)|𝐲i),p(f_i(t)|\boldsymbol{y}, \hat{\boldsymbol{\pi}}) = \sum_{k=0}^{K} \tilde{\pi}_k p_k(f_i(t)|\boldsymbol{y}_i),

where pk(fi(t)|𝐲i)p_k(f_i(t)|\boldsymbol{y}_i) is the posterior of the ii-th series data under the kk-th GP component.

With the posterior, FASH aim to simultaneously answer any subset of the following questions:

  • What is the estimated function fi(t)f_i(t) for each series yiy_i? (Smoothing)
  • With a false discovery rate (FDR) control, which fi(t)∈S0βŠ‚Sf_i(t) \in S_0 \subset S? (Hypothesis testing)
  • Is there any clustering structure in the estimated functions fi(t)f_i(t) in terms of their behaviors? (Clustering)

LGP Prior

For now, let’s assume the mean function mkm_k is zero, and each GP component is defined through the following ordinary differential equation (ODE): Lf(t)=ΟƒkW(t),Lf(t) = \sigma_k W(t), where W(t)W(t) is a Gaussian white noise process and LL is a known ppth order linear differential operator. Given the LL operator, the covariance function CkC_k is completely specified by the single standard deviation parameter Οƒk\sigma_k.

This prior shrinks the function ff toward the base model S0=Null{L}S_0 = \text{Null}\{L\}, which is the set of functions that satisfy Lf=0Lf = 0. The smaller Οƒk\sigma_k is, the stronger the shrinkage is. By choosing different LL operator, this one-parameter GP family can produce prior that encodes different kinds of shapes. Some examples are discussed in Zhang et.al 2023 and Zhang et.al 2024.

The above one-parameter family of GP priors is flexible and interpretable. By choosing the LL operator, we can choose different types of base model to shrink the function toward. In order words, it specifies the center of the shrinkage (like the null hypothesis).

Example: Integrated Wiener Process

For example, when L=d2dt2L = \frac{d^2}{dt^2}, the prior is called a second-order Integrated Wiener Process (IWP) prior, which shrinks the function toward the base model S0=Null{L}=span{1,t}S_0 = \text{Null}\{L\} = \text{span}\{1,t\}.

When all the observations are Gaussian, the posterior mean 𝔼(f|𝐲i)\mathbb{E}(f|\boldsymbol{y}_i) using the second order IWP is exactly the cubic smoothing spline estimate in Kimeldorf and Wahba, 1970.

Computation Issue

To simplify the posterior computation with each GP component, we apply the following two tricks:

  • Finite Element Method: The finite element method approximates each GP f(t)f(t) as a linear combination of basis functions: f(t)=βˆ‘l=1mwlψl(t)f(t) = \sum_{l=1}^{m} w_l \psi_l(t), where the mm basis functions ψl(t)\psi_l(t) are fixed and the weights 𝐰\boldsymbol{w} follow Gaussian distribution. This simplifies the computation of each p(fi(t)|𝐲i,Οƒk)p(f_i(t)|\boldsymbol{y}_i,\sigma_k) to p(𝐰|𝐲i,Οƒk)p(\boldsymbol{w}|\boldsymbol{y}_i,\sigma_k). The weights not only have smaller dimension than the function f(t)f(t), but also have a sparse precision matrix. See Zhang et al, 2023 and Lindgren et.al, 2011 for more details.
  • Laplace Approximation: An efficient way to compute the posterior of the weights 𝐰\boldsymbol{w} is to use the Laplace approximation, as discussed in Rue et al, 2009. The Laplace approximation approximates the posterior distribution as a Gaussian distribution with the mode at the posterior mean and the covariance matrix as the inverse of the Hessian matrix at the mode: pG(𝐰|𝐲,Οƒk)=𝒩(𝐰̂,VΜ‚)p_G(\boldsymbol{w}|\boldsymbol{y}, \sigma_k) = \mathcal{N}(\hat{\boldsymbol{w}}, \hat{V}).

In this way, the complicated integration required in the posterior computation is replaced by a simpler optimization task with sparse matrices. When the observations are Gaussian, the Laplace approximation is exact. When the observations are not Gaussian, the Laplace approximation provides reasonable approximation with very small amount of computation cost.