knitr::opts_chunk$set(fig.width = 8, fig.height = 6)
library(fashr)

Setup

We consider the effect size estimate of 100100 eQTLs measured in day t=1t = 1 to day t=16t = 16:

  • there are 5050 eQTLs that are not dynamic, i.e., the effect size is constant over time (Category A).
  • there are 3030 eQTLs that are linear dynamic, i.e., the effect size is changing linearly over time (Category B).
  • there are 2020 eQTLs that are non-linear dynamic, i.e., the effect size is changing non-linearly over time (Category C).

Let’s take a look at the data structure:

length(datasets)
## [1] 100
str(datasets[[1]])
## 'data.frame':    16 obs. of  4 variables:
##  $ x    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ y    : num  -0.0611 0.0756 0.039 -0.0311 -0.4429 ...
##  $ truef: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ sd   : num  0.2 0.05 0.1 0.05 0.2 0.2 0.1 0.1 0.2 0.2 ...

Take a look at the true label of the datasets:

table(labels)
## labels
##  A  B  C 
## 50 30 20

Fitting FASH

The default way of fitting FASH is to input the list of datasets (data_list), and specify the column names for the effect size (Y), the standard deviation of the effect size (S), and the time points (smooth_var).

The computation could be paralleled by specifying the number of cores (num_cores). Reducing the number of basis functions (num_basis) can also greatly speed up the computation.

Testing non-linearity

Let’s first use the IWP2 model (order = 2), and try to detect the dynamic eQTLs with non-linear dynamics (Category C).

fash_fit <- fash(Y = "y", smooth_var = "x", S = "sd", data_list = datasets, 
                  likelihood = "gaussian", order = 2, pred_step = 1,
                  num_cores = 2, num_basis = 20, grid = seq(0, 1, by = 0.025),
                  verbose = TRUE)
## Starting data setup...
## Completed data setup in 0.01 seconds.
## Starting likelihood computation...
## Completed likelihood computation in 12.03 seconds.
## Starting empirical Bayes estimation...
## Completed empirical Bayes estimation in 0.01 seconds.
## fash object created successfully.
fash_fit
## Fitted fash Object
## -------------------
## Number of datasets: 100
## Likelihood: gaussian
## Number of PSD grid values: 41 (initial), 6 (non-trivial)
## Order of Integrated Wiener Process (IWP): 2

Let’s take a look at the estimated prior:

fash_fit$prior_weights
##     psd prior_weight
## 1 0.000   0.81369365
## 2 0.025   0.01755264
## 3 0.125   0.11104094
## 4 0.175   0.02131837
## 5 0.200   0.01002227
## 6 0.350   0.02637213

We can take a look at their posterior weights in each GP component:

plot(fash_fit)

The plot function automatically plots the posterior weights of the GP components, with darker colors indicating larger PSD (e.g. larger σk\sigma_k). The observations are ordered based on posterior mean of the PSD (i.e. σk\sigma_k) in descending order from top to bottom. We could also customize the plot in various different ways:

plot(fash_fit, discrete = TRUE, ordering = "lfdr")

The above structure plot ignores the continuous structure of the PSD (discrete = TRUE) and orders the observations based on the local false discovery rate (LFDR) of p(βi(t)S0|𝐘,ĝ)p(\beta_i(t)\in S_0|\boldsymbol{Y}, \hat{g}). In this case, since the order of IWP is p = 2, the base model S0S_0 denotes the space of linear function.

We can then use fdr_control to test the null hypothesis that H0:βi(t)S0H_0: \beta_i(t) \in S_0 at a given FDR level:

fdr_result <- fdr_control(fash_fit, alpha = 0.1, plot = TRUE)
## 20 datasets are significant at alpha level 0.10. Total datasets tested: 100.

There are 21 eQTLs flagged as significant at FDR level 0.1. We can take out the indices of these eQTLs:

detected_indices <- fdr_result$fdr_results$index[fdr_result$fdr_results$FDR < 0.1]

How many of the true (non-linear) dynamic eQTLs are detected?

sum(labels[detected_indices] == "C")/sizeC
## [1] 0.95

What is the false discovery rate?

sum(labels[detected_indices] != "C")/length(detected_indices)
## [1] 0.05

Let’s take a look at the inferred eQTL effect βi(t)\beta_i(t) for the detected eQTLs.

fitted_beta <- predict(fash_fit, index = detected_indices[1])
str(fitted_beta)
## 'data.frame':    16 obs. of  5 variables:
##  $ x     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ mean  : num  0.7521 0.3925 0.1712 -0.0469 -0.2553 ...
##  $ median: num  0.7526 0.3991 0.1708 -0.0477 -0.2585 ...
##  $ lower : num  0.6575 0.1543 0.0765 -0.1963 -0.4183 ...
##  $ upper : num  0.8486 0.602 0.2677 0.1074 -0.0902 ...

The predict function returns the posterior information of the effect size βi(t)\beta_i(t) for the eQTL specified by index. By default, it returns the posterior of the effect size at each observed time point. We can also specify the time points to predict the effect size at:

fitted_beta_new <- predict(fash_fit, index = detected_indices[1], smooth_var = seq(0, 16, length.out = 100))
str(fitted_beta_new)
## 'data.frame':    100 obs. of  5 variables:
##  $ x     : num  0 0.162 0.323 0.485 0.646 ...
##  $ mean  : num  1.162 1.096 1.03 0.963 0.897 ...
##  $ median: num  1.147 1.083 1.02 0.956 0.893 ...
##  $ lower : num  0.568 0.594 0.613 0.635 0.659 ...
##  $ upper : num  1.84 1.67 1.5 1.32 1.16 ...

It is also possible to store M posterior samples rather than the posterior summary:

fitted_beta_samples <- predict(fash_fit, index = detected_indices[1], 
                               smooth_var = seq(0, 16, length.out = 100), 
                               only.samples = TRUE, M = 50)
str(fitted_beta_samples)
##  num [1:100, 1:50] 1.012 0.974 0.937 0.899 0.862 ...

Let’s plot the inferred effect size for the first detected eQTL:

plot(datasets[[detected_indices[1]]]$x, datasets[[detected_indices[1]]]$y, type = "p", col = "black", lwd = 2, xlab = "Time", ylab = "Effect Size")
lines(fitted_beta_new$x, fitted_beta_new$mean, col = "red", lwd = 2)
lines(datasets[[detected_indices[1]]]$x, datasets[[detected_indices[1]]]$truef, col = "black", lwd = 1, lty = 2)
polygon(c(fitted_beta_new$x, rev(fitted_beta_new$x)), c(fitted_beta_new$lower, rev(fitted_beta_new$upper)), col = rgb(1, 0, 0, 0.2), border = NA)

Testing dynamic eQTLs

What if we are interested in detecting any dynamic eQTLs, not just the non-linear ones? (i.e., Categories B and C)

We can specify the order of the IWP model to be 1, which will setup a base model S0S_0 as the space of constant functions.

fash_fit_2 <- fash(Y = "y", smooth_var = "x", S = "sd", data_list = datasets, 
                  likelihood = "gaussian", order = 1, pred_step = 1,
                  num_cores = 2, num_basis = 20, grid = seq(0, 1, by = 0.025),
                  verbose = TRUE)
## Starting data setup...
## Completed data setup in 0.01 seconds.
## Starting likelihood computation...
## Completed likelihood computation in 9.82 seconds.
## Starting empirical Bayes estimation...
## Completed empirical Bayes estimation in 0.01 seconds.
## fash object created successfully.
fash_fit_2
## Fitted fash Object
## -------------------
## Number of datasets: 100
## Likelihood: gaussian
## Number of PSD grid values: 41 (initial), 9 (non-trivial)
## Order of Integrated Wiener Process (IWP): 1

Take a look at the structure plot:

plot(fash_fit_2, discrete = TRUE, ordering = "mean")

Let’s test the null hypothesis that H0:βi(t)S0H_0: \beta_i(t) \in S_0 at a given FDR level:

fdr_result_2 <- fdr_control(fash_fit_2, alpha = 0.1, plot = TRUE)
## 48 datasets are significant at alpha level 0.10. Total datasets tested: 100.

detected_indices_2 <- fdr_result_2$fdr_results$index[fdr_result_2$fdr_results$FDR < 0.1]

How many of the true dynamic eQTLs are detected?

sum(labels[detected_indices_2] != "A")/(sizeB + sizeC)
## [1] 0.88

What is the false discovery rate?

sum(labels[detected_indices_2] == "A")/length(detected_indices_2)
## [1] 0.08333333