simulate_two_order_gp_dataset.RdSimulates an \(N \times D\) dataset with two latent sample orderings \(t_1\) and \(t_2\). The first \(D/2\) features are generated from a Gaussian process over \(t_1\), and the remaining \(D/2\) features are generated from an independent Gaussian process over \(t_2\).
Each feature is a GP draw evaluated at the \(N\) sample locations. Optionally:
permute feature columns (permute_cols),
permute the sample order within block 2 (permute_rows_block2),
shift the data to be positive (shift_positive),
add i.i.d. Gaussian noise (noise_sd).
Column names are assigned *before* permutation and then permuted consistently with the columns,
so that names remain aligned with the returned true_group.
simulate_two_order_gp_dataset(
N = 1000,
D = 16,
t_range = c(0, 10),
range = 5,
smoothness = 2.5,
variance = 3,
noise_sd = 0.05,
shift_positive = TRUE,
permute_cols = FALSE,
permute_rows_block2 = TRUE,
seed = NULL
)Integer \(\ge 2\). Number of samples (rows).
Integer \(\ge 2\) and even. Number of features (columns).
Numeric length-2 vector. Range for sampling t1 and t2.
Matern GP hyperparameters.
Nonnegative numeric. Standard deviation of i.i.d. Gaussian noise added to X.
Logical; if TRUE, shift each GP block so its minimum is 1.
Logical; if TRUE, permute feature columns.
Logical; if TRUE, permute rows of the second GP block before combining.
Optional integer seed. If not NULL, sets set.seed(seed).
A list with components:
X: numeric matrix \(N \times D\).
t1, t2: numeric vectors length N (latent orderings).
permut_cols: integer vector length D (the applied column permutation; identity if permute_cols=FALSE).
true_group: integer vector length D in 1,2 indicating which ordering generated each feature (after permutation).
row_perm_block2: integer vector length N giving the row permutation applied to block 2 (or NULL).
Requires fields::rdist, fields::Matern, and MASS::mvrnorm.