library(smoothEMr)

The feature-partition problem

Problem statement

We observe a data matrix Xn×d, X \in \mathbb{R}^{n \times d}, where each column (feature) may follow one of two different latent orderings of the samples. Let the feature index set be [d]:={1,2,,d}. [d] := \{1,2,\dots,d\}.

We seek a partition of features into two disjoint sets J1J2=[d],J1J2=, J_1 \cup J_2 = [d], \qquad J_1 \cap J_2 = \emptyset, such that features in J1J_1 are best explained by ordering system 1, and features in J2J_2 are best explained by ordering system 2.

Latent ordering via responsibilities

In csmoothEM, each ordering system is represented by an n×Kn \times K responsibility matrix Γ(m)=(γik(m))i=1..n,k=1..K,m{1,2}, \Gamma^{(m)} = \left(\gamma^{(m)}_{ik}\right)_{i=1..n,\,k=1..K}, \qquad m \in \{1,2\}, where γik(m)\gamma^{(m)}_{ik} is the (soft) membership probability of sample ii in component/bin kk under ordering mm. You can view Γ(m)\Gamma^{(m)} as a soft discretization of a latent 1D ordering into KK bins.

Each ordering system mm also has csmoothEM parameters θ(m)=(π(m),μ(m),σ(m),λ(m)), \theta^{(m)} = (\pi^{(m)}, \mu^{(m)}, \sigma^{(m)}, \lambda^{(m)}), where (in the homoskedastic csmoothEM setting used here):

  • π(m)K\pi^{(m)} \in \mathbb{R}^K are mixture weights,
  • μ(m)dm×K\mu^{(m)} \in \mathbb{R}^{d_m \times K} are component means for features in the corresponding subset,
  • σ(m)dm\sigma^{(m)} \in \mathbb{R}^{d_m} are per-feature standard deviations,
  • λ(m)dm\lambda^{(m)} \in \mathbb{R}^{d_m} are per-feature smoothness strengths for a RW prior along KK.

(Here dm=|Jm|d_m = |J_m|.)

Scoring a feature against a fixed ordering

A key ingredient in both greedy partition procedures is a feature-wise alignment score: for a candidate feature jj, given a fixed responsibilities matrix Γ\Gamma, we compute a scalar score S(jΓ) S(j \mid \Gamma) that measures how well feature jj aligns with the ordering encoded by Γ\Gamma.

The package provides two score modes through score_mode:

  • score_mode = "none": a plug-in (ELBO) score with fixed λ=1\lambda=1.
  • score_mode = "ml": a collapsed (marginal-like) score with per-feature λj\lambda_j optimized.

Both modes rely on the same idea: given Γ\Gamma, we can fit feature-specific μj\mu_{j\cdot} (and σj2\sigma_j^2) in closed form (or near closed form), then evaluate a comparable scalar objective.


Simulated example

We simulate a dataset where half of the features follow one latent ordering and the other half follow a second ordering. We also permute the feature columns to make the labels nontrivial.

sim <- simulate_two_order_gp_dataset(
  N = 1000,
  D = 16,
  t_range = c(0, 10),
  range = 5,
  smoothness = 2.5,
  variance = 3.0,
  noise_sd = 0.05,
  seed = 123
)

X <- sim$X
true_group <- sim$true_group

cat("Simulated X:", nrow(X), "x", ncol(X), "\n")
#> Simulated X: 1000 x 16
print(table(true_group))
#> true_group
#> 1 2 
#> 8 8

Quick visualization

Instead of a heatmap, we first plot a few representative features from each ordering. Because permute_rows_block2=TRUE permutes the sample order for the second block, features from ordering 2 typically look misaligned when plotted against the raw sample index.

# Pick one representative feature from each true group
idx1 <- which(true_group == 1)
idx2 <- which(true_group == 2)

set.seed(1)
j1 <- sample(idx1, 1)  # a feature generated by ordering 1
j2 <- sample(idx2, 1)  # a feature generated by ordering 2

x1 <- as.numeric(scale(X[, j1]))  # z-score for comparability
x2 <- as.numeric(scale(X[, j2]))

# Construct the two "true" orderings for rows
t1 <- sim$t1
t2 <- sim$t2
if (!is.null(sim$row_perm_block2)) {
  t2 <- t2[sim$row_perm_block2]
}

# ordering indices
o1 <- order(t1)
o2 <- order(t2)

op <- par(no.readonly = TRUE)
on.exit(par(op), add = TRUE)
par(mfrow = c(2, 2), mar = c(4, 4, 2.5, 1))

# (1) feature from ordering 1 vs ordering 1
plot(t1[o1], x1[o1], type = "l", lwd = 2,
     xlab = "t1 (ordering 1)", ylab = "z-scored feature value",
     main = sprintf("ord1 feature (%s)\nagainst ordering 1", colnames(X)[j1]))

# (2) feature from ordering 2 vs ordering 1
plot(t1[o1], x2[o1], type = "l", lwd = 2,
     xlab = "t1 (ordering 1)", ylab = "z-scored feature value",
     main = sprintf("ord2 feature (%s)\nagainst ordering 1", colnames(X)[j2]))

# (3) feature from ordering 1 vs ordering 2
plot(t2[o2], x1[o2], type = "l", lwd = 2,
     xlab = "t2 (ordering 2)", ylab = "z-scored feature value",
     main = sprintf("ord1 feature (%s)\nagainst ordering 2", colnames(X)[j1]))

# (4) feature from ordering 2 vs ordering 2
plot(t2[o2], x2[o2], type = "l", lwd = 2,
     xlab = "t2 (ordering 2)", ylab = "z-scored feature value",
     main = sprintf("ord2 feature (%s)\nagainst ordering 2", colnames(X)[j2]))

Approach 1: Forward greedy partition

Description

The forward greedy algorithm proceeds as follows:

  1. Choose a seed feature j1j_1 and fit a 1D csmoothEM ordering to obtain Γ(1)\Gamma^{(1)}.
  2. Choose a second seed feature j2j_2 as the feature with the worst alignment score under Γ(1)\Gamma^{(1)}, and fit a 1D csmoothEM ordering to obtain Γ(2)\Gamma^{(2)}.
  3. While unassigned features remain:
    • score each remaining feature under Γ(1)\Gamma^{(1)} and Γ(2)\Gamma^{(2)},
    • pick the feature with the largest absolute score gap,
    • assign it to the ordering with higher score,
    • warm-start update the corresponding ordering (update θ(m)\theta^{(m)} and then Γ(m)\Gamma^{(m)}).

In the implementation, the ordering-specific objects are returned as two csmooth_em fits (fit1, fit2).

Run

set.seed(1)
res_fwd <- forward_two_ordering_partition_csmooth(
  X = X,
  K = 50,
  score_mode = "ml",
  greedy_em_refine = TRUE,
  greedy_em_max_iter = 10,  
  verbose = 1
)
#> [Greedy-csmooth] seeds: j1=4, j2=9
#> [Greedy-csmooth] pick 3 -> ord1 (|Δ|=2357.548), remaining=13
#> [Greedy-csmooth] pick 5 -> ord1 (|Δ|=1979.785), remaining=12
#> [Greedy-csmooth] pick 13 -> ord2 (|Δ|=1898.020), remaining=11
#> [Greedy-csmooth] pick 1 -> ord1 (|Δ|=1815.358), remaining=10
#> [Greedy-csmooth] pick 6 -> ord1 (|Δ|=1547.641), remaining=9
#> [Greedy-csmooth] pick 2 -> ord1 (|Δ|=1511.510), remaining=8
#> [Greedy-csmooth] pick 7 -> ord1 (|Δ|=1344.981), remaining=7
#> [Greedy-csmooth] pick 10 -> ord2 (|Δ|=1295.184), remaining=6
#> [Greedy-csmooth] pick 16 -> ord2 (|Δ|=1062.702), remaining=5
#> [Greedy-csmooth] pick 8 -> ord1 (|Δ|=788.595), remaining=4
#> [Greedy-csmooth] pick 14 -> ord2 (|Δ|=513.117), remaining=3
#> [Greedy-csmooth] pick 11 -> ord2 (|Δ|=1374.790), remaining=2
#> [Greedy-csmooth] pick 12 -> ord2 (|Δ|=1106.892), remaining=1
#> [Greedy-csmooth] pick 15 -> ord2 (|Δ|=825.989), remaining=0
#> [Greedy-csmooth] done: |J1|=8, |J2|=8

Confusion table (with label switching handled)

The partition labels 1/2 may swap. We compute both possible mappings and report the better one.

assign_raw <- as.integer(res_fwd$coord_assign)
tab1 <- table(true = true_group, est = assign_raw)
tab2 <- table(true = true_group, est = 3L - assign_raw)

acc1 <- sum(diag(tab1)) / sum(tab1)
acc2 <- sum(diag(tab2)) / sum(tab2)

if (acc2 > acc1) {
  cat("Forward: using swapped labels\n")
  print(tab2)
  cat("Accuracy:", acc2, "\n")
} else {
  cat("Forward: using original labels\n")
  print(tab1)
  cat("Accuracy:", acc1, "\n")
}
#> Forward: using original labels
#>     est
#> true 1 2
#>    1 8 0
#>    2 0 8
#> Accuracy: 1

You can also inspect the two ordering-specific fitted objects:

print(res_fwd$fit1)
#> Fitted csmoothEM object with RW(2) prior along K = 50
#> -----
#>   n = 1000, d = 8, modelName = homoskedastic
#>   iter = 0; init_method = forward_greedy; adaptive = ml;
#>   lambda_vec: range=[1, 1], mean=1, relative = TRUE
#>   ELBO last = NA; penLogLik last = NA; ML/C last = NA
print(res_fwd$fit2)
#> Fitted csmoothEM object with RW(2) prior along K = 50
#> -----
#>   n = 1000, d = 8, modelName = homoskedastic
#>   iter = 0; init_method = forward_greedy; adaptive = ml;
#>   lambda_vec: range=[1, 1], mean=1, relative = TRUE
#>   ELBO last = NA; penLogLik last = NA; ML/C last = NA

Approach 2: Backward greedy partition

Description

The backward greedy algorithm starts from a single ordering fit and then moves features to a second ordering if doing so improves the alignment score.

  1. Fit ordering 1 on all features to obtain Γ(1)\Gamma^{(1)}.
  2. Initialize ordering 2 from a seed feature j2j_2 (by default, the worst-aligned feature under ordering 1), yielding Γ(2)\Gamma^{(2)}.
  3. Repeat:
    • for each feature currently in ordering 1, compute the gain Δj=S2(jΓ2)S1(jΓ1). \Delta_j = S_2(j \mid \Gamma_2) - S_1(j \mid \Gamma_1).
    • move the best feature if the gain is positive,
    • warm-start update both orderings (update θ(1),θ(2)\theta^{(1)}, \theta^{(2)} and then Γ(1),Γ(2)\Gamma^{(1)}, \Gamma^{(2)}).

Importantly, the implementation maintains warm-start state and does not re-run global ordering initializers after the initial construction.

Run

set.seed(1)
res_bwd <- backward_two_ordering_partition_csmooth(
  X = X,
  K = 50,
  score_mode = "ml",
  warm_iter_init = 10,
  warm_iter_refit = 10,
  verbose = TRUE
)
#> [backward-init] seedB=9 (ord2_1), |A|=15 |B|=1
#> [backward 01] move 13 (ord2_5) gain=1011.576 |A|=14 |B|=2
#> [backward 02] move 10 (ord2_2) gain=542.226 |A|=13 |B|=3
#> [backward 03] move 16 (ord2_8) gain=378.083 |A|=12 |B|=4
#> [backward 04] no positive gain; stopping.

Confusion table (with label switching handled)

assign_raw <- ifelse(res_bwd$assign == "A", 1L, 2L)
tab1 <- table(true = true_group, est = assign_raw)
tab2 <- table(true = true_group, est = 3L - assign_raw)

acc1 <- sum(diag(tab1)) / sum(tab1)
acc2 <- sum(diag(tab2)) / sum(tab2)

if (acc2 > acc1) {
  cat("Backward: using swapped labels\n")
  print(tab2)
  cat("Accuracy:", acc2, "\n")
} else {
  cat("Backward: using original labels\n")
  print(tab1)
  cat("Accuracy:", acc1, "\n")
}
#> Backward: using original labels
#>     est
#> true 1 2
#>    1 8 0
#>    2 4 4
#> Accuracy: 0.75