Greedy backward filtering of features for csmoothEM ordering inference — greedy_backward_filter

Implements a greedy "backward" feature filtering strategy to mitigate the case where most features are noise and do not follow any latent ordering. The algorithm:

Fits csmoothEM on all features to obtain an initial ordering (via responsibilities \(\Gamma\)).
Computes per-feature collapsed contributions \(C_j\) (via compute_C_by_coord_csmooth).
Removes a batch of features with the smallest \(C_j\).
Refits csmoothEM on the remaining features for a few iterations.
Repeats until a stopping rule is met.

This procedure is intended as a preprocessing step before more ambitious tasks such as feature partitioning across multiple orderings.

greedy_backward_filter_csmooth(
  X,
  method = c("fiedler", "PCA", "tSNE", "pcurve", "random"),
  K = 50,
  modelName = c("homoskedastic", "heteroskedastic"),
  adaptive = "prior",
  num_iter_init = 10,
  num_iter_refit = 5,
  discretization = c("equal", "quantile", "kmeans"),
  batch = 20,
  min_keep = 20,
  tau = NULL,
  max_rounds = 50,
  verbose = TRUE,
  ...
)

Arguments

X: Numeric matrix (n x d).
method: Ordering method passed to initialize_csmoothEM. One of "fiedler", "PCA", "tSNE", "pcurve", "random".
K: Integer \(\ge 2\). Number of mixture components.
modelName: Either "homoskedastic" or "heteroskedastic".
adaptive: Adaptive mode passed to do_csmoothEM when refitting. Typically "prior" for speed (or "ml" if using collapsed-ML).
num_iter_init: Integer \(\ge 1\). Number of warm-start iterations for the initial fit.
num_iter_refit: Integer \(\ge 1\). Number of iterations for each refit after feature removal.
discretization: Discretization method for initialization passed to initialize_csmoothEM. Recommended: "quantile" to avoid empty components.
batch: Integer \(\ge 1\). Number of lowest-scoring features (smallest \(C_j\)) removed per round.
min_keep: Integer \(\ge 1\). Minimum number of features to keep; stops if fewer would remain.
tau: Optional numeric threshold. If provided, stops when min(Cj) >= tau.
max_rounds: Integer \(\ge 1\). Maximum number of greedy rounds.
verbose: Logical; print a one-line summary each round.
...: Additional arguments passed to initialize_csmoothEM (e.g. ordering controls).

Value

A list with components:

keep_cols: integer indices of retained features (w.r.t. the original X).
drop_cols: integer indices of removed features (w.r.t. the original X).
fit: final fitted csmooth_em object on the retained features.
history: data.frame with per-round diagnostics (C_total, min_Cj, etc.).