greedy_backward_filter_csmooth.RdImplements a greedy "backward" feature filtering strategy to mitigate the case where most features are noise and do not follow any latent ordering. The algorithm:
Fits csmoothEM on all features to obtain an initial ordering (via responsibilities \(\Gamma\)).
Computes per-feature collapsed contributions \(C_j\) (via compute_C_by_coord_csmooth).
Removes a batch of features with the smallest \(C_j\).
Refits csmoothEM on the remaining features for a few iterations.
Repeats until a stopping rule is met.
This procedure is intended as a preprocessing step before more ambitious tasks such as feature partitioning across multiple orderings.
greedy_backward_filter_csmooth(
X,
method = c("fiedler", "PCA", "tSNE", "pcurve", "random"),
K = 50,
modelName = c("homoskedastic", "heteroskedastic"),
adaptive = "prior",
num_iter_init = 10,
num_iter_refit = 5,
discretization = c("equal", "quantile", "kmeans"),
batch = 20,
min_keep = 20,
tau = NULL,
max_rounds = 50,
verbose = TRUE,
...
)Numeric matrix (n x d).
Ordering method passed to initialize_csmoothEM. One of
"fiedler", "PCA", "tSNE", "pcurve", "random".
Integer \(\ge 2\). Number of mixture components.
Either "homoskedastic" or "heteroskedastic".
Adaptive mode passed to do_csmoothEM when refitting.
Typically "prior" for speed (or "ml" if using collapsed-ML).
Integer \(\ge 1\). Number of warm-start iterations for the initial fit.
Integer \(\ge 1\). Number of iterations for each refit after feature removal.
Discretization method for initialization passed to initialize_csmoothEM.
Recommended: "quantile" to avoid empty components.
Integer \(\ge 1\). Number of lowest-scoring features (smallest \(C_j\)) removed per round.
Integer \(\ge 1\). Minimum number of features to keep; stops if fewer would remain.
Optional numeric threshold. If provided, stops when min(Cj) >= tau.
Integer \(\ge 1\). Maximum number of greedy rounds.
Logical; print a one-line summary each round.
Additional arguments passed to initialize_csmoothEM (e.g. ordering controls).
A list with components:
keep_cols: integer indices of retained features (w.r.t. the original X).
drop_cols: integer indices of removed features (w.r.t. the original X).
fit: final fitted csmooth_em object on the retained features.
history: data.frame with per-round diagnostics (C_total, min_Cj, etc.).