LUCIDus implements Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) for joint analysis of exposures, omics, latent clusters, and outcomes. The current package version in this repository is 3.1.0.
Compared with the original single-omics workflow, the current package supports:
- Early integration: one latent structure for a single omics matrix.
- Parallel integration: layer-specific latent structures for multiple omics layers.
- Serial integration: multi-stage workflows that chain early and/or parallel submodels.
- Model tuning over latent-cluster counts and, for early/parallel models, regularization penalties.
- Feature selection for exposures and omics means/covariances.
- Missing-data diagnostics and imputation for incomplete omics matrices.
- Prediction of latent-cluster membership and outcomes, including
g_computation = TRUE. - Bootstrap inference for early, parallel, and serial models.
- S3 summaries for all supported model types and Sankey-style plots for early models.
The package builds on:
- Peng et al. (2020), A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits
- Zhao et al. (2024), An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data
- Zhao et al. (2024), LUCIDus: An R Package for Implementing Latent Unknown Clustering by Integrating Multi-omics Data (LUCID) With Phenotypic Traits
LUCIDus now supports three model families with different data structures:
| Model | Z input |
K input |
Typical use |
|---|---|---|---|
"early" |
One matrix | Integer or integer vector | Single integrated omics layer |
"parallel" |
List of matrices | Integer vector or list by layer | Multiple omics layers modeled in parallel |
"serial" |
Nested list of stages | Nested list matching Z |
Multi-stage biology or late integration pipelines |
For serial models, each stage can itself be an early block (a matrix) or a parallel block (a list of matrices), and K must mirror that topology.
Install the current CRAN release with:
install.packages("LUCIDus")The comprehensive guide in vagnette/lucidus_full_functionality_guide.Rmd organizes the package around a full analysis workflow:
- inspect missingness and imputation quality
- fit early, parallel, or serial models with
estimate_lucid() - tune
Kand penalties withtune_lucid()orlucid() - summarize fitted models
- predict latent clusters and outcomes, including
g_computation - quantify uncertainty with
boot_lucid() - visualize early-model structure with
plot() - use numerical-stability helpers when building robust workflows
library(LUCIDus)
data(sim_data)
G <- sim_data$G[1:300, , drop = FALSE]
Z <- sim_data$Z[1:300, , drop = FALSE]
Y <- sim_data$Y_normal[1:300]
CoY <- sim_data$Covariate[1:300, , drop = FALSE]
fit_early <- estimate_lucid(
G = G,
Z = Z,
Y = Y,
CoY = CoY,
lucid_model = "early",
family = "normal",
K = 2,
seed = 1008
)
summary(fit_early)
plot(fit_early)library(LUCIDus)
data(simulated_HELIX_data)
G <- simulated_HELIX_data$exposure
Y <- simulated_HELIX_data$outcome
CoY <- simulated_HELIX_data$covariate
omics <- simulated_HELIX_data$omics
Z_parallel <- list(
methylomics = omics[, 1:10, drop = FALSE],
transcriptomics = omics[, 11:20, drop = FALSE],
miRNA = omics[, 21:30, drop = FALSE]
)
fit_parallel <- estimate_lucid(
G = G,
Z = Z_parallel,
Y = Y,
CoY = CoY,
lucid_model = "parallel",
family = "normal",
K = c(2, 2, 2),
seed = 1008
)
summary(fit_parallel)library(LUCIDus)
data(sim_data)
G <- sim_data$G[1:200, , drop = FALSE]
Y <- sim_data$Y_normal[1:200]
Z_stage1 <- list(
layer1 = sim_data$Z[1:200, 1:5, drop = FALSE],
layer2 = sim_data$Z[1:200, 6:10, drop = FALSE]
)
Z_stage2 <- sim_data$Z[1:200, 1:4, drop = FALSE]
fit_serial <- estimate_lucid(
G = G,
Z = list(Z_stage1, Z_stage2),
Y = Y,
lucid_model = "serial",
family = "normal",
K = list(list(2, 2), 2),
seed = 1008
)
summary(fit_serial)| Task | Main API |
|---|---|
| Fit a model directly | estimate_lucid() |
Tune K and penalties |
tune_lucid() |
| Fit or auto-tune in one step | lucid() |
| Summarize fitted objects | summary(), summary_lucid() |
| Predict clusters and outcomes | predict_lucid() |
| Bootstrap confidence intervals | boot_lucid() |
| Diagnose missingness | analyze_missing_pattern(), check_na() |
| Impute missing omics values | safe_impute(), fill_data() |
| Check imputation quality | check_imputation_quality() |
| Numerical stability helpers | safe_log_sum_exp(), safe_normalize(), safe_solve(), check_and_stabilize_sigma(), check_convergence() |
| Visualize fitted early models | plot() |
The guide also demonstrates these workflows across continuous and binary outcomes, missing-data settings, and multi-stage serial topologies.
Use estimate_lucid() when you already know the model topology and cluster counts. This is the lowest-level exported fitting API and supports:
- supervised and unsupervised fits
- optional
CoGandCoYcovariates - normal and binary outcomes
- early, parallel, and serial model structures
- missing-data handling during fitting
For development and diagnostics, the guide also uses verbose = TRUE to print iteration-level fitting traces:
fit_verbose <- estimate_lucid(
G = G,
Z = Z,
Y = Y,
lucid_model = "early",
family = "normal",
K = 2,
max_itr = 2,
max_tot.itr = 8,
verbose = TRUE
)The guide treats tuning as a first-class workflow rather than an optional extra:
tune_lucid()explicitly evaluates a grid ofKand penalty valueslucid()is the higher-level wrapper that fits directly or auto-tunes depending on whetherKand penalty inputs are scalars or vectors- penalty tuning is available for
"early"and"parallel" - serial models currently accept scalar penalties, while still supporting topology selection through
K
# Using G, Z, and Y from the early integration example above
tuned <- tune_lucid(
G = G,
Z = Z,
Y = Y,
lucid_model = "early",
family = "normal",
K = 2:3,
Rho_G = c(0, 0.1),
Rho_Z_Mu = c(0, 5),
Rho_Z_Cov = c(0, 0.1),
seed = 1008
)The lucid() wrapper uses the same inputs and will automatically dispatch to tuning when K or penalty arguments are vectors. Penalty tuning is currently supported for "early" and "parallel"; "serial" accepts scalar penalty inputs.
The summary methods are richer than a simple coefficient printout. As shown in the guide, summary() returns model information, fit statistics, feature-selection summaries, parameter tables, missing-data summaries, and optionally bootstrap confidence intervals.
s <- summary(fit_early)After running boot_lucid(), you can also attach bootstrap confidence intervals with summary(fit_early, boot.se = boot_fit).
# Using fit_early, G, Z, and Y from the early integration example above
pred <- predict_lucid(
model = fit_early,
G = G,
Z = Z,
Y = Y,
lucid_model = "early"
)
pred_g <- predict_lucid(
model = fit_early,
G = G,
Z = NULL,
Y = NULL,
lucid_model = "early",
g_computation = TRUE
)predict_lucid() supports early, parallel, and serial models. In g_computation = TRUE mode, it uses the fitted G -> X path to generate cluster and outcome predictions under modified exposures.
# Using fit_early, G, Z, and Y from the early integration example above
boot_fit <- boot_lucid(
G = G,
Z = Z,
Y = Y,
lucid_model = "early",
model = fit_early,
R = 20,
conf = 0.95
)
summary(fit_early, boot.se = boot_fit)Bootstrap inference is available for early, parallel, and serial models. If a fitted model uses nonzero penalties, boot_lucid() will refit a zero-penalty version internally because bootstrap CIs are derived from unpenalized fits. If feature selection has removed variables, refit the reduced model before bootstrapping.
The package now includes dedicated helpers for incomplete omics data:
# Using Z from the early integration example above
Z_miss <- Z
Z_miss[1, ] <- NA
Z_miss[2:4, 1] <- NA
miss_pattern <- analyze_missing_pattern(Z_miss)
na_summary <- check_na(Z_miss, lucid_model = "early")
Z_imp <- safe_impute(Z_miss, method = "mean")
imp_quality <- check_imputation_quality(Z_miss, Z_imp)These helpers are useful both before fitting and when validating simple imputations outside the model-based missing-data workflow.
The guide uses them in three roles:
- pre-fit diagnostics with
analyze_missing_pattern()andcheck_na() - simple standalone imputation with
safe_impute() - low-level likelihood-based filling via
fill_data()
The guide includes a dedicated binary-outcome example. LUCIDus supports:
family = "normal"for continuous outcomesfamily = "binary"for 0/1 outcomespredict_lucid(..., response = TRUE)for class labelspredict_lucid(..., response = FALSE)for predicted probabilities
fit_early_binary <- estimate_lucid(
G = G,
Z = Z,
Y = sim_data$Y_binary[1:300],
lucid_model = "early",
family = "binary",
K = 2,
seed = 1008
)
pred_binary_prob <- predict_lucid(
model = fit_early_binary,
G = G,
Z = Z,
Y = sim_data$Y_binary[1:300],
lucid_model = "early",
response = FALSE
)The guide covers both plotting and the lower-level numerical helpers that support more stable workflows.
plot()currently provides Sankey-style visualization for early models- parallel and serial plotting methods are present but still under development
safe_log_sum_exp(),safe_normalize(),safe_solve(),check_and_stabilize_sigma(), andcheck_convergence()help with numerically fragile workflows
plot(fit_early)
safe_log_sum_exp(c(-1000, -1001, -999))
safe_normalize(c(1e-300, 2, 3))
sigma_stable <- check_and_stabilize_sigma(matrix(c(1, 0.999999, 0.999999, 0.999998), 2))
sigma_inv <- safe_solve(sigma_stable)The full functionality guide is intentionally lightweight so examples run quickly. For real analyses, it recommends increasing:
max_itrandmax_tot.itr- bootstrap replicates
R - the size of the tuning grid
For serial pipelines in particular, the guide recommends closely reviewing stage-wise missingness summaries and fitted submodels.
For fuller, end-to-end examples, see:
The comprehensive guide is the best reference for the current package surface. It covers:
- missing-data diagnostics and imputation helpers
- direct fitting with
estimate_lucid() - wrapper workflows with
tune_lucid()andlucid() - verbose fitting demos
- summaries, prediction, and
g_computation - bootstrap inference
- plotting and numerical stability helpers
- continuous and binary outcome examples
If you use LUCIDus, please cite:
- Peng C., Wang J., Asante I., Louie S., Jin R., Chatzi L., Casey G., Thomas D. C., Conti D. V. (2020). A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits. Bioinformatics.
- Zhao Y., Jia Q., Goodrich J. A., Conti D. V. (2024). LUCIDus: An R Package for Implementing Latent Unknown Clustering by Integrating Multi-omics Data (LUCID) With Phenotypic Traits. The R Journal.
- Zhao Y., Jia Q., Goodrich J., Darst B., Conti D. V. (2024). An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data. Bioinformatics Advances.

