LUCIDus: LUCID with Multiple Omics Data

LUCIDus implements Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) for joint analysis of exposures, omics, latent clusters, and outcomes. The current package version in this repository is 3.1.0.

Compared with the original single-omics workflow, the current package supports:

Early integration: one latent structure for a single omics matrix.
Parallel integration: layer-specific latent structures for multiple omics layers.
Serial integration: multi-stage workflows that chain early and/or parallel submodels.
Model tuning over latent-cluster counts and, for early/parallel models, regularization penalties.
Feature selection for exposures and omics means/covariances.
Missing-data diagnostics and imputation for incomplete omics matrices.
Prediction of latent-cluster membership and outcomes, including g_computation = TRUE.
Bootstrap inference for early, parallel, and serial models.
S3 summaries for all supported model types and Sankey-style plots for early models.

The package builds on:

Peng et al. (2020), A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits
Zhao et al. (2024), An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data
Zhao et al. (2024), LUCIDus: An R Package for Implementing Latent Unknown Clustering by Integrating Multi-omics Data (LUCID) With Phenotypic Traits

Integration strategies

LUCIDus now supports three model families with different data structures:

Model	`Z` input	`K` input	Typical use
`"early"`	One matrix	Integer or integer vector	Single integrated omics layer
`"parallel"`	List of matrices	Integer vector or list by layer	Multiple omics layers modeled in parallel
`"serial"`	Nested list of stages	Nested list matching `Z`	Multi-stage biology or late integration pipelines

For serial models, each stage can itself be an early block (a matrix) or a parallel block (a list of matrices), and K must mirror that topology.

Installation

Install the current CRAN release with:

install.packages("LUCIDus")

Workflow overview

The comprehensive guide in vagnette/lucidus_full_functionality_guide.Rmd organizes the package around a full analysis workflow:

inspect missingness and imputation quality
fit early, parallel, or serial models with estimate_lucid()
tune K and penalties with tune_lucid() or lucid()
summarize fitted models
predict latent clusters and outcomes, including g_computation
quantify uncertainty with boot_lucid()
visualize early-model structure with plot()
use numerical-stability helpers when building robust workflows

Quick start

Early integration

library(LUCIDus)

data(sim_data)

G <- sim_data$G[1:300, , drop = FALSE]
Z <- sim_data$Z[1:300, , drop = FALSE]
Y <- sim_data$Y_normal[1:300]
CoY <- sim_data$Covariate[1:300, , drop = FALSE]

fit_early <- estimate_lucid(
  G = G,
  Z = Z,
  Y = Y,
  CoY = CoY,
  lucid_model = "early",
  family = "normal",
  K = 2,
  seed = 1008
)

summary(fit_early)
plot(fit_early)

Parallel integration

library(LUCIDus)

data(simulated_HELIX_data)

G <- simulated_HELIX_data$exposure
Y <- simulated_HELIX_data$outcome
CoY <- simulated_HELIX_data$covariate
omics <- simulated_HELIX_data$omics

Z_parallel <- list(
  methylomics = omics[, 1:10, drop = FALSE],
  transcriptomics = omics[, 11:20, drop = FALSE],
  miRNA = omics[, 21:30, drop = FALSE]
)

fit_parallel <- estimate_lucid(
  G = G,
  Z = Z_parallel,
  Y = Y,
  CoY = CoY,
  lucid_model = "parallel",
  family = "normal",
  K = c(2, 2, 2),
  seed = 1008
)

summary(fit_parallel)

Serial integration with mixed topology

library(LUCIDus)

data(sim_data)

G <- sim_data$G[1:200, , drop = FALSE]
Y <- sim_data$Y_normal[1:200]

Z_stage1 <- list(
  layer1 = sim_data$Z[1:200, 1:5, drop = FALSE],
  layer2 = sim_data$Z[1:200, 6:10, drop = FALSE]
)
Z_stage2 <- sim_data$Z[1:200, 1:4, drop = FALSE]

fit_serial <- estimate_lucid(
  G = G,
  Z = list(Z_stage1, Z_stage2),
  Y = Y,
  lucid_model = "serial",
  family = "normal",
  K = list(list(2, 2), 2),
  seed = 1008
)

summary(fit_serial)

Main functionality

Task	Main API
Fit a model directly	`estimate_lucid()`
Tune `K` and penalties	`tune_lucid()`
Fit or auto-tune in one step	`lucid()`
Summarize fitted objects	`summary()`, `summary_lucid()`
Predict clusters and outcomes	`predict_lucid()`
Bootstrap confidence intervals	`boot_lucid()`
Diagnose missingness	`analyze_missing_pattern()`, `check_na()`
Impute missing omics values	`safe_impute()`, `fill_data()`
Check imputation quality	`check_imputation_quality()`
Numerical stability helpers	`safe_log_sum_exp()`, `safe_normalize()`, `safe_solve()`, `check_and_stabilize_sigma()`, `check_convergence()`
Visualize fitted early models	`plot()`

The guide also demonstrates these workflows across continuous and binary outcomes, missing-data settings, and multi-stage serial topologies.

Full workflow patterns

Direct fitting with `estimate_lucid()`

Use estimate_lucid() when you already know the model topology and cluster counts. This is the lowest-level exported fitting API and supports:

supervised and unsupervised fits
optional CoG and CoY covariates
normal and binary outcomes
early, parallel, and serial model structures
missing-data handling during fitting

For development and diagnostics, the guide also uses verbose = TRUE to print iteration-level fitting traces:

fit_verbose <- estimate_lucid(
  G = G,
  Z = Z,
  Y = Y,
  lucid_model = "early",
  family = "normal",
  K = 2,
  max_itr = 2,
  max_tot.itr = 8,
  verbose = TRUE
)

Tuning with `tune_lucid()` and `lucid()`

The guide treats tuning as a first-class workflow rather than an optional extra:

tune_lucid() explicitly evaluates a grid of K and penalty values
lucid() is the higher-level wrapper that fits directly or auto-tunes depending on whether K and penalty inputs are scalars or vectors
penalty tuning is available for "early" and "parallel"
serial models currently accept scalar penalties, while still supporting topology selection through K

Tuning, prediction, and inference

Tune over `K` and penalties

# Using G, Z, and Y from the early integration example above
tuned <- tune_lucid(
  G = G,
  Z = Z,
  Y = Y,
  lucid_model = "early",
  family = "normal",
  K = 2:3,
  Rho_G = c(0, 0.1),
  Rho_Z_Mu = c(0, 5),
  Rho_Z_Cov = c(0, 0.1),
  seed = 1008
)

The lucid() wrapper uses the same inputs and will automatically dispatch to tuning when K or penalty arguments are vectors. Penalty tuning is currently supported for "early" and "parallel"; "serial" accepts scalar penalty inputs.

Structured summaries

The summary methods are richer than a simple coefficient printout. As shown in the guide, summary() returns model information, fit statistics, feature-selection summaries, parameter tables, missing-data summaries, and optionally bootstrap confidence intervals.

s <- summary(fit_early)

After running boot_lucid(), you can also attach bootstrap confidence intervals with summary(fit_early, boot.se = boot_fit).

Prediction and g-computation

# Using fit_early, G, Z, and Y from the early integration example above
pred <- predict_lucid(
  model = fit_early,
  G = G,
  Z = Z,
  Y = Y,
  lucid_model = "early"
)

pred_g <- predict_lucid(
  model = fit_early,
  G = G,
  Z = NULL,
  Y = NULL,
  lucid_model = "early",
  g_computation = TRUE
)

predict_lucid() supports early, parallel, and serial models. In g_computation = TRUE mode, it uses the fitted G -> X path to generate cluster and outcome predictions under modified exposures.

Bootstrap inference

# Using fit_early, G, Z, and Y from the early integration example above
boot_fit <- boot_lucid(
  G = G,
  Z = Z,
  Y = Y,
  lucid_model = "early",
  model = fit_early,
  R = 20,
  conf = 0.95
)

summary(fit_early, boot.se = boot_fit)

Bootstrap inference is available for early, parallel, and serial models. If a fitted model uses nonzero penalties, boot_lucid() will refit a zero-penalty version internally because bootstrap CIs are derived from unpenalized fits. If feature selection has removed variables, refit the reduced model before bootstrapping.

Missing-data utilities

The package now includes dedicated helpers for incomplete omics data:

# Using Z from the early integration example above
Z_miss <- Z
Z_miss[1, ] <- NA
Z_miss[2:4, 1] <- NA

miss_pattern <- analyze_missing_pattern(Z_miss)
na_summary <- check_na(Z_miss, lucid_model = "early")
Z_imp <- safe_impute(Z_miss, method = "mean")
imp_quality <- check_imputation_quality(Z_miss, Z_imp)

These helpers are useful both before fitting and when validating simple imputations outside the model-based missing-data workflow.

The guide uses them in three roles:

pre-fit diagnostics with analyze_missing_pattern() and check_na()
simple standalone imputation with safe_impute()
low-level likelihood-based filling via fill_data()

Binary outcomes

The guide includes a dedicated binary-outcome example. LUCIDus supports:

family = "normal" for continuous outcomes
family = "binary" for 0/1 outcomes
predict_lucid(..., response = TRUE) for class labels
predict_lucid(..., response = FALSE) for predicted probabilities

fit_early_binary <- estimate_lucid(
  G = G,
  Z = Z,
  Y = sim_data$Y_binary[1:300],
  lucid_model = "early",
  family = "binary",
  K = 2,
  seed = 1008
)

pred_binary_prob <- predict_lucid(
  model = fit_early_binary,
  G = G,
  Z = Z,
  Y = sim_data$Y_binary[1:300],
  lucid_model = "early",
  response = FALSE
)

Visualization and stability helpers

The guide covers both plotting and the lower-level numerical helpers that support more stable workflows.

plot() currently provides Sankey-style visualization for early models
parallel and serial plotting methods are present but still under development
safe_log_sum_exp(), safe_normalize(), safe_solve(), check_and_stabilize_sigma(), and check_convergence() help with numerically fragile workflows

plot(fit_early)

safe_log_sum_exp(c(-1000, -1001, -999))
safe_normalize(c(1e-300, 2, 3))
sigma_stable <- check_and_stabilize_sigma(matrix(c(1, 0.999999, 0.999999, 0.999998), 2))
sigma_inv <- safe_solve(sigma_stable)

Practical notes from the full guide

The full functionality guide is intentionally lightweight so examples run quickly. For real analyses, it recommends increasing:

max_itr and max_tot.itr
bootstrap replicates R
the size of the tuning grid

For serial pipelines in particular, the guide recommends closely reviewing stage-wise missingness summaries and fitted submodels.

Documentation and tutorials

For fuller, end-to-end examples, see:

The comprehensive guide is the best reference for the current package surface. It covers:

missing-data diagnostics and imputation helpers
direct fitting with estimate_lucid()
wrapper workflows with tune_lucid() and lucid()
verbose fitting demos
summaries, prediction, and g_computation
bootstrap inference
plotting and numerical stability helpers
continuous and binary outcome examples

Citation

If you use LUCIDus, please cite:

Peng C., Wang J., Asante I., Louie S., Jin R., Chatzi L., Casey G., Thomas D. C., Conti D. V. (2020). A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits. Bioinformatics.
Zhao Y., Jia Q., Goodrich J. A., Conti D. V. (2024). LUCIDus: An R Package for Implementing Latent Unknown Clustering by Integrating Multi-omics Data (LUCID) With Phenotypic Traits. The R Journal.
Zhao Y., Jia Q., Goodrich J., Darst B., Conti D. V. (2024). An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data. Bioinformatics Advances.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCIDus: LUCID with Multiple Omics Data

Integration strategies

Installation

Workflow overview

Quick start

Early integration

Parallel integration

Serial integration with mixed topology

Main functionality

Full workflow patterns

Direct fitting with `estimate_lucid()`

Tuning with `tune_lucid()` and `lucid()`

Tuning, prediction, and inference

Tune over `K` and penalties

Structured summaries

Prediction and g-computation

Bootstrap inference

Missing-data utilities

Binary outcomes

Visualization and stability helpers

Practical notes from the full guide

Documentation and tutorials

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LUCIDus: LUCID with Multiple Omics Data

Integration strategies

Installation

Workflow overview

Quick start

Early integration

Parallel integration

Serial integration with mixed topology

Main functionality

Full workflow patterns

Direct fitting with estimate_lucid()

Tuning with tune_lucid() and lucid()

Tuning, prediction, and inference

Tune over K and penalties

Structured summaries

Prediction and g-computation

Bootstrap inference

Missing-data utilities

Binary outcomes

Visualization and stability helpers

Practical notes from the full guide

Documentation and tutorials

Citation

Direct fitting with `estimate_lucid()`

Tuning with `tune_lucid()` and `lucid()`

Tune over `K` and penalties