Skip to content

Remove loose source-file fallback from language project detection#92

Open
brandonSc wants to merge 4 commits intomainfrom
brandon/tighten-project-detection
Open

Remove loose source-file fallback from language project detection#92
brandonSc wants to merge 4 commits intomainfrom
brandon/tighten-project-detection

Conversation

@brandonSc
Copy link
Copy Markdown
Contributor

@brandonSc brandonSc commented Mar 24, 2026

Summary

All language collectors (Go, Rust, Python, Java, PHP) had a find-based fallback in their is_*_project() helper that detected stray source files (e.g. .go, .rs, .java within depth 3). This caused false positives: repos with a few source files but no project manifest would trigger collectors that then wrote empty/error data to Component JSON — e.g. a Go block with go_mod_exists: false, go_sum_exists: false, and golangci-lint failing with "directory prefix . does not contain main module."

Now each detector requires its language's manifest file(s), matching the Node.js collector which already only checked for package.json.

Changes

Collector Before After
golang go.mod OR .go files go.mod only
rust Cargo.toml OR .rs files Cargo.toml only
python manifest files OR .py files Manifest files only
java pom.xml/build.gradle OR .java files Build files only
php composer.json OR .php files composer.json only

Also removes a now-redundant Cargo.toml guard in rust/clippy.sh since is_rust_project() already guarantees it.

🤖 This PR was implemented by an AI agent.

Summary by CodeRabbit

  • Bug Fixes
    • More accurate project detection for Go, Java, PHP, Python, Rust, and Node.js by requiring proper manifest/config files instead of loose source-file heuristics.
    • Rust license-detection improved with clearer error handling and more robust path resolution.
    • Rust lint preflight behavior adjusted (may affect when the linter runs if no manifest is present).

All language collectors (Go, Rust, Python, Java, PHP) had a fallback in
their is_*_project() check that detected stray source files (e.g. .go,
.rs, .java) via find. This caused false positives: repos with a few
source files but no project manifest (go.mod, Cargo.toml, etc.) would
trigger collectors that then wrote empty/error data to Component JSON.

Now each detector requires its language's manifest file(s), matching
the Node.js collector which already only checked for package.json.

Also removes a now-redundant Cargo.toml guard in rust/clippy.sh since
is_rust_project() already guarantees it.
@brandonSc
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6fe8d85-e42e-4285-9c34-0a9d16b95242

📥 Commits

Reviewing files that changed from the base of the PR and between d4a5ba5 and 9f1dedf.

📒 Files selected for processing (6)
  • collectors/golang/helpers.sh
  • collectors/java/helpers.sh
  • collectors/nodejs/helpers.sh
  • collectors/php/helpers.sh
  • collectors/python/helpers.sh
  • collectors/rust/helpers.sh
🚧 Files skipped from review as they are similar to previous changes (4)
  • collectors/php/helpers.sh
  • collectors/golang/helpers.sh
  • collectors/rust/helpers.sh
  • collectors/java/helpers.sh

📝 Walkthrough

Walkthrough

This PR tightens language project detection to require manifest files (no source-file fallbacks), adjusts Rust clippy preflight behavior, and makes Syft Rust license-detection path and existence checks more explicit and robust.

Changes

Cohort / File(s) Summary
Language project detectors
collectors/golang/helpers.sh, collectors/java/helpers.sh, collectors/php/helpers.sh, collectors/python/helpers.sh, collectors/nodejs/helpers.sh
Removed source-file heuristics (e.g., *.go, *.java, *.php, *.py) and now detect projects only via manifests (go.mod, build manifests, composer.json, Python manifests, package.json), using -f at CWD or git ls-files for tracked manifests.
Rust detection & clippy
collectors/rust/helpers.sh, collectors/rust/clippy.sh
is_rust_project() now requires Cargo.toml (CWD or tracked via git); removed .rs-file fallback. clippy.sh no longer performs an explicit CWD Cargo.toml preflight check and relies on existing guards.
Syft Rust license detection
collectors/syft/generate.sh
Use ${BASH_SOURCE[0]:-$0} fallback for script dir. Split post-cargo fetch checks into explicit branches: verify REGISTRY_SRC existence, verify presence of rust-license-map.py, produce specific error messages and conditionally run the Python license-map script (otherwise skip with logged stderr).

Sequence Diagram(s)

sequenceDiagram
    participant Generate as collectors/syft/generate.sh
    participant Cargo as cargo
    participant FS as FileSystem
    participant Py as rust-license-map.py
    participant Logger as stderr

    Generate->>Cargo: cargo fetch (populate registry)
    Cargo-->>Generate: exit (success/failure)
    Generate->>FS: test -d $REGISTRY_SRC
    alt REGISTRY_SRC exists
        Generate->>FS: test -f $PLUGIN_DIR/rust-license-map.py
        alt python script exists
            Generate->>Py: invoke rust-license-map.py with registry path
            Py-->>Generate: output (license map)
        else python script missing
            Generate->>Logger: write "rust-license-map.py missing" error
        end
    else REGISTRY_SRC missing
        Generate->>Logger: write "cargo registry source not found" error
    end
Loading

Possibly related PRs

Suggested reviewers

  • vladaionescu
  • mikejholly
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: removing the fallback detection based on source files in favor of requiring manifest files for language project detection.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch brandon/tighten-project-detection

Comment @coderabbitai help to get the list of available commands and usage tips.

@brandonSc brandonSc requested a review from vladaionescu March 31, 2026 14:58
@brandonSc brandonSc marked this pull request as ready for review March 31, 2026 14:58
@github-actions
Copy link
Copy Markdown

Claude Auto-Approve Review

Summary: Tightens language project detection across Go, Java, PHP, Python, and Rust collectors to require a build manifest file (go.mod, pom.xml, Cargo.toml, etc.) rather than falling back to scanning for loose source files. This is a sound change: the downstream tools in each collector (golangci-lint, cargo clippy, composer, etc.) all require these manifests to function, so detecting a "project" without one would just lead to failures later. The redundant Cargo.toml check in clippy.sh is correctly removed since is_rust_project now guarantees it. The syft/generate.sh change adds useful diagnostic logging and fixes a BASH_SOURCE portability edge case. All changes are consistent, well-commented, and low-risk.

Scope: ISOLATED

The change is narrowly focused: it removes the loose source-file fallback (find *.ext) from 5 language detection helpers, making them require a proper project manifest. A small related cleanup in clippy.sh removes a now-redundant guard, and syft/generate.sh gets improved diagnostics. All changes are in collector shell scripts with a clear, consistent pattern.


Conclusion: ✅ Auto-approved (isolated changes, no issues)

@brandonSc
Copy link
Copy Markdown
Contributor Author

Related to: https://github.com/earthly/lunar-lib/pull/85/changes

One PR makes the language detection more strict, the other makes policies leverage it so they are also not overly generous with applying to things they shouldn't

Instead of only checking the repo root for manifest files, also search
subdirectories via git ls-files (fast, reads the git index). This
handles monorepos and projects with nested language roots. Falls back
gracefully to root-only detection outside a git repo.

Applies the same pattern to all six language helpers: golang, java,
nodejs, php, python, rust.
@brandonSc
Copy link
Copy Markdown
Contributor Author

Pushed 9f1dedf — all six language helpers (Go, Java, Node.js, PHP, Python, Rust) now search subdirs for manifest files using git ls-files per your suggestion. Fast path still checks the root first, then falls back to the git index scan. Gracefully degrades to root-only if not in a git repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants