Skip to content

Distinguish cache hits from real downloads in p2 transport log#5971

Open
vogella wants to merge 2 commits intoeclipse-tycho:mainfrom
vogella:fix/distinguish-cache-hits-in-log
Open

Distinguish cache hits from real downloads in p2 transport log#5971
vogella wants to merge 2 commits intoeclipse-tycho:mainfrom
vogella:fix/distinguish-cache-hits-in-log

Conversation

@vogella
Copy link
Copy Markdown
Contributor

@vogella vogella commented Apr 20, 2026

Summary

Tycho's p2 transport currently logs Downloading from <url> whenever a resource is requested, even when the response is served entirely from the local p2 bundle cache (~/.m2/repository/.cache/tycho/) with no network transfer. This makes incremental / warm-cache builds look like they are downloading gigabytes, and makes it hard to spot real download activity in a CI log.

This PR teaches SharedHttpCacheStorage to report the outcome of every cache access, and adjusts TychoRepositoryTransport so the log matches what actually happened.

Before

Every access prints the same two lines even when nothing was fetched from the network:

[INFO] Downloading from p2: https://download.eclipse.org/releases/2025-03/content.xml
[INFO] Downloaded from p2: https://download.eclipse.org/releases/2025-03/content.xml (12 MB at 1.5 GB/s)

After

Outcome Log line Level
Cache hit (no network) Fetched from cache: <url> DEBUG
Revalidation (304 Not Modified) Up-to-date: <url> DEBUG
Real download (2xx with body) Downloading from <url> (+ Downloaded from ... at X/s as before) INFO

For a warm-cache build the noisy Downloading from / Downloaded from pair is now silent at INFO, while real network downloads keep the existing INFO output so CI output is unchanged for that case.

Implementation notes

  • New CacheState { FROM_CACHE, NOT_MODIFIED, DOWNLOADED, UNKNOWN } enum.
  • HttpCache gains getLastCacheState(URI).
  • SharedHttpCacheStorage.CacheLine.fetchFile now emits the three-way log at the point where it actually knows the outcome, and signals the state back to the storage via a callback.
  • TychoRepositoryTransport.downloadArtifact no longer emits the misleading pre-download Downloading from log, and suppresses the post-download Downloaded from ... at X/s summary when the state is FROM_CACHE or NOT_MODIFIED.
  • Transport behavior is unchanged; this is a logging-only change.

Test plan

  • New unit test SharedHttpCacheStorageLoggingTest covers all three cases (cache hit, 304, real 200) plus the UNKNOWN default. Uses a Mockito-mocked HttpTransport so it never touches the network.
  • mvn -pl p2-maven-plugin -am test — 4/4 green.
  • mvn -pl tycho-core -am verify — 581/581 green.

🤖 Generated with Claude Code

Tycho's p2 transport previously logged "Downloading from <url>" for
every resource access, even when the response was served entirely from
the local p2 bundle cache. This made builds look like they were
downloading gigabytes when they were effectively offline.

Logging now branches on the cache outcome:

 - "Fetched from cache: <url>"  (DEBUG) - no network
 - "Up-to-date: <url>"          (DEBUG) - 304 Not Modified
 - "Downloading from <url>"     (INFO)  - real 2xx download (unchanged)

The log is emitted from the cache layer after the outcome is known.
SharedHttpCacheStorage now records the per-URI CacheState so the
transport can suppress the misleading "Downloaded from ... at X/s"
summary for cache hits and revalidations while keeping it unchanged
for real downloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Test Results

1 047 files  1 047 suites   5h 9m 11s ⏱️
1 399 tests 1 378 ✅ 21 💤 0 ❌
4 197 runs  4 131 ✅ 66 💤 0 ❌

Results for commit 0f6b621.

♻️ This comment has been updated with latest results.

@laeubi
Copy link
Copy Markdown
Member

laeubi commented Apr 21, 2026

Tycho's p2 transport currently logs Downloading from whenever a resource is requested, even when the response is served entirely from the local p2 bundle cache

This is actually desired to make it mostly similar how Maven works. As you can see you can even suppress these with -B (non interactive mode) if you like.

and makes it hard to spot real download activity in a CI log

This logging is not meant as a debug facility of the caching framework (for what we have system properties available).

Copy link
Copy Markdown
Member

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

String id = "p2"; // TODO we might compute the id from the IRepositoryIdManager based on the URI?
if (cacheConfig.isInteractive()) {
logger.info("Downloading from " + id + ": " + source);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message should remain to stay consistent with how maven works and to know that the download has started.

Comment on lines +83 to +84
@Requirement
HttpCache httpCache;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not bind directly to a specific cache implementation there might be other caching techniques see comment below.

logger.info("Downloaded from " + id + ": " + source + " ("
+ FileUtils.byteCountToDisplaySize(downloadStatus.getFileSize()) + " at "
+ FileUtils.byteCountToDisplaySize(downloadStatus.getTransferRate()) + "/s)");
CacheState state = httpCache.getLastCacheState(source);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a side-car approach here, the information should be transported through the DownloadStatusOutputStream

+ FileUtils.byteCountToDisplaySize(downloadStatus.getTransferRate()) + "/s)");
CacheState state = httpCache.getLastCacheState(source);
if (state == CacheState.DOWNLOADED || state == CacheState.UNKNOWN) {
logger.info("Downloaded from " + id + ": " + source + " ("
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The info should always be logged and we should just replace the transfer rate by a status e.g.

Downloaded from " + id + ": " + source + " ("
+ FileUtils.byteCountToDisplaySize(downloadStatus.getFileSize()) + " (from cache)"

* for the three possible outcomes of a cache access: pure cache hit, 304
* revalidation, and real download.
*/
class SharedHttpCacheStorageLoggingTest {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has shown that mocking does not work well on the long run in Tycho as it always tests internal behavior what is hard to maintain.
Instead please add a real integration-test that uses e.g. a java 21 jdk server (we also have some examples using jetty already).
This proves it works in real world use-cases end-to-.end

* produce accurate log output (e.g. distinguish "Downloading" from "Fetched
* from cache").
*/
public enum CacheState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please think about better naming here that is independent from caching and agnostic from the transport technique. e.g. we also have ftp transports that have no meaning of (http) status codes.

@laeubi
Copy link
Copy Markdown
Member

laeubi commented Apr 24, 2026

@vogella do you plan to further work on this? I want to make a new Tycho release likely end of month and it could then be included if we finish before.

@vogella
Copy link
Copy Markdown
Contributor Author

vogella commented Apr 24, 2026

Your initial comment indicates that this does not fit to Maven.

So you agree with the change now? I can finish it in this case

- Restore "Downloading from" INFO log to match Maven behaviour.
- Drop direct HttpCache dependency from TychoRepositoryTransport.
- Carry the cache-hit flag through DownloadStatusOutputStream (via a
  thread-local registry) instead of querying the cache as a side-car.
- Always emit "Downloaded from" and only swap the suffix: "(from cache)"
  for cache hits / 304 revalidations, "at X/s" for real transfers.
- Remove the CacheState enum; a simple fromCache boolean on the stream
  carries the same information without transport-specific terminology.
- Replace the mock-based SharedHttpCacheStorageLoggingTest with a real
  integration test that exercises the cache against an actual
  com.sun.net.httpserver instance, and drop the mockito test dep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants