[Bug] SingleFile uploads incorrectly trigger HTTP GET on original URL, causing missing titles and crawling failures

### Describe the Bug

When uploading a packaged HTML file via the /api/v1/bookmarks/singlefile endpoint (e.g., using the SingleFile extension), the backend still attempts to make an HTTP GET request to the original URL to determine its content type.

If the original website blocks crawlers, requires authentication, or is otherwise inaccessible, this pre-check fails. As a result, the crawler's parsing logic breaks, causing the bookmark to lose its title and other metadata, even though a perfectly valid HTML file was successfully uploaded.

### Steps to Reproduce

Steps to reproduce the behavior:

Configure the SingleFile extension to upload to the /api/v1/bookmarks/singlefile endpoint as per the official documentation.
Visit a webpage that actively blocks crawlers or requires being logged in.
Use the SingleFile extension to package and upload the page to Karakeep.
Check the newly created bookmark in Karakeep.

### Expected Behaviour

The system should recognize that a precrawledArchiveAssetId (the uploaded HTML file) was provided and entirely skip making any network requests to the original URL. The uploaded HTML file should be parsed directly to extract the correct title, description, and content.

### Screenshots or Additional Context

I looked into the source code and found the root cause. This issue was introduced in commit `2743d9e` (#1163), which added support for saving direct image/PDF bookmarks.

In `apps/workers/workers/crawlerWorker.ts`, inside the `runCrawler` function, the crawler unconditionally executes `getContentType(url)`:

```typescript
// crawlerWorker.ts - runCrawler function
const contentType = await getContentType(url, jobId, job.abortSignal);
```

This HTTP `GET` request fires *before* the system checks for `precrawledArchiveAssetId` inside `crawlAndParseUrl` .

When `getContentType` hits an anti-bot restriction, it affects the crawling pipeline, even though `crawlAndParseUrl`itself correctly tries to read the `precrawledArchiveAssetId`.


### Device Details

_No response_

### Exact Karakeep Version

v0.31.0

### Environment Details

Docker

### Debug Logs

_No response_

### Have you checked the troubleshooting guide?

- [x] I have checked the troubleshooting guide and I haven't found a solution to my problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] SingleFile uploads incorrectly trigger HTTP GET on original URL, causing missing titles and crawling failures #2579

Describe the Bug

Steps to Reproduce

Expected Behaviour

Screenshots or Additional Context

Device Details

Exact Karakeep Version

Environment Details

Debug Logs

Have you checked the troubleshooting guide?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] SingleFile uploads incorrectly trigger HTTP GET on original URL, causing missing titles and crawling failures #2579

Description

Describe the Bug

Steps to Reproduce

Expected Behaviour

Screenshots or Additional Context

Device Details

Exact Karakeep Version

Environment Details

Debug Logs

Have you checked the troubleshooting guide?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions