Skip to content

If the parser OOMs, orphaned assets / invalid data can be left in the database #2519

@BryanWall

Description

@BryanWall

Describe the Bug

I tried adding an asset through SinglePage (an article on X.com). It didn't work because the parser was OOM. I deleted the failed bookmark and increased CRAWLER_PARSER_MEM_LIMIT_MB to 768. That fixed the issue.

However, it left an orphaned banner image for the deleted asset in the database. I've attached screenshots. I think the image was deleted from storage, but this entry was left in the database with a null bookmark ID. I was able to fix the issue by manually deleting that entry from the database.

Is there, or should there be, a process to remove invalid data from the database, or maybe there needs to be some error handling on the crawl process so it fails gracefully if OOM?

Steps to Reproduce

Try to add an asset that causes parser to be OOM
Delete the failed asset

Expected Behaviour

Failed crawling doesn't leave invalid data in the db

Screenshots or Additional Context

Image Image

Device Details

No response

Exact Karakeep Version

v0.31.0

Environment Details

No response

Debug Logs

No response

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpri/mediumMed priority issuestatus/approvedThis issue is ready to be implemented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions