Skip to content

feat: storage template file move hardening#5917

Merged
zackpollard merged 7 commits intomainfrom
feat/storage-template-hardening
Dec 29, 2023
Merged

feat: storage template file move hardening#5917
zackpollard merged 7 commits intomainfrom
feat/storage-template-hardening

Conversation

@zackpollard
Copy link
Copy Markdown
Member

@zackpollard zackpollard commented Dec 21, 2023

Storage template moving can sometimes fail and cause data corruption when the upload folder and library folder are not the same filesystem. This PR aims to address the existing problem and add validation of the file operations to ensure the moved file data matches the original asset data by the way of file size checks and optional hash checks.

TODO:

  • When file is copied rather than renamed, validate filesize in database matches filesize on disk
  • When file is copied rather than renamed, validate hash in database matches hash of copied asset
  • Change job concurrency to one and remove configuration options for this
  • Add setting to enable and disable storage template system entirely, default storage template system to off
  • Change structure of default upload folder to split GUIDs into seperate folders to improve performance Will be done in seperate PR

Optional TODO:

  • Have the filename itself indicate if it is partial or not (.tmp suffix?) This actually substantially increases complexity with the move table, not going ahead with this for now
  • Keep an audit log of moves/file operations in the database Could be done in seperate PR
  • Use database locks to force template migration concurrency to one, avoiding race conditions where two processes move a file with the same name to the same location, not appending a +n Done in feat: storage template locking + fix for database locks #6054

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Dec 22, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: b5d1617
Status: ✅  Deploy successful!
Preview URL: https://24067ebe.immich.pages.dev
Branch Preview URL: https://feat-storage-template-harden.immich.pages.dev

View logs

@zackpollard zackpollard force-pushed the feat/storage-template-hardening branch from 0c89f0f to 023f08f Compare December 22, 2023 01:35
Copy link
Copy Markdown
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! :)

@zackpollard zackpollard force-pushed the feat/storage-template-hardening branch 4 times, most recently from 527720b to f99db77 Compare December 23, 2023 02:25
@zackpollard zackpollard marked this pull request as ready for review December 24, 2023 00:38
@zackpollard
Copy link
Copy Markdown
Member Author

Ready for review and merge 😄

Copy link
Copy Markdown
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about nesting the files by default in a ab/cd/abcdxx-xxxx-xxxxx structure?

Copy link
Copy Markdown
Member

@bo0tzz bo0tzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Did you not end up adding the locking to this?

@alextran1502
Copy link
Copy Markdown
Member

I just tested the PR, it seems like the motion part of LivePhotos aren't moved to the library

@zackpollard
Copy link
Copy Markdown
Member Author

I just tested the PR, it seems like the motion part of LivePhotos aren't moved to the library

We have explicit tests that check this behaviour so they should be. I haven't actually changed any of that logic, it's just two seperate calls to move each seperate part.

@zackpollard
Copy link
Copy Markdown
Member Author

Good stuff. Did you not end up adding the locking to this?

No, I will only add locking if the PR refactoring the locking mechanism gets merged before this PR does.

@zackpollard
Copy link
Copy Markdown
Member Author

Have you thought about nesting the files by default in a ab/cd/abcdxx-xxxx-xxxxx structure?

Forgot about that, can change it to that structure, yea

@danieldietzler
Copy link
Copy Markdown
Member

Have you thought about nesting the files by default in a ab/cd/abcdxx-xxxx-xxxxx structure?

Forgot about that, can change it to that structure, yea

Ty :)

@zackpollard zackpollard force-pushed the feat/storage-template-hardening branch 4 times, most recently from 459008c to 49389eb Compare December 24, 2023 23:19
Copy link
Copy Markdown
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff 🚀

@alextran1502
Copy link
Copy Markdown
Member

alextran1502 commented Dec 28, 2023

@zackpollard This condition prevents moving the video part of the LivePhotos upload

image

image

I tested with main on a small set of LivePhotos, and they are moved correctly.

If I remember correctly, we are not extracting EXIF and add a record in EXIF table for the motion part of LivePhotos

@zackpollard zackpollard force-pushed the feat/storage-template-hardening branch from 49389eb to ab750e1 Compare December 28, 2023 12:40
@zackpollard
Copy link
Copy Markdown
Member Author

Changed exif extraction to run for all assets, including assets marked isVisible = false. This should resolve the issues with live photos not moving correctly.

Copy link
Copy Markdown
Member

@alextran1502 alextran1502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zackpollard zackpollard merged commit 2e38fa7 into main Dec 29, 2023
@zackpollard zackpollard deleted the feat/storage-template-hardening branch December 29, 2023 18:41
@alextran1502 alextran1502 mentioned this pull request Jan 1, 2024
3 tasks
martabal pushed a commit that referenced this pull request Jan 9, 2024
* fix: pgvecto.rs extension breaks typeorm schema:drop command

* fix: parse postgres bigints to javascript number types when selecting data

* feat: verify file size is the same as original asset after copying file for storage template job

* feat: allow disabling of storage template job, defaults to disabled for new instances

* fix: don't allow setting concurrency for storage template migration, can cause race conditions above 1

* feat: add checksum verification when file is copied for storage template job

* fix: extract metadata for assets that aren't visible on timeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants