Context
Before users can provision files with checksum validation, they need to generate the appropriate checksums for their dataset files. The documentation should provide clear guidance and tools for users to compute the checksums, especially for large datasets, taking into account complex uploads with recursive directory structures.
Current Behaviour
- Users have no guidance on how to generate and compute checksums for their files
- Users may generate checksums using incompatible algorithms or methods
Expected Behaviour
- Users have clear, platform-specific instructions for generating CRC32 checksums
- Users can efficiently generate checksums for entire directory trees
- Users understand the expected output format and how to integrate checksums into their metadata
- The generated checksums are compatible with the expected checksums of our system.
Implementation Details
Ideally the documentation provides natively supported platform-specific solutions to guide users into generating the expected checksums in a manner expected by our systems.
This includes
- Linux
- macOS
- Windows powershell
Output Format
- Algorithm: CRC32 (OpenBIS compatible)
- Output format: 8-character hexadecimal string (e.g.,
a1b2c3d4)
- All outputs should be lowercase for consistency
- Associated with the relative path of the filenames in a tab seperated structured format
Integration with Metadata
Users should integrate generated checksums into their metadata.txt or checksums.txt file.
(Dependent on the solution chosen for the parent issue)
Option A: Extended metadata.txt
<MeasurementId1> File_Example.raw 1b2c3d4
<MeasurementId2> File_Example2.raw 5f6g7h8
Option B: Separate checksums.txt
File_Example.raw 1b2c3d4
File_Example2.raw 5f6g7h8
Acceptance Criteria
Documentation Updates
- Add documentation to "Data Upload" Section
- Include troubleshooting section for common issues:
- Checksum mismatches (e.g., due to different line ending handling)
- File encoding issues (UTF-8 vs. ASCII)
- Path handling (relative vs. absolute paths)
Code of Conduct
- I agree to follow this project's Code of Conduct
Context
Before users can provision files with checksum validation, they need to generate the appropriate checksums for their dataset files. The documentation should provide clear guidance and tools for users to compute the checksums, especially for large datasets, taking into account complex uploads with recursive directory structures.
Current Behaviour
Expected Behaviour
Implementation Details
Ideally the documentation provides natively supported platform-specific solutions to guide users into generating the expected checksums in a manner expected by our systems.
This includes
Output Format
a1b2c3d4)Integration with Metadata
Users should integrate generated checksums into their
metadata.txtorchecksums.txtfile.(Dependent on the solution chosen for the parent issue)
Option A: Extended metadata.txt
Option B: Separate checksums.txt
Acceptance Criteria
Documentation Updates
Code of Conduct