Skip to content

[Feature] Provide documentation on how users can compute CRC32 checksums for their SFTP based dataset uploads #49

@Steffengreiner

Description

@Steffengreiner

Context

Before users can provision files with checksum validation, they need to generate the appropriate checksums for their dataset files. The documentation should provide clear guidance and tools for users to compute the checksums, especially for large datasets, taking into account complex uploads with recursive directory structures.

Current Behaviour

  • Users have no guidance on how to generate and compute checksums for their files
  • Users may generate checksums using incompatible algorithms or methods

Expected Behaviour

  • Users have clear, platform-specific instructions for generating CRC32 checksums
  • Users can efficiently generate checksums for entire directory trees
  • Users understand the expected output format and how to integrate checksums into their metadata
  • The generated checksums are compatible with the expected checksums of our system.

Implementation Details

Ideally the documentation provides natively supported platform-specific solutions to guide users into generating the expected checksums in a manner expected by our systems.

This includes

  • Linux
  • macOS
  • Windows powershell

Output Format

  • Algorithm: CRC32 (OpenBIS compatible)
  • Output format: 8-character hexadecimal string (e.g., a1b2c3d4)
  • All outputs should be lowercase for consistency
  • Associated with the relative path of the filenames in a tab seperated structured format

Integration with Metadata

Users should integrate generated checksums into their metadata.txt or checksums.txt file.
(Dependent on the solution chosen for the parent issue)

Option A: Extended metadata.txt

<MeasurementId1>    File_Example.raw    1b2c3d4
<MeasurementId2>    File_Example2.raw    5f6g7h8

Option B: Separate checksums.txt

File_Example.raw    1b2c3d4
File_Example2.raw    5f6g7h8

Acceptance Criteria

  • Step-by-step instructions provided for Linux checksum generation
  • Step-by-step instructions provided for macOS checksum generation
  • Step-by-step instructions provided for Windows checksum generation
  • The instructions take into account the need for recursive directory checksum generation
  • Users understand how to format checksums for integration with metadata files
  • Documentation includes examples of common pitfalls (e.g., line endings, encoding)
  • All provided solutions generate checksums compatible with the expected checksum standard (CRC32)

Documentation Updates

  • Add documentation to "Data Upload" Section
  • Include troubleshooting section for common issues:
    • Checksum mismatches (e.g., due to different line ending handling)
    • File encoding issues (UTF-8 vs. ASCII)
    • Path handling (relative vs. absolute paths)

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions