Skip to content

Metadata for SeqDev Workspace files - Protocol and API #1802

@dandelany

Description

@dandelany

Summary

Introduce a metadata protocol for workspace files in SeqDev. Metadata is stored in hidden sibling JSON files and accessed exclusively through a Metadata API in the Workspace Service.

Metadata files are created lazily by the workspace service and follow the lifecycle of their associated workspace file - eg. when the underlying file is deleted or copied, their metadata file should also be deleted or copied.

This ticket implements the metadata storage protocol and API. Enforcement of metadata-controlled behavior (e.g., read-only files) is covered in a separate ticket.


Metadata Storage Protocol

Each workspace file may have a metadata file stored as a hidden sibling, named as .<filename>.meta.seqdev. (Note: this will require a DB migration to add this as an allowed metadata file extension)

For example:

foo.seqn             // existing file
.foo.seqn.meta.seqdev  // metadata file

Rules

  • Metadata files are not guaranteed to exist for every file.
  • Metadata files are created when metadata is first written.
  • Any non-metadata workspace files (not directories) may have a metadata file.
  • If a file with metadata is moved, renamed, copied, or deleted, its associated metadata file should be too.
  • Combination file/metadata ops should be done as atomically as is feasible, ie. both succeed or both fail. Where atomicity is not possible, prefer solutions which allow the main file operation to succeed even if metadata operations fail.
  • We should take any steps possible to prevent "orphaned" metadata files (without an associated regular file) for any operations done through our API, though orphans will be possible if user changes files directly.

Metadata File JSON Structure

Metadata files will be JSON files with the following structure:

{
  "version": "1",
  "readOnly": false,
  "createdBy": "username",
  "createdAt": "timestamp",
  "lastEditedBy": "username",
  "lastEditedAt": "timestamp",
  "user": {
    "arbitrary": "values"
  }
}

Top-level fields

Field Type Description
version string Metadata format version (service-controlled)
readOnly boolean Indicates file should be treated as read-only
createdBy string User who created metadata
lastEditedBy string Last user who modified file or metadata
lastEditedAt string Timestamp of last modification
user object Arbitrary user metadata

Rules

  • Top-level fields in the metadata (besides user) are read/written to directly by SeqDev and have a controlled schema.
    • readOnly can be updated by the user, & is used internally by workspace service
    • version, createdBy, createdAt, lastEditedBy, & lastEditedAt are not directly editable by the user via the API, and are managed by SeqDev internally.
  • The user field is an arbitrary nested JSON object controlled by the user. The keys can be anything, but they cannot contain dots (ie. "user.name" is an invalid key).

Metadata API

All endpoints with a path reference the main file path, not the metadata file path.


Get Metadata

Returns the metadata file for the file at <filePath>

GET /metadata/<workspaceId>/<filePath>

Behavior

Condition Response
File missing 404 error
File path is a directory 400 error (only files have metadata)
Metadata missing { "version": "1" }
Metadata valid Metadata JSON file contents

NOTE: that this will return the contents of the metadata file AS-IS - if the JSON is malformed, the client will need to detect/handle it.


Set Metadata (Upsert)

Set values in the metadata file given an update object

POST /metadata/<workspaceId>/<filePath>

Example request

{
  "readOnly": false,
  "user": {
    "status": "draft"
  }
}

Rules

  • Creates metadata if missing.
  • Performs a shallow merge of top-level fields that the user is allowed to edit (namely readOnly).
  • Also performs a shallow merge of any provided user object (so that the entire user object isn't overwritten).
  • Unknown top-level keys rejected.
  • If metadata is already malformed, this endpoint throws a 500 with an appropriate error message
  • If user passes a malformed/non-object for the user object (including one with dots in the keys), respond with a 400 error.

Unset Metadata Keys

Unsets values in the metadata file for the given list of keys

POST /metadata/unset/<workspaceId>/<filePath>

Request body

["readOnly", "user.status", "user.info.name"]

Rules

  • Removes keys from metadata.
  • Supports dot-path syntax inside user with any level of depth.
  • Removing missing keys is a no-op.

Delete Metadata

DELETE /metadata/<workspaceId>/<filePath>

Behavior

Condition Result
File missing 404
Metadata missing 200

Changes to Existing Workspace API

We want to modify the existing workspace directory listing API to optionally return metadata associated with the files. Specifically, this endpoint:

GET /ws/<workspaceId>/<folderPath>

Should accept a new query parameter withMetadata=true. When enabled, each file entry in the response should include two additional keys, metadata (the contents of their metadata JSON file) and metadataStatus to indicate any issue with a malformed metadata file.


Example

```json
{
  "name": "plan.seqn",
  "metadata": {...},
  "metadataStatus": "ok"
}

Status values

Status Meaning
ok Metadata exists and parsed
missing Metadata file absent
malformed Metadata JSON invalid
  • If metadata is malformed, the metadata value should be null
  • Metadata files themselves should not listed by default by this endpoint.

Restrictions on Metadata Files

Metadata files cannot be accessed through generic file APIs.

Workspace Service must reject

  • Reading metadata files directly
  • Writing metadata files directly
  • Moving/renaming metadata files
  • Deleting metadata files
  • Uploading files matching metadata naming pattern

All metadata operations must go through the Metadata API.


Out of Scope

  • Metadata enforcement (read-only behavior)
  • Automatic audit field updates
  • Actions API support
  • Metadata indexing/search
  • Bulk metadata operations
  • Schema migrations
  • Handling filesystem drift between file and metadata

Metadata

Metadata

Assignees

Labels

4.2.0featureA new feature or feature requestsequencingAnything related to the sequencing domain

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions