Skip to content

Latest commit

 

History

History
500 lines (345 loc) · 21.4 KB

File metadata and controls

500 lines (345 loc) · 21.4 KB

Lantern - Data Model

Overview

Entities Overview

Records

Base records

lantern.lib.metadata_library.models.record.Record.

Records are a partial representation of the ISO 19115 information model in Python. They generically describe resources (maps/products, datasets, collections, etc.).

Note

Unless stated otherwise, references to 'Records' elsewhere in this documentation refer Record Revisions and not this concept of a record.

Note

The base Records model is considered part of the BAS Metadata Library. See the Library docs for more information.

Catalogue records

lantern.models.record.Record

Catalogue Records represent Records within the Data Catalogue specifically.

Note

Unless stated otherwise, references to 'Records' elsewhere in this documentation refer Record Revisions and not this concept of a record.

Catalogue Records extend the Base Record class by implementing the Catalogue's Record Requirements.

Note

This subclass SHOULD be used for any additional subclasses within the Catalogue.

Record revisions

lantern.models.record.revision.RecordRevision

Record Revisions represent Records at a particular point in time indicated by a revision identifier.

Note

Unless stated otherwise, references to 'Records' elsewhere in this documentation refer to this concept of a record.

Revision identifiers are a local addition and not part of the ISO 19115 information model. Identifiers SHOULD come from a version Control system (VCS) such as Git. Identifiers MUST be unique within the history of each Record but MAY be shared across multiple Records, to represent a coordinated set of changes for example (i.e. a records changeset).

Implemented as a (catalogue) Record subclass with an additional top-level file_revision property.

Record requirements

In addition to Record Validation, the Data Catalogue requires all records:

  • MUST use a UUID value for the file_identifier:
    • to ensure resources can be distinguished without relying on a value such as title that may change or not be unique
    • to ensure resource identifiers and aliases are distinct and can't be ambiguous
  • MUST include an identification.identifier, as per [1]:
    • to determine if a record is part of the Catalogue
  • MUST NOT use the database.bas.ac.uk namespace in any identification.identifier elements:
    • as this value is ambiguous across systems, use a more specific value (e.g. foo.data.bas.ac.uk)
  • MUST include an identification.identifier.contacts.*.contact with at least the 'pointOfContact' role:
    • for use with the item contact tab
  • MUST use unique identifiers for extents
  • MUST structure any Aliases as below if included:
    • MUST use values in the form: {prefix}/{value}
    • MUST use an allowed prefix for each hierarchy level, as per [2]
    • MUST NOT use UUIDs in values (to avoid conflicts with file_identifier values)
    • MUST set the href property to https://lantern.data.bas.ac.uk/{alias} (e.g. https://lantern.data.bas.ac.uk/collections/foo)
    • MUST use the namespace: alias.lantern.data.bas.ac.uk

These requirements are enforced by the validate() method in the Catalogue Record class.

Note

Whilst not required, records without Administration Metadata will be interpreted as Restricted Items.

Caution

The catalogue does not enforce metadata access constraints if set.

[1]

  • identifier: {file_identifier}
  • href: https://lantern.data.bas.ac.uk/items/{file_identifier}
  • namespace: lantern.data.bas.ac.uk

[2]

Hierarchy level Allowed Prefixes
collection collections
dataset datasets
initiative projects
product products
mapProduct (local) maps
paperMapProduct (local) maps
webMapProduct (local) maps

Items

Items are wrappers around Records to provide convenience properties and methods to access an underlying Record's configuration for use in a specific context.

Note

Items do not follow a formal specification and are not interoperable outside of this project.

Item base

lantern.models.item.base.ItemBase

A base item contains common properties and methods across all item subclasses.

For example:

  • Item.citation_html returns an HTML formatted version of identification.other_citation_details, if set
  • Item.kv returns a dict of key-values if identification.supplemental_information is a suitable JSON encoded object
  • Item.resource_access returns a local access type enumeration value by parsing any resource permissions set in optional Administrative Metadata

Item super-types

lantern.models.item.catalogue.enums.ItemSuperType

Item types (set by their underlying Record's hierarchy_level property) can be sorted into two broad 'super-types':

  • containers: for types such as collections and projects that group and organise records
  • resources: for types such as datasets and products that represent actual data holdings

This higher level grouping is useful to simplify and generalise logic in catalogue item tabs and other elements. For example, whether to enable the licence tab.

Implemented as an enumeration available as the Item._super_type private property (for use in Item elements and tabs).

Item aliases

Items are canonically identified by their Record's UUIDv4 file_identifier property, including in URLs for item pages. These values are intentionally non-meaningful, and due to their length and randomness, not memorable. Whilst useful for ensuring uniqueness, they are not useful when referring to items, or providing self-describing URLs.

Item aliases provide a way to create additional URLs for an item with more useful values, such as a slugified title or existing codes or shorthand.

Aliases are defined as Record identifiers using the alias.lantern.data.bas.ac.uk namespace. Values are prefixed by a pluralised term related to the Record hierarchy level (e.g. collections/foo for a collection record). See the Record requirements section for allowed prefixes and other requirements.

Caution

The catalogue does not enforce aliases to be unique across records, and the behaviour of conflicting aliases is left undefined. Any implicit behaviour MUST NOT be relied upon.

Item key value data

The ItemBase class includes a kv property returning a dictionary of parsed Key Value, or an empty dict if no KV value is defined or is malformed.

Item administrative metadata

The ItemBase class includes an optional admin_metadata property returning:

Administration metadata properties are available as admin_ prefixed Item properties with optional return values.

JSON Web Keys (JWKs) for decrypting JWEs and verifying the signature of JWTs should be configured using the ADMIN_METADATA_ENCRYPTION_KEY_PRIVATE and ADMIN_METADATA_SIGNING_KEY_PUBLIC Config Options respectively.

Tip

These keys can be accessed from Export Metadata if created from a Config object.

Item access levels

lantern.models.item.base.enums.AccessLevel

Access levels for each item are available via:

  • Item.admin_metadata_access Item property (for who can view a description of the item)
  • Item.admin_resource_access Item property (for who can access the item itself, if applicable)

Both properties return an enumeration value, determined by permissions from Administrative Metadata.

Both properties default to AccessLevel.NONE. To allow open access, include permissions equivalent to the lantern.lib.metadata_library.models.record.presets.admin.OPEN_ACCESS permission.

Caution

The catalogue does not enforce metadata access permissions. They will always evaluate to open access (unrestricted).

Warning

External data access systems are responsible for enforcing any resource permissions that may apply. The catalogue only indicates whether restrictions may apply at an informative level.

Warning

The catalogue does not consider access constraints set in metadata.costraints or identification.constraints, as they can not be verified as trustworthy.

Note

Access constraints SHOULD still be set for visibility to end users and for interoperability with other systems.

Catalogue Items simplify the admin_resource_access access level to a binary restricted property, returning and defaulting to true unless Item.admin_access_level == AccessLevel.PUBLIC.

Where restricted, Item Templates display additional context in item summaries and the data tab (if applicable).

Catalogue items

lantern.models.item.catalogue.ItemCatalogue.

Catalogue Items are tightly coupled to the Data Catalogue and its user interface. Features include:

  • properties organised under classes for each UI tab (including logic to determine whether a tab should be shown)
  • local enums mapping Record properties to UI values for improved readability
  • a render() method to output an HTML page for each item
  • classes (lantern.models.item.catalogue.distributions) for processing distribution options for the catalogue UI
  • an item summary implementation (lantern.models.item.catalogue.elements.ItemSummaryCatalogue)

Catalogue item limitations

Supported properties (references not normative or exhaustive):

  • file_identifier
  • file_revision
  • hierarchy_level
  • metadata.date_stamp
  • metadata.metadata_standard.name
  • metadata.metadata_standard.version
  • metadata.constraints[type='usage', restriction_code='licence'] (only where a href is included)
  • reference_system_info
  • identification.citation.title
  • identification.citation.dates
  • identification.citation.edition
  • identification.citation.contacts ('author' and single 'point of contact' roles only, excludes contact.position)
  • identification.citation.series
  • identification.citation.identifiers[namespace='doi']
  • identification.citation.identifiers[namespace='isbn']
  • identification.citation.identifiers[namespace='alias.lantern.data.bas.ac.uk']
  • identification.abstract
  • identification.aggregations (only as below)
    • 'part of' (items in collections)
    • item and collection cross-references
    • supersedes (not 'superseded by')
    • 'one side of' (physical maps only)
    • 'opposite side of' (physical maps only)
  • identification.constraints[type='usage', restriction_code='licence']
  • identification.maintenance
  • identification.extent (single bounding temporal and geographic bounding box extent only)
  • identification.other_citation_details
  • identification.graphic_overviews ('overview' image only)
  • identification.spatial_resolution
  • identification.supplemental_information ('physical_size_*' and 'admin_meta' KV's only)
  • data_quality.lineage.statement
  • data_quality.domain_consistency
  • distribution.distributor.format (format and href only)
  • distribution.distributor.transfer_option (except online_resource.protocol)
  • Administrative Metadata (in trusted contexts only)

Unsupported properties (references not normative or exhaustive):

  • identification.purpose (except as used in ItemSummaries)
  • identification.citation.identifiers (except 'doi', 'isbn', 'lantern.alias.data.bas.ac.uk' namespaces)

Intentionally omitted properties (references not normative or exhaustive):

  • *.character_set (not useful to end-users, present in underlying record)
  • *.language (not useful to end-users, present in underlying record)
  • *.online_resource.protocol (not useful to end-users, present in underlying record)
  • *.constraints[type='access'] (not trustworthy, see Item Access)
  • distribution.distributor (not useful to end-users)

Catalogue items supported distribution options

Supported distribution options:

  • services:
    • ArcGIS Feature Layer/Service
    • ArcGIS OGC API Features Layer/Service
    • ArcGIS (Raster) Tile Layer/Service
    • ArcGIS Vector Tile Layer/Service
  • file types:
    • CSV
    • Garmin FPL (aviation GPS data)
    • OGC GeoPackage
    • GeoJson
    • GPX
    • JPEG
    • Mapbox Vector Tiles
    • PDF (with optional geo-referencing)
    • PNG
    • Esri Shapefile
  • other special cases:
    • BAS published maps purchasing options
    • BAS SAN references

Implemented via classes in the lantern.models.item.catalogue.distributions package.

Special catalogue items

To support more complex use-cases ItemCatalogue subclasses can be used to implement special handling for items.

Special catalogue items classes MUST implement a public matches class method returning a boolean indicating whether the special class applies to a given Record.

Suitable logic needs to be implemented where Records are processed into items to call these matches methods to determine which Catalogue Item class or subclass to use.

Physical map items

lantern.models.item.catalogue.special.physical_map.ItemCataloguePhysicalMap.

Physical maps are represented by a trio of Records, one per side plus a third Record for the overall map itself. Aggregations are used to associate the records together with the local 'physicalReverseOf' aggregation association [1] and local 'paperMap' aggregation initiative [2].

Records for each side are typical Catalogue Items. The overall Record is special Physical Map subclass, which is used automatically when a Record:

  • uses the local 'paperMapProduct' hierarchy level [3]
  • includes at least one aggregation for a map side ('isComposedOf' association, 'paperMap' initiative)

This subclass overloads some properties in Catalogue Item tabs to show a common, aggregated, value, or separate values per side:

  • if the values in each side are the same, they are ignored and the value from the overall Record is shown
  • if different, values for each side are shown - the value from the overall Record is ignored

[1] lantern.lib.metadata_library.models.record.enums.AggregationAssociationCode.PHYSICAL_REVERSE_OF

[2] lantern.lib.metadata_library.models.record.enums.AggregationInitiativeCode.PAPER_MAP

[3] lantern.lib.metadata_library.models.record.enums.HierarchyLevelCode.PAPER_MAP_PRODUCT

BAS public website search items

lantern.models.item.public_website.ItemWebsiteSearch

BAS public website search items represent Items within the search index of the BAS Public Website for use with the BAS Public Website Search Output.

Consists of limited properties needed to render a search result for an Item. Includes logic to:

  • select the most relevant date for the item (revision > publication > creation)
  • select the most suitable description for the item (purpose > abstract)
  • determining whether an item should be marked as removed/deleted (based on resource maintenance information)

ArcGIS items

lantern.models.item.arcgis.ItemArcGIS

ArcGIS items represent Items as ArcGIS content.

sFeatures include:

  • reflecting Item properties, such as summary, description, access permissions and licence constraints, etc. in ArcGIS content items consistently
  • establishing a one-to-many relationship between an Item and ArcGIS content items via ArcGIS item Metadata

Templates, stored in src/lantern/resources/templates/_arcgis, are used to:

  • combine the record abstract, lineage and link to the catalogue item as the ArcGIS item description
  • format supported licences to look consistent with Catalogue Items

ArcGIS items require a Record and an ArcGIS content item, represented by the lantern.lib.arcgis.gis.dataclasses.Item class, to set ArcGIS specific properties such the ArcGIS content type, needed to represent Items as valid ArcGIS content items (via an .item() property).

ArcGIS items sharing levels

The sharing level of an ArcGIS item is set based on the Item Access Level.

Warning

This logic does not take account of group based sharing options. Use with caution if this applies to an item.

ArcGIS items metadata

ArcGIS Item Metadata can store full metadata instances using a range of information models, including ISO 19115 (via the ISO 19139 encoding).

The ArcGIS Item class uses this feature to associate a catalogue resource with ArcGIS resources. This is implemented by storing the ISO file identifier, using the Esri vendor specific information model, as a one-way, one-to-many, child-parent relationship.

Note

Whilst uni-directional, the inverse of this relationship (from parent to children, between catalogue resources to ArcGIS resources) is effectively represented by ArcGIS layer distribution options within catalogue resources, which link to the respective ArcGIS content items.

ArcGIS items limitations

Warning

This section is Work in Progress (WIP) and may not be complete/accurate.

  • ArcGIS items are limited to the properties used in ArcGIS item pages, and that map to the ArcGIS content item information model (i.e. maintenance and progress information is not included as they cannot be represented)
  • only the OGL v3 licence is supported as a licence usage constraint, using others will raise an exception
  • group based sharing options are not supported
  • the Esri vendor specific Metadata used is too minimal to be considered valid by the ArcGIS Online/Portal metadata editor

Site checks

lantern.models.checks.Check

Represent checks run as part of Site Checks.

Consists of:

  • a check type for configuring how checks are run
  • properties to configure how to run a check (URL to check, expected HTTP status, etc.)
  • properties to track the execution of a check and record results (lifecycle state, actual HTTP status, duration, etc.)

Record checks

lantern.models.checks.RecordChecks

Represents a generator for additional checks to check selected properties in a given Record (e.g. distributions).

Tip

These checks are included via the Record ISO XML Output.

Static site metadata

lantern.models.site.SiteMeta

Site metadata represents site-wide information and page specific context, including:

  • base URL, build time, etc.
  • Open Graph metadata
    • which can be constructed via the lantern.models.site.OpenGraphMeta class
  • Schema.org metadata string
    • which can be constructed via the lantern.models.site.SchemaOrgMeta class

Export metadata

lantern.models.site.ExportMeta

Exporter metadata inherits from and extends Site metadata with additional properties for use by Outputs and Exporters, including:

Static site content

lantern.models.site.SiteContent

Static site content represent pages or other files within a Site generated by Outputs for use by Exporters.

Content items wrap a text or binary value with additional metadata including:

  • the relative path for the file within the static site
  • its media type and any optional profiles
  • optionally, Content Metadata
  • optionally, a redirect target (the URL the item should redirect to)

Tip

Where using a redirect, consider using a Site Redirect instead.

Static site content metadata

Site content items can optionally include a set of key-value pairs (as a dict).

These values MAY have a functional use, such as for determining outdated content, and/or for troubleshooting.

Note

Exporters may not support content metadata where the target storage system does not support it.

Static site redirects

lantern.models.site.SiteRedirect

Specialised form of Static Site Content representing redirects.

Auto-generates a minimal HTML page including lantern.models.site.SiteRedirect tag as a content value.

Static site page meta

lantern.models.site.SitePageMeta

Used by Static Site Pages to set information needed for HTML Metadata and Item Previews.