Skip to content

move upstream peer tracking and block validation out of select_chain stage #658

@rkuhn

Description

@rkuhn

Abstract

We bring the select_chain stage back within the complexity budget for a single stage by factoring out parts of its behaviour that are not intrinsically required for selecting chains.

Why?

Currently, select_chain does four things:

  1. it tracks upstream peer progress and asserts correct peer behaviour
  2. it selects a best chain candidate whenever new information becomes available
  3. it uses the ledger to validate blocks (EDIT: this is incorrect, validation happens before chain selection, which is arguably worse)
  4. it extends the chain for downstream peers upon successful validation

This is the most complex stage in the current setup and it needs to be simplified before we can make substantial improvements. The complexity needs to be factored into multiple orthogonal parts that can then be improved independently.

How?

The upstream peer tracking will be moved into the beginning of the consensus pipeline, in effect replacing the pull stage and enlarging its scope to also validate headers (because that is necessary to properly track the upstream peer’s state). This means that any header that is stored is already valid, avoiding duplication of work. Incorrect peer behaviour is recognised closer to the network stack and will trigger disconnections.

Block validation will be moved into a new part of the pipeline which will also fetch blocks with a strategy that allows batching to reduce the impact of network link latency — our current setup requires one RTT plus bandwidth delay for each block, and it does this in 1:1 correspondence with header processing, thereby slowing the whole pipeline down. The main difficulty will be to correctly feed back block validation errors into select_chain to trigger the switch to a different fork (this is not unclear, it only will be the most subtle part of the logic).

Testing Strategy / Acceptance Criteria

Currently, e2e testing on CI has been reduced to run only up to epoch 176, with these changes it should easily run until 182 again; it is reasonable to expect that it can run even further within 15min.

Tests will be added for the disconnection of misbehaving upstream peers.

Discussion points

This will likely be implemented via multiple PRs, starting with factoring out upstream peer tracking.

Dependencies & Related Tasks

No response

Checklist

  • I understand that feature requests and unrefined work item should be open as GitHub Discussions instead.
  • I have assigned this item to an existing milestone from the roadmap
  • I have added a label capturing the impact of this item (i.e. value for users/stakeholders if successful)
  • I have added a label capturing the delivery risk of this item (i.e. how likely is it that this task will succeed as planned)
  • I have added a label capturing the effort of this item (i.e. how large is the task?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    TOPIC.ConsensusMostly related to amaru-consensus / amaru-ouroboros

    Type

    No fields configured for Task.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions