-
Notifications
You must be signed in to change notification settings - Fork 325
feat: add adr-001 for node key refactoring #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 22 commits
4f9a004
8646a9a
6e5b081
4ee7804
9468ee2
0c2d610
1459a18
d8d0bf9
88885f1
5272de4
006d76c
7cc7280
4e6044f
0bf7486
a0dcc0e
ee11dff
d9f7d2e
10184ac
5f94844
85a90e7
8fb87b6
6bbf7f9
204d881
f743311
bf85d92
b064ede
6095bc4
7873267
fb0f182
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,154 @@ | ||
| # ADR ADR-001: Node Key Refactoring | ||
|
|
||
| ## Changelog | ||
|
|
||
| - 2022-10-31: First draft | ||
|
|
||
| ## Status | ||
|
|
||
| Proposed | ||
|
|
||
| ## Context | ||
|
|
||
| The original key format of IAVL nodes is a hash of the node. It does not take advantage of data locality on disk. Nodes are stored in random locations on the disk due to the random hash value, so it needs to scan the disk to find the corresponding node which can be very inefficient. | ||
|
|
||
| The `orphans` are used to manage node removal in the current design and allow the deletion of removed nodes for the specific version from the disk through the `DeleteVersion` API. It needs to track every time when updating the tree and also requires extra storage to store `orphans`. But there are only 2 use cases for `DeleteVersion`: | ||
|
|
||
| 1. Rollback of the tree to a previous version | ||
| 2. Remove unnecessary old nodes | ||
|
|
||
| ## Decision | ||
|
|
||
| - Use the version and the path as a node key like `bigendian(version) | byte array(path)` format. Here the `path` is a binary expression of the path from the root to the current node. | ||
| ``` | ||
| `10101` : (right, left, right, left, right) -> [0x15] | ||
| ``` | ||
| - Store only the child node key for the below version in node body writes. Because it is possible to get the child path for the same version. | ||
| ```go | ||
| func (node *Node) getLeftNode() (*Node, error) { | ||
| if node.leftNode != nil { | ||
| return node.leftNode, nil | ||
| } | ||
| if node.leftNodeKey != nil { | ||
| return getNode(node.leftNodeKey) // get the node from the storage | ||
| } | ||
| return getNode(&NodeKey{ | ||
| version: node.nodeKey.version, | ||
| path: node.nodeKey.path + '0', // it will be more complicated in the real implementation | ||
| }) | ||
|
aaronc marked this conversation as resolved.
Outdated
|
||
| } | ||
| ``` | ||
| - Remove the `version` field from node body writes. | ||
| - Remove the `leftHash` and `rightHash` fields, and instead store `hash` field in the node body. | ||
| - Remove the `orphans` completely from both tree and storage. | ||
|
|
||
| New node structure | ||
|
tac0turtle marked this conversation as resolved.
|
||
|
|
||
| ```go | ||
| type NodeKey struct { | ||
| version int64 | ||
| path []byte | ||
| } | ||
|
|
||
| type Node struct { | ||
| key []byte | ||
| value []byte | ||
| hash []byte // keep it in the storage instead of leftHash and rightHash | ||
| nodeKey *NodeKey // new field, the key in the storage | ||
| leftNodeKey *NodeKey // new field, need to store in the storage | ||
| rightNodeKey *NodeKey // new field, need to store in the storage | ||
| leftNode *Node | ||
| rightNode *Node | ||
| size int64 | ||
| subtreeHeight int8 | ||
| } | ||
| ``` | ||
|
|
||
| New tree structure | ||
|
|
||
| ```go | ||
| type MutableTree struct { | ||
| *ImmutableTree | ||
| lastSaved *ImmutableTree | ||
| versions map[int64]bool | ||
| allRootLoaded bool | ||
| unsavedFastNodeAdditions map[string]*fastnode.Node | ||
| unsavedFastNodeRemovals map[string]interface{} | ||
| ndb *nodeDB | ||
| skipFastStorageUpgrade bool | ||
|
|
||
| mtx sync.Mutex | ||
| } | ||
| ``` | ||
|
|
||
| We will assign the `nodeKey` when saving the current version in `SaveVersion`. It will reduce unnecessary checks in CRUD operations of the tree and keep sorted the order of insertion in the LSM tree. | ||
|
|
||
| ### Migration | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alternative way to migrate is using the state streamer, resync the chain, and recreate the new iavl db with the state changes, then switch.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a breaking change, so validators need to update at once (at least 67% of them). So we need a migration process anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's not consensus breaking, since the root hashes don't change, validators don't need to update at the same time?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably depends on the migration mechanism. I think the one described below won't guarantee the same IAVL tree.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, today IAVL key is a hash, so this update will change the keys stored in the IAVL, hence it will change an order in the tree, and the tree Merkle Hash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it can migrate IAVL nodes one to one, change references between them, but the hashes of each node is not changed. |
||
|
|
||
| We can migrate nodes one by one by iterating the version. | ||
|
|
||
| - Iterate the version in order, and get the root node for the specific version. | ||
|
cool-develope marked this conversation as resolved.
Outdated
|
||
| - Iterate the tree based on the root and pick only nodes the node version is the same as the given version. | ||
| - Store them using the new node key format. | ||
|
|
||
| ### Pruning | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this section needs more details. |
||
|
|
||
| We assume keeping only the range versions of `fromVersion` to `toVersion`. Refer to [this issue](https://github.com/cosmos/cosmos-sdk/issues/12989). | ||
|
cool-develope marked this conversation as resolved.
Outdated
|
||
|
|
||
| Here we are introducing a new way how to get orphaned nodes which remove in the `n+1`th version updates. | ||
|
tac0turtle marked this conversation as resolved.
Outdated
|
||
|
|
||
| - Traverse the tree in-order way based on the root of `n+1`th version. | ||
| - If we visit the lower version node, pick the node and don't visit further deeply. Pay attention to the order of these nodes. | ||
| - Traverse the tree in-order way based on the root of `n`th version. | ||
|
tac0turtle marked this conversation as resolved.
|
||
| - Iterate the tree until meet the first node among the above nodes(stack) and delete all visited nodes so far from `n`th tree. | ||
| - Pop the first node from the stack and iterate again. | ||
|
|
||
| If we assume `1 to (n-1)` versions already been removed, when we want to remove the `n`th version, we can just remove the above orphaned nodes. | ||
|
|
||
| ### Rollback | ||
|
|
||
| When we want to rollback to the specific version `n` | ||
|
|
||
| - Iterate the version from `n+1`. | ||
| - Traverse key-value through `traversePrefix` with `prefix=bigendian(version)`. | ||
| - Remove all iterated nodes. | ||
|
|
||
| ## Consequences | ||
|
cool-develope marked this conversation as resolved.
|
||
|
|
||
| ### Positive | ||
|
|
||
| Using the version and the path, we take advantage of data locality in the LSM tree. Since we commit the sorted data, it can reduce compactions and makes it easy to find the key. Also, it can reduce the key and node size in the storage. | ||
|
cool-develope marked this conversation as resolved.
Outdated
cool-develope marked this conversation as resolved.
Outdated
|
||
|
|
||
| ``` | ||
| # node body | ||
|
|
||
| add `hash`: +32 byte | ||
| add `leftNodeKey`, `rightNodeKey`: max 8 + 8 = +16 byte | ||
| remove `leftHash`, `rightHash`: -64 byte | ||
| remove `version`: max -8 byte | ||
| ------------------------------------------------------------ | ||
| total save 24 byte | ||
|
|
||
| # node key | ||
|
|
||
| remove `hash`: -32 byte | ||
| add `version|path`: +16 byte | ||
| ------------------------------------ | ||
| total save 16 byte | ||
| ``` | ||
|
|
||
| Removing orphans also provides performance improvements including memory and storage saving. | ||
|
cool-develope marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Negative | ||
|
|
||
| The `Update` operation will require extra DB access because we need to take children to calculate the hash of updated nodes. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean to say that it requires extra DB reads? Why is this not needed in Set and Remove?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so we are getting
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the trade off here is storage saving on per node basis, right? maybe we mention this as a tradeoff
cool-develope marked this conversation as resolved.
Outdated
|
||
| It doesn't require more access in other cases including `Set`, `Remove`, and `Proof`. | ||
|
cool-develope marked this conversation as resolved.
Outdated
|
||
|
|
||
| It is impossible to remove the individual version. The new design requires more restrict pruning strategies. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you elaborate on this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is enough, it will remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if removing
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets edit to include this.
cool-develope marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## References | ||
|
|
||
| - https://github.com/cosmos/iavl/issues/548 | ||
| - https://github.com/cosmos/iavl/issues/137 | ||
| - https://github.com/cosmos/iavl/issues/571 | ||
| - https://github.com/cosmos/cosmos-sdk/issues/12989 | ||
Uh oh!
There was an error while loading. Please reload this page.