Skip to content

Commit c61d1de

Browse files
cool-developealexanderbezrobert-zarembaaaronc
authored
feat: add adr-001 for node key refactoring (#608)
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com> Co-authored-by: Robert Zaremba <robert@zaremba.ch> Co-authored-by: Aaron Craelius <aaron@regen.network>
1 parent ad8c0eb commit c61d1de

2 files changed

Lines changed: 144 additions & 0 deletions

File tree

docs/architecture/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,5 @@ If recorded decisions turned out to be lacking, convene a discussion, record the
2020
and then modify the code to match.
2121

2222
## ADR Table of Contents
23+
24+
- [ADR 001: Node Key Refactoring](./adr-001-node-key-refactoring.md)
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# ADR ADR-001: Node Key Refactoring
2+
3+
## Changelog
4+
5+
- 2022-10-31: First draft
6+
7+
## Status
8+
9+
Proposed
10+
11+
## Context
12+
13+
The original key format of IAVL nodes is a hash of the node. It does not take advantage of data locality on LSM-Tree. Nodes are stored with the random hash value, so it increases the number of compactions and makes it difficult to find the node. The new key format will take advantage of data locality in the LSM tree and reduce the number of compactions.
14+
15+
The `orphans` are used to manage node removal in the current design and allow the deletion of removed nodes for the specific version from the disk through the `DeleteVersion` API. It needs to track every time when updating the tree and also requires extra storage to store `orphans`. But there are only 2 use cases for `DeleteVersion`:
16+
17+
1. Rollback of the tree to a previous version
18+
2. Remove unnecessary old nodes
19+
20+
## Decision
21+
22+
- Use the version and the local nonce as a node key like `bigendian(version) | bigendian(nonce)` format. Here the `nonce` is a local sequence id for the same version.
23+
- Store the children node keys (`leftNodeKey` and `rightNodeKey`) in the node body.
24+
- Remove the `version` field from node body writes.
25+
- Remove the `leftHash` and `rightHash` fields, and instead store `hash` field in the node body.
26+
- Remove the `orphans` completely from both tree and storage.
27+
28+
New node structure
29+
30+
```go
31+
type NodeKey struct {
32+
version int64
33+
nonce int32
34+
}
35+
36+
type Node struct {
37+
key []byte
38+
value []byte
39+
hash []byte // keep it in the storage instead of leftHash and rightHash
40+
nodeKey *NodeKey // new field, the key in the storage
41+
leftNodeKey *NodeKey // new field, need to store in the storage
42+
rightNodeKey *NodeKey // new field, need to store in the storage
43+
leftNode *Node
44+
rightNode *Node
45+
size int64
46+
leftNode *Node
47+
rightNode *Node
48+
subtreeHeight int8
49+
}
50+
```
51+
52+
New tree structure
53+
54+
```go
55+
type MutableTree struct {
56+
*ImmutableTree // The current, working tree.
57+
lastSaved *ImmutableTree // The most recently saved tree.
58+
unsavedFastNodeAdditions map[string]*fastnode.Node // FastNodes that have not yet been saved to disk
59+
unsavedFastNodeRemovals map[string]interface{} // FastNodes that have not yet been removed from disk
60+
ndb *nodeDB
61+
skipFastStorageUpgrade bool // If true, the tree will work like no fast storage and always not upgrade fast storage
62+
63+
mtx sync.Mutex
64+
}
65+
```
66+
67+
We will assign the `nodeKey` when saving the current version in `SaveVersion`. It will reduce unnecessary checks in CRUD operations of the tree and keep sorted the order of insertion in the LSM tree.
68+
69+
### Migration
70+
71+
We can migrate nodes through the following steps:
72+
73+
- Export the snapshot of the tree from the original version.
74+
- Import the snapshot to the new version.
75+
- Track the nonce for the same version using int32 array of the version length.
76+
- Assign the `nodeKey` when saving the node.
77+
78+
### Pruning
79+
80+
The current pruning strategies allows for intermediate versions to exist. With the adoption of this ADR we are migrating to allowing only versions to exist between a range (50-100 instead of 1,25,50-100).
81+
82+
Here we are introducing a new way how to get orphaned nodes which remove in the `n+1`th version updates without storing orphanes in the storage.
83+
84+
When we want to remove the `n+1`th version
85+
86+
- Traverse the tree in-order way based on the root of `n+1`th version.
87+
- If we visit the lower version node, pick the node and don't visit further deeply. Pay attention to the order of these nodes.
88+
- Traverse the tree in-order way based on the root of `n`th version.
89+
- Iterate the tree until meet the first node among the above nodes(stack) and delete all visited nodes so far from `n`th tree.
90+
- Pop the first node from the stack and iterate again.
91+
92+
If we assume `1 to (n-1)` versions already been removed, when we want to remove the `n`th version, we can just remove the above orphaned nodes.
93+
94+
### Rollback
95+
96+
When we want to rollback to the specific version `n`
97+
98+
- Iterate the version from `n+1`.
99+
- Traverse key-value through `traversePrefix` with `prefix=bigendian(version)`.
100+
- Remove all iterated nodes.
101+
102+
## Consequences
103+
104+
### Positive
105+
106+
* Using the version and a local nonce, we take advantage of data locality in the LSM tree. Since we commit the sorted data, it can reduce compactions and makes it easy to find the key. Also, it can reduce the key and node size in the storage.
107+
108+
```
109+
# node body
110+
111+
add `hash`: +32 byte
112+
add `leftNodeKey`, `rightNodeKey`: max (8 + 4) * 2 = +24 byte
113+
remove `leftHash`, `rightHash`: -64 byte
114+
remove `version`: max -8 byte
115+
------------------------------------------------------------
116+
total save 16 byte
117+
118+
# node key
119+
120+
remove `hash`: -32 byte
121+
add `version|nonce`: +12 byte
122+
------------------------------------
123+
total save 20 byte
124+
```
125+
126+
* Removing orphans also provides performance improvements including memory and storage saving.
127+
128+
### Negative
129+
130+
* `Update` operations will require extra DB access because we need to take children to calculate the hash of updated nodes.
131+
* It doesn't require more access in other cases including `Set`, `Remove`, and `Proof`.
132+
133+
* It is impossible to remove the individual version. The new design requires more restrict pruning strategies.
134+
135+
* When importing the tree, it may require more memory because of int32 array of the version length. We will introduce the new importing strategy to reduce the memory usage.
136+
137+
## References
138+
139+
- https://github.com/cosmos/iavl/issues/548
140+
- https://github.com/cosmos/iavl/issues/137
141+
- https://github.com/cosmos/iavl/issues/571
142+
- https://github.com/cosmos/cosmos-sdk/issues/12989

0 commit comments

Comments
 (0)