Skip to content

WIP: Integrate ISA-L & Generalised Erasure Coding.#80

Draft
BlamKiwi wants to merge 367 commits intokoverstreet:masterfrom
BlamKiwi:isal
Draft

WIP: Integrate ISA-L & Generalised Erasure Coding.#80
BlamKiwi wants to merge 367 commits intokoverstreet:masterfrom
BlamKiwi:isal

Conversation

@BlamKiwi
Copy link
Copy Markdown
Contributor

@BlamKiwi BlamKiwi commented Dec 1, 2019

Over the weekend I got ISA-L building and integrated CRC64 (5-15x speed-up Ryzen 2200G) as a quick proof point. I just want some quick feedback before tackling full Erasure Coding.

KBuild -
I've added ISA-L and EC as some boolean flags to KBuild. I assume you don't want EC support as a separate module?

Makefile -
The ISA-L code builds without modification from Intel's upstream. This has resulted in very verbose KBuild Special Rules due to the NASM dependency and unnecessary CRC implementations. I would be interested in advice for a better approach until I can port ISA-L to GAS and strip out unused code.

Accel.h/c -
This is a temporary integration point for accessing optimised primitives. I intend to move them to the appropriate kernel lib folders once everything is working.

MD-RAID Compatibility -
The website TODO list mentions Andrea Mazzoleni's technique of combining Vandermonde and Cauchy matrices to implement Erasure Coding compatible with MD-RAID. To begin with I won't be implementing this technique. When stuff is stable I will dig into those mathematics a bit.

koverstreet and others added 30 commits November 18, 2019 11:48
We weren't checking for errors when trying to delet stripes, which meant
ec_stripe_delete_work() would spin trying to delete the same stripe over
and over.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If there is only a single entry at 0, the first time we call xas_next(),
we return the entry.  Unfortunately, all subsequent times we call
xas_next(), we also return the entry at 0 instead of noticing that the
xa_index is now greater than zero.  This broke find_get_pages_contig().

Fixes: 64d3e9a ("xarray: Step through an XArray")
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
There was a null ptr deref when there wasn't a stripes heap allocated

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Change it to not mark keys that will be overwritten by keys in the
journal - this fixes a bug where we pop an assertion in
bucket_set_stripe() because of a stale pointer - because the stripe that
has the stale pointer has been deleted.

This code could be factored out and used elsewhere, at some point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Actual repair code will come later, but this is a start

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With reflink, we'll no longer be able to calculate the offset of the
data we want into the extent we're reading from from the extent pos and
the iter pos - we'll have to pass it in separately.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
for_each_btree_key() calls bch2_trans_get_iter() - we have to reset the
transaction state before getting the iterator again, in the retry path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Where unlink_on_commit is used, on unsuccessfull commit we're likely
retrying the whole update and were going to be using the same iterators
again.

The management of multiple iterators needs to be gone over a fair bit
more at some point...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Prep work for reflink - for reflink, we're going to be using
bch2_extent_update() with other updates in the same transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Minor cleanup - prep work for new key types for reflink

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
More prep work for reflink: for extents, we're not looking for an exact
mach on pos, rather that the pos is within the range of the key the
iterator points to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_btree_node_iter_prev_filter() tried to be smart about iterating
backwards when skipping over whiteouts/discards - but unfortunately,
doing so can leave the node iterator in an inconsistent state; the sane
solution is to just always iterate backwards one key at a time.

But we compact btree nodes when more than a quarter of the keys are
whiteouts/discards, so the optimization wasn't buying us that much
anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants