Conversation
yelhousni
reviewed
Sep 21, 2021
Collaborator
yelhousni
left a comment
There was a problem hiding this comment.
LGTM, the choice of C is not theoretically optimal but in practice this leads to nice speedups especially when there are many 1 scalars.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In few instances, inputs to the msm (
MultiExp) may have a significant number of 0 and 1.For large instances, the cost of filtering these values is high since we must allocate large area of memory and copy.
Zeroes are not too costly, and in practice we don't seem to suffer from high number of branch misprediction. Each go routine processing a chunk of bit is going to test the digit it gets, if it's zero, it does nothing (and all the other go routines running at the same time do nothing too, since all the digits are 0).
However, "1" values means the first go routine (processing the c lower bits of the scalars) is going to do more work that the other go routines (who only hit "0" digits).
To workaround since (avoiding memory copy); we count the number of scalar where only the c-lowest bits are set. If this represent more than 10% of the total msm instance, we spawn 2 go routines (and process half of the scalars in each) for the msm.
This is likely not an optimal solution and we may iterate to improve complexity for all cases since:
However, in practice, it allows the msm instance in such cases to finish faster, since most of the go routines will finish at the same time.