Increase performance and reduce allocations in hclsyntax.LexConfig#490
Increase performance and reduce allocations in hclsyntax.LexConfig#490apparentlymart merged 3 commits intohashicorp:mainfrom
hclsyntax.LexConfig#490Conversation
|
Hi! Any updates on this? |
2856001 to
c4f9e5b
Compare
c31e897 to
818a4cf
Compare
|
Hi @inkel! Sorry for not responding sooner. I had to do some tooling maintenance on this repository to get the per-commit checks working again before review, and that's now done and so we do have the tests running here again. At the time I'm writing this there seems to be a test failure but my initial investigation suggests that it's a pre-existing issue unrelated to what you proposed here: there is a test that's currently relying on a particular iteration order for maps in Go, but Go does not guarantee that and so there's a 50% chance of the test failing. I'd like to try to get that fixed first because the code you modified here was introduced specifically to fix an earlier bug around accidental reuse of values, and so although what you implemented does look correct to me as a human reviewer (the closure you introduced seems to allocate a new copy of the range for each use) I'd like to confirm that confidence by seeing the tests run cleanly before we move forward. Thanks for working on this! |
|
Hey @apparentlymart thanks for taking the time to review this! What you say makes total sense to me. I'll be happy to help with anything if you need to. |
818a4cf to
a2faa08
Compare
Doing this reduces the memory used in ~11%, as the following benchstat
comparison shows:
name old time/op new time/op delta
LexConfig-12 9.27µs ± 0% 9.03µs ± 1% -2.55% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
LexConfig-12 8.94kB ± 0% 7.98kB ± 0% -10.74% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
LexConfig-12 37.0 ± 0% 37.0 ± 0% ~ (all equal)
Benchmarks were created using:
go test -benchmem -benchtime=200000x -count=10 -bench=.
In a similar fashion as the parent commit, here instead of always
copying the tok.Range for later use, we define a function to get this
copied value, and thus we only allocate the copy if it's needed,
otherwise don't.
For the benchmark introduced earlier, the reduction in allocations and
memory usage is outstanding:
name old time/op new time/op delta
LexConfig-12 9.05µs ± 1% 7.83µs ± 1% -13.54% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
LexConfig-12 7.98kB ± 0% 6.06kB ± 0% -24.07% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
LexConfig-12 37.0 ± 0% 7.0 ± 0% -81.08% (p=0.000 n=10+10)
Benchmarks were created using:
go test -benchmem -benchtime=200000x -count=10 -bench=.
a2faa08 to
3d38be3
Compare
|
Hi again, @inkel! Thanks for your patience. With the automated test runs now working we can see the existing tests passing, and for good measure I also tried some manual tests to convince myself that indeed this approach does result in each diagnostic pointing to a separate This does mean that now if there are multiple diagnostics in a single iteration we will allocate a new Thanks again for working on this! I'm going to merge it now. |
|
@apparentlymart thank you for merging this and for the detailed explanation! |
Hello, team. First of all, thank you for sharing this library.
While working on an internal project that makes use of this library to parse
.hclfiles to retrieve a list of used modules, I added some benchmarks and found that one of the bottlenecks that I had was in thehclsyntax.LexConfig. Further inspection highlightedcheckInvalidTokensas one of the culprits (the other beingemitTokenwhich is in a generated file and thus I didn't want to mess much with).I took the liberty to add a benchmark for
LexConfigwith a rather small HCL file, but that serves the purposes of finding if I was able to improve the results. Then I proceeded to run the benchmarks in the original code and found this results (only showing one line for brevity):After that I've started to find where most of the allocations were being made, and found a spot that was making most of the reported allocations. Then I proceeded to try and reduce the allocations count and bytes, and the last commit got me this benchmark results (again showing only one line for brevity):
According to
benchstat, the difference between both benchmarks is:As you can see this reduced the number of allocations in ~80% using 32% less bytes while doing so. As a side effect, performance got increased in 15%.
I hope you find this useful and worth of merging, thanks!