Optimize parser on large Viper files by fpoli · Pull Request #477 · viperproject/silver

fpoli · 2020-11-09T10:42:33Z

Profiling Silicon on a large Viper file (6k loc), the computeFrom in PosComputer is marked as "hot". This PR replaces a linear scan in its implementation with a binary search, which even counting the preprocessing should be faster than before on any input case. On the 6k loc Viper file this removes ~3s (tested on 4 cold runs of Silicon).

Is there any test suite that checks the precise line/column of Viper's error messages?

mschwerhoff · 2020-11-09T17:36:32Z

Profiling Silicon on a large Viper file (6k loc), the computeFrom in PosComputer is marked as "hot". This PR replaces a linear scan in its implementation with a binary search, which even counting the preprocessing should be faster than before on any input case.

Interesting observation! I wonder if it would even be possible (and reasonable) to use some kind of specialised data structure for spatial information here.

On the 6k loc Viper file this removes ~3s (tested on 4 cold runs of Silicon).

You could also benchmark both versions against the Viper test suite, using this benchmark script to repeatedly execute a version and record runtime statistics in a CSV file.

Is there any test suite that checks the precise line/column of Viper's error messages?

Our test annotations, e.g. //:: ExpectedOutput(...) rely on positional information, if I'm not mistaken, and similar for caching in the IDE.

fpoli · 2020-11-10T08:41:37Z

Interesting observation! I wonder if it would even be possible (and reasonable) to use some kind of specialised data structure for spatial information here.

I don't know the parser well enough to answer this. I would say it depends on what kind of assumptions we can make. For example, if the index argument of consecutive computeFrom calls always increases and if it's not called concurrently, we can remove the binary search and make the time complexity O(1).

You could also benchmark both versions against the Viper test suite, using this benchmark script to repeatedly execute a version and record runtime statistics in a CSV file.

Nice, thanks! I didn't know that. Before the optimization:

[info] - /tmp/programs/Knights_tour.rs.vpr [Silicon-Statistics-Silver]
[info]   + [Benchmark] Trimmed 2 marginal runs. 
[info]   + [Benchmark] Mean /  8 runs:   2.29 sec  (Parsing),   5.83 sec  (Semantic Analysis),    202 msec (Translation),    565 msec (Consistency Check),  36.65 sec  (Verification),  45.54 sec  (Overall). 
[info]   + [Benchmark] Stddevs:           195 msec (Parsing),    512 msec (Semantic Analysis),     32 msec (Translation),     47 msec (Consistency Check),   1.46 sec  (Verification),   1.41 sec  (Overall). 
[info]   + [Benchmark] Rel. stddev:         8 %    (Parsing),      8 %    (Semantic Analysis),     15 %    (Translation),      8 %    (Consistency Check),      3 %    (Verification),      3 %    (Overall). 
[info]   + [Benchmark] Best run:         2.15 sec  (Parsing),   6.35 sec  (Semantic Analysis),    169 msec (Translation),    538 msec (Consistency Check),  34.58 sec  (Verification),  43.78 sec  (Overall). 
[info]   + [Benchmark] Median run:       2.31 sec  (Parsing),   4.92 sec  (Semantic Analysis),    266 msec (Translation),    664 msec (Consistency Check),  38.15 sec  (Verification),  46.31 sec  (Overall). 
[info]   + [Benchmark] Worst run:        2.08 sec  (Parsing),   6.04 sec  (Semantic Analysis),    233 msec (Translation),    620 msec (Consistency Check),  38.88 sec  (Verification),  47.85 sec  (Overall).

After the optimization:

[info] - /tmp/programs/Knights_tour.rs.vpr [Silicon-Statistics-Silver]
[info]   + [Benchmark] Trimmed 2 marginal runs. 
[info]   + [Benchmark] Mean /  8 runs:    404 msec (Parsing),   4.62 sec  (Semantic Analysis),    195 msec (Translation),    519 msec (Consistency Check),  29.49 sec  (Verification),  35.23 sec  (Overall). 
[info]   + [Benchmark] Stddevs:            48 msec (Parsing),    593 msec (Semantic Analysis),     63 msec (Translation),     93 msec (Consistency Check),   2.91 sec  (Verification),   3.59 sec  (Overall). 
[info]   + [Benchmark] Rel. stddev:        11 %    (Parsing),     12 %    (Semantic Analysis),     32 %    (Translation),     17 %    (Consistency Check),      9 %    (Verification),     10 %    (Overall). 
[info]   + [Benchmark] Best run:          345 msec (Parsing),   4.01 sec  (Semantic Analysis),    145 msec (Translation),    445 msec (Consistency Check),  26.69 sec  (Verification),  31.63 sec  (Overall). 
[info]   + [Benchmark] Median run:        439 msec (Parsing),   4.67 sec  (Semantic Analysis),    167 msec (Translation),    495 msec (Consistency Check),  28.07 sec  (Verification),  33.84 sec  (Overall). 
[info]   + [Benchmark] Worst run:         501 msec (Parsing),   5.66 sec  (Semantic Analysis),    331 msec (Translation),    615 msec (Consistency Check),  36.15 sec  (Verification),  43.25 sec  (Overall).

Verification takes ~7s less probably due to a different seed, but parsing clearly got 5x faster.

Our test annotations, e.g. //:: ExpectedOutput(...) rely on positional information, if I'm not mistaken, and similar for caching in the IDE.

Are those tests already run as part of sbt test?

mschwerhoff · 2020-11-10T17:24:35Z

I don't know the parser well enough to answer this. I would say it depends on what kind of assumptions we can make. For example, if the index argument of consecutive computeFrom calls always increases and if it's not called concurrently, we can remove the binary search and make the time complexity O(1).

I'm not sure, either. @fabiopakk Do you know more about this?

Verification takes ~7s less probably due to a different seed, but parsing clearly got 5x faster.

Nice effect on parsing!

Our test annotations, e.g. //:: ExpectedOutput(...) rely on positional information, if I'm not mistaken, and similar for caching in the IDE.

Are those tests already run as part of sbt test?

Yes: if you run sbt test in the Silicon directory, our Viper test suite is run, plus a few Silicon-specific tests. Similarly for Carbon.

fpoli · 2020-12-07T16:03:13Z

@mschwerhoff Should I merge or should someone review this?

fpoli · 2020-12-07T16:34:43Z

+    // implementation of `computeFrom` used to do.
    val lines = s.linesWithSeparators
-    _lines = lines.map(_.length).toArray
+    _line_offset = (lines.map(_.length) ++ Seq(0)).toArray


As described in the comment above, the ++ Seq(0) here is used to accurately reproduce an edge case of the old behavior. However, It's not clear to me how important that edge case is. Maybe we can ignore it.

mschwerhoff · 2020-12-07T16:35:19Z

@mschwerhoff Should I merge or should someone review this?

No objections from my side

fabiopakk · 2020-12-07T18:09:44Z

Really interesting work, @fpoli. Thanks for that. I'm changing the PosComputer in the new FastParse, but I think your work can still be useful there. I'll merge into the current version and try to make its way into FastParse 2.2.2.

Use a binary search in PosComputer#computeFrom

8cc6f64

fpoli changed the title ~~Use a binary search in PosComputer#computeFrom~~ Optimize parser on large Viper files Nov 9, 2020

fpoli requested a review from fabiopakk November 9, 2020 10:43

fpoli commented Dec 7, 2020

View reviewed changes

mschwerhoff approved these changes Dec 7, 2020

View reviewed changes

fabiopakk approved these changes Dec 7, 2020

View reviewed changes

fabiopakk closed this Dec 7, 2020

fabiopakk reopened this Dec 7, 2020

fabiopakk merged commit 1828cf8 into master Dec 7, 2020

bobismijnnaam mentioned this pull request Jan 18, 2021

Build problems with VerCors which uses Viper as a library #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize parser on large Viper files#477

Optimize parser on large Viper files#477
fabiopakk merged 1 commit into
masterfrom
optimize-parser

fpoli commented Nov 9, 2020

Uh oh!

mschwerhoff commented Nov 9, 2020

Uh oh!

fpoli commented Nov 10, 2020

Uh oh!

mschwerhoff commented Nov 10, 2020 •

edited

Loading

Uh oh!

fpoli commented Dec 7, 2020

Uh oh!

fpoli Dec 7, 2020

Uh oh!

mschwerhoff commented Dec 7, 2020

Uh oh!

fabiopakk commented Dec 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fpoli commented Nov 9, 2020

Uh oh!

mschwerhoff commented Nov 9, 2020

Uh oh!

fpoli commented Nov 10, 2020

Uh oh!

mschwerhoff commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fpoli commented Dec 7, 2020

Uh oh!

fpoli Dec 7, 2020

Choose a reason for hiding this comment

Uh oh!

mschwerhoff commented Dec 7, 2020

Uh oh!

fabiopakk commented Dec 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mschwerhoff commented Nov 10, 2020 •

edited

Loading