Commit 035987d
authored
Make sanitizeGpxContent work on long input strings
Change the regular expression so the matched and captured text will be as short as possible, and the expression only matches <time> and </time> tags that are related.
In the original version, the expression will match and capture a string between, most often unrelated <time> and </time> tags because of the .* (dot asterisk) in the expression. This results in big captures requiring increased value of the pcre.backtrack_limit PHP setting to succeed and a huge number of <time> to </time> tag combinations are tried out.
When preg_replace internally iterates over the input string, the .* (dot asterisk) construct will match from the first <time> tag to the last </time> tag, and from the second <time> tag to the last </time> tag, and from the third to the last and so forth, and then it can start all over from the first <time> tag to the second last </time> tag, and from the second <time> tag to the second last </time> tag and so forth, until all combinations of <time> and a subsequent </time> tag have been tried. In a file with 30.000 time elements, it results in over 450 million tries (30.000+29.999+29.998...). It takes very long time and is needless.
The suggested change ensures the text matched and captured is only the text between related <time> and </time> tags, and the text matched and captured is less than 50 characters in each iteration.
With the suggested change the expression will not match the characters [ and < after the <time> tag, which stops the matching and capturing at the firstcomming timezone name, or the related </time> if a timezone name is not present in the element. Because only related <time> and </time> tags are tried, it results in 30.000 tries in a file with 30.000 time elements instead of 450 millions.
On my server with at high value of pcre.backtrack_limit in the PHP settings, a request to handle a gpx file with 30.000+ time elements gives up with a time out after an hour. With the suggested change the same file is processed in less than a second, and it works with the default pcre.backtrack_limit value. I believe it will actually work with any size of input because no capture is more than 50 characters, and the test for null result should not be needed, but I have not tested if it is correct.1 parent 2b0073e commit 035987d
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
122 | | - | |
| 122 | + | |
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| |||
0 commit comments