Skip to content

Make sanitizeGpxContent work on long input strings#96

Open
StBr65 wants to merge 1 commit intojulien-nc:mainfrom
StBr65:patch-1
Open

Make sanitizeGpxContent work on long input strings#96
StBr65 wants to merge 1 commit intojulien-nc:mainfrom
StBr65:patch-1

Conversation

@StBr65
Copy link
Copy Markdown

@StBr65 StBr65 commented May 6, 2026

Change the regular expression so the matched and captured text will be as short as possible, and the expression only matches and tags that are related.

In the original version, the expression will match and capture a string between, most often unrelated and tags because of the .* (dot asterisk) in the expression. This results in big captures requiring increased value of the pcre.backtrack_limit PHP setting to succeed and a huge number of to tag combinations are tried out.

When preg_replace internally iterates over the input string, the .* (dot asterisk) construct will match from the first tag to the last tag, and from the second tag to the last tag, and from the third to the last and so forth, and then it can start all over from the first tag to the second last tag, and from the second tag to the second last tag and so forth, until all combinations of and a subsequent tag have been tried. In a file with 30.000 time elements, it results in over 450 million tries (30.000+29.999+29.998...). It takes very long time and is needless.

The suggested change ensures the text matched and captured is only the text between related and tags, and the text matched and captured is less than 50 characters in each iteration.

With the suggested change the expression will not match the characters [ and < after the tag, which stops the matching and capturing at the firstcomming timezone name, or the related if a timezone name is not present in the element. Because only related and tags are tried, it results in 30.000 tries in a file with 30.000 time elements instead of 450 millions.

On my server with at high value of pcre.backtrack_limit in the PHP settings, a request to handle a gpx file with 30.000+ time elements gives up with a time out after an hour. With the suggested change the same file is processed in less than a second, and it works with the default pcre.backtrack_limit value. I believe it will actually work with any size of input because no capture is more than 50 characters, and the test for null result should not be needed, but I have not tested if it is correct.

Change the regular expression so the matched and captured text will be as short as possible, and the expression only matches <time> and </time> tags that are related.

In the original version, the expression will match and capture a string between, most often unrelated <time> and </time> tags because of the .* (dot asterisk) in the expression. This results in big captures requiring increased value of the pcre.backtrack_limit PHP setting to succeed and a huge number of <time> to </time> tag combinations are tried out.

When preg_replace internally iterates over the input string, the .* (dot asterisk) construct will match from the first <time> tag to the last </time> tag, and from the second <time> tag to the last </time> tag, and from the third to the last and so forth, and then it can start all over from the first <time> tag to the second last </time> tag, and from the second <time> tag to the second last </time> tag and so forth, until all combinations of <time> and a subsequent </time> tag have been tried. In a file with 30.000 time elements, it results in over 450 million tries (30.000+29.999+29.998...). It takes very long time and is needless.

The suggested change ensures the text matched and captured is only the text between related <time> and </time> tags, and the text matched and captured is less than 50 characters in each iteration.

With the suggested change the expression will not match the characters [ and < after the <time> tag, which stops the matching and capturing at the firstcomming timezone name, or the related </time> if a timezone name is not present in the element. Because only related <time> and </time> tags are tried, it results in 30.000 tries in a file with 30.000 time elements instead of 450 millions.

On my server with at high value of pcre.backtrack_limit in the PHP settings, a request to handle a gpx file with 30.000+ time elements gives up with a time out after an hour. With the suggested change the same file is processed in less than a second, and it works with the default pcre.backtrack_limit value. I believe it will actually work with any size of input because no capture is more than 50 characters, and the test for null result should not be needed, but I have not tested if it is correct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant