Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./gen.pl is horribly slow #9230

Closed
emilengler opened this issue Jul 30, 2022 · 1 comment
Closed

./gen.pl is horribly slow #9230

emilengler opened this issue Jul 30, 2022 · 1 comment

Comments

@emilengler
Copy link
Contributor

I did this

engler@lennon:~/src/curl/docs/cmdline-opts$ time ./gen.pl mainpage *.d > /dev/null 
    1m49.23s real     0m54.19s user     0m53.80s system

I expected the following

A faster generation of the man-page.

curl/libcurl version

git master

operating system

OpenBSD lennon 7.1 GENERIC.MP#3 amd64

@emilengler
Copy link
Contributor Author

emilengler commented Jul 30, 2022

My first impression is that fde0925 introduced the decrease in the performance. Before that commit, the results were as follows:

engler@lennon:~/src/curl/docs/cmdline-opts$ time ./gen.pl mainpage *.d > /dev/null 
    0m28.47s real     0m23.97s user     0m03.54s system

emanuele6 added a commit to emanuele6/curl that referenced this issue Jul 30, 2022
The (\W) capture substituted with itself is unnecessary, a simple \b
assertion is equivalent and faster.

I also quoted the injection of $k in the regex since the unquoted
injection was not intended and could potentially cause problems.

Reported-by: Emil Engler <me@emilengler.com>
Fixes curl#9230
Closes curl#9231
emilengler added a commit to emilengler/curl that referenced this issue Jul 30, 2022
On some systems, the gen.pl script takes nearly two minutes for the
generation of the main-page, which is a completely unacceptable time.

The commit author thinks, that the terrible performance comes from some
low-level locale checks regarding the use of "\W" in a regular
expression.

This commit replaces the "\W" with [^a-zA-Z0-9_], which is, according to
regex101.com, functionally equivalent to the previous operation, except
that it is obviously limited to ASCII, which is fine, as the curl
project is English-only anyway.

See curl#8299
Fixes curl#9230
Closes #XXXX
emilengler added a commit to emilengler/curl that referenced this issue Jul 30, 2022
On some systems, the gen.pl script takes nearly two minutes for the
generation of the main-page, which is a completely unacceptable time.

The commit author thinks, that the terrible performance comes from some
low-level locale checks regarding the use of "\W" in a regular
expression.

This commit replaces the "\W" with [^a-zA-Z0-9_], which is, according to
regex101.com, functionally equivalent to the previous operation, except
that it is obviously limited to ASCII, which is fine, as the curl
project is English-only anyway.

See curl#8299
Fixes curl#9230
Closes curl#9232
emanuele6 added a commit to emanuele6/curl that referenced this issue Jul 30, 2022
The (\W) capture substituted with itself is unnecessary, a simple \b
assertion is equivalent and faster.

I also quoted the injection of $k in the regex since the unquoted
injection was not intended and could potentially cause problems.

This improved the speed of

 ./gen.pl mainpage *.d > /dev/null

from 12.3 seconds to 8.8 seconds on my PC.

Reported-by: Emil Engler <me@emilengler.com>
Fixes curl#9230
Closes curl#9231
emanuele6 added a commit to emanuele6/curl that referenced this issue Jul 30, 2022
The character capure and the (\W) capture substituted with themselves is
unnecessary. I replaced them with a lookahead assertion and a \b
assertion.

I also quoted the injection of $k in the regex since the unquoted
injection was not intended and could potentially cause problems.

This improved the speed of

 ./gen.pl mainpage *.d > /dev/null

from 12.3 seconds to 0.6 seconds on my PC!

Reported-by: Emil Engler <me@emilengler.com>
Fixes curl#9230
Closes curl#9231
emanuele6 added a commit to emanuele6/curl that referenced this issue Jul 30, 2022
The (\W) capture substituted with itself is unnecessary, a simple \b
assertion is equivalent and faster.

I also quoted the injection of $k in the regex since the unquoted
injection was not intended and could potentially cause problems.

This improved the speed of

 ./gen.pl mainpage *.d > /dev/null

from 12.3 seconds to 8.8 seconds on my PC.

Reported-by: Emil Engler <me@emilengler.com>
Fixes curl#9230
Closes curl#9231
emilengler added a commit to emilengler/curl that referenced this issue Jul 31, 2022
On some systems, the gen.pl script takes nearly two minutes for the
generation of the main-page, which is a completely unacceptable time.

The commit author thinks, that the terrible performance comes from some
low-level locale checks regarding the use of "\W" in a regular
expression.

This commit replaces the "\W" with [^a-zA-Z0-9_], which is, according to
regex101.com, functionally equivalent to the previous operation, except
that it is obviously limited to ASCII, which is fine, as the curl
project is English-only anyway.

Co-authored-by: Emanuele Torre <torreemanuele6@gmail.com>

See curl#8299
Fixes curl#9230
Closes curl#9232
emilengler added a commit to emilengler/curl that referenced this issue Aug 1, 2022
On some systems, the gen.pl script takes nearly two minutes for the
generation of the main-page, which is a completely unacceptable time.

The slow performance has two causes:
1. Use of a regex locale operator
2. Useless invokations of loops

The commit addresses the first issue by replacing the "\W" wiht
[^a-zA-Z0-9_], which is, according to regex101.com, functionally
equivalent to the previous operation, except that it is obviously
limited to ASCII only, which is fine, as the curl project is
English-only anyway.

The second issue is being addressed by only running the loop if the line
contains a "--" in it. The loop may be completeley removed in the
future.

Co-authored-by: Emanuele Torre <torreemanuele6@gmail.com>

See curl#8299
Fixes curl#9230
Closes curl#9232
@bagder bagder closed this as completed in 9217935 Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants