digitalmars.D.learn - std.regex is fat
- Chris Katko (12/12) Oct 12 2018 Like, insanely fat.
- Alex (3/16) Oct 12 2018 Hm... maybe, you run into this:
- Chris Katko (14/36) Oct 13 2018 So wait, if their solution was to simply REMOVE std.regex from
- Chris Katko (4/11) Oct 13 2018 For comparison, I just tested and grep uses about 4 MB of RAM to
- Adam D. Ruppe (6/8) Oct 13 2018 Running and compiling are two entirely different things. Running
- Chris Katko (60/68) Oct 14 2018 I know that. I figured people would miss my point on it though so
- Adam D. Ruppe (4/9) Oct 13 2018 Template instantiation, which is a big issue for ctRegex, but not
Like, insanely fat. All I wanted was a simple regex. The second include a regex function, my program would no longer compile "out of memory for fork". /usr/bin/time -v reports it went from 150MB of RAM for D, DAllegro, and Allegro5. To over 650MB of RAM, and from 1.5 seconds to >5.5 seconds to compile. Now I have to close all my Chrome tabs just to compile. Just for one line of regex. And I get it, it's the overhead of the library import, not the single line. But good gosh, more than 3X the RAM of the entire project for a single library import? Something doesn't add up!
Oct 12 2018
On Friday, 12 October 2018 at 13:25:33 UTC, Chris Katko wrote:Like, insanely fat. All I wanted was a simple regex. The second include a regex function, my program would no longer compile "out of memory for fork". /usr/bin/time -v reports it went from 150MB of RAM for D, DAllegro, and Allegro5. To over 650MB of RAM, and from 1.5 seconds to >5.5 seconds to compile. Now I have to close all my Chrome tabs just to compile. Just for one line of regex. And I get it, it's the overhead of the library import, not the single line. But good gosh, more than 3X the RAM of the entire project for a single library import? Something doesn't add up!Hm... maybe, you run into this: https://forum.dlang.org/post/mailman.3091.1517866806.9493.digitalmars-d puremagic.com
Oct 12 2018
On Friday, 12 October 2018 at 13:42:34 UTC, Alex wrote:On Friday, 12 October 2018 at 13:25:33 UTC, Chris Katko wrote:So wait, if their solution was to simply REMOVE std.regex from isEmail. That doesn't solve the regex problem at all. And from what I read in that thread, this penalty is paid per template INSTANTIATION which could explode. 1 - Does anyone know WHY it's so incredibly fat? 2 - If this isn't going to be fixed anytime soon, shouldn't there be a DISCLAIMER on the documentation? (+potential workarounds like keeping regex queries in their own file.) I mean, this kind of thing shouldn't require looking through forums. It's a clear bug, and if it's a WONTFIX (even temporarily), it should be documented clearly as such. If I'm running into this issue, how many other people already did, and possibly even gave up on using D?Like, insanely fat. All I wanted was a simple regex. The second include a regex function, my program would no longer compile "out of memory for fork". /usr/bin/time -v reports it went from 150MB of RAM for D, DAllegro, and Allegro5. To over 650MB of RAM, and from 1.5 seconds to >5.5 seconds to compile. Now I have to close all my Chrome tabs just to compile. Just for one line of regex. And I get it, it's the overhead of the library import, not the single line. But good gosh, more than 3X the RAM of the entire project for a single library import? Something doesn't add up!Hm... maybe, you run into this: https://forum.dlang.org/post/mailman.3091.1517866806.9493.digitalmars-d puremagic.com
Oct 13 2018
On Sunday, 14 October 2018 at 02:44:55 UTC, Chris Katko wrote:On Friday, 12 October 2018 at 13:42:34 UTC, Alex wrote:For comparison, I just tested and grep uses about 4 MB of RAM to run. So it's not the regex. It's the dmd / templates / CTFE, right?[...]So wait, if their solution was to simply REMOVE std.regex from isEmail. That doesn't solve the regex problem at all. And from what I read in that thread, this penalty is paid per template INSTANTIATION which could explode. [...]
Oct 13 2018
On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:For comparison, I just tested and grep uses about 4 MB of RAM to run.Running and compiling are two entirely different things. Running the D regex code should be comparable, but compiling it is slow, in great part because of internal templates... There was an effort to speed up the template code, but it is still not complete.
Oct 13 2018
On Sunday, 14 October 2018 at 03:26:33 UTC, Adam D. Ruppe wrote:On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:I know that. I figured people would miss my point on it though so I should have clarified. That's why I said it's likely the templates/DMD that's exploding--not the actual regex action. From a simple program, it takes ~100-150MB of RAM to compile. Adding a single regex (not compiled regex) balloons to 550MB at 5 seconds of compile time. ----------- Anyhow, I wrote my own simple "dgrep" and compared the results with grep, it's very competitive: (NOT to be confused with the above RAM stats for COMPILING) Command being timed: "sh -c cat dgrep.d | ./dgrep 'write' " User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3192 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 301 Voluntary context switches: 5 Involuntary context switches: 124 Swaps: 0 File system inputs: 8 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "sh -c cat dgrep.d | grep 'write'" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2224 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 2 Minor (reclaiming a frame) page faults: 282 Voluntary context switches: 10 Involuntary context switches: 0 Swaps: 0 File system inputs: 760 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 So I have to say I'm impressed with the actual performance of the regular expressions engine--especially considering "grep" is, IIRC, considered a fine-tuned beast.For comparison, I just tested and grep uses about 4 MB of RAM to run.Running and compiling are two entirely different things. Running the D regex code should be comparable, but compiling it is slow, in great part because of internal templates... There was an effort to speed up the template code, but it is still not complete.
Oct 14 2018
On Sunday, 14 October 2018 at 02:44:55 UTC, Chris Katko wrote:So wait, if their solution was to simply REMOVE std.regex from isEmail.That was ctRegex, which is different than regex.That doesn't solve the regex problem at all. And from what I read in that thread, this penalty is paid per template INSTANTIATION which could explode.Template instantiation, which is a big issue for ctRegex, but not for regular regex.
Oct 13 2018