www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - InvalidMemoryOperationError from File.byLine()?

reply "Stiff" <stiff example.com> writes:
Hi,

I'm reading in a rather large text file (~25Mb) line by line, 
where I don't need to hang on to a line for more than one 
iteration through my loop. I'm consistently getting an 
InvalidMemoryOperationError on my 2,547th iteration, and based on 
console outputs, I'm fairly certain it's coming from the byLine() 
method. To be clear, I'm just doing a simple:

foreach (line; file.byLine()){ ... }

Am I possibly screwing something up? I haven't declared any 
classes or written any destructors. Is this a bug? If so, is 
there a workaround?

Thanks!
Jun 02 2015
next sibling parent reply "Alex Parrill" <initrd.gz gmail.com> writes:
On Tuesday, 2 June 2015 at 19:18:15 UTC, Stiff wrote:
 Hi,

 I'm reading in a rather large text file (~25Mb) line by line, 
 where I don't need to hang on to a line for more than one 
 iteration through my loop. I'm consistently getting an 
 InvalidMemoryOperationError on my 2,547th iteration, and based 
 on console outputs, I'm fairly certain it's coming from the 
 byLine() method. To be clear, I'm just doing a simple:

 foreach (line; file.byLine()){ ... }

 Am I possibly screwing something up? I haven't declared any 
 classes or written any destructors. Is this a bug? If so, is 
 there a workaround?

 Thanks!
Can you provide a minimal test case? I.e. as short of a program and input file as you can get that still has the error?
Jun 02 2015
parent reply "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 19:35:04 UTC, Alex Parrill wrote:
 On Tuesday, 2 June 2015 at 19:18:15 UTC, Stiff wrote:
 Hi,

 I'm reading in a rather large text file (~25Mb) line by line, 
 where I don't need to hang on to a line for more than one 
 iteration through my loop. I'm consistently getting an 
 InvalidMemoryOperationError on my 2,547th iteration, and based 
 on console outputs, I'm fairly certain it's coming from the 
 byLine() method. To be clear, I'm just doing a simple:

 foreach (line; file.byLine()){ ... }

 Am I possibly screwing something up? I haven't declared any 
 classes or written any destructors. Is this a bug? If so, is 
 there a workaround?

 Thanks!
Can you provide a minimal test case? I.e. as short of a program and input file as you can get that still has the error?
Here's the least code I can reproduce it with: //appropriate imports here void main(string[] args) { auto groupFile = new File(args[1]); int groupNum = 0; foreach (groupLine; groupFile.byLine()){ writeln(groupNum); groupNum++; } } Data (i.e. groupFile, 2 lines): 41 7, 144 0, 473 2, 730 1, 229 6, 333 11, 961 15, 856 16, 20 17, 165 18, 200 19, 395 20, 939 21, 240 24, 760 24, 434 29, 169 30, 718 30, 845 32, 942 33, 414 35, 889 36, 944 36, 918 37, 891 38, 976 38, 325 40, 840 40, 884 40, 931 40, 991 40, 690 41, 802 41, 138 43, 827 43, 934 43, 886 44, 894 47, 979 53, 892 60, 225 63, 858 67, 144 229 3 2, 473 229 1 2, 730 229 2 2, 144 333 6 2, 730 333 5 2, 144 961 8 2, 473 961 5 2, 144 856 11 2, 473 856 9 2, 473 20 10 2, 730 20 11 2, 144 165 14 2, 473 165 9 2, 229 165 3 3, 730 200 12 2, 229 200 6 3, 473 395 13 2, 730 395 14 2, 333 939 3 3, 144 939 13 3, 144 240 18 2, 473 240 15 2, 144 760 19 2, 165 760 1 4, 165 434 5 4, 229 434 16 4, 395 169 9 3, 333 169 12 3, 229 169 16 3, 20 718 7 3, 395 718 3 3, 200 845 8 4, 395 845 7 4, 395 942 6 3, 333 942 14 3, 165 414 15 4, 20 414 13 4, 760 414 6 5, 240 889 5 3, 20 889 11 3, 165 944 13 4, 718 944 1 4, 240 918 8 3, 718 918 2 4, 229 918 17 4, 165 891 17 4, 760 891 9 5, 169 891 2 5, 165 976 15 4, 434 976 4 5, 200 325 15 4, 20 325 16 4, 20 840 18 3, 169 840 5 4, 200 840 8 4, 414 884 3 6, 395 884 14 6, 165 884 15 6, 169 931 5 4, 718 931 5 4, 200 991 10 4, 165 991 10 4, 414 690 0 6, 200 690 15 6, 165 802 19 4, 718 802 5 4, 240 802 9 4, 200 138 13 4, 760 138 8 5, 414 827 5 6, 169 827 7 6, 760 827 12 6, 200 934 19 4, 414 934 3 6, 434 934 0 6, 240 886 15 3, 169 886 9 4, 718 886 5 4, 434 894 13 5, 414 894 7 6, 690 979 9 7, 138 979 5 7, 169 979 17 7, 138 892 9 6, 325 892 11 6, 690 225 15 7, 138 225 12 7, 690 858 19 7, 325 858 19 7, 47 8, 144 4, 557 0, 730 5, 229 10, 333 15, 86 17, 200 23, 184 24, 939 25, 355 26, 169 34, 287 34, 1 37, 138 41, 949 41, 119 42, 51 46, 356 47, 884 50, 873 51, 783 52, 346 53, 900 53, 594 54, 110 55, 925 56, 236 57, 856 60, 374 61, 840 64, 993 64, 828 65, 858 66, 548 67, 992 68, 525 70, 563 70, 709 72, 614 74, 746 76, 278 82, 961 83, 959 85, 509 87, 962 91, 448 96, 256 97, 144 229 3 2, 557 229 7 2, 730 229 2 2, 144 333 6 2, 730 333 5 2, 229 86 0 3, 557 86 9 3, 730 200 12 2, 229 200 6 3, 333 184 2 3, 144 184 12 3, 333 939 3 3, 144 939 13 3, 730 355 15 2, 86 355 2 4, 333 169 12 3, 229 169 16 3, 200 287 9 4, 86 287 11 4, 229 287 17 4, 86 1 13 4, 200 1 6 4, 200 138 13 4, 355 138 10 5, 200 949 11 4, 86 949 16 4, 287 119 2 5, 169 119 1 5, 200 51 18 4, 184 51 17 4, 184 356 17 4, 169 356 6 4, 169 884 14 4, 355 884 17 5, 287 884 8 5, 119 873 6 6, 356 873 1 6, 51 873 1 6, 138 783 0 6, 200 783 17 6, 169 346 14 4, 287 346 14 5, 287 900 12 5, 138 900 4 6, 287 594 14 5, 169 594 13 5, 86 594 17 5, 287 110 14 5, 355 110 16 5, 184 110 16 5, 169 925 17 4, 51 925 5 5, 287 925 3 5, 169 236 16 4, 51 236 3 5, 346 856 3 6, 110 856 1 6, 783 856 1 7, 1 856 5 7, 356 374 7 5, 287 374 19 5, 1 840 19 5, 356 840 8 5, 51 993 11 5, 346 993 3 6, 51 828 16 5, 783 828 10 7, 594 828 7 7, 594 858 5 6, 51 858 12 6, 1 858 16 6, 356 548 14 5, 110 548 5 6, 783 992 11 7, 346 992 10 7, 51 992 11 7, 346 525 12 6, 110 525 10 6, 374 563 3 6, 346 563 10 6, 783 709 13 7, 374 709 3 7, 346 614 15 6, 110 614 12 6, 594 746 16 6, 110 746 14 6, 525 278 6 7, 236 278 18 7, 709 961 5 8, 746 961 1 8, 374 961 9 8, 783 961 12 8, 525 959 12 7, 614 959 3 7, 746 959 0 7, 346 959 14 7, 746 509 3 7, 374 509 17 7, 548 962 19 7, 563 962 16 7, 614 962 7 7, 614 448 16 7, 525 448 19 7, 709 256 19 8, 614 256 16 8, Sorry for the mess of data, I'm not sure how to attach something. It looks like a data-dependent problem. So is there another method I can use, do I need to punt on that line, or can I alter the data in some way to make it not crash?
Jun 02 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 3:57 PM, Stiff wrote:

 Here's the least code I can reproduce it with:

 //appropriate imports here

 void main(string[] args)
 {
      auto groupFile = new File(args[1]);
      int groupNum = 0;
      foreach (groupLine; groupFile.byLine()){
          writeln(groupNum);
          groupNum++;
      }
 }

 Data (i.e. groupFile, 2 lines):
This looks strikingly similar to an issue already in bugzilla: https://issues.dlang.org/show_bug.cgi?id=14578 Look at the description of comment 4. I could not reproduce your error either (on OSX version 2.067.0). What OS/arch are you using? I'll try it in linux. -Steve
Jun 02 2015
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 4:30 PM, Steven Schveighoffer wrote:

 I could not reproduce your error either (on OSX version 2.067.0). What
 OS/arch are you using? I'll try it in linux.
Nope, still can't reproduce. -Steve
Jun 02 2015
prev sibling next sibling parent reply "anonymous" <anonymous example.com> writes:
On Tuesday, 2 June 2015 at 19:18:15 UTC, Stiff wrote:
 Hi,

 I'm reading in a rather large text file (~25Mb) line by line, 
 where I don't need to hang on to a line for more than one 
 iteration through my loop. I'm consistently getting an 
 InvalidMemoryOperationError on my 2,547th iteration, and based 
 on console outputs, I'm fairly certain it's coming from the 
 byLine() method. To be clear, I'm just doing a simple:

 foreach (line; file.byLine()){ ... }

 Am I possibly screwing something up? I haven't declared any 
 classes or written any destructors. Is this a bug? If so, is 
 there a workaround?

 Thanks!
Might be issue 14005/13856 (14005 has the InvalidMemoryOperationError test case). It's an embarrassing issue that comes up again and again. I really hope a fix makes it through soon. https://issues.dlang.org/show_bug.cgi?id=14005 https://issues.dlang.org/show_bug.cgi?id=13856
Jun 02 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out. -Steve
Jun 02 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 4:43 PM, Steven Schveighoffer wrote:
 On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out.
Hm... I think the issue might possibly be solved. What version of the compiler are you using? -Steve
Jun 02 2015
next sibling parent reply "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer 
wrote:
 On 6/2/15 4:43 PM, Steven Schveighoffer wrote:
 On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I 
 really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out.
Hm... I think the issue might possibly be solved. What version of the compiler are you using? -Steve
I'm on dmd v2.067 (according to dmd --version), using Arch Linux kernel version 4.0.4-2. I installed dmd from the Arch community repo just a couple days ago.
Jun 02 2015
parent reply "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 21:10:20 UTC, Stiff wrote:
 On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer 
 wrote:
 On 6/2/15 4:43 PM, Steven Schveighoffer wrote:
 On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I 
 really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out.
Hm... I think the issue might possibly be solved. What version of the compiler are you using? -Steve
I'm on dmd v2.067 (according to dmd --version), using Arch Linux kernel version 4.0.4-2. I installed dmd from the Arch community repo just a couple days ago.
Sorry, x86_64.
Jun 02 2015
parent reply "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 21:15:19 UTC, Stiff wrote:
 On Tuesday, 2 June 2015 at 21:10:20 UTC, Stiff wrote:
 On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer 
 wrote:
 On 6/2/15 4:43 PM, Steven Schveighoffer wrote:
 On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I 
 really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out.
Hm... I think the issue might possibly be solved. What version of the compiler are you using? -Steve
I'm on dmd v2.067 (according to dmd --version), using Arch Linux kernel version 4.0.4-2. I installed dmd from the Arch community repo just a couple days ago.
Sorry, x86_64.
...and for what it's worth, it looks like my second line has 2046 characters, including the newline, which is sorta consistent with Iain's test case.
Jun 02 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 5:22 PM, Stiff wrote:
 On Tuesday, 2 June 2015 at 21:15:19 UTC, Stiff wrote:
 On Tuesday, 2 June 2015 at 21:10:20 UTC, Stiff wrote:
 I'm on dmd v2.067 (according to dmd --version), using Arch Linux
 kernel version 4.0.4-2. I installed dmd from the Arch community repo
 just a couple days ago.
Sorry, x86_64.
....and for what it's worth, it looks like my second line has 2046 characters, including the newline, which is sorta consistent with Iain's test case.
OK. I can reproduce now with all 3 test cases. I tweaked the line sizes a little bit. Noting that it fails on 2.067.0 and 2.067.1 on Linux x64, but passes just fine on OSX same versions. FWIW, you don't need anything except foreach(line; somefile.byLine()) {} as your main code. I think the correct fix is to fix readln as outlined in Ranier's pull request: https://github.com/D-Programming-Language/phobos/pull/2794 But that seems to be stalled. I will at least try to fix the assumeSafeAppend calls so they are not crashing. -Steve
Jun 02 2015
parent reply "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 21:35:54 UTC, Steven Schveighoffer 
wrote:
 OK. I can reproduce now with all 3 test cases. I tweaked the 
 line sizes a little bit. Noting that it fails on 2.067.0 and 
 2.067.1 on Linux x64, but passes just fine on OSX same versions.

 FWIW, you don't need anything except foreach(line; 
 somefile.byLine()) {} as your main code.

 I think the correct fix is to fix readln as outlined in 
 Ranier's pull request: 
 https://github.com/D-Programming-Language/phobos/pull/2794

 But that seems to be stalled. I will at least try to fix the 
 assumeSafeAppend calls so they are not crashing.

 -Steve
So...in the meantime, I'll just pad my input I guess?
Jun 02 2015
parent "anonymous" <anonymous example.com> writes:
On Tuesday, 2 June 2015 at 21:43:45 UTC, Stiff wrote:
 So...in the meantime, I'll just pad my input I guess?
It's a mess and I'm not sure what works and what doesn't, but here are some options: byLineCopy: Could be fine as it doesn't reuse any buffers. readln without passing a buffer: as above. readln with passing a buffer: Problematic, readln won't respect the boundaries of the buffer (issue 13856). But it may work ok, if you give its very own GC allocation. byLine from git head: If you can build phobos from source, the InvalidMemoryOperationError doesn't seem to happen there anymore. readln is still problematic, though.
Jun 02 2015
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 06/02/2015 01:56 PM, Steven Schveighoffer wrote:
 On 6/2/15 4:43 PM, Steven Schveighoffer wrote:
 On 6/2/15 3:35 PM, anonymous wrote:

 It's an embarrassing issue that comes up again and again. I really hope
 a fix makes it through soon.

 https://issues.dlang.org/show_bug.cgi?id=14005
I can reproduce this one. I'll figure this out.
Hm... I think the issue might possibly be solved. What version of the compiler are you using? -Steve
I had the impression that it was solved at some point but the problem came back. I had to remove byLine from my book project altogether so that the sample code can be tested: :) https://bitbucket.org/acehreli/ddili/commits/45c183d078e144d68b96541c858e8fd43b4734e9#Lsrc/codetester.dF257 Ali
Jun 02 2015
prev sibling parent reply "anonymous" <anonymous example.com> writes:
On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer 
wrote:
 Hm... I think the issue might possibly be solved. What version 
 of the compiler are you using?
I think 14005 (and 14578) are fixed in git head, because there's no assumeSafeAppend in byLine anymore. But the underlying issue 13856 is still there: readln still stomps happily over the bounds of its buffer.
Jun 02 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/15 5:42 PM, anonymous wrote:
 On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer wrote:
 Hm... I think the issue might possibly be solved. What version of the
 compiler are you using?
I think 14005 (and 14578) are fixed in git head, because there's no assumeSafeAppend in byLine anymore.
Yep, I can confirm that. Stiff, sorry to say that you are stuck unless you want to use the bleeding edge of D, which means building from source. Wish I had a better answer. And in reality, even if it needed a fix and I provided it, this was going to be your option. -Steve
Jun 02 2015
parent "Stiff" <stiff example.com> writes:
On Tuesday, 2 June 2015 at 22:36:36 UTC, Steven Schveighoffer 
wrote:
 On 6/2/15 5:42 PM, anonymous wrote:
 On Tuesday, 2 June 2015 at 20:56:41 UTC, Steven Schveighoffer 
 wrote:
 Hm... I think the issue might possibly be solved. What 
 version of the
 compiler are you using?
I think 14005 (and 14578) are fixed in git head, because there's no assumeSafeAppend in byLine anymore.
Yep, I can confirm that. Stiff, sorry to say that you are stuck unless you want to use the bleeding edge of D, which means building from source. Wish I had a better answer. And in reality, even if it needed a fix and I provided it, this was going to be your option. -Steve
Thanks for the workarounds, Steve and anonymous. I ended up getting around it by just using awk to pad any of my input lines that were around 2048 chars with a few spaces, since I have access to that file. Not particularly convenient, but it works. Seeing as this is only my second day of using D (thus why I posted this to .learn), I'll try to get a grip on the language itself before I start playing with the bleeding edge. :)
Jun 02 2015
prev sibling next sibling parent reply "Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
On Tuesday, 2 June 2015 at 19:18:15 UTC, Stiff wrote:
 iteration through my loop. I'm consistently getting an 
 InvalidMemoryOperationError on my 2,547th iteration, and based 
 on console outputs, I'm fairly certain it's coming from the 
 byLine() method. To be clear, I'm just doing a simple:
I get this too! Interestingly enough this self-contained alternative solution https://github.com/nordlow/justd/blob/master/bylinefast.d is about 2-3 times faster and does *not* trigger the error. I have it used successfully for reading files containing tens of millions of lines of text so I believe it works!
Jun 02 2015
parent "Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
On Tuesday, 2 June 2015 at 20:06:12 UTC, Per Nordlöw wrote:
 https://github.com/nordlow/justd/blob/master/bylinefast.d
I'm thinking of renaming it to something other than `byLine` (perhaps `byChunk`) because it's (no longer) limited to reading "lines". This because the terminator is an arbitrary string.
Jun 02 2015
prev sibling next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Tuesday, June 02, 2015 19:18:13 Stiff via Digitalmars-d-learn wrote:
 Hi,

 I'm reading in a rather large text file (~25Mb) line by line,
 where I don't need to hang on to a line for more than one
 iteration through my loop. I'm consistently getting an
 InvalidMemoryOperationError on my 2,547th iteration, and based on
 console outputs, I'm fairly certain it's coming from the byLine()
 method. To be clear, I'm just doing a simple:

 foreach (line; file.byLine()){ ... }

 Am I possibly screwing something up? I haven't declared any
 classes or written any destructors. Is this a bug? If so, is
 there a workaround?
We'd need to see actual code which exhibited the problem in order to say much, but it my be related to the fact that byLine reuses the buffer that it uses for front, so you can't keep it around between iterations. If you want to do that, you have to dup/idup it or use byLineCopy. But that's just a shot in the dark. Without seeing actual code, it's hard to judge what could be going on. - Jonathan M Davis
Jun 03 2015
prev sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Wednesday, June 03, 2015 01:38:17 Jonathan M Davis via Digitalmars-d-learn
wrote:
 On Tuesday, June 02, 2015 19:18:13 Stiff via Digitalmars-d-learn wrote:
 Hi,

 I'm reading in a rather large text file (~25Mb) line by line,
 where I don't need to hang on to a line for more than one
 iteration through my loop. I'm consistently getting an
 InvalidMemoryOperationError on my 2,547th iteration, and based on
 console outputs, I'm fairly certain it's coming from the byLine()
 method. To be clear, I'm just doing a simple:

 foreach (line; file.byLine()){ ... }

 Am I possibly screwing something up? I haven't declared any
 classes or written any destructors. Is this a bug? If so, is
 there a workaround?
We'd need to see actual code which exhibited the problem in order to say much, but it my be related to the fact that byLine reuses the buffer that it uses for front, so you can't keep it around between iterations. If you want to do that, you have to dup/idup it or use byLineCopy. But that's just a shot in the dark. Without seeing actual code, it's hard to judge what could be going on.
LOL. And of course, the later messages appear in my account after I responded, so _now_ I can see that my response was unnecessary. :| Oh, well. Clearly, my e-mail client hates me. ;) - Jonathan M Davis
Jun 03 2015