www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6495] New: array(file.byLine()) is a problem

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495

           Summary: array(file.byLine()) is a problem
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: andrej.mitrovich gmail.com



12:12:05 PDT ---
I can use readText, but if I want to use the byLine range with a call to
array() to save the lines of a text I end up getting garbage (I know why):

auto file = File("foo.txt", "r");
auto lineBuff = array(file.byLine());

I don't know if it can be fixed, I mean array() is not to blame (it only
exhausts a range), and byLine reuses a char[] buffer to stay efficient. But it
looks like something that could be thought about.

As a workaround I can do:

lineBuff = readText(r"file.txt").splitLines;

So it's not a big issue, but I ran into it and hence this report.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 14 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495


Vladimir Panteleev <thecybershadow gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow gmail.com



12:14:15 PDT ---
Another workaround:

array(map!`a.idup`(file.byLine()))

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 14 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor



12:33:03 PDT ---

 Another workaround:
 
 array(map!`a.idup`(file.byLine()))
Thanks, first I tried .idup on the right side but that didn't work. I think this falls under "suble things you should be aware of", not really a bug. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 14 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc



One solution to avoid this problem is to modify byLine to perform a copy on
default and not perform it on request:

auto file = File("foo.txt", "r");
auto lineBuff = array(file.byLine());

==> good



auto file = File("foo.txt", "r");
auto lineBuff = array(file.byLine!false());

==> does't copy, lineBuff contains garbage

This makes the code safe (but a little slower) on default, when you don't know
what you are doing, and offers a simple way to have faster code when you know
what you are doing and you know you don't want a copy (this ides also follows
the general safety philosophy of D).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 16 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495




I like the idea of making byLine non-transient by default, but allowing a
compile-time option to make it transient when you want to optimize your code
(so the user is responsible for making sure transience doesn't break his code
before turning it on).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 31 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495




I don't like the use of a boolean flag, though. Maybe byLine!CopyBuffer()
(default) vs. byLine!ReuseBuffer()?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 31 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495




06:33:14 PDT ---

 I like the idea of making byLine non-transient
This "non-transient" term is being used often lately but it has no definition on the website, do you think you could write up a pull for dlang.org so we can officially document it? Otherwise we're lost in the conversation. :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495




Transient just means the value of .front changes when you call .popFront(), so
you cannot simply assign .front to a variable and expect it to retain its value
after you popFront(). Non-transient just means that you can assign .front to a
variable and rest assured that popFront() will not mutate its value from under
you.

I'm not sure we want to put this on the website just yet -- it *is* a term that
we coined for the sake of distinguishing between, say, the current byLine(),
which reuses the .front buffer, and the proposed change where it calls .dup on
every line returned.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com




 Transient just means the value of .front changes when you call .popFront(), so
 you cannot simply assign .front to a variable and expect it to retain its value
 after you popFront(). Non-transient just means that you can assign .front to a
 variable and rest assured that popFront() will not mutate its value from under
 you.
 
 I'm not sure we want to put this on the website just yet -- it *is* a term that
 we coined for the sake of distinguishing between, say, the current byLine(),
 which reuses the .front buffer, and the proposed change where it calls .dup on
 every line returned.
Not *quite* as simple as that. If front returns a "non reference type" (eg an int) but by reference, is it transient? struct S { int front; void popFront() {++front;} enum empty = false; } int a = myRange.front; //Non transient int* b = &myRange.front; //Transient myRange.popFront(); assert(*b == a); //Fails Is that range transient? Is the range "reference transient"? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495




No, transience is a property of the *range*, not what you do with the data
returned by the range. Taking the address of .front technically undefined,
since you can't guarantee that that's even possible for arbitrary range types
(in general, it's not possible, since .front may be implemented as a  property
function). If .front were a ref function, then you may have a point.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 01 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6495





 Taking the address of .front technically undefined,
 since you can't guarantee that that's even possible for arbitrary range types
 (in general, it's not possible, since .front may be implemented as a  property
 function). If .front were a ref function, then you may have a point.
That's BS. That's like saying indexing is undefined because an arbitrary range isn't indexable. We have the `hasLValueElements` trait specifically for this. Ranges that offer references to their internal exist, and are a class of their own. Also, the " property" thing is a non-argument: It is just buggy behaviour; The new implementation (as explained by Kenji) is that &propFun should and *will* return the address of what is returned after having called propFun. That said, I'll *agree* that there is not a single trait that guarantees references remain valid after a pop, nor does *anybody* rely on it, so we can indeed safely say that my point was, well, pointless :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 01 2013