www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4572] New: std.file.read return type

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572

           Summary: std.file.read return type
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



I am unsure about this, but I think that it is better for std.file.read() to
return something like a ubyte[] instead of a void[].

It's a problem to perform a cast(ubyte[]) in SafeD:
auto data = cast(ubyte[])std.file.read();

Or maybe in SafeD I have to use other safer functions to load binary data, like
slurp() or something similar.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 02 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572


David Simcha <dsimcha yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsimcha yahoo.com



I agree, and here's another reason to add to the list:

void[] implies an array that may contain pointers.  I just looked at the impl.
of std.file and was surprised to learn that it allocates the array as not
containing pointers via GC.malloc.  However, when you do a new void[N], you get
an array that may contain pointers.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 03 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572


nfxjfg gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg gmail.com



I say using the void[] type is a design error in this case. void[] is just a
safer void*. void[] should only be used in cases when you have to reference a
specific region of memory. You use it because the type system is not expressive
enough to describe the actual type (see Variant and so on).

But in this case, it's very clear what the actual data type is: it's an array
of bytes. File contents are nothing else than untyped bag of bytes. Thus the
return type should be ubyte[].

Note that reinterpret casting ubyte[] slices to other arrays or structs is not
clean: there are issues of byte order and padding. This too doesn't apply to
void[]. There's a clear difference.

Tange makes the same error.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 03 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




Thank you David and nfxjfg :-)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 03 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




One can add that .dup-ing a void[] will make all the precaution in
std.file.read of allocating the void[] with GC.BlkAttr.NO_SCAN useless. The
dup'ed array won't have the NO_SCAN flag set. It really shows that void[] is
not the natural type to use here.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 03 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com



05:24:44 PDT ---

 One can add that .dup-ing a void[] will make all the precaution in
 std.file.read of allocating the void[] with GC.BlkAttr.NO_SCAN useless. The
 dup'ed array won't have the NO_SCAN flag set. It really shows that void[] is
 not the natural type to use here.
any array operation which copies a block to another copies the flags from the original array block. So the NO_SCAN flag should persist. If it doesn't, that's a bug (looking at it, dup is the only function that doesn't copy the block attributes, I'll address this on the mailing list). I think the void[] type is more consistent than ubyte. Consider two things. First, a read where the array is provided by the caller. If this function is typed: ubyte[] read(ubyte[] into); then, you must cast your data to a ubyte[]. But void[] can be implicitly cast to from anything, so void[] wins here. Second, a write operation: int write(ubyte[] from); Again, cast is necessary where void[] would not require it. To be sure std.file.read() could return a ubyte[], and unless you intend to actually deal with it as a ubyte[], you need to cast to the type you need. But shouldn't it be consistent with other operations? But we can forgo all this stuff if we add a template parameter to read, so you can specify exactly the type you want. If you know your file is a bunch of int's, you could do: std.file.read!(int)(); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




Thank you for your answer Steven.

 then, you must cast your data to a ubyte[].  But void[] can be implicitly cast
 to from anything, so void[] wins here.
If you compile this program with dmd 2.047: void main() { auto v = new void[10]; ubyte a1, a2; a1 = v; v = a2; } It produces the errors: test.d(4): Error: cannot implicitly convert expression (v) of type void[] to ubyte test.d(5): Error: cannot implicitly convert expression (a2) of type ubyte to void[]
 But we can forgo all this stuff if we add a template parameter to read, so you
 can specify exactly the type you want.  If you know your file is a bunch of
 int's, you could do:
 std.file.read!(int)();
It seems a good idea. So I presume std.file.read!(int[])(); reads a matrix. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




06:02:21 PDT ---

 Thank you for your answer Steven.
 
 then, you must cast your data to a ubyte[].  But void[] can be implicitly cast
 to from anything, so void[] wins here.
If you compile this program with dmd 2.047: void main() { auto v = new void[10]; ubyte a1, a2; a1 = v; v = a2; } It produces the errors: test.d(4): Error: cannot implicitly convert expression (v) of type void[] to ubyte test.d(5): Error: cannot implicitly convert expression (a2) of type ubyte to void[]
Yes, D does not convert non-arrays to void[]. You can do something like this: v = (&a2)[0..1]; which is typically what is done. ubyte[] provides no automatic conversion from anything, including other array types.
 But we can forgo all this stuff if we add a template parameter to read, so you
 can specify exactly the type you want.  If you know your file is a bunch of
 int's, you could do:
 std.file.read!(int)();
It seems a good idea. So I presume std.file.read!(int[])(); reads a matrix.
Well, I would assume it would return an int[][], which probably would mean nothing since arrays are pointer/length values, and any pointer/length values read from a file would be bogus. I'd say read should reject reading elements that have references in them. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




Steven Schveighoffer:

 Well, I would assume it would return an int[][], which probably would mean
 nothing since arrays are pointer/length values, and any pointer/length values
 read from a file would be bogus.  I'd say read should reject reading elements
 that have references in them.
Yes, you are right, I probably meant a static array, so this reads a dynamic array of ints: std.file.read!(int[])(); A static array of two ints: std.file.read!(int[2])(); A static array of static array of ints (the memory of a fixed-size matrix is contiguous in D): std.file.read!(int[2][5])(); The problem is that such syntax doesn't read the items of the static array in-place, the result is a static array that gets copied. So you need an argument by ref: int[2][5] m; std.file.read(T)(ref T m); I don't like this a lot. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572





 any array operation which copies a block to another copies the flags from the
 original array block.  So the NO_SCAN flag should persist.  If it doesn't,
 that's a bug (looking at it, dup is the only function that doesn't copy the
 block attributes, I'll address this on the mailing list).
Concatenating arrays doesn't preserve the bit either. And if the user allocates the void[], the NO_SCAN bit won't be set anyway. The user must remember to allocate a ubyte[] array to get it right. How many users will even understand why? (Also if you copy one array into another, flags won't be preserved. If you allocate the void[] with C-malloc, lifetime.d won't do the right thing either.) Let me repeat: the only reason why you want to use void[] here is because void[] implicitly converts to any other array type. But doing that is bogus anyway. If you convert it to an int[], you implicitly assume the data stored on the disk is in the same endian as your machine's. If it's an array of struct, you assume there are no padding issues (or further byte order issues). If it's a char[], you assume it's valid utf-8. No really, there should be specialized functions for proper I/O with certain types. In no case should the user be encouraged to do the false thing. void[] encourages to do 2 false things: 1. allocating the incorrect type, and 2. reading/writing data in a machine specific format. I think it's a good thing to remind the user of 2. by forcing him to explicitly cast the array. Yes, dear user, you ARE reinterpret casting your data to a bunch of bytes! I guess Andrei will remove the usage of void[] anyway because of his beloved Java subset of D. void[] would allow aliasing pointers with integers, which is a no-no in SafeD.
 But we can forgo all this stuff if we add a template parameter to read, so you
 can specify exactly the type you want.  If you know your file is a bunch of
 int's, you could do:
 
 std.file.read!(int)();
I guess nothing can stop the devs from killing Phobos with even more template bloat. Your function call signature forgets anything about endians, though. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




10:31:17 PDT ---


 any array operation which copies a block to another copies the flags from the
 original array block.  So the NO_SCAN flag should persist.  If it doesn't,
 that's a bug (looking at it, dup is the only function that doesn't copy the
 block attributes, I'll address this on the mailing list).
Concatenating arrays doesn't preserve the bit either.
Yes, that should be addressed too, thanks for pointing it out!
 And if the user allocates
 the void[], the NO_SCAN bit won't be set anyway. The user must remember to
 allocate a ubyte[] array to get it right. How many users will even understand
 why? (Also if you copy one array into another, flags won't be preserved.
You're splitting hairs here. ubyte and void are both just as nebulous, it's just that void[] requires you to cast to something before reading it. It means "I don't know what the type is." ubyte means "this is an array of unsigned bytes", which is somewhat as nebulous, and one could argue more useful. But wouldn't it be best to just ask for the type you wish? What if you want to read a utf-8 file, wouldn't it be better for file.read() to return a char[]? I think the template solution works best.
 If you
 allocate the void[] with C-malloc, lifetime.d won't do the right thing either.)
Who's allocating a void[] with c-malloc? I thought std.file.read allocated the data with gc_malloc?
 Let me repeat: the only reason why you want to use void[] here is because
 void[] implicitly converts to any other array type.
Well, it's for consistency. void[] does not implicitly convert to another array type, rather another array type can implicitly convert to void[]. So as far as returning void[], it forces you to cast, even if you expect the data to be ubyte[].
 But doing that is bogus anyway. If you convert it to an int[], you implicitly
 assume the data stored on the disk is in the same endian as your machine's. If
 it's an array of struct, you assume there are no padding issues (or further
 byte order issues). If it's a char[], you assume it's valid utf-8.
So? Is one not allowed to assume the contents of a file are in a format they expect? Typically one inserts magic numbers or something similar in a fie to ensure the file is correct, but I don't see why casting makes the user more keenly aware that the file might be incorrect.
 I think it's a good thing to remind the user of 2. by forcing him to explicitly
 cast the array. Yes, dear user, you ARE reinterpret casting your data to a
 bunch of bytes!
And that is what a void[] return does... not sure what you are getting at here.
 I guess Andrei will remove the usage of void[] anyway because of his beloved
 Java subset of D. void[] would allow aliasing pointers with integers, which is
 a no-no in SafeD.
hehe, I don't think you know how close Andrei and Java are.
 But we can forgo all this stuff if we add a template parameter to read, so you
 can specify exactly the type you want.  If you know your file is a bunch of
 int's, you could do:
 
 std.file.read!(int)();
I guess nothing can stop the devs from killing Phobos with even more template bloat. Your function call signature forgets anything about endians, though.
All it does is apply a type to a common function's return. Endianness is an implementation issue. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 04 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




05:13:42 PDT ---
After discussion on the druntime mailing list, it was agreed that dup should
preserve the bits, but we decided to leave concatenation as-is.  This shouldn't
be a huge issue, because one doesn't often concatenate void[]s together. 
There's no clear choice in concatenation because multiple arrays are being
concatenated together, and none of them is going to "own" the data afterwards.

So I think in light of dup now preserving the scan bit, the issue of using
void[] as the return type of std.file.read (non-templated version) is moot.

But I still think being able to apply a type to the return data array via a
template parameter would be a good enhancement.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 06 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572




10:39:39 PDT ---
To me void[] stinks a lot and it's not a big deal to avoid it completely. It's
an abomination from trying to evolve C concepts for D.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 15 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4572


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2011