digitalmars.D.bugs - [Issue 1985] New: import expression should return ubyte[] not string

d-bugmail puremagic.com (22/22) Apr 10 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1985

Janice Caron (2/3) Apr 11 2008 Shouldn't it return invariant(ubyte)[] ?

Bill Baxter (3/7) Apr 11 2008 In D2 probably so, but I filed the bug against D1.028.

Janice Caron (8/10) Apr 11 2008 Makes sense.

d-bugmail puremagic.com (7/7) Apr 11 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1985

Janice Caron (10/12) Apr 11 2008 No, it should be ubyte. The reason is that void arrays can contain

d-bugmail puremagic.com (17/17) Apr 11 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1985
d-bugmail puremagic.com (10/28) Apr 11 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1985

Aarti_pl (15/21) Apr 11 2008 I think I understand your way of thinking. But the problem here is

Frank Benoit (7/34) Apr 11 2008 foreach( c; import("data")){

d-bugmail puremagic.com (10/14) Apr 11 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1985
d-bugmail puremagic.com (14/14) May 27 2012 http://d.puremagic.com/issues/show_bug.cgi?id=1985

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985

           Summary: import expression should return ubyte[] not string
           Product: D
           Version: 1.028
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: wbaxter gmail.com


The compiler does not convert the encoding of the imported files to utf8, so it
should not pretend that it knows the contents of the file are utf8.

In fact, probably one of the most practically useful applications of the import
expression is to import binary files, which it is impossible for the compiler
to put in utf8 format.

So the conclusion is that import("foo.dat") should evaluate to ubyte[], not
char[].  It can be cast to char[] if the developer happens to know that the
file is, in fact, text.  Currently the situation is reversed -- the data must
be cast to ubyte[] if the developer knows it is not, in fact, utf8 text.


--

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 11/04/2008, d-bugmail puremagic.com <d-bugmail puremagic.com> wrote:
    import expression should return ubyte[] not string

Shouldn't it return invariant(ubyte)[] ?

Apr 11 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Janice Caron wrote:
 On 11/04/2008, d-bugmail puremagic.com <d-bugmail puremagic.com> wrote:
    import expression should return ubyte[] not string

 
 Shouldn't it return invariant(ubyte)[] ?

In D2 probably so, but I filed the bug against D1.028.

--bb

Apr 11 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 11/04/2008, Bill Baxter <dnewsgroup billbaxter.com> wrote:
 Shouldn't it return invariant(ubyte)[] ?

  In D2 probably so, but I filed the bug against D1.028.

Makes sense.

I suppose in D2 it depends on the answer to the following question. If I write

    auto a = import("filename");
    auto b = import("filename");

do we get two separate copies, or do a and b both point to the same
memory? If the latter, it should definitely be invariant(ubyte)[] in
D2, though as you say, ubyte[] in D1.

Apr 11 2008

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985






Yes, I also think that implying that imported file is char[] is not best
decision. 

Maybe even imported type should be void[], so that it must be explicitly casted
to proper type.


--

Apr 11 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 11/04/2008, d-bugmail puremagic.com <d-bugmail puremagic.com> wrote:
  Maybe even imported type should be void[], so that it must be explicitly
casted
  to proper type.

No, it should be ubyte. The reason is that void arrays can contain
pointers, and ubyte arrays can't. A void array means that the garbage
collector has to scan it, looking for anything that looks like it
might be an address, and if it finds such a collection of bits by
accident, then something will be marked as "in use", that actually
isn't.

If the array came from a file, it can't very well have meaningful
pointers into RAM, so I agree with the original poster that it should
be ubyte[] for D1, and I would add invariant(ubyte)[] for D2.

Apr 11 2008

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985








This isn't (direct) argument against my proposal. It would be if GC scanning
void[] is fundamentally "good thing". But in fact I don't think that in this
case it is such kind of design decision. (There were already proposals to
change this behavior.)

But in case of reading external files, the problem is that compiler just *don't
know* format of imported file. So IMHO the best thing to do is to reflect this
situation in language, and force user to cast content of file to real type.

In case there will stay default cast to some type in import it is really
difficult to justify which default behavior is better. I agree with Bill that
in most GUI application it would be better to have imported array of bytes. But
you can also think about DB framework in which external file is used to define
schema of database (see: U++ framework written in C++). In this case some text
format is much more natural...


--

Apr 11 2008

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985








 
 This isn't (direct) argument against my proposal. It would be if GC scanning
 void[] is fundamentally "good thing". But in fact I don't think that in this
 case it is such kind of design decision. (There were already proposals to
 change this behavior.)
 
 But in case of reading external files, the problem is that compiler just *don't
 know* format of imported file. So IMHO the best thing to do is to reflect this
 situation in language, and force user to cast content of file to real type.
 
 In case there will stay default cast to some type in import it is really
 difficult to justify which default behavior is better. I agree with Bill that
 in most GUI application it would be better to have imported array of bytes. But
 you can also think about DB framework in which external file is used to define
 schema of database (see: U++ framework written in C++). In this case some text
 format is much more natural...
 


Your argument is right on, but ubyte[] *is* the type that means "I don't know
what the heck this data is, but it's just data, not pointers".  void[] means "I
don't know what the heck this data is, it might even be full of pointers".  

I don't see any way that file that is on disk at *compile time* could contain
pointers that are relevant to the program later on at *run time*.  So ubyte[]
is the proper type.


--

Apr 11 2008

Aarti_pl <aarti interia.pl> writes:

 Your argument is right on, but ubyte[] *is* the type that means "I don't know
 what the heck this data is, but it's just data, not pointers".  void[] means "I
 don't know what the heck this data is, it might even be full of pointers".  
 I don't see any way that file that is on disk at *compile time* could contain
 pointers that are relevant to the program later on at *run time*.  So ubyte[]
 is the proper type.

I think I understand your way of thinking. But the problem here is 
different. With void[] you can not do anything, so you have to cast. 
With ubyte[] you can use data as they are, because they have in fact 
specific type. But what if file contains array of int's? In such a case 
you are in exactly same situation as with char[]. Compiler chooses one 
type to which it casts by default, exactly like currently with char[]. 
And this choice can be wrong. If it is good or wrong depends only on 
application. In GUI ubyte[] is more appropriate (e.g. importing icon), 
but in DB framework string is better (importing db schema).

Having written this all I slowly get to conclusion that current 
situation is not so bad :-) char[] would be usually much more 
appropriate for lower layers in application than ubyte[] (mostly for 
loading and compiling domain languages). Alternative with void[] is IMHO 
theoretically better, but practically you will have even more casts in 
you code... Tough decision... :-)

Apr 11 2008

Frank Benoit <keinfarbton googlemail.com> writes:

Aarti_pl schrieb:
 Your argument is right on, but ubyte[] *is* the type that means "I 
 don't know
 what the heck this data is, but it's just data, not pointers".  void[] 
 means "I
 don't know what the heck this data is, it might even be full of 
 pointers".  I don't see any way that file that is on disk at *compile 
 time* could contain
 pointers that are relevant to the program later on at *run time*.  So 
 ubyte[]
 is the proper type.

 
 I think I understand your way of thinking. But the problem here is 
 different. With void[] you can not do anything, so you have to cast. 
 With ubyte[] you can use data as they are, because they have in fact 
 specific type. But what if file contains array of int's? In such a case 
 you are in exactly same situation as with char[]. Compiler chooses one 
 type to which it casts by default, exactly like currently with char[]. 
 And this choice can be wrong. If it is good or wrong depends only on 
 application. In GUI ubyte[] is more appropriate (e.g. importing icon), 
 but in DB framework string is better (importing db schema).
 
 Having written this all I slowly get to conclusion that current 
 situation is not so bad :-) char[] would be usually much more 
 appropriate for lower layers in application than ubyte[] (mostly for 
 loading and compiling domain languages). Alternative with void[] is IMHO 
 theoretically better, but practically you will have even more casts in 
 you code... Tough decision... :-)

foreach( c; import("data")){
	// do something
}

- cannot work with void[]
- will always work in a reasonable way with ubyte[]
- might throw UnicodeException with char[]

Apr 11 2008

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985







 I think I understand your way of thinking. But the problem here is 
 different. With void[] you can not do anything, so you have to cast. 
 With ubyte[] you can use data as they are, because they have in fact 
 specific type. 

Well, it's a specific type that represents raw memory.

But I get your point too.  If there were a void_without_pointers type that
couldn't be used without casting, then I'd be happy for import("file") to use
that.  But with void[] getting scanned by the gc for pointers it's a no-go.  If
you do just one import("big_binary_file") it could add enough false pointers
that your app would hold on to big gobs of memory it doesn't need to.


--

Apr 11 2008

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1985


dawg dawgfoto.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dawg dawgfoto.de
           Platform|x86                         |All
            Version|1.028                       |D1 & D2
         OS/Version|Windows                     |All



UTF validation and conversions can be done at compile time,
although the interpreter is currently a little slow for this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

May 27 2012

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 1985] New: import expression should return ubyte[] not string