www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why are void[] contents marked as having pointers?

reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
I just went through a ~15000-line project and replaced most occurrences of
void[]. Now the project is an ugly mess of void[], ubyte[] and casts, but at
least it doesn't leak memory like crazy any more.

I don't know why it was decided to mark the contents of void[] as "might have
pointers". It makes no sense! Consider:

1) void[] has this wonderful, magical property that any array type implicitly
casts to void[]. This makes it wonderful to use in libraries and functions that
manipulate data with no regards to what it actually contains. Network
libraries, compression libraries, etc. - right about anywhere where you'd use a
void* and length in C++, a void[] is just and appropriate.
2) Despite that void[] is "typeless", you can still operate on it - namely,
slice and concatenate them. Pass a void[] to a network send() function - how
much did you send? Half the buffer? No problem, slice it away and store the
rest - and no casts.
3) It's very rare in practice that the only pointer to your object (which you
still plan to access later) to be stored in a void[]-allocated array! Remember,
the properties of memory regions are determined when the memory is allocated,
so casting an array of structures to a void[] will not lose you that reference.
You'd need to move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the void[]), then
drop the reference to your original structure, for this to happen.

Here's a simple naive implementation of a buffer:

void[] buffer;
void queue(void[] data)
{
	buffer ~= data;
}
...
queue([1,2,3][]);
queue("Hello, World!");

No casts! So simple and beautiful. However, should you use this pattern to work
with larger amounts of data with a high entropy, the "minefield" effect will
cause the GC to stop collecting most data. Sure, you can call
std.gc.hasNoPointers, but you need to do it after every single concatenation...
and it makes expressions with more than one concatenation unsafe.

I heard that Tango copies over the properties of arrays when they are
reallocated, which helps but solves the problem only partially.

So, I ask you: is there actually code out there that depends on the way void[]
works right now? I brought up this argument a year or so ago on IRC, and there
were people who defended ferociously the current design using idealisms ("it
should work like what it sounds like, it should contain any type" or something
like that), but I've yet to see a practical argument.


P.S. How come the standard library doesn't have a simple function like this?

T[] toArray(T)(inout T data) { return (&data)[0..1]; }

It happens often that I need to get a slice of memory around an object's
reference (for example to pass it to a function that takes a void[] :D), and
typing (&x)[0..1] every time feels like a hack.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com
May 31 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
May 31 2009
next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 31 May 2009 22:41:47 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
This isn't about performance, this is about having one thousand casts all over my code. It becomes a burden to cast everything to ubyte[] when working with abstract binary data. For example, when building a MIME multipart message with binary fields, every line needs to have a cast in it - when we could have just used the ~= operator to append to a void[]. Alternative solutions would be to have a second type (either new or one of the existing, e.g. ubyte[]) act as void[] (any array type casts to it implicitly) but not be scanned by the GC, but I doubt this is something you'll consider -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Vladimir Panteleev wrote:
 On Sun, 31 May 2009 22:41:47 +0300, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
  "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your 
 object (which you still plan to access later) to be stored in a 
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
This isn't about performance, this is about having one thousand casts all over my code. It becomes a burden to cast everything to ubyte[] when working with abstract binary data. For example, when building a MIME multipart message with binary fields, every line needs to have a cast in it - when we could have just used the ~= operator to append to a void[].
Another alternative would be to allow implicitly casting arrays of any type to const(ubyte)[] which is always safe. But I think this is too much ado about nothing - you're avoiding the type system to start with, so use ubyte, insert a cast, and call it a day. If you have too many casts, the problem is most likely elsewhere so that argument I'm not buying. Andrei
May 31 2009
next sibling parent reply BCS <none anon.com> writes:
Hello Andrei,

 Vladimir Panteleev wrote:
 
 This isn't about performance, this is about having one thousand casts
 all over my code. It becomes a burden to cast everything to ubyte[]
 when working with abstract binary data. For example, when building a
 MIME multipart message with binary fields, every line needs to have a
 cast in it - when we could have just used the ~= operator to append
 to a void[].
 
Another alternative would be to allow implicitly casting arrays of any type to const(ubyte)[] which is always safe.
sounds like something that might work.
 But I think this is too
 much ado about nothing - you're avoiding the type system to start
 with,
I'm not sure he is (or at least, he is in a very well defined way; "I need to look at this data as its bytes")
 so use ubyte, insert a cast, and call it a day. If you have too
 many casts, the problem is most likely elsewhere
You might be correct, but I don't think any of us have enough info right now to make that assertion.
May 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
BCS wrote:
 so use ubyte, insert a cast, and call it a day. If you have too
 many casts, the problem is most likely elsewhere
You might be correct, but I don't think any of us have enough info right now to make that assertion.
Oh there is enough information. What's needed is: const(ubyte)[] getRepresentation(T)(T[] data) { return cast(typeof(return)) data; } If you have many calls to getRepresentation(), then that anticlimatically shows that you need to look at arrays' representations often. If there are too many of those, maybe some of the said arrays should be dealt with as ubyte[] in the first place. Andrei
May 31 2009
next sibling parent BCS <none anon.com> writes:
Hello Andrei,

 BCS wrote:
 
 so use ubyte, insert a cast, and call it a day. If you have too many
 casts, the problem is most likely elsewhere
 
You might be correct, but I don't think any of us have enough info right now to make that assertion.
Oh there is enough information. What's needed is: const(ubyte)[] getRepresentation(T)(T[] data) { return cast(typeof(return)) data; } If you have many calls to getRepresentation(), then that anticlimatically shows that you need to look at arrays' representations often. If there are too many of those, maybe some of the said arrays should be dealt with as ubyte[] in the first place.
Maybe in some cases but if the primary function of the code is processing stuff between "raw data" and other data types than the above is irrelevant. The OP sort of hinted somewhere that this is the kind of thing he is working on. Without knowing what the OP is doing, I still don't think we can say if his program is well designed.
May 31 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }
This is functionally equivalent to (forgive the D1): ubyte[] getRepresentation(void[] data) { return cast(ubyte[]) data; } Since no allocation is done in this case, the use of void[] is safe, and it doesn't instantiate a version of the function for every type you call it with. I remarked about this in my other reply. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }
This is functionally equivalent to (forgive the D1): ubyte[] getRepresentation(void[] data) { return cast(ubyte[]) data; } Since no allocation is done in this case, the use of void[] is safe, and it doesn't instantiate a version of the function for every type you call it with. I remarked about this in my other reply.
This is not safe because you can change the data. Andrei
May 31 2009
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 02:18:46 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }
This is functionally equivalent to (forgive the D1): ubyte[] getRepresentation(void[] data) { return cast(ubyte[]) data; } Since no allocation is done in this case, the use of void[] is safe, and it doesn't instantiate a version of the function for every type you call it with. I remarked about this in my other reply.
Which is why I wrote "forgive the D1" :) I've yet to switch to D2, but it's obvious that the const should be there to ensure safety. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 01 2009
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 But I think this is too much ado about nothing - you're avoiding the type
system to start with, so use ubyte, insert a cast, and call it a day. 
I don't get it - not using casts is avoiding the type system? :P Note that I am NOT up-casting the void[] later back to some other type - it goes out to the network, a file, etc. void[] sounds like it fits perfectly in the type hierarchy for "just a bunch of bytes", except for the "may contain pointers" fine print.
 If you have too many casts, the problem is most likely elsewhere so that
argument I'm not buying.
I could cut down on the number of casts if I were to replace most array appending operations to calls to a function that takes a void[] and then internally casts to an ubyte[] and appends that somewhere. There's a lot of diversity of types being worked with in my case - strings, various structs, more raw data, etc. I'm more annoyed that I'd need to do something like that to work around a design decision that may not have been fully thought out. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Vladimir Panteleev wrote:
 On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 But I think this is too much ado about nothing - you're avoiding
 the type system to start with, so use ubyte, insert a cast, and
 call it a day.
I don't get it - not using casts is avoiding the type system? :P Note that I am NOT up-casting the void[] later back to some other type - it goes out to the network, a file, etc. void[] sounds like it fits perfectly in the type hierarchy for "just a bunch of bytes", except for the "may contain pointers" fine print.
I understand. You are sending around object representation. void[] may contain pointers, so you're simply not looking at the right abstraction.
 If you have too many casts, the problem is most likely elsewhere so
 that argument I'm not buying.
I could cut down on the number of casts if I were to replace most array appending operations to calls to a function that takes a void[] and then internally casts to an ubyte[] and appends that somewhere. There's a lot of diversity of types being worked with in my case - strings, various structs, more raw data, etc. I'm more annoyed that I'd need to do something like that to work around a design decision that may not have been fully thought out.
Walter has written a class called OutBuffer (see std.outbuffer) the likes of which could be used to encapsulate representation marshaling. Andrei
May 31 2009
prev sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Another alternative would be to allow implicitly casting arrays of any  
 type to const(ubyte)[] which is always safe. But I think this is too  
 much ado about nothing - you're avoiding the type system to start with,  
 so use ubyte, insert a cast, and call it a day. If you have too many  
 casts, the problem is most likely elsewhere so that argument I'm not  
 buying.
I've thought about this for a bit. If we allow any *non-reference* type except void[] to implicitly cast to ubyte[], but still allow implicitly casting ubyte[] to void[], it will put ubyte[] in the perfect spot in the type hierarchy - it'll allow safely (portability issues notwithstanding) getting the representation of value-type (POD) arrays, while still allowing abstracting it even further to the "might have pointers" type - at which point it is unsafe to access individual bytes, which void[] disallows without casts. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 01 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 31 May 2009 22:41:47 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
I just realized that by "performance" you might have meant memory leaks. Well, sure, if you can say that my programs crashing every few hours due to running out of memory is a "performance" problem. I'm sorry to sound bitter, but this was the cause of much annoyance for my software's users. It took me to write a memory debugger to understand that no matter how much you chase void[]s with hasNoPointers, there will always be that one ~ which you overlooked. As much as I try to look from an objective perspective, I don't see how a memory leak (and memory leaks in D usually mean that NO memory is being freed, except for small lucky objects not having bogus pointers to them) is a problem less significant than an obscure case that involves allocating a void[], storing a pointer in it and losing all other references to the object. In fact, I just searched the D documentation and I couldn't find a statement saying whether void[] are scanned by the GC or not. Enter mr. D-newbie, who wants to write his own network/compression/file-copying/etc. library/program and stumbles upon void[], the seemingly perfect abstract-binary-data-container type for the job... (which is exactly what happened with yours truly). P.S. Not trying to push my point of view, but just trying to offer some perspective from someone who has been bit by this design choice... -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 I just realized that by "performance" you might have meant memory
 leaks.
No, in this context I meant improving performance by not scanning the void[] memory for pointers.
 Well, sure, if you can say that my programs crashing every few
 hours due to running out of memory is a "performance" problem. I'm
 sorry to sound bitter, but this was the cause of much annoyance for
 my software's users. It took me to write a memory debugger to
 understand that no matter how much you chase void[]s with
 hasNoPointers, there will always be that one ~ which you overlooked.
I'm curious what form of data you have that always seem to look like valid pointers. There are a couple other options you can pursue - moving the gc pool to another location in the address space, or changing the alignment of your void[] data so it won't look like aligned pointers (the gc won't look for misaligned pointers). Or just use ubyte[] instead.
 As much as I try to look from an objective perspective, I don't see
 how a memory leak (and memory leaks in D usually mean that NO memory
 is being freed, except for small lucky objects not having bogus
 pointers to them) is a problem less significant than an obscure case
 that involves allocating a void[], storing a pointer in it and losing
 all other references to the object.
Because one is an obvious failure, and the other will be memory corruption. Memory corruption is pernicious and awful.
 In fact, I just searched the D
 documentation and I couldn't find a statement saying whether void[]
 are scanned by the GC or not. Enter mr. D-newbie, who wants to write
 his own network/compression/file-copying/etc. library/program and
 stumbles upon void[], the seemingly perfect
 abstract-binary-data-container type for the job... (which is exactly
 what happened with yours truly).
 
 P.S. Not trying to push my point of view, but just trying to offer
 some perspective from someone who has been bit by this design
 choice...
Hmm. Wouldn't compression data be naturally a ubyte[] type?
May 31 2009
next sibling parent BCS <none anon.com> writes:
Hello Walter,

 I'm curious what form of data you have that always seem to look like
 valid pointers. There are a couple other options you can pursue -
 moving the gc pool to another location in the address space, or
 changing the alignment of your void[] data so it won't look like
 aligned pointers (the gc won't look for misaligned pointers).
 
Most (but not all) of the cases I can think of where you get false pointers, re-aligning stuff or moving the heap won't help as the false pointer source will hit the full address space.
May 31 2009
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I just realized that by "performance" you might have meant memory
 leaks.
No, in this context I meant improving performance by not scanning the void[] memory for pointers.
 Well, sure, if you can say that my programs crashing every few
 hours due to running out of memory is a "performance" problem. I'm
 sorry to sound bitter, but this was the cause of much annoyance for
 my software's users. It took me to write a memory debugger to
 understand that no matter how much you chase void[]s with
 hasNoPointers, there will always be that one ~ which you overlooked.
I'm curious what form of data you have that always seem to look like valid pointers. There are a couple other options you can pursue - moving the gc pool to another location in the address space, or changing the alignment of your void[] data so it won't look like aligned pointers (the gc won't look for misaligned pointers).
It's just compressed data, which is evenly distributed across the 32-bit address space. Let's do the math: Suppose we have an application which has two blocks of memory, M and N. Block M is a block with random data which is erroneously marked as having pointers, while block N is a block which shouldn't have any pointers towards it. Now, the chance that a random DWORD will point inside N is sizeof(N)/0x100000000 - or rather, we can say that it will NOT point inside N with the probability of 1-(sizeof(N)/0x100000000). For as many DWORDs as there are in M, raise that to the power sizeof(M)/4. For values already as small as 1 MB for M and N, it's pretty much guaranteed that you'll have pointers inside N. Relocating or re-aligning the data won't help - it won't affect the entropy or the value range.
 Or just use ubyte[] instead.
And the casts that come with it :(
 As much as I try to look from an objective perspective, I don't see
 how a memory leak (and memory leaks in D usually mean that NO memory
 is being freed, except for small lucky objects not having bogus
 pointers to them) is a problem less significant than an obscure case
 that involves allocating a void[], storing a pointer in it and losing
 all other references to the object.
Because one is an obvious failure, and the other will be memory corruption. Memory corruption is pernicious and awful.
It is, yes. But if you add "don't put your only references inside void[]s" to the "don'ts" on the GC page, the programmer will only have himself to blame for not reading the language documentations. This goes right along with other tricks IMHO.
 In fact, I just searched the D
 documentation and I couldn't find a statement saying whether void[]
 are scanned by the GC or not. Enter mr. D-newbie, who wants to write
 his own network/compression/file-copying/etc. library/program and
 stumbles upon void[], the seemingly perfect
 abstract-binary-data-container type for the job... (which is exactly
 what happened with yours truly).
  P.S. Not trying to push my point of view, but just trying to offer
 some perspective from someone who has been bit by this design
 choice...
Hmm. Wouldn't compression data be naturally a ubyte[] type?
That's a subjective opinion :) I could just as well continue arguing that void[] is the perfect type for any kind of "opaque" binary data due to its properties. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Vladimir Panteleev wrote:
 That's a subjective opinion :) I could just as well continue arguing
 that void[] is the perfect type for any kind of "opaque" binary data
 due to its properties.
To argue that convincingly, you'd need to disable conversions from arrays of class objects to void[]. Andrei
May 31 2009
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 02:21:33 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 To argue that convincingly, you'd need to disable conversions from  
 arrays of class objects to void[].
You're right. Perhaps implicit cast of reference types to void[] should result in an error. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 02:21:33 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
 To argue that convincingly, you'd need to disable conversions from  
 arrays of class objects to void[].
You're right. Perhaps implicit cast of reference types to void[] should result in an error.
If only there were a way to indicate that void[]s could contain pointers, then they would behave uniformly across types... Oh wait.
May 31 2009
prev sibling next sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Because one is an obvious failure, and the other will be memory  
 corruption. Memory corruption is pernicious and awful.
I wanted to add that debugging memory corruptions and other memory problems for D right now is complicated due to lack of proper tools in this area. Hopefully this will change in the near future. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Hmm. Wouldn't compression data be naturally a ubyte[] type?
(again, something I forgot to add... shouldn't hit Send so soon) Consider this really basic example of file concatenation: auto data = read("file1") ~ read("file2"); // oops! void[] concatenation - minefield created -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Vladimir Panteleev:
 Consider this really basic example of file concatenation:
 auto data = read("file1") ~ read("file2"); // oops! void[] concatenation -
minefield created
I think a better design for that read() function is to return ubyte[]. I have never understood why it returns a void[]. To manage generic data ubyte is better than void[] in your program (sometimes uint[] is useful to increase efficiency compared to ubyte[]). Bye, bearophile
May 31 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:

 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.

 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!
FWIW, I also consider void[] as a storage for an arbitrary untyped binary data, and thus I believe GC shouldn't scan it. While it is possible to prevent GC from scanning an arbitrary void[] array, there is no reasonable way to prevent it from scanning all arrays. It is a breaking change, but may be changed for D2. In 99% it is a correct behavior (and a bug in a rest), but reduces application execution speed significantly. ++vote
May 31 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
 
 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.
  
 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!
  
FWIW, I also consider void[] as a storage for an arbitrary untyped binary data, and thus I believe GC shouldn't scan it. Ignoring void[] arrays is a correct behavior in 99% of cases (and a bug in a rest), but improves application execution speed significantly. While it is possible to prevent GC from scanning an arbitrary void[] array, there is no reasonable way to prevent it from scanning all arrays (without modifying GC code). It is a breaking change, but not too late for D2. ++vote
May 31 2009
parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Denis Koroskin wrote:
 On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
  
 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.
  
 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!
  
FWIW, I also consider void[] as a storage for an arbitrary untyped binary data, and thus I believe GC shouldn't scan it.
You're contradicting yourself there. void[] is arbitrary untyped data, so it could contain uints, floats, bytes, pointers, arrays, strings, etc. or structs with any of those. I think the current behavior is correct: ubyte[] is the new void*. I also agree that std.file.read (and similar functions) should return ubyte[] instead of void[], to prevent surprises after concatenation. L.
May 31 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Lionello Lunesu wrote:
 Denis Koroskin wrote:
 On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev 
 <thecybershadow gmail.com> wrote:
  
 I just went through a ~15000-line project and replaced most 
 occurrences  of void[]. Now the project is an ugly mess of void[], 
 ubyte[] and casts,  but at least it doesn't leak memory like crazy 
 any more.
  
 I don't know why it was decided to mark the contents of void[] as 
 "might  have pointers". It makes no sense!
  
FWIW, I also consider void[] as a storage for an arbitrary untyped binary
> data, and thus I believe GC shouldn't scan it. You're contradicting yourself there. void[] is arbitrary untyped data, so it could contain uints, floats, bytes, pointers, arrays, strings, etc. or structs with any of those. I think the current behavior is correct: ubyte[] is the new void*.
Even in C, people often use unsigned char* for arbitrary data that does not include pointers.
May 31 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
 3) It's very rare in practice that the only pointer to your object (which you
still plan to access later) to be stored in a void[]-allocated array! Remember,
the properties of memory regions are determined when the memory is allocated,
so casting an array of structures to a void[] will not lose you that reference.
You'd need to move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the void[]), then
drop the reference to your original structure, for this to happen.
void[] = can contain pointers ubyte[] = can not contain pointers void[] just wraps void*, which is a low level type and can contain anything. Because of that, the conservative GC needs to scan it for pointers. ubyte[], on the other hand, contains sequences of 8 bit integers. For untyped binary data, ubyte[] is the most correct type. You want to send it over network or write it into a file? Use ubyte[]. The data will never contain any pointers. You want to play low level tricks, that involve copying around arbitrary memory contents (like boxing, see std.boxer)? Use void[]. I think that's a good way to distinguish it. You shouldn't cast structs or any other types to ubyte[], because the memory representation of those type is highly platform specific. Structs can contain padding, integers are endian dependend... If you want to convert these to binary data, write a marshaller. You _never_ want to do direct casts, because they're simply unportable. If you do the cast, you have to know what you're doing.
May 31 2009
next sibling parent BCS <none anon.com> writes:
Hello grauzone,

 You shouldn't cast structs or any other types to ubyte[], because the
 memory representation of those type is highly platform specific.
 Structs can contain padding, integers are endian dependend... If you
 want to convert these to binary data, write a marshaller. You _never_
 want to do direct casts, because they're simply unportable. If you do
 the cast, you have to know what you're doing.
 
Never say never. Some cases like tmp files or whatnot where the same exe will save and load the file never* have any need for potability. *"never" uses intentionally :b.
May 31 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 31 May 2009 23:11:57 +0300, grauzone <none example.net> wrote:

 3) It's very rare in practice that the only pointer to your object  
 (which you still plan to access later) to be stored in a  
 void[]-allocated array! Remember, the properties of memory regions are  
 determined when the memory is allocated, so casting an array of  
 structures to a void[] will not lose you that reference. You'd need to  
 move your pointer to a void[]-array (which you need to allocate  
 explicitly or, for example, concatenating your reference to the  
 void[]), then drop the reference to your original structure, for this  
 to happen.
void[] = can contain pointers ubyte[] = can not contain pointers void[] just wraps void*, which is a low level type and can contain anything. Because of that, the conservative GC needs to scan it for pointers. ubyte[], on the other hand, contains sequences of 8 bit integers. For untyped binary data, ubyte[] is the most correct type. You want to send it over network or write it into a file? Use ubyte[]. The data will never contain any pointers. You want to play low level tricks, that involve copying around arbitrary memory contents (like boxing, see std.boxer)? Use void[].
std.boxer is actually a valid counter-example for my post. The specific fix is simple: replace the void[] with void*[]. The generic "fix" is just to add a line to http://www.digitalmars.com/d/garbage.html adding that hiding your only reference in a void[] results in undefined behavior. I don't think this should be an inconvenience to any projects?
 You shouldn't cast structs or any other types to ubyte[], because the  
 memory representation of those type is highly platform specific. Structs  
 can contain padding, integers are endian dependend... If you want to  
 convert these to binary data, write a marshaller. You _never_ want to do  
 direct casts, because they're simply unportable. If you do the cast, you  
 have to know what you're doing.
Thanks for the advice, but I actually know what I'm doing. Unlike C, D's structure alignment rules are actually part of the specification. If I wanted my programs to be safe/cross-platform/etc. regardless of execution speed, I'd use a scripting or VM-ed language. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your only
reference in a void[] results in undefined behavior. I don't think this should
be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
May 31 2009
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to  
 http://www.digitalmars.com/d/garbage.html adding that hiding your only  
 reference in a void[] results in undefined behavior. I don't think this  
 should be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 31 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright <dhasenan gmail.com>
wrote:
 
 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to  
 http://www.digitalmars.com/d/garbage.html adding that hiding your only  
 reference in a void[] results in undefined behavior. I don't think this  
 should be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.
Because you can have a struct with align(1) that contains pointers. Then these pointers can be unaligned. Then an array of those structs cast to a void*[] would contain pointers, but as an optimization, the GC would consider the pointers in this array aligned because you tell it they are.
Jun 01 2009
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to   
 http://www.digitalmars.com/d/garbage.html adding that hiding your  
 only  reference in a void[] results in undefined behavior. I don't  
 think this  should be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.
Because you can have a struct with align(1) that contains pointers. Then these pointers can be unaligned. Then an array of those structs cast to a void*[] would contain pointers, but as an optimization, the GC would consider the pointers in this array aligned because you tell it they are.
The GC will not "see" unaligned pointers, regardless if they're in a struct or void[] array. The GC doesn't know the type of the data it's scanning - it just knows if it might contain pointers or it definitely doesn't contain pointers. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 01 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright <dhasenan gmail.com>
wrote:
 
 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to   
 http://www.digitalmars.com/d/garbage.html adding that hiding your  
 only  reference in a void[] results in undefined behavior. I don't  
 think this  should be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.
Because you can have a struct with align(1) that contains pointers. Then these pointers can be unaligned. Then an array of those structs cast to a void*[] would contain pointers, but as an optimization, the GC would consider the pointers in this array aligned because you tell it they are.
The GC will not "see" unaligned pointers, regardless if they're in a struct or void[] array. The GC doesn't know the type of the data it's scanning - it just knows if it might contain pointers or it definitely doesn't contain pointers.
Okay, so currently the GC doesn't do anything interesting with its type information. You're suggesting that that be enforced and codified.
Jun 01 2009
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Jun 2009 01:01:00 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright   
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to    
 http://www.digitalmars.com/d/garbage.html adding that hiding your   
 only  reference in a void[] results in undefined behavior. I don't   
 think this  should be an inconvenience to any projects?
What do you use for "may contain unaligned pointers"?
Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.
Because you can have a struct with align(1) that contains pointers. Then these pointers can be unaligned. Then an array of those structs cast to a void*[] would contain pointers, but as an optimization, the GC would consider the pointers in this array aligned because you tell it they are.
The GC will not "see" unaligned pointers, regardless if they're in a struct or void[] array. The GC doesn't know the type of the data it's scanning - it just knows if it might contain pointers or it definitely doesn't contain pointers.
Okay, so currently the GC doesn't do anything interesting with its type information. You're suggesting that that be enforced and codified.
I wasn't suggesting any GC modifications, I was just suggesting that void[]'s TypeInfo "has pointers" flag be set to false. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 02 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that void[]'s
TypeInfo "has pointers" flag be set to false.
The suggestion was that void[] be used as ubyte[] currently is, and then to use void*[] to indicate an array of unknown type that may have pointers. This works when all pointers are aligned, or when the garbage collector does not optimize in cases where a type is known not to contain unaligned pointers. Alternatively, you can change the runtime to notify the GC on array copies so it can keep track of type information when you're avoiding the type system. But it's so easy to get around this by accident, it's not a reasonable solution (even if it could be made fast).
Jun 02 2009
parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Jun 2, 2009 at 7:11 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that
 void[]'s TypeInfo "has pointers" flag be set to false.
The suggestion was that void[] be used as ubyte[] currently is, and then to use void*[] to indicate an array of unknown type that may have pointers.
How do you have a void*[] point to a block of memory that is not a multiple of (void*).sizeof?
Jun 02 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Jarrett Billingsley wrote:
 On Tue, Jun 2, 2009 at 7:11 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that
 void[]'s TypeInfo "has pointers" flag be set to false.
The suggestion was that void[] be used as ubyte[] currently is, and then to use void*[] to indicate an array of unknown type that may have pointers.
How do you have a void*[] point to a block of memory that is not a multiple of (void*).sizeof?
Another good point. Or how do you index it by byte?
Jun 03 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Jun 03 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Vladimir was suggesting that void[] be the same as ubyte[] and that you use void*[] if you might include a pointer. So that use case would be safe.
Jun 03 2009
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Christopher Wright wrote:
 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Vladimir was suggesting that void[] be the same as ubyte[] and that you use void*[] if you might include a pointer. So that use case would be safe.
How would you generically store the bits of this, then? struct Gotcha { void* ptr; ubyte boo; }
Jun 04 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
<dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Vladimir was suggesting that void[] be the same as ubyte[] and that you use void*[] if you might include a pointer. So that use case would be safe.
Actually, I think Andrei's idea is better (to allow implicit casting arrays of non-reference types to const(ubyte)[]). It introduces an abstract no-pointers type, but still allows implicit casting to "might have pointers". -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 04 2009
parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev  
<thecybershadow gmail.com> wrote:

 On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Vladimir was suggesting that void[] be the same as ubyte[] and that you use void*[] if you might include a pointer. So that use case would be safe.
Actually, I think Andrei's idea is better (to allow implicit casting arrays of non-reference types to const(ubyte)[]). It introduces an abstract no-pointers type, but still allows implicit casting to "might have pointers".
There is a pitfall: should an "arrays of non-reference types" be implicitly castable to const(byte)[] or const(ubyte[])[] ? Should const(byte)[] also be implicitly castable to const(ubyte)[] (or vice versa)?
Jun 04 2009
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 04 Jun 2009 21:31:07 +0300, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev  
 <thecybershadow gmail.com> wrote:

 On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?
How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte. Bye, bearophile
Vladimir was suggesting that void[] be the same as ubyte[] and that you use void*[] if you might include a pointer. So that use case would be safe.
Actually, I think Andrei's idea is better (to allow implicit casting arrays of non-reference types to const(ubyte)[]). It introduces an abstract no-pointers type, but still allows implicit casting to "might have pointers".
There is a pitfall: should an "arrays of non-reference types" be implicitly castable to const(byte)[] or const(ubyte[])[] ? Should const(byte)[] also be implicitly castable to const(ubyte)[] (or vice versa)?
I don't see why you'd want to work with arrays of signed bytes. It doesn't make sense to allow implicit casting between the two; the programmer should just pick one and stick with it. I think unsigned makes more sense. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 05 2009
parent reply BCS <none anon.com> writes:
Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.
I can think of a number of cases where I would expect numbers to be in a range like [-20,+20], for instance, delta of small integral value or golf scores relative to par.
Jun 05 2009
next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.
I can think of a number of cases where I would expect numbers to be in a range like [-20,+20], for instance, delta of small integral value or golf scores relative to par.
Yes, but how is this related to abstracting data types to a generic type that can be used for stuff like buffering or networking? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 05 2009
parent reply BCS <none anon.com> writes:
Hello Vladimir,

 On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:
 
 Hello Vladimir,
 
 I don't see why you'd want to work with arrays of signed bytes.
 
I can think of a number of cases where I would expect numbers to be in a range like [-20,+20], for instance, delta of small integral value or golf scores relative to par.
Yes, but how is this related to abstracting data types to a generic type that can be used for stuff like buffering or networking?
It's not and that's the point. The point is there are uses for 8-bit signed integer values other than as raw data. I might have read your comment out of context but it seemed you were saying there is no use for the signed byte type.
Jun 05 2009
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Fri, 05 Jun 2009 20:16:08 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.
I can think of a number of cases where I would expect numbers to be in a range like [-20,+20], for instance, delta of small integral value or golf scores relative to par.
Yes, but how is this related to abstracting data types to a generic type that can be used for stuff like buffering or networking?
It's not and that's the point. The point is there are uses for 8-bit signed integer values other than as raw data. I might have read your comment out of context but it seemed you were saying there is no use for the signed byte type.
Oh yes; I was definitely not suggesting removing byte[] from the language. <insidejoke namespace="#d">I'm sure he wouldn't be pleased one bit if we did that! :P</insidejoke> -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jun 05 2009
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Fri, 5 Jun 2009 07:15:11 +0000 (UTC), BCS wrote:

 Hello Vladimir,
 
 I don't see why you'd want to work with arrays of signed bytes.
I can think of a number of cases where I would expect numbers to be in a range like [-20,+20], for instance, delta of small integral value or golf scores relative to par.
Or sound wave sample points [-127, 127] -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jun 05 2009
prev sibling next sibling parent reply BCS <none anon.com> writes:
Hello Vladimir,

 I just went through a ~15000-line project and replaced most
 occurrences of void[]. Now the project is an ugly mess of void[],
 ubyte[] and casts, but at least it doesn't leak memory like crazy any
 more.
 
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
 
 2) Despite that void[] is "typeless", you can still operate on it -
 namely, slice and concatenate them. Pass a void[] to a network send()
 function - how much did you send? Half the buffer? No problem, slice
 it away and store the rest - and no casts.
 
 3) It's very rare in practice that the only pointer to your object
 (which you still plan to access later) to be stored in a
 void[]-allocated array! Remember, the properties of memory regions are
 determined when the memory is allocated, so casting an array of
 structures to a void[] will not lose you that reference. You'd need to
 move your pointer to a void[]-array (which you need to allocate
 explicitly or, for example, concatenating your reference to the
 void[]), then drop the reference to your original structure, for this
 to happen.
 
I think the idea is that void[] is the most general data type; it can be anything, including pointers. Also for a real world use case where void[]=mightHavePointers is valid, consider a system that reads blocks of data structures from a file and then does in place substation from file references to memory references. You can't allocate buffers of the correct type because you may not even know what that is until you have already loaded the data.
 Here's a simple naive implementation of a buffer:
 
 void[] buffer;
 void queue(void[] data)
 {
 buffer ~= data;
 }
 ...
 queue([1,2,3][]);
 queue("Hello, World!");
 No casts! So simple and beautiful. However, should you use this
 pattern to work with larger amounts of data with a high entropy, the
 "minefield" effect will cause the GC to stop collecting most data.
 Sure, you can call std.gc.hasNoPointers, but you need to do it after
 every single concatenation... and it makes expressions with more than
 one concatenation unsafe.
Yes, when data is being copied into void[] from another type[] it is reasonable to ignore pointers but as above, going the other way (IMHO the /common/ case) it's not so easy.
 
 I heard that Tango copies over the properties of arrays when they are
 reallocated, which helps but solves the problem only partially.
 
 So, I ask you: is there actually code out there that depends on the
 way void[] works right now? I brought up this argument a year or so
 ago on IRC, and there were people who defended ferociously the current
 design using idealisms ("it should work like what it sounds like, it
 should contain any type" or something like that), but I've yet to see
 a practical argument.
I think that void[] should be left as is but I'm almost ready to throw in with the idea that we **need** another type that has the no-cast parts of void[] but assume no pointers as well.
May 31 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 01 Jun 2009 00:53:02 +0400, BCS <none anon.com> wrote:

 Hello Vladimir,

 I just went through a ~15000-line project and replaced most
 occurrences of void[]. Now the project is an ugly mess of void[],
 ubyte[] and casts, but at least it doesn't leak memory like crazy any
 more.
  I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
  2) Despite that void[] is "typeless", you can still operate on it -
 namely, slice and concatenate them. Pass a void[] to a network send()
 function - how much did you send? Half the buffer? No problem, slice
 it away and store the rest - and no casts.
  3) It's very rare in practice that the only pointer to your object
 (which you still plan to access later) to be stored in a
 void[]-allocated array! Remember, the properties of memory regions are
 determined when the memory is allocated, so casting an array of
 structures to a void[] will not lose you that reference. You'd need to
 move your pointer to a void[]-array (which you need to allocate
 explicitly or, for example, concatenating your reference to the
 void[]), then drop the reference to your original structure, for this
 to happen.
I think the idea is that void[] is the most general data type; it can be anything, including pointers. Also for a real world use case where void[]=mightHavePointers is valid, consider a system that reads blocks of data structures from a file and then does in place substation from file references to memory references. You can't allocate buffers of the correct type because you may not even know what that is until you have already loaded the data.
In this case you should *explicitly* mark that void[] array as "mightHavePointers".
May 31 2009
prev sibling parent reply MLT <none anon.com> writes:
Walter Bright Wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
As quite a newby, I can sum up what I understood as follows: 1. The idea of void[] is that you can put anything in it without casting. 2. Because of this, you might put pointers in a void[]. 3. Since you have "legitimately" stored pointers, and we don't want to have the GC throw away something that we still have valid pointers for, we have to have the GC scan over void[] arrays for possible hits. 4. This pretty much means that any "big"(*) D program can not afford to put uniformly distributed data in a void[] array, because the GC will stop working correctly - it will not dispose of stuff that you don't need any more. (*) where "big" means a program that creates and destroys a lot of objects. So, currently if you want to use void[] to store non-pointers, you need to use the gc function to mark the memory as not containing pointers. A comment and a question. I agree that suddenly losing data because you stored a pointer in a void[] is worse than GC not working well. However, since GC in D is so automatic, almost any use of void[] to store non-pointer data will cause massive memory leaks and eventual program failure. I can see 4 solutions... First, to not allow non-pointers to be stored in void[]. So non-pointers are stored in ubyte[], pointers in void[]. Kinda looses the main point of using void[]. Second, void[] is not scanned by GC, but you can mark it to be. This can cause bugs if you store a pointer in void[], and later retreive it, but don't mark correctly. Third, void[] is scanned by GC, but you can mark it not to be. This can cause memory leaks if you store complex data in void[] in a big program, and don't handle GC marking correctly. Forth - somewhat more complex. Since the compiler knows exactly when a pointer is stored in a void[] and when not, it would be possible to have the compiler handle all by itself, as long as the property of having to be scanned by GC is dirty - once a variable has it, any other that touches that variable gets the property. Of these four solutions, the last 3 can still cause bugs if one stores both pointers and data in the same void[] array, no matter how the memory is marked, unless one does that marking on a very fine scale (is that possible?) My conclusion from all this is either "don't use void[]", or "only use void[] to store pointers" if you don't want bugs in a valid program.
Jun 03 2009
parent Christopher Wright <dhasenan gmail.com> writes:
MLT wrote:
 Walter Bright Wrote:
 
 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
[...]
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!
Rare or common, it still would be a nasty bug lurking to catch someone. The default behavior in D should be to be correct code. Doing potentially unsafe things to improve performance should require extra effort - in this case it would be either using the gc function to mark the memory as not containing pointers, or storing them as ubyte[] instead.
As quite a newby, I can sum up what I understood as follows: 1. The idea of void[] is that you can put anything in it without casting. 2. Because of this, you might put pointers in a void[]. 3. Since you have "legitimately" stored pointers, and we don't want to have the GC throw away something that we still have valid pointers for, we have to have the GC scan over void[] arrays for possible hits. 4. This pretty much means that any "big"(*) D program can not afford to put uniformly distributed data in a void[] array, because the GC will stop working correctly - it will not dispose of stuff that you don't need any more. (*) where "big" means a program that creates and destroys a lot of objects. So, currently if you want to use void[] to store non-pointers, you need to use the gc function to mark the memory as not containing pointers. A comment and a question. I agree that suddenly losing data because you stored a pointer in a void[] is worse than GC not working well. However, since GC in D is so automatic, almost any use of void[] to store non-pointer data will cause massive memory leaks and eventual program failure.
First, this is no problem if you are merely aliasing an existing array. In order for it to be an issue, you must copy from some array to a void[] -- for instance, appending to an existing void[], or .dup'ing a void[] alias. (While a GC could work around the latter case, it would be unsafe -- you can append something with pointers to a void[] copy of an int[].)
 I can see 4 solutions...
 
 First, to not allow non-pointers to be stored in void[]. So non-pointers are
stored in ubyte[], pointers in void[]. Kinda looses the main point of using
void[].
 
 Second, void[] is not scanned by GC, but you can mark it to be. This can cause
bugs if you store a pointer in void[], and later retreive it, but don't mark
correctly.
This is an unsafe option.
 Third, void[] is scanned by GC,  but you can mark it not to be. This can cause
memory leaks if you store complex data in void[] in a big program, and don't
handle GC marking correctly.
This is already available. If you know your array doesn't have pointers, you can call GC.hasNoPointers(array.ptr). This is a safe option.
 Forth - somewhat more complex. Since the compiler knows exactly when a pointer
is stored in a void[] and when not, it would be possible to have the compiler
handle all by itself, as long as the property of having to be scanned by GC is
dirty - once a variable has it, any other that touches that variable gets the
property.
This isn't really the case unless you get some really invasive whole program analysis (not available with D's compilation model, or if you want to interact with code written in other languages, or if you want to do runtime dynamic linking) or a really invasive runtime (think of calling a method every time you access an array). In point of fact, that's not going to be enough. You need to call the runtime with every assignment, since you might be passing individual ubytes around when they're part of a pointer and reassembling them somewhere else.
 Of these four solutions, the last 3 can still cause bugs if one stores both
pointers and data in the same void[] array, no matter how the memory is marked,
unless one does that marking on a very fine scale (is that possible?)
struct S { int i; int* j; } You're screwed.
 My conclusion from all this is either "don't use void[]", or "only use void[]
to store pointers" if you don't want bugs in a valid program.
Not bugs, but potential performance issues. And the advice should be "don't allocate void[]", to split hairs.
Jun 03 2009