digitalmars.D - built-in array ptrEnd

monarch_dodra (24/24) Sep 17 2012 I love D's concept of arrays (fat pointers).

bearophile (7/10) Sep 17 2012 Currently this is not true, take a look at the ABI part in the D

monarch_dodra (10/20) Sep 17 2012 Thank you for sharing. My guess would be this makes more sense,
Don Clugston (3/11) Sep 18 2012 Because it would be a mistake. You can efficiently get from (ptr,

Steven Schveighoffer (12/25) Sep 18 2012 There is another reason to avoid this.

monarch_dodra (5/17) Sep 18 2012 That's a good point. I also shows another danger of ptrEnd: Not

Nick Sabalausky (3/26) Sep 18 2012 FWIW, a ptrLast would avoid that (ie, arr.ptrLast == &arr[$-1])

Ben Davis (7/33) Sep 18 2012 Does the above not also mean that the second array's 'ptr' prevents the

Steven Schveighoffer (13/27) Sep 18 2012 No, a pointed-at location in memory does not refer to the prior bytes,

Andrei Alexandrescu (10/33) Sep 17 2012 To be blunt, I think this is a terrible idea for a convenience function....

Jonathan M Davis (5/14) Sep 17 2012 I concur. Pointer arithmetic should be rare (particularly outside of Pho...
monarch_dodra (8/20) Sep 17 2012 I think I spent way too much time in the past weeks doing array

Jacob Carlborg (9/32) Sep 17 2012 Rather than adding new language features we're moving stuff out of the

"monarch_dodra" <monarchdodra gmail.com> writes:

I love D's concept of arrays (fat pointers).

However, one thing I've found it lacks is a (convenient) way to 
get the end ptr.

Phobos (and druntime) are riddled with "arr.ptr + arr.length". It 
is ugly and inconvenient, and makes something that should be easy 
to understand that much harder.

Then I thought: "std.array" defines all the functions required to 
enhance arrays. Why not just add a "ptrEnd" in there? So I did.

The rational is that now, we can write:

bool isDisjoint = a.ptrEnd <= b.ptr || b.ptrEnd <= a.ptr;

More elegant than:

bool isDisjoint = a.ptr + a.length <= b.ptr ||
                   b.ptr + b.length <= a.ptr;

Nothing revolutionary, but it *is* easier on the fingers when 
typing :D . Also, it *does* make a change in some bigger and more 
complicated cases.

Anyways, pull request:
https://github.com/D-Programming-Language/phobos/pull/798

I wanted to have some feedback, as this is introducing something 
new (as opposed to fixing something existing).

IMO, this should really be built-in, in particular, since, in my 
understanding, an array is internally represented by the ptr and 
ptrEnd pair anyways. If the compiler has access to it, it might 
as well communicate it (rather than us re-calculating it...)

Sep 17 2012

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 IMO, this should really be built-in, in particular, since, in 
 my understanding, an array is internally represented by the ptr 
 and ptrEnd pair anyways.

Currently this is not true, take a look at the ABI part in the D 
site. Currently it's a pointer and length. Walter and/or Andrei 
discussed the idea of turning them into two pointers, but I don't 
know if and why that change was refused.

Bye,
bearophile

Sep 17 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 17 September 2012 at 16:39:21 UTC, bearophile wrote:
 monarch_dodra:

 IMO, this should really be built-in, in particular, since, in 
 my understanding, an array is internally represented by the 
 ptr and ptrEnd pair anyways.

 Currently this is not true, take a look at the ABI part in the 
 D site. Currently it's a pointer and length. Walter and/or 
 Andrei discussed the idea of turning them into two pointers, 
 but I don't know if and why that change was refused.

 Bye,
 bearophile

Thank you for sharing. My guess would be this makes more sense, 
since "arr.length" is the most called method. I *supposed* it was 
this way, because C++ has a way of working with pairs of 
pointers. That said, C is more of a pointer plus length approach.

Not that it should matter for us users anyways, such low level 
implementation details should not leak into code. The pull is 
purely for convenience. My "motivation" for making it built-in is 
just that it makes sense to have it as such. You shouldn't have 
to import a module just to (conveniently) get the end pointer.

Sep 17 2012

Don Clugston <dac nospam.com> writes:

On 17/09/12 18:40, bearophile wrote:
 monarch_dodra:

 IMO, this should really be built-in, in particular, since, in my
 understanding, an array is internally represented by the ptr and
 ptrEnd pair anyways.

 Currently this is not true, take a look at the ABI part in the D site.
 Currently it's a pointer and length. Walter and/or Andrei discussed the
 idea of turning them into two pointers, but I don't know if and why that
 change was refused.

Because it would be a mistake. You can efficiently get from (ptr, 
length) to (ptr, endPtr) but the reverse is not true.

Sep 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 18 Sep 2012 08:02:29 -0400, Don Clugston <dac nospam.com> wrote:

 On 17/09/12 18:40, bearophile wrote:
 monarch_dodra:

 IMO, this should really be built-in, in particular, since, in my
 understanding, an array is internally represented by the ptr and
 ptrEnd pair anyways.

 Currently this is not true, take a look at the ABI part in the D site.
 Currently it's a pointer and length. Walter and/or Andrei discussed the
 idea of turning them into two pointers, but I don't know if and why that
 change was refused.

 Because it would be a mistake. You can efficiently get from (ptr,  
 length) to (ptr, endPtr) but the reverse is not true.

There is another reason to avoid this.

Note that if I have two consecutive blocks of memory:

0...4
and
4...8

If we define an array that points to the first block as a pointer to 0 and  
a pointer to 4, then that array also effectively points at the second  
block (4...8).  The way the GC works, it will not release the second block  
as long as you have a pointer to the first, even though the second pointer  
is not technically pointing at the block.

-Steve

Sep 18 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 18 September 2012 at 12:06:15 UTC, Steven 
Schveighoffer wrote:
 There is another reason to avoid this.

 Note that if I have two consecutive blocks of memory:

 0...4
 and
 4...8

 If we define an array that points to the first block as a 
 pointer to 0 and a pointer to 4, then that array also 
 effectively points at the second block (4...8).  The way the GC 
 works, it will not release the second block as long as you have 
 a pointer to the first, even though the second pointer is not 
 technically pointing at the block.

 -Steve

That's a good point. I also shows another danger of ptrEnd: Not 
only is it not a reference to the current range, it could *also* 
be a reference to an un-related range.

Sep 18 2012

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Tue, 18 Sep 2012 15:43:37 +0200
"monarch_dodra" <monarchdodra gmail.com> wrote:

 On Tuesday, 18 September 2012 at 12:06:15 UTC, Steven 
 Schveighoffer wrote:
 There is another reason to avoid this.

 Note that if I have two consecutive blocks of memory:

 0...4
 and
 4...8

 If we define an array that points to the first block as a 
 pointer to 0 and a pointer to 4, then that array also 
 effectively points at the second block (4...8).  The way the GC 
 works, it will not release the second block as long as you have 
 a pointer to the first, even though the second pointer is not 
 technically pointing at the block.

 -Steve

 
 That's a good point. I also shows another danger of ptrEnd: Not 
 only is it not a reference to the current range, it could *also* 
 be a reference to an un-related range.

FWIW, a ptrLast would avoid that (ie, arr.ptrLast == &arr[$-1])

Sep 18 2012

Ben Davis <entheh cantab.net> writes:

On 18/09/2012 21:10, Nick Sabalausky wrote:
 On Tue, 18 Sep 2012 15:43:37 +0200
 "monarch_dodra" <monarchdodra gmail.com> wrote:

 On Tuesday, 18 September 2012 at 12:06:15 UTC, Steven
 Schveighoffer wrote:
 There is another reason to avoid this.

 Note that if I have two consecutive blocks of memory:

 0...4
 and
 4...8

 If we define an array that points to the first block as a
 pointer to 0 and a pointer to 4, then that array also
 effectively points at the second block (4...8).  The way the GC
 works, it will not release the second block as long as you have
 a pointer to the first, even though the second pointer is not
 technically pointing at the block.

 -Steve

 That's a good point. I also shows another danger of ptrEnd: Not
 only is it not a reference to the current range, it could *also*
 be a reference to an un-related range.


Does the above not also mean that the second array's 'ptr' prevents the 
first array from being garbage-collected?

In any case, maybe the heap leaves gaps (perhaps if it has to insert 
metadata), so this is a non-issue anyway?

 FWIW, a ptrLast would avoid that (ie, arr.ptrLast == &arr[$-1])

That would make the problem worse for zero-length arrays though. Don't 
forget the corner-cases :)

Sep 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 18 Sep 2012 17:04:53 -0400, Ben Davis <entheh cantab.net> wrote:

 On 18/09/2012 21:10, Nick Sabalausky wrote:
 On Tue, 18 Sep 2012 15:43:37 +0200
 "monarch_dodra" <monarchdodra gmail.com> wrote:

 That's a good point. I also shows another danger of ptrEnd: Not
 only is it not a reference to the current range, it could *also*
 be a reference to an un-related range.


 Does the above not also mean that the second array's 'ptr' prevents the  
 first array from being garbage-collected?

No, a pointed-at location in memory does not refer to the prior bytes,  
only the subsequent bytes.

 In any case, maybe the heap leaves gaps (perhaps if it has to insert  
 metadata), so this is a non-issue anyway?

Yes and no.  In the case of a block allocated as an array, metadata is  
stored, and the runtime takes care to put at least one byte between the  
allocated block and the next.  The main reason being, you can do  
arr[$..$], and even with a single-pointer array type, it points at the  
next block.

However, it's definitely possible to allocate (and have slices point at)  
blocks that do not have padding.

 FWIW, a ptrLast would avoid that (ie, arr.ptrLast == &arr[$-1])

 That would make the problem worse for zero-length arrays though. Don't  
 forget the corner-cases :)

Agreed, referencing one *past* the last element is a much more useful  
idiom, as I've experienced with C++ iterators and D ranges.

-Steve

Sep 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/17/12 12:34 PM, monarch_dodra wrote:
 I love D's concept of arrays (fat pointers).

 However, one thing I've found it lacks is a (convenient) way to get the
 end ptr.

 Phobos (and druntime) are riddled with "arr.ptr + arr.length". It is
 ugly and inconvenient, and makes something that should be easy to
 understand that much harder.

 Then I thought: "std.array" defines all the functions required to
 enhance arrays. Why not just add a "ptrEnd" in there? So I did.

 The rational is that now, we can write:

 bool isDisjoint = a.ptrEnd <= b.ptr || b.ptrEnd <= a.ptr;

 More elegant than:

 bool isDisjoint = a.ptr + a.length <= b.ptr ||
 b.ptr + b.length <= a.ptr;

 Nothing revolutionary, but it *is* easier on the fingers when typing :D
 . Also, it *does* make a change in some bigger and more complicated cases.

 Anyways, pull request:
 https://github.com/D-Programming-Language/phobos/pull/798

 I wanted to have some feedback, as this is introducing something new (as
 opposed to fixing something existing).

 IMO, this should really be built-in, in particular, since, in my
 understanding, an array is internally represented by the ptr and ptrEnd
 pair anyways. If the compiler has access to it, it might as well
 communicate it (rather than us re-calculating it...)

To be blunt, I think this is a terrible idea for a convenience function. 
Note that I'm only allowing myself to say this because monarch_dodra has 
clearly made other excellent contributions so I assume his ideas can 
take a bit of a destruction.

Normal code isn't supposed to mess with pointers and stuff, particularly 
with pointers past the end of arrays. That's rare. If Phobos uses .ptr 
with any frequency it's because it's low-level code that should optimize 
for performance compulsively.


Andrei

Sep 17 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, September 17, 2012 13:11:30 Andrei Alexandrescu wrote:
 To be blunt, I think this is a terrible idea for a convenience function.
 Note that I'm only allowing myself to say this because monarch_dodra has
 clearly made other excellent contributions so I assume his ideas can
 take a bit of a destruction.
 
 Normal code isn't supposed to mess with pointers and stuff, particularly
 with pointers past the end of arrays. That's rare. If Phobos uses .ptr
 with any frequency it's because it's low-level code that should optimize
 for performance compulsively.

I concur. Pointer arithmetic should be rare (particularly outside of Phobos), 
and ptrEnd does almost nothing for you. It just slightly shortens code for a 
rare use case. It's not worth it.

- Jonathan M Davis

Sep 17 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 17 September 2012 at 17:10:43 UTC, Andrei Alexandrescu 
wrote:
 On 9/17/12 12:34 PM, monarch_dodra wrote:

 To be blunt, I think this is a terrible idea for a convenience 
 function. Note that I'm only allowing myself to say this 
 because monarch_dodra has clearly made other excellent 
 contributions so I assume his ideas can take a bit of a 
 destruction.

 Normal code isn't supposed to mess with pointers and stuff, 
 particularly with pointers past the end of arrays. That's rare. 
 If Phobos uses .ptr with any frequency it's because it's 
 low-level code that should optimize for performance 
 compulsively.


 Andrei

I think I spent way too much time in the past weeks doing array 
arithmetic actually. Hence the proposal. It is actually true 
you'd never need this outside of low level.

No problem if you think it was a bad idea, that was the point of 
the thread, to get some hindsight from others. Thank you for the 
compliment about my contributions. It means a lot.

Sep 17 2012

Jacob Carlborg <doob me.com> writes:

On 2012-09-17 18:34, monarch_dodra wrote:
 I love D's concept of arrays (fat pointers).

 However, one thing I've found it lacks is a (convenient) way to get the
 end ptr.

 Phobos (and druntime) are riddled with "arr.ptr + arr.length". It is
 ugly and inconvenient, and makes something that should be easy to
 understand that much harder.

 Then I thought: "std.array" defines all the functions required to
 enhance arrays. Why not just add a "ptrEnd" in there? So I did.

 The rational is that now, we can write:

 bool isDisjoint = a.ptrEnd <= b.ptr || b.ptrEnd <= a.ptr;

 More elegant than:

 bool isDisjoint = a.ptr + a.length <= b.ptr ||
                    b.ptr + b.length <= a.ptr;

 Nothing revolutionary, but it *is* easier on the fingers when typing :D
 . Also, it *does* make a change in some bigger and more complicated cases.

 Anyways, pull request:
 https://github.com/D-Programming-Language/phobos/pull/798

 I wanted to have some feedback, as this is introducing something new (as
 opposed to fixing something existing).

 IMO, this should really be built-in, in particular, since, in my
 understanding, an array is internally represented by the ptr and ptrEnd
 pair anyways. If the compiler has access to it, it might as well
 communicate it (rather than us re-calculating it...)

Rather than adding new language features we're moving stuff out of the 
core language and into the runtime/standard library. This is a perfect 
example of a library function. Since we have UFCS it would behave and 
look exactly the same as if it was a built-in property on arrays. If 
this is added to the "object" module in druntime you wouldn't even need 
to import anything.

-- 
/Jacob Carlborg

Sep 17 2012

D Programming

C/C++ Programming

Other

digitalmars.D - built-in array ptrEnd