digitalmars.D.learn - Array start index

DLearner (9/9) Aug 01 2015 Does the D language set in stone that the first element of an

Rikki Cattermole (22/31) Aug 01 2015 In c style languages (like D) the index actually defines the offset in
bachmeier (8/17) Aug 01 2015 Seems you could easily wrap an array in a struct, define opIndex
John Colvin (7/16) Aug 01 2015 See

DLearner (24/43) Aug 01 2015 D is a C derivative, so it seems a shame not to identify causes

Andrej Mitrovic via Digitalmars-d-learn (17/20) Aug 01 2015 This has already been done! D defines an array to be a struct with a

bachmeier (8/30) Aug 01 2015 But what type of programming are you doing? Even after decades of

QAston (5/12) Aug 02 2015 Adding 1-indexed arrays to the language fixes nothing. Just write

bachmeier (6/10) Aug 03 2015 Oh, I don't think that's a good idea. It's too confusing to have

DLearner (8/18) Aug 03 2015 Looks like 0-base is fixed, to avoid problems with existing code.

Jonathan M Davis via Digitalmars-d-learn (20/40) Aug 03 2015 Almost all programming languages in heavy use at this point in time star...
QAston (13/21) Aug 04 2015 There're quite a few things stopping this from being added to the

jmh530 (7/16) Aug 03 2015 I come from matlab and R which start matrices at 1. I used to
Observer (8/16) Aug 03 2015 This experiment has already been run. Perl used to support a $[
"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (31/36) Aug 04 2015 I, too, don't think this is a good idea in general, but I can see

Bastiaan Veelo (13/15) Feb 06 2017 Careful here, you've got the arguments reversed. The unit test

Bastiaan Veelo (14/15) Feb 06 2017 Sorry for that misinformation. I should have said that

pineapple (9/9) Feb 06 2017 One reason for zero-based indexes that isn't "it's what we're all

Nemanja Boric (3/12) Feb 06 2017 Related:
bachmeier (5/14) Feb 06 2017 In the scientific programming community, languages like Matlab,

Bastiaan Veelo (103/103) Feb 06 2017 (There is an honest question in the end, please read on.)

=?UTF-8?Q?Ali_=c3=87ehreli?= (30/39) Feb 06 2017 struct StaticArray(T, ptrdiff_t first, ptrdiff_t last) {

Bastiaan Veelo (115/123) Feb 07 2017 Thank you very much for your time. Sadly this gives me an access

=?UTF-8?Q?Ali_=c3=87ehreli?= (32/41) Feb 07 2017 You forgot to call that most important function. ;)

Bastiaan Veelo (9/42) Feb 07 2017 Hah of course. I assumed the name would give it some special

=?UTF-8?Q?Ali_=c3=87ehreli?= (9/10) Feb 07 2017 Of course we have to and the struct that I wrote is illegal because it

Bastiaan Veelo (12/22) Feb 07 2017 Gosh am I glad I brought that up, and for people like you hanging

=?UTF-8?Q?Ali_=c3=87ehreli?= (28/40) Feb 07 2017 Mandatory disclaimer: We can't be sure without testing.

Bastiaan Veelo (12/12) Feb 08 2017 Wrapup: I am going to go for the original approach of index

WhatMeForget (7/16) Feb 06 2017 There is a good metaphor in multi-story buildings. Americans

"DLearner" <bmqazwsx123 gmail.com> writes:

Does the D language set in stone that the first element of an 
array _has_ to be index zero?
Wouldn't starting array elements at one avoid the common 
'off-by-one' logic error, it does
seem more natural to begin a count at 1.

Actually, maybe even better to allow array definitions of form
int foo[x:y];
(y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

I think the (very old) IBM PL/I language was like this.

Aug 01 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 1/08/2015 9:35 p.m., DLearner wrote:
 Does the D language set in stone that the first element of an array
 _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 'off-by-one'
 logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

In c style languages (like D) the index actually defines the offset in 
memory. Not actually the index.
So while in some languages 1 is used instead of 0, 0 maps better to the 
hardware.

Think of this byte array:

ubyte* myptr = [

|---|-------|
| i | value |
|---|-------|
| 0 | 1     |
| 1 | 255   |

];

For 0 start of index:
size_t i = 0;
assert(myptr[i] == 1);

For 1 start of index:
size_t i = 1;
assert(i != 0);
assert(myptr[i-1] == 1);

While this is not the complete reason why 0 is chosen, it is something 
to think about.

Aug 01 2015

"bachmeier" <no spam.net> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

Seems you could easily wrap an array in a struct, define opIndex 
and opSlice appropriately, and use alias this to keep the other 
properties of the array. The problem you'll run into is 
interacting with other code that assumes zero-based indexing. I 
thought about setting the first index at 1 for my library that 
embeds D inside R, since that's how it's done in R, but very 
quickly realized how confusing it would be.

Aug 01 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?

For the builtin slice types? Yes, set in stone.

 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

See 
https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

As other commenters have/will point out, you can easily define a 
custom type in D that behaves as you describe, and - Dijkstra 
notwithstanding - there are valid uses for such things.

Aug 01 2015

"DLearner" <bmqazwsx123 gmail.com> writes:

On Saturday, 1 August 2015 at 17:55:06 UTC, John Colvin wrote:
 On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?

 For the builtin slice types? Yes, set in stone.

 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], 
 foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

 See 
 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

 As other commenters have/will point out, you can easily define 
 a custom type in D that behaves as you describe, and - Dijkstra 
 notwithstanding - there are valid uses for such things.

D is a C derivative, so it seems a shame not to identify causes 
of bugs in C,
and design them out in D.

For example, in C difficult to write non-trivial commercial 
programs without using pointers.
Pointer manipulation has a terrible reputation for bugs.
But in D, easy to write commercial programs without using 
pointers.
Problem has been designed away.

Similarly, off-by-one array bugs are commonplace in C.
We should seek to eliminate the source of those bugs, which 
basically reduces to the
issue that programmers find it unnatural to start a count at zero.
Whether they _should_ find a zero start unnatural is irrelevant - 
they just do as an
observed fact, so let's change the language so the issue is 
avoided (designed away).

Suggestion: if the codebase for D is considered so large that 
zero-basing cannot now be changed,
why not extend the language to allow for array definitions like 
'int[x:y] foo'?
And then have a rule that 'int[:y] bar' defines a 1-based array 
of y elements?

Aug 01 2015

Andrej Mitrovic via Digitalmars-d-learn writes:

On 8/1/15, DLearner via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:
 D is a C derivative, so it seems a shame not to identify causes
 of bugs in C,
 and design them out in D.

This has already been done! D defines an array to be a struct with a
pointer and a length. See this article:
http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625

I would argue it's not "off-by-one" that's causing most issues when
dealing with C "arrays", but instead it's in general out-of-bounds
issues (whether it's off bye one or off by 50..) since you often don't
have the length or could easily use the wrong variable as the length.

Think about how much D code would actually have subtle off-by-one
errors if D didn't use 0-based indexing like the majority of popular
languages use. Any time you would interface with other languages you
would have to double, triple-check all your uses of arrays.

FWIW at the very beginning I also found it odd that languages use
0-based indexing, but that was before I had any significant
programming experience under my belt. By now it's second nature to me
to use 0-based indexing.

Aug 01 2015

"bachmeier" <no spam.net> writes:

On Saturday, 1 August 2015 at 19:04:10 UTC, Andrej Mitrovic wrote:
 On 8/1/15, DLearner via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com> wrote:
 D is a C derivative, so it seems a shame not to identify causes
 of bugs in C,
 and design them out in D.

 This has already been done! D defines an array to be a struct 
 with a pointer and a length. See this article: 
 http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625

 I would argue it's not "off-by-one" that's causing most issues 
 when dealing with C "arrays", but instead it's in general 
 out-of-bounds issues (whether it's off bye one or off by 50..) 
 since you often don't have the length or could easily use the 
 wrong variable as the length.

 Think about how much D code would actually have subtle 
 off-by-one errors if D didn't use 0-based indexing like the 
 majority of popular languages use. Any time you would interface 
 with other languages you would have to double, triple-check all 
 your uses of arrays.

 FWIW at the very beginning I also found it odd that languages 
 use 0-based indexing, but that was before I had any significant 
 programming experience under my belt. By now it's second nature 
 to me to use 0-based indexing.

But what type of programming are you doing? Even after decades of 
programming and trying out dozens of languages, zero-based 
indexing still gets me at times when the arrays I work with 
represent vectors and matrices. Especially when porting code from 
other languages that use one-based indexing. One of the nice 
things about D is that it gives you the tools to easily make the 
change if you want.

Aug 01 2015

"QAston" <qastonx gmail.com> writes:

On Saturday, 1 August 2015 at 23:02:51 UTC, bachmeier wrote:
 But what type of programming are you doing? Even after decades 
 of programming and trying out dozens of languages, zero-based 
 indexing still gets me at times when the arrays I work with 
 represent vectors and matrices. Especially when porting code 
 from other languages that use one-based indexing. One of the 
 nice things about D is that it gives you the tools to easily 
 make the change if you want.

Adding 1-indexed arrays to the language fixes nothing. Just write 
your 1-indexed array type and if you enjoy using it, publish it 
as a library. Who knows, if demand is high it may even end up in 
phobos.

Aug 02 2015

"bachmeier" <no spam.com> writes:

On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just 
 write your 1-indexed array type and if you enjoy using it, 
 publish it as a library. Who knows, if demand is high it may 
 even end up in phobos.

Oh, I don't think that's a good idea. It's too confusing to have 
more than one method of indexing within the same language. You 
just have to do a thorough job of testing, as the possibility of 
errors is something you'll have to live with, given the different 
design choices of different languages.

Aug 03 2015

"DLearner" <bmqazwsx123 gmail.com> writes:

On Monday, 3 August 2015 at 13:45:01 UTC, bachmeier wrote:
 On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just 
 write your 1-indexed array type and if you enjoy using it, 
 publish it as a library. Who knows, if demand is high it may 
 even end up in phobos.

 Oh, I don't think that's a good idea. It's too confusing to 
 have more than one method of indexing within the same language. 
 You just have to do a thorough job of testing, as the 
 possibility of errors is something you'll have to live with, 
 given the different design choices of different languages.

Looks like 0-base is fixed, to avoid problems with existing code.

But nothing stops _adding_ to the language by allowing
int[x:y] foo to mean valid symbols are foo[x], foo[x+1],..., 
foo[y].
Plus rule that int[:y] means valid symbols are foo[1], 
foo[2],..., foo[y].

That way, 1-start achieved, with no conflict with existing code?

Aug 03 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Monday, August 03, 2015 21:32:03 DLearner via Digitalmars-d-learn wrote:
 On Monday, 3 August 2015 at 13:45:01 UTC, bachmeier wrote:
 On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just
 write your 1-indexed array type and if you enjoy using it,
 publish it as a library. Who knows, if demand is high it may
 even end up in phobos.

 Oh, I don't think that's a good idea. It's too confusing to
 have more than one method of indexing within the same language.
 You just have to do a thorough job of testing, as the
 possibility of errors is something you'll have to live with,
 given the different design choices of different languages.

 Looks like 0-base is fixed, to avoid problems with existing code.

 But nothing stops _adding_ to the language by allowing
 int[x:y] foo to mean valid symbols are foo[x], foo[x+1],...,
 foo[y].
 Plus rule that int[:y] means valid symbols are foo[1],
 foo[2],..., foo[y].

 That way, 1-start achieved, with no conflict with existing code?

Almost all programming languages in heavy use at this point in time start
indexing at 0. It would be highly confusing to almost all programmers out
there to have 1-based indexing. In addition, having 0-based indexing
actually makes checking against the end of arrays and other random-access
ranges easier. You can just check against length without having to do any
math. In general, I would expect 1-based indexing to _increase_ the number
of off by one errors in code - both because 0-based indexing helps avoid
such problems when dealing with the end of the array and more importantly,
because almost everyone expects 0-based indexing.

You're really barking up the wrong tree if you're trying to get any support
for 1-based indexing in D. I doubt that you will see much of anyone who
thinks that it's even vaguely a good idea, and there's no way that Walter or
Andrei (or probably anyone in the main dev team) who is going to agree that
it's even worth considering.

I think that the reality of the matter is that if you're going to do much
programming - especially if you're going to be professional programmer - you
just need to get used to the idea that array indices start at 0. There are a
few languages out there where they don't, but they are far from the norm.

- Jonathan M Davis

Aug 03 2015

"QAston" <qaston gmail.com> writes:

On Monday, 3 August 2015 at 21:32:05 UTC, DLearner wrote:
 Looks like 0-base is fixed, to avoid problems with existing 
 code.

 But nothing stops _adding_ to the language by allowing
 int[x:y] foo to mean valid symbols are foo[x], foo[x+1],..., 
 foo[y].
 Plus rule that int[:y] means valid symbols are foo[1], 
 foo[2],..., foo[y].

 That way, 1-start achieved, with no conflict with existing code?

There're quite a few things stopping this from being added to the 
language.

1. People will have to learn this new feature and it's 
interaction with gazillion of other D features.

2. There would be a redundancy - core language will have 2 array 
types while one of them can be easily implemented using the other.

3. Devs will have to maintain it - as if they don't have enough 
things to fix atm.

Really, this is so simple to do as a library - just use opIndex, 
opSlice with a template struct.

As a general rule - start asking for language features only when 
things can't be done without them.

Aug 04 2015

"jmh530" <john.michael.hall gmail.com> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

I come from matlab and R which start matrices at 1. I used to 
think it was more natural. However after I started using numpy I 
now think 0 index is better.
Also see 
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html
Give it a try. You might find you like it.

Aug 03 2015

"Observer" <spurious.address yahoo.com> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

This experiment has already been run.  Perl used to support a $[ 
variable to set the array base.  After experience with the 
confusion and problems that causes, it was finally deprecated and 
effectively removed from the language.  See the end paragraphs of 
http://perldoc.perl.org/perlvar.html and also
http://search.cpan.org/~wolfsage/perl/ext/arybase/arybase.pm for 
more info.

Aug 03 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

I, too, don't think this is a good idea in general, but I can see 
a few use-cases where 1-based indices may be more natural. It's 
easy to define a wrapper:

     struct OneBasedArray(T) {
         T[] _payload;

         alias _payload this;

         T opIndex(size_t index) {
             assert(index > 0);
             return _payload[index-1];
         }

         void opIndexAssign(U : T)(size_t index, auto ref U value) 
{
             assert(index > 0);
             _payload[index-1] = value;
         }
     }

     unittest {
         OneBasedArray!int arr;
         arr = [1,2,3];
         arr ~= 4;
         assert(arr.length == 4);
         assert(arr[1] == 1);
         assert(arr[2] == 2);
         assert(arr[3] == 3);
         assert(arr[4] == 4);
     }

Test with:

     rdmd -main -unittest xx.d

This can of course be easily extended to support other bases than 
one.

Aug 04 2015

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Tuesday, 4 August 2015 at 08:18:50 UTC, Marc Schütz wrote:
         void opIndexAssign(U : T)(size_t index, auto ref U 
 value) {

Careful here, you've got the arguments reversed. The unit test 
didn't detect this because it was ambiguous. This one isn't:

     unittest {
         OneBasedArray!int arr;
         arr = [1,2,3];
         arr ~= 14;
         assert(arr.length == 4);
         assert(arr[1] == 1);
         assert(arr[2] == 2);
         assert(arr[3] == 3);
         assert(arr[4] == 14);
     }

Feb 06 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Monday, 6 February 2017 at 14:26:35 UTC, Bastiaan Veelo wrote:
 The unit test didn't detect this because it was ambiguous.

Sorry for that misinformation. I should have said that 
opIndexAssign wasn't tested. Here is a better test.

     unittest {
         OneBasedArray!int arr;
         arr = [1,2,3];
         arr ~= 4;
         arr[4] = 14;
         assert(arr.length == 4);
         assert(arr[1] == 1);
         assert(arr[2] == 2);
         assert(arr[3] == 3);
         assert(arr[4] == 14);
     }

Feb 06 2017

pineapple <meapineapple gmail.com> writes:

One reason for zero-based indexes that isn't "it's what we're all 
used to" is that if you used one-based indexes, you would be able 
to represent one fewer index than zero-based, since one of the 
representable values - zero - could no longer be used to 
represent any index.

Also, it's what we're all used to, and it makes perfect sense to 
a lot of us, and the only times in recent memory I've ever made 
off-by-one errors were when I was trying to use Lua and its 
one-based indexing.

Feb 06 2017

Nemanja Boric <4burgos gmail.com> writes:

On Monday, 6 February 2017 at 18:55:19 UTC, pineapple wrote:
 One reason for zero-based indexes that isn't "it's what we're 
 all used to" is that if you used one-based indexes, you would 
 be able to represent one fewer index than zero-based, since one 
 of the representable values - zero - could no longer be used to 
 represent any index.

 Also, it's what we're all used to, and it makes perfect sense 
 to a lot of us, and the only times in recent memory I've ever 
 made off-by-one errors were when I was trying to use Lua and 
 its one-based indexing.

Related:

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

Feb 06 2017

bachmeier <no spam.net> writes:

On Monday, 6 February 2017 at 18:55:19 UTC, pineapple wrote:
 One reason for zero-based indexes that isn't "it's what we're 
 all used to" is that if you used one-based indexes, you would 
 be able to represent one fewer index than zero-based, since one 
 of the representable values - zero - could no longer be used to 
 represent any index.

 Also, it's what we're all used to, and it makes perfect sense 
 to a lot of us, and the only times in recent memory I've ever 
 made off-by-one errors were when I was trying to use Lua and 
 its one-based indexing.

In the scientific programming community, languages like Matlab, 
Fortran, and R use one-based indexing for a very good reason. 
Zero-based indexing is a computer science thing that makes no 
sense to someone working with equations.

Feb 06 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

(There is an honest question in the end, please read on.)

All good reasons set aside, both in favour and against 0-based 
arrays, the only reason that is relevant to me right now is that 
we are seriously looking into translating close to a million 
lines of foreign code to D, from a language that supports arrays 
over arbitrary ranges of indices (positive and negative). 
Adapting this much program logic to a different array base is out 
of the question; this is an engineering application and lives are 
at stake. So let's not bring the base-arguments back into this 
sub-thread but focus on a performant solution.

Expanding on Marc's outset, I now have:

/*****************
  * A fixed-length array with an index that runs from $(D_PARAM 
first)
  * to $(D_PARAM last) inclusive.
  *
  * Indices are converted, which involves a small overhead.
  */
struct StaticArray(T, ptrdiff_t first, ptrdiff_t last) {
     T[last - first + 1] _payload;

     alias _payload this;

     // Support e = arr[5];
     ref T opIndex(ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         return _payload[index - first];
     }

     // Support arr[5] = e;
     void opIndexAssign(U : T)(auto ref U value, ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         _payload[index - first] = value;
     }

     // Support foreach(e; arr).
     int opApply(scope int delegate(ref T) dg)
     {
         int result = 0;

         for (int i = 0; i < _payload.length; i++)
         {
             result = dg(_payload[i]);
             if (result)
                 break;
         }
         return result;
     }

     // Support foreach(i, e; arr).
     int opApply(scope int delegate(ptrdiff_t index, ref T) dg)
     {
         int result = 0;

         for (int i = 0; i < _payload.length; i++)
         {
             result = dg(i + first, _payload[i]);
             if (result)
                 break;
         }
         return result;
     }

     // Write to binary file.
     void toFile(string fileName)
     {
         import std.stdio;
         auto f = File(fileName, "wb");
         if (f.tryLock)
         {
             f.rawWrite(_payload);
         }
     }
}

unittest {
     StaticArray!(int, -10, 10) arr;
     assert(arr.length == 21);

     foreach (ref e; arr)
         e = 42;
     assert(arr[-10] == 42);
     assert(arr[0]   == 42);
     assert(arr[10]  == 42);

     foreach (i, ref e; arr)
         e = i;
     assert(arr[-10] == -10);
     assert(arr[0]   ==   0);
     assert(arr[5]   ==   5);
     assert(arr[10]  ==  10);

     arr[5] = 15;
     assert(arr[5]   == 15);
}

//////////////

(The first and last indices probably don't need to be template 
arguments, they could possibly be immutable members of the 
struct; But that is not what worries me now.)
The thing is that a small overhead is paid in opIndex() and 
opIndexAssign() (the other member functions are fine). I'd like 
to get rid of that overhead.

In "Numerical Recipes in C", section 1.2, Press et al. propose an 
easy solution using an offset pointer:

float b[4], *bb;
bb = b - 1;

Thus bb[1] through bb[4] all exist, no space is wasted nor is 
there a run-time overhead.

I have tried to adapt the internals of my struct to Press' 
approach, but it seems to me it is not that easy in D -- or I'm 
just not proficient enough. Can someone here show me how that 
could be done?

Thanks!

Feb 06 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/06/2017 03:00 PM, Bastiaan Veelo wrote:

 In "Numerical Recipes in C", section 1.2, Press et al. propose an easy
 solution using an offset pointer:

 float b[4], *bb;
 bb = b - 1;

 Thus bb[1] through bb[4] all exist, no space is wasted nor is there a
 run-time overhead.

 I have tried to adapt the internals of my struct to Press' approach, but
 it seems to me it is not that easy in D -- or I'm just not proficient
 enough. Can someone here show me how that could be done?

struct StaticArray(T, ptrdiff_t first, ptrdiff_t last) {
     T[last - first + 1] _payload;

If the static array has to be a member like that (as opposed to a 
dynamic one), you need an extra member:

     T* _ptr;

Because structs cannot have default constructors, you need an init() 
function to establish the invariant:

     void init() {
         this._ptr = _payload.ptr - first;
     }

Of course, that relies on the fact that _payload.ptr - first is a valid 
address to hold in a pointer.

[OT]:
     // Commented-out as it seems dangerous because _payload does not
     // have the same semantics as StaticArray.
     //
     //  alias _payload this;

Which means, you would implement length() yourself:

     size_t length() {
         return _payload.length;
     }

Then you use _ptr when indexing:

     // Support e = arr[5];
     ref T opIndex(ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         return *(_ptr + index);
     }

Ali

Feb 06 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Monday, 6 February 2017 at 23:42:55 UTC, Ali Çehreli wrote:
 Then you use _ptr when indexing:

     // Support e = arr[5];
     ref T opIndex(ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         return *(_ptr + index);
     }

 Ali

Thank you very much for your time. Sadly this gives me an access 
violation. The traceback doesn't seem right though, as it seems 
to jump to opIndexAssign from where opIndex is used. If I'm not 
mistaken, I have confirmed that _ptr is a valid address, see 
below. We do not need to take measures against the GC?

- Bastiaan.


On Windows:

rdmd -main -unittest -debug -g source\epcompat\array.d

object.Error (0): Access Violation
----------------
0x00402057 in void epcompat.array.__unittestL81_1() at 
C:\SARC\Pascal2017\D\epcompat\source\epcompat\array.d(88)
0x0040465C in void epcompat.array.__modtest() at 
C:\SARC\Pascal2017\D\epcompat\source\epcompat\array.d(39)
0x0040A455 in int 
core.runtime.runModuleUnitTests().__foreachbody1(object.ModuleInfo*)
0x0040C007 in int object.ModuleInfo.opApply(scope int 
delegate(object.ModuleInfo*)).__lambda2(immutable(object.ModuleInfo*))
0x00406808 in _d_run_main
0x004046D4 in main at 
C:\SARC\Pascal2017\D\epcompat\source\epcompat\array.d(7)
0x004225DD in mainCRTStartup
0x779C62C4 in BaseThreadInitThunk
0x77B50FD9 in RtlSubscribeWnfStateChangeNotification
0x77B50FA4 in RtlSubscribeWnfStateChangeNotification

The complete source:

module epcompat.array;

// Test with rdmd -main -unittest -debug -g 
source\epcompat\array.d

alias StaticArray_offset StaticArray;

/*****************
  * A fixed-length array with an index that runs from $(D_PARAM 
first)
  * to $(D_PARAM last) inclusive.
  *
  * Implemented by means of an offset pointer.
  */
struct StaticArray_offset(T, ptrdiff_t first, ptrdiff_t last) {
     T[last - first + 1] _payload;
     T* _ptr;

     void init() {
         assert( first < cast(size_t)_payload.ptr);              
// Address space underrun.
         assert(-first < size_t.max - cast(size_t)_payload.ptr); 
// Address space overrun.
         this._ptr = _payload.ptr - first;
     }

     size_t length() {
         return _payload.length;
     }

     // Support e = arr[5];
     ref T opIndex(ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         return *(_ptr + index);
     }

     // Support arr[5] = e;
     void opIndexAssign(U : T)(auto ref U value, ptrdiff_t index) {
         assert(index >= first);
         assert(index <= last);
         *(_ptr + index) = value;
     }   // Line 39

     // Support foreach(e; arr).
     int opApply(scope int delegate(ref T) dg)
     {
         int result = 0;

         for (int i = 0; i < _payload.length; i++)
         {
             result = dg(_payload[i]);
             if (result)
                 break;
         }
         return result;
     }

     // Support foreach(i, e; arr).
     int opApply(scope int delegate(ptrdiff_t index, ref T) dg)
     {
         int result = 0;

         for (int i = first; i <= last; i++)
         {
             result = dg(i, *(_ptr + i));
             if (result)
                 break;
         }
         return result;
     }

     // Write to binary file.
     void toFile(string fileName)
     {
         import std.stdio;
         auto f = File(fileName, "wb");
         if (f.tryLock)
         {
             f.rawWrite(_payload);
         }
     }
}

unittest {
     StaticArray!(int, -10, 10) arr;
     assert(arr.length == 21);

     foreach (ref e; arr)
         e = 42;

     assert(arr[-10] == 42); // Line 88
     assert(arr[0]   == 42);
     assert(arr[10]  == 42);

     foreach (i, ref e; arr)
         e = i;

     assert(arr[-10] == -10);
     assert(arr[0]   ==   0);
     assert(arr[5]   ==   5);
     assert(arr[10]  ==  10);

     arr[5] = 15;
     assert(arr[5]   == 15);
}

Feb 07 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/07/2017 02:11 AM, Bastiaan Veelo wrote:

     void init() {
         assert( first < cast(size_t)_payload.ptr);              //
 Address space underrun.
         assert(-first < size_t.max - cast(size_t)_payload.ptr); //
 Address space overrun.
         this._ptr = _payload.ptr - first;
     }

You forgot to call that most important function. ;)

1) I don't understand the first assert there, which does not pass for 
me, so I commented it out.

2) May bad: init() is not a good name for a struct member, so it should 
be renamed:

     void initialize() {
         // assert( first < cast(size_t)_payload.ptr);              // 
Address space underrun.
         assert(-first < size_t.max - cast(size_t)_payload.ptr); // 
Address space overrun.
         this._ptr = _payload.ptr - first;
     }

3) Instead of having to remember to call it, let's introduce a function 
that does it for us:

auto makeStaticArray(T, ptrdiff_t first, ptrdiff_t last)() {
     auto s = StaticArray!(T, first, last)();
     s.initialize();
     return s;
}

unittest {
     // StaticArray!(int, -10, 10) arr;
     auto arr = makeStaticArray!(int, -10, 10);

     foreach (i, ref e; arr)
         e = i;

Unrelated: That line passes because you're building 32-bits. Here is the 
error I got:

   Error: cannot implicitly convert expression (i) of type long to int

You can cast it:

         e = cast(int)i;

or by

         import std.conv : to;
         e = i.to!int;

Ali

Feb 07 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Tuesday, 7 February 2017 at 20:28:30 UTC, Ali Çehreli wrote:
 You forgot to call that most important function. ;)

Hah of course. I assumed the name would give it some special 
meaning, like postblit.

 1) I don't understand the first assert there, which does not 
 pass for me, so I commented it out.

The intention is to check that (_payload.ptr - first) is larger 
than 0.

 2) May bad: init() is not a good name for a struct member, so 
 it should be renamed:

     void initialize() {
         // assert( first < cast(size_t)_payload.ptr);
    // Address space underrun.
         assert(-first < size_t.max - cast(size_t)_payload.ptr); 
 // Address space overrun.
         this._ptr = _payload.ptr - first;
     }

 3) Instead of having to remember to call it, let's introduce a 
 function that does it for us:

 auto makeStaticArray(T, ptrdiff_t first, ptrdiff_t last)() {
     auto s = StaticArray!(T, first, last)();
     s.initialize();
     return s;
 }

OK good.

 unittest {
     // StaticArray!(int, -10, 10) arr;
     auto arr = makeStaticArray!(int, -10, 10);

     foreach (i, ref e; arr)
         e = i;

 Unrelated: That line passes because you're building 32-bits. 
 Here is the error I got:

   Error: cannot implicitly convert expression (i) of type long 
 to int

 You can cast it:

         e = cast(int)i;

 or by

         import std.conv : to;
         e = i.to!int;

Thanks a lot for your illustrative answers, including the next 
one!

Bastiaan.

Feb 07 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Pressed send too soon, before considering your GC question.

On 02/07/2017 02:11 AM, Bastiaan Veelo wrote:

 We do not need to take measures against the GC?

Of course we have to and the struct that I wrote is illegal because it 
is self-referencing through the _ptr member. (D has the right to move 
structs around, making my _ptr potentially pointing to an illegal address.)

This optimization cannot work if the array is a static array inside the 
same struct. It would work with a dynamic array but then it would 
probably be slower than applying the *(ptr+index) technique.

Ali

Feb 07 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Tuesday, 7 February 2017 at 20:33:35 UTC, Ali Çehreli wrote:
 On 02/07/2017 02:11 AM, Bastiaan Veelo wrote:

 We do not need to take measures against the GC?

 Of course we have to and the struct that I wrote is illegal 
 because it is self-referencing through the _ptr member. (D has 
 the right to move structs around, making my _ptr potentially 
 pointing to an illegal address.)

Gosh am I glad I brought that up, and for people like you hanging 
out in the learn group. Thanks for pointing this out!

 This optimization cannot work if the array is a static array 
 inside the same struct. It would work with a dynamic array but 
 then it would probably be slower than applying the *(ptr+index) 
 technique.

You mean slower than the _payload[index - first] technique? Is 
that because of heap memory versus stack memory?

Am I correct that the reason that a dynamic array would work is 
because it is allocated independently from the struct, meaning 
that its address stays the same even if the struct is moved? We 
could .reserve a dynamic array in initialize() to prevent it 
being resized.

Thanks again.
Bastiaan.

Feb 07 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/07/2017 02:04 PM, Bastiaan Veelo wrote:

 This optimization cannot work if the array is a static array inside
 the same struct. It would work with a dynamic array but then it would
 probably be slower than applying the *(ptr+index) technique.

 You mean slower than the _payload[index - first] technique? Is that
 because of heap memory versus stack memory?

Mandatory disclaimer: We can't be sure without testing.

Not exactly because stack versus heap because the whole object might be 
sitting in heap memory anyway:

alias S = StaticArray!(/* ... */);

     S[] objects;
     objects ~= S(/* ... */);

So, all of those are on the heap.

It's more about having everything at hand, near each other, close in 
memory. If the member is a static array, then when we have an S[], all 
members are there to be operated on. If the array in dynamic, then an 
S[] has indirections, reaching out to the heap, which may involve 
long-latency memory reads.

     foreach (s; objects) {
         s[42];    // May have a cache miss and read from memory
     }

In the static array case, the entire body of s is already on the cache.

On the other hand, when the objects are large, then few of those can fit 
in the cache. etc. One needs to profile to see what works better.

 Am I correct that the reason that a dynamic array would work is because
 it is allocated independently from the struct, meaning that its address
 stays the same even if the struct is moved?

Yes, that would be a requirement to prevent the self-reference.

 We could .reserve a dynamic
 array in initialize() to prevent it being resized.

Makes sense.

Another solution that came to my mind: You can keep the self-referencing 
solution but make sure that the objects are not moved after initialize():

     S[] objects;
     objects.reserve(100);
     objects.each!(o => o.initialize());
     // The objects will not move if you're careful

 Thanks again.
 Bastiaan.

Ali

Feb 07 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

Wrapup: I am going to go for the original approach of index 
conversion, and leaving the offset-pointer approach for what it 
is.

Reasons:
1) uncertain efficiency gain/loss,
2) theoretically it may fail,
3) .sizeof does not include the payload,
4) analysis of the assembler generated by our reference compiler 
(Prospero) shows that it also uses index conversion. 
(Interestingly, it does so at compile time, when possible).

Thanks for many interesting insights.

Bastiaan.

Feb 08 2017

WhatMeForget <kheaser gmail.com> writes:

On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.

There is a good metaphor in multi-story buildings. Americans 

the UK, their first floor is what Americans would call the 2nd 
floor.

But then how can you trust a group of people who drive on the 
wrong side of the road :)

Feb 06 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Array start index