www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should r.length of type ulong be supported on 32-bit systems?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
https://github.com/dlang/phobos/pull/4827 still allows that but 
specifies that phobos algorithms are not required to. -- Andrei
Sep 30 2016
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but specifies that
 phobos algorithms are not required to. -- Andrei
A couple instances where a length may be longer than the address space: 1. large files 2. sequences of mathematically generated values I don't see a particular advantage to restricting hasLength() to be size_t or ulong. Whether an algorithm requires the length to be within the addressable capabilities of the CPU or not should be up to the algorithm. There is also cause for r.length to return an int even on a 64 bit system - it can be more efficient.
Sep 30 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/30/2016 11:58 PM, Walter Bright wrote:
 On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but
 specifies that
 phobos algorithms are not required to. -- Andrei
A couple instances where a length may be longer than the address space: 1. large files 2. sequences of mathematically generated values I don't see a particular advantage to restricting hasLength() to be size_t or ulong. Whether an algorithm requires the length to be within the addressable capabilities of the CPU or not should be up to the algorithm. There is also cause for r.length to return an int even on a 64 bit system - it can be more efficient.
Indeed the "I have a dream" cases are quite the lure. The reality in the field, however, is that Phobos thoroughly assumes length has type size_t. The problem is defining hasLength in ways that are at the same time general and that also work is extremely laborious. Just grep through Consider: 1. std.algorithm.comparison:1007 (abridged) const rc = mulu(r, c, overflow); if (_matrix.length < rc) So here we assume length is comparable for inequality with a const size_t. That should probably rule out signed integrals. 2. ibid, 1694 (abridged) static if (hasLength!(Range1) && hasLength!(Range2)) { return r1.length == r2.length; } Here we assume any two ranges satisfying hasLength have lengths comparable for equality. That projects a rather tenuous definition of hasLength: "the range must define a length that is comparable for equality with any other range that also defines a length". Not impossible, but definitely not what we have right now, de facto or de jure. 3. ibid, 634 (abridged) immutable len = min(r1.length, r2.length); So here two lengths must be comparable for ordering, have a common type, and have an immutable constructor. 4. ibid, 656 (abridged) return threeWay(r1.length, r2.length); Here threeWay takes two size_t. So both lengths must be at least convertible implicitly to size_t. I got to the end of the first file I scanned with grep, and I don't know what a robust definition of hasLength should be - outside of size_t itself. The more locations one examines, the more the capabilities required get closer to those of size_t - or the more latent bugs exist in phobos if we want to support something else. We have two options here. 1. Support a range of types (e.g.: any integral and any user-defined type that supports the following list of primitives: ...) for length with hasLength, and then have each range-based algorithm define its own additional restrictions if they have such. That means right now a bunch of phobos code is incorrect and needs to be fixed. 2. Require lengths to be size_t. I'm all for generality, but of the useful kind. It seems to me supporting a complicated definition of length is a protracted proposition with little upside. That's why I'm going with the engineering solution in the PR. Andrei
Sep 30 2016
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/30/2016 10:04 PM, Andrei Alexandrescu wrote:
 I'm all for generality, but of the useful kind. It seems to me supporting a
 complicated definition of length is a protracted proposition with little
upside.
 That's why I'm going with the engineering solution in the PR.
You make a good case. Agreed.
Sep 30 2016
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 1 Oct 2016 01:04:01 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 09/30/2016 11:58 PM, Walter Bright wrote:
 On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote:  
 https://github.com/dlang/phobos/pull/4827 still allows that but
 specifies that
 phobos algorithms are not required to. -- Andrei  
A couple instances where a length may be longer than the address space: 1. large files 2. sequences of mathematically generated values I don't see a particular advantage to restricting hasLength() to be size_t or ulong. Whether an algorithm requires the length to be within the addressable capabilities of the CPU or not should be up to the algorithm. There is also cause for r.length to return an int even on a 64 bit system - it can be more efficient.
Indeed the "I have a dream" cases are quite the lure. The reality in the field, however, is that Phobos thoroughly assumes length has type size_t. The problem is defining hasLength in ways that are at the same time general and that also work is extremely laborious. Just grep through Consider: 1. std.algorithm.comparison:1007 (abridged) const rc = mulu(r, c, overflow); if (_matrix.length < rc) So here we assume length is comparable for inequality with a const size_t. That should probably rule out signed integrals. 2. ibid, 1694 (abridged) static if (hasLength!(Range1) && hasLength!(Range2)) { return r1.length == r2.length; } Here we assume any two ranges satisfying hasLength have lengths comparable for equality. That projects a rather tenuous definition of hasLength: "the range must define a length that is comparable for equality with any other range that also defines a length". Not impossible, but definitely not what we have right now, de facto or de jure. 3. ibid, 634 (abridged) immutable len = min(r1.length, r2.length); So here two lengths must be comparable for ordering, have a common type, and have an immutable constructor. 4. ibid, 656 (abridged) return threeWay(r1.length, r2.length); Here threeWay takes two size_t. So both lengths must be at least convertible implicitly to size_t. I got to the end of the first file I scanned with grep, and I don't know what a robust definition of hasLength should be - outside of size_t itself. The more locations one examines, the more the capabilities required get closer to those of size_t - or the more latent bugs exist in phobos if we want to support something else. We have two options here. 1. Support a range of types (e.g.: any integral and any user-defined type that supports the following list of primitives: ...) for length with hasLength, and then have each range-based algorithm define its own additional restrictions if they have such. That means right now a bunch of phobos code is incorrect and needs to be fixed. 2. Require lengths to be size_t. I'm all for generality, but of the useful kind. It seems to me supporting a complicated definition of length is a protracted proposition with little upside. That's why I'm going with the engineering solution in the PR. Andrei
Me too, I haven't looked into it, but I noticed that all the points you mentioned are resolved if length was required to be a natural number, i.e. unsigned basic type. Based on these points alone, I tend to agree with Walter. Note: Point 4. is arguable, because `threeWay` is a local function designed to compare char types and only used once to compare lengths under the (silent) assumption that they are < int.max. I.e. It would be cleaner to replace that call with an explicit comparison. -- Marco
Oct 01 2016
prev sibling next sibling parent Meta <jared771 gmail.com> writes:
On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu 
wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but 
 specifies that phobos algorithms are not required to. -- Andrei
In at least some cases it's bad. See: http://forum.dlang.org/thread/vqeeyqprqphwktebgaiw forum.dlang.org?page=1
Sep 30 2016
prev sibling next sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Friday, September 30, 2016 22:31:23 Andrei Alexandrescu via Digitalmars-d 
wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but
 specifies that phobos algorithms are not required to. -- Andrei
Every time this comes up, I strongly argue in favor of requiring that length be size_t. Occasionally, that restriction is annoying, but allowing anything else plays havoc with generic code. It's even worse when you have to start dealing with the fact that the same code needs to compile and function the same on both 32-bit and 64-bit systems. At one point, iota was changed to support lengths of long or ulong in order to give you the full range of numbers for a range of long or ulong on 32-bit systems, and that's caused us a fair bit of grief in various places in Phobos, because everything was assuming that length was size_t and doing anything else can get fairly complicated. And I think that it's still the case that a fair bit of Phobos will fall flat on its face when dealing with a length that isn't size_t. Fortunately, it looks like iota was finally fixed to use size_t for length again. This is some loss of expressiveness as a result of requiring that length be size_t, but it's rarely a problem, and it simplifies code that deals with length significantly. - Jonathan M Davis
Oct 01 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 1 October 2016 at 08:09:45 UTC, Jonathan M Davis 
wrote:
 On Friday, September 30, 2016 22:31:23 Andrei Alexandrescu via 
 Digitalmars-d wrote:
 [...]
Every time this comes up, I strongly argue in favor of requiring that length be size_t. Occasionally, that restriction is annoying, but allowing anything else plays havoc with generic code. It's even worse when you have to start dealing with the fact that the same code needs to compile and function the same on both 32-bit and 64-bit systems. [...]
+1
Oct 01 2016
prev sibling parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu 
wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but 
 specifies that phobos algorithms are not required to. -- Andrei
I don't have much experience, but IMHO the length should be restricted to be any of the built-in unsigned integral types (ubyte, ushort, uint, size_t, ulong). This way, all comparisons for equality or inequality are defined and correct, and the common type exists and is the one you'd expect. Also, every other operation on them works as expected, because smaller types are widened and the absence of signed types guarantees no strange results. The only problem that arises with this definition of length is that functions cannot take a length parameter as size_t, because that could truncate it. Possible solutions, ordered from the best to the worst: 1) functions should try not to take lengths; they should take ranges or slices, which can be queried for their length; 2) functions could take lengths as the greatest possible type (ulong); 3) functions could be parameterized on the length type; 4) a minority of functions could, if deemed really necessary (for performance or whatever), use size_t, and it should be documented that they expect the length to be limited by the addressable length of the machine, and for which reason they do.
Oct 01 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/01/2016 05:30 AM, Lodovico Giaretta wrote:
 On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu wrote:
 https://github.com/dlang/phobos/pull/4827 still allows that but
 specifies that phobos algorithms are not required to. -- Andrei
I don't have much experience, but IMHO the length should be restricted to be any of the built-in unsigned integral types (ubyte, ushort, uint, size_t, ulong). This way, all comparisons for equality or inequality are defined and correct, and the common type exists and is the one you'd expect. Also, every other operation on them works as expected, because smaller types are widened and the absence of signed types guarantees no strange results.
I've looked through phobos and sizes smaller than size_t present less risk. The issue is that e.g. code is liable to do: auto len = r.length; and the proceeds to use len tacitly assuming it's a size_t. If r.length were allowed to return e.g. ubyte, doing something like ++len may overflow a lot earlier than the code would expect. I made a quick pass through Phobos looking for potential issues. I found a couple in replaceLast (uses arithmetic on indexes obtained with auto) but not much else (in fairness I stopped at std/file.d in whatever order "git grep" lists files). So in all likelihood sizes smaller than size_t may be easier to make work. However, this seems to complicate definitions without a corresponding payoff. For example a natural consequence would be to have operator[] take the same type as the length. That would definitely cause problems because people pass index computations to operator[] such as 1, idx + 1, or $ - 1.
 The only problem that arises with this definition of length is that
 functions cannot take a length parameter as size_t, because that could
 truncate it.
I think Jonathan is right: sizes longer than size_t have numerous issues - in fact enough that the limiting decision was made (by a group of contributors and approved by us) to force iota.length to return size_t. Andrei
Oct 01 2016