digitalmars.D.learn - Range length property

=?UTF-8?B?Tm9yZGzDtnc=?= (3/3) Apr 10 2018 Should ranges always provide a length property?

Adam D. Ruppe (8/11) Apr 10 2018 Just provide it whenever it is cheap to do so. If you need to do

=?UTF-8?B?Tm9yZGzDtnc=?= (5/16) Apr 10 2018 I'm thinking of my own container Hashmap having its range

Jonathan M Davis (26/46) Apr 10 2018 That would depend entirely on what you're trying to do, but in general, ...

Steven Schveighoffer (8/17) Apr 10 2018 e.g. std.array.array is going to pre-allocate an array of the correct

Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/9) Apr 10 2018 Thanks, that's what I thought too.

Cym13 (25/78) Apr 10 2018 I find that discussion very interesting as I had never considered

Jonathan M Davis (27/109) Apr 10 2018 In general, if you care about efficient code, it becomes critical that
H. S. Teoh (12/23) Apr 10 2018 [...]
Steven Schveighoffer (15/90) Apr 10 2018 O(lg n) is fine for .length, it doesn't need to be O(1). It just can't

Jonathan M Davis (31/34) Apr 10 2018 Whether a range has a length property or not is primarily dependent on h...

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

Should ranges always provide a length property?

If so, in which cases is a length property an advantage or a 
requirement?

Apr 10 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?

No.

 If so, in which cases is a length property an advantage or a 
 requirement?

Just provide it whenever it is cheap to do so. If you need to do 
complex calculations or especially loop over contents to figure 
out the length, do NOT provide it.

But if it is as simple as returning some value, provide it and 
algorithms can take advantage of it for optimizations etc. as 
needed.

Apr 10 2018

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?

 No.

 If so, in which cases is a length property an advantage or a 
 requirement?

 Just provide it whenever it is cheap to do so. If you need to 
 do complex calculations or especially loop over contents to 
 figure out the length, do NOT provide it.

 But if it is as simple as returning some value, provide it and 
 algorithms can take advantage of it for optimizations etc. as 
 needed.

I'm thinking of my own container Hashmap having its range 
ByKeyValue requiring one extra word of memory to store the 
iteration count which, in turn, can be used to calculate the 
length of the remaining range. Is this motivated?

Apr 10 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, April 10, 2018 19:47:10 Nordl�w via Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordl�w wrote:
 Should ranges always provide a length property?

 No.

 If so, in which cases is a length property an advantage or a
 requirement?

 Just provide it whenever it is cheap to do so. If you need to
 do complex calculations or especially loop over contents to
 figure out the length, do NOT provide it.

 But if it is as simple as returning some value, provide it and
 algorithms can take advantage of it for optimizations etc. as
 needed.

 I'm thinking of my own container Hashmap having its range
 ByKeyValue requiring one extra word of memory to store the
 iteration count which, in turn, can be used to calculate the
 length of the remaining range. Is this motivated?

That would depend entirely on what you're trying to do, but in general, if a
range has length, then some algorithms will be more efficient, and some
algorithms do require length. So, if you can provide length, then the range
will be more useful, just like a bidirectional range can be more useful than
a forward range or a random-access range can be more useful than either.
However, if you're not doing anything that ever benefits from it having
length, then it doesn't buy you anything. So, it ultimately depends on what
you're doing. In a general purpose library, I'd say that it should have
length if it can do so in O(1), but if it's just for you, then it may or may
not be worth it.

The other thing to consider is what happens when the container is mutated. I
don't think that ranges necessarily behave all that well when an underlying
container is mutated, but it is something that has to be considered when
dealing with a range over a container. Even if mutating the underlying
container doesn't necessarily invalidate a range, maintaining the length in
the manner that you're suggesting probably makes it so that it would be
invalidated in more cases, since if any elements are added or removed in the
portion that was already popped off the range, then the iteration count
couldn't be used to calculate the length in the same way anymore. Now, with
a hash map, the range is probably fully invalidated when anything gets added
or removed anyway, since that probably screws with the order of the elements
in the range, but how the range is going to behave when the underlying
container is mutated and how having the length property does or doesn't
affect that is something that you'll need to consider.

- Jonathan M Davis

Apr 10 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 4/10/18 4:08 PM, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn wrote:
 I'm thinking of my own container Hashmap having its range
 ByKeyValue requiring one extra word of memory to store the
 iteration count which, in turn, can be used to calculate the
 length of the remaining range. Is this motivated?

 
 That would depend entirely on what you're trying to do, but in general, if a
 range has length, then some algorithms will be more efficient, and some
 algorithms do require length.

e.g. std.array.array is going to pre-allocate an array of the correct 
length and fill it in, vs. appending each element as it gets them from 
the range.

Personally, I would store the length because typically a container range 
is short-lived. It also jives with the container itself which likely has 
O(1) length.

-Steve

Apr 10 2018

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 10 April 2018 at 20:16:18 UTC, Steven Schveighoffer 
wrote:
 e.g. std.array.array is going to pre-allocate an array of the 
 correct length and fill it in, vs. appending each element as it 
 gets them from the range.

 Personally, I would store the length because typically a 
 container range is short-lived. It also jives with the 
 container itself which likely has O(1) length.

Thanks, that's what I thought too.

Apr 10 2018

Cym13 <cpicard openmailbox.org> writes:

On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via 
 Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?

 No.

 If so, in which cases is a length property an advantage or 
 a requirement?

 Just provide it whenever it is cheap to do so. If you need 
 to do complex calculations or especially loop over contents 
 to figure out the length, do NOT provide it.

 But if it is as simple as returning some value, provide it 
 and algorithms can take advantage of it for optimizations 
 etc. as needed.

 I'm thinking of my own container Hashmap having its range 
 ByKeyValue requiring one extra word of memory to store the 
 iteration count which, in turn, can be used to calculate the 
 length of the remaining range. Is this motivated?

 That would depend entirely on what you're trying to do, but in 
 general, if a range has length, then some algorithms will be 
 more efficient, and some algorithms do require length. So, if 
 you can provide length, then the range will be more useful, 
 just like a bidirectional range can be more useful than a 
 forward range or a random-access range can be more useful than 
 either. However, if you're not doing anything that ever 
 benefits from it having length, then it doesn't buy you 
 anything. So, it ultimately depends on what you're doing. In a 
 general purpose library, I'd say that it should have length if 
 it can do so in O(1), but if it's just for you, then it may or 
 may not be worth it.

 The other thing to consider is what happens when the container 
 is mutated. I don't think that ranges necessarily behave all 
 that well when an underlying container is mutated, but it is 
 something that has to be considered when dealing with a range 
 over a container. Even if mutating the underlying container 
 doesn't necessarily invalidate a range, maintaining the length 
 in the manner that you're suggesting probably makes it so that 
 it would be invalidated in more cases, since if any elements 
 are added or removed in the portion that was already popped off 
 the range, then the iteration count couldn't be used to 
 calculate the length in the same way anymore. Now, with a hash 
 map, the range is probably fully invalidated when anything gets 
 added or removed anyway, since that probably screws with the 
 order of the elements in the range, but how the range is going 
 to behave when the underlying container is mutated and how 
 having the length property does or doesn't affect that is 
 something that you'll need to consider.

 - Jonathan M Davis

I find that discussion very interesting as I had never considered 
that because of design by introspection having a costly length 
method would lead to unexpected calls by generic algorithms 
making it a disadventage if present.

On the other hand I don't think the end user should have to 
scratch his head to find the length of a range, especially if 
it's not trivial to get (say, O(log n) kind of case). Therefore 
exposing a method in any case seems the best from an API 
perspective.

But to avoid the performance issues mentionned earlier it means 
it should bear a different name (get/setLength comes to mind). I 
believe this is the same kind of issue that lead to having "in" 
for associative arrays but not regular ones. However this also 
leads to less coherent APIs in contradiction with the principle 
of least surprise.

In retrospect since only "unexpected" calls to such methods cause 
the issue I wonder if it wouldn't be best to have an UDA saying 
"Hey, please, this method is costly, if you're a generic template 
performing introspection you should probably not call me". And 
writing that Andrei's work on complexity annotations comes to 
mind. Anyway, I don't think the user should use different names 
just to alleviate an issue on the library side but the 
alternative would be costly to put in place...

Any thoughts?

Apr 10 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, April 10, 2018 22:07:40 Cym13 via Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordl�w via

 Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordl�w wrote:
 Should ranges always provide a length property?

 No.

 If so, in which cases is a length property an advantage or
 a requirement?

 Just provide it whenever it is cheap to do so. If you need
 to do complex calculations or especially loop over contents
 to figure out the length, do NOT provide it.

 But if it is as simple as returning some value, provide it
 and algorithms can take advantage of it for optimizations
 etc. as needed.

 I'm thinking of my own container Hashmap having its range
 ByKeyValue requiring one extra word of memory to store the
 iteration count which, in turn, can be used to calculate the
 length of the remaining range. Is this motivated?

 That would depend entirely on what you're trying to do, but in
 general, if a range has length, then some algorithms will be
 more efficient, and some algorithms do require length. So, if
 you can provide length, then the range will be more useful,
 just like a bidirectional range can be more useful than a
 forward range or a random-access range can be more useful than
 either. However, if you're not doing anything that ever
 benefits from it having length, then it doesn't buy you
 anything. So, it ultimately depends on what you're doing. In a
 general purpose library, I'd say that it should have length if
 it can do so in O(1), but if it's just for you, then it may or
 may not be worth it.

 The other thing to consider is what happens when the container
 is mutated. I don't think that ranges necessarily behave all
 that well when an underlying container is mutated, but it is
 something that has to be considered when dealing with a range
 over a container. Even if mutating the underlying container
 doesn't necessarily invalidate a range, maintaining the length
 in the manner that you're suggesting probably makes it so that
 it would be invalidated in more cases, since if any elements
 are added or removed in the portion that was already popped off
 the range, then the iteration count couldn't be used to
 calculate the length in the same way anymore. Now, with a hash
 map, the range is probably fully invalidated when anything gets
 added or removed anyway, since that probably screws with the
 order of the elements in the range, but how the range is going
 to behave when the underlying container is mutated and how
 having the length property does or doesn't affect that is
 something that you'll need to consider.

 - Jonathan M Davis

 I find that discussion very interesting as I had never considered
 that because of design by introspection having a costly length
 method would lead to unexpected calls by generic algorithms
 making it a disadventage if present.

 On the other hand I don't think the end user should have to
 scratch his head to find the length of a range, especially if
 it's not trivial to get (say, O(log n) kind of case). Therefore
 exposing a method in any case seems the best from an API
 perspective.

 But to avoid the performance issues mentionned earlier it means
 it should bear a different name (get/setLength comes to mind). I
 believe this is the same kind of issue that lead to having "in"
 for associative arrays but not regular ones. However this also
 leads to less coherent APIs in contradiction with the principle
 of least surprise.

 In retrospect since only "unexpected" calls to such methods cause
 the issue I wonder if it wouldn't be best to have an UDA saying
 "Hey, please, this method is costly, if you're a generic template
 performing introspection you should probably not call me". And
 writing that Andrei's work on complexity annotations comes to
 mind. Anyway, I don't think the user should use different names
 just to alleviate an issue on the library side but the
 alternative would be costly to put in place...

 Any thoughts?

In general, if you care about efficient code, it becomes critical that
anything that's going to be used in generic code has well-known Big-O
complexity, which is why C++ cares about that sort of thing with its
containers and why Andrei specifically put the Big-O complexity of all of
the generic container operations in std.container. And that extends to
ranges. If ranges are implemented without any care about the algorithmic
complexity of the operations, then performance is ultimately going to tank.
And in the case of ranges, it's really not all that hard to understand the
algorithmic complexity, because pretty much all of the range API functions
are supposed to be O(1). Certainly, having any of them be O(n) would be a
disaster.

Ranges can expose other functions if they want to, but they're not part of
the range API and thus would not be useful in any code that wasn't written
specifically for that range type. And honestly, at some point, it makes more
sense to just allocate a dynamic array and get the full range API than to
try and find ways to have inefficient versions of any of the functions in
the range API.

In the case of length, we have walkLength. So, anything that can't have an
efficient length can have its length obtained using walkLength, and any code
that doesn't care whether getting the length is efficient can call
walkLength rather than ever dealing with length, since walkLength will use
length if it's present. So, I don't think that it makes any sense to try and
implement alternate function names for length. That problem has already been
solved.

As for in, canFind and find solve the problem.

- Jonathan M Davis

Apr 10 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Apr 10, 2018 at 10:07:40PM +0000, Cym13 via Digitalmars-d-learn wrote:
[...]
 On the other hand I don't think the end user should have to scratch
 his head to find the length of a range, especially if it's not trivial
 to get (say, O(log n) kind of case). Therefore exposing a method in
 any case seems the best from an API perspective.
 
 But to avoid the performance issues mentionned earlier it means it
 should bear a different name (get/setLength comes to mind). I believe
 this is the same kind of issue that lead to having "in" for
 associative arrays but not regular ones. However this also leads to
 less coherent APIs in contradiction with the principle of least
 surprise.

[...]

I've run into this in my own code, and I've been using `walkLength` as
the name of the method, just for consistency with Phobos' walkLength.
I'm not 100% sure this is a good idea, since overloading Phobos names
can sometimes lead to annoying symbol conflict situations.  But the one
good thing is that you won't forget what it's called because it's so
familiar.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein

Apr 10 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 4/10/18 6:07 PM, Cym13 wrote:
 On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn 
 wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?

 No.

 If so, in which cases is a length property an advantage or >> a 


 requirement?
 Just provide it whenever it is cheap to do so. If you need > to do 

 complex calculations or especially loop over contents > to figure out 
 the length, do NOT provide it.
 But if it is as simple as returning some value, provide it > and 

 algorithms can take advantage of it for optimizations > etc. as needed.

 I'm thinking of my own container Hashmap having its range ByKeyValue 
 requiring one extra word of memory to store the iteration count 
 which, in turn, can be used to calculate the length of the remaining 
 range. Is this motivated?

 That would depend entirely on what you're trying to do, but in 
 general, if a range has length, then some algorithms will be more 
 efficient, and some algorithms do require length. So, if you can 
 provide length, then the range will be more useful, just like a 
 bidirectional range can be more useful than a forward range or a 
 random-access range can be more useful than either. However, if you're 
 not doing anything that ever benefits from it having length, then it 
 doesn't buy you anything. So, it ultimately depends on what you're 
 doing. In a general purpose library, I'd say that it should have 
 length if it can do so in O(1), but if it's just for you, then it may 
 or may not be worth it.

 The other thing to consider is what happens when the container is 
 mutated. I don't think that ranges necessarily behave all that well 
 when an underlying container is mutated, but it is something that has 
 to be considered when dealing with a range over a container. Even if 
 mutating the underlying container doesn't necessarily invalidate a 
 range, maintaining the length in the manner that you're suggesting 
 probably makes it so that it would be invalidated in more cases, since 
 if any elements are added or removed in the portion that was already 
 popped off the range, then the iteration count couldn't be used to 
 calculate the length in the same way anymore. Now, with a hash map, 
 the range is probably fully invalidated when anything gets added or 
 removed anyway, since that probably screws with the order of the 
 elements in the range, but how the range is going to behave when the 
 underlying container is mutated and how having the length property 
 does or doesn't affect that is something that you'll need to consider.

 - Jonathan M Davis

 
 I find that discussion very interesting as I had never considered that 
 because of design by introspection having a costly length method would 
 lead to unexpected calls by generic algorithms making it a disadventage 
 if present.
 
 On the other hand I don't think the end user should have to scratch his 
 head to find the length of a range, especially if it's not trivial to 
 get (say, O(log n) kind of case). Therefore exposing a method in any 
 case seems the best from an API perspective.

O(lg n) is fine for .length, it doesn't need to be O(1). It just can't 
be O(n). I think we established that "fast" operations are O(lg n) or 
better.

That being said, I don't know of a use case where you can get the length 
in O(lg n). It's usually O(1) or O(n).

 But to avoid the performance issues mentionned earlier it means it 
 should bear a different name (get/setLength comes to mind). I believe 
 this is the same kind of issue that lead to having "in" for associative 
 arrays but not regular ones. However this also leads to less coherent 
 APIs in contradiction with the principle of least surprise.

It's definitely a tradeoff. It pushes some implementation details to the 
user, but it also makes the runtime complexity more predictable.

 In retrospect since only "unexpected" calls to such methods cause the 
 issue I wonder if it wouldn't be best to have an UDA saying "Hey, 
 please, this method is costly, if you're a generic template performing 
 introspection you should probably not call me". And writing that 
 Andrei's work on complexity annotations comes to mind. Anyway, I don't 
 think the user should use different names just to alleviate an issue on 
 the library side but the alternative would be costly to put in place...

Potentially, but remember at the time length and walkLength were 
conceived, UDA's didn't exist!

Using UDAs would also have the unfortunate side effect of eliminating 
self-documentation. When you see walkLength right now, you know it's 
"slow", when you see length, you know "fast". If you have to look at 
UDAs to figure that out, then reading the code is that much harder.

-Steve

Apr 10 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, April 10, 2018 14:25:52 Nordl�w via Digitalmars-d-learn wrote:
 Should ranges always provide a length property?

 If so, in which cases is a length property an advantage or a
 requirement?

Whether a range has a length property or not is primarily dependent on how
efficient it is to implement length.
https://dlang.org/phobos/std_container.html lists the Big-O complexity of
the various functions and properties that are expected for containers, and
in the case of length, that also applies to ranges. It lists O(log n) as the
Big-O complexity of length, so length should only be implemented if its
Big-O complexity is no worse than O(log n). In most cases, that means
implementing it only if it's O(1) (I think that it's O(log n) rather than
O(1) because of binary trees), and length should certainly never be
implemented if it's O(n). Basically, if you can just return the length
without calculating it, then it makes sense to implement it, but if you have
to calculate it, then in most cases, it shouldn't be there. Range-based
functions are going to expect length to be very cheap to call if it is
present, and any function that needs to ascertain the length of a range even
if it doesn't have length will call walkLength (which returns length if
present and iterates the entire range to count it if it's not). As iterating
the entire range to count it, is O(n), most algorithms won't do that are
more likely to require length if they need to know how many elements there
are in the range, but that depends on what they're doing.

A function that requires a length property or which has a path optimized for
ranges that have a length property checks for that with
std.range.primitives.hasLength. Random access ranges are required to either
be infinite or provide length, but for other range types, it's optional and
must be checked for if an algorithm is going to use it.

And actually, a large percentage of ranges are lazy, in which case, it's
pretty rare that they can have a length property, because you usually have
no way of knowing how long they're going to be without actually iterating
through the range. So, while it's not uncommon for a range to define length,
it's very common that they don't.

- Jonathan M Davis

Apr 10 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Range length property