www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Range length property

reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Should ranges always provide a length property?

If so, in which cases is a length property an advantage or a 
requirement?
Apr 10 2018
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or a 
 requirement?
Just provide it whenever it is cheap to do so. If you need to do complex calculations or especially loop over contents to figure out the length, do NOT provide it. But if it is as simple as returning some value, provide it and algorithms can take advantage of it for optimizations etc. as needed.
Apr 10 2018
parent reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or a 
 requirement?
Just provide it whenever it is cheap to do so. If you need to do complex calculations or especially loop over contents to figure out the length, do NOT provide it. But if it is as simple as returning some value, provide it and algorithms can take advantage of it for optimizations etc. as needed.
I'm thinking of my own container Hashmap having its range ByKeyValue requiring one extra word of memory to store the iteration count which, in turn, can be used to calculate the length of the remaining range. Is this motivated?
Apr 10 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or a
 requirement?
Just provide it whenever it is cheap to do so. If you need to do complex calculations or especially loop over contents to figure out the length, do NOT provide it. But if it is as simple as returning some value, provide it and algorithms can take advantage of it for optimizations etc. as needed.
I'm thinking of my own container Hashmap having its range ByKeyValue requiring one extra word of memory to store the iteration count which, in turn, can be used to calculate the length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in general, if a range has length, then some algorithms will be more efficient, and some algorithms do require length. So, if you can provide length, then the range will be more useful, just like a bidirectional range can be more useful than a forward range or a random-access range can be more useful than either. However, if you're not doing anything that ever benefits from it having length, then it doesn't buy you anything. So, it ultimately depends on what you're doing. In a general purpose library, I'd say that it should have length if it can do so in O(1), but if it's just for you, then it may or may not be worth it. The other thing to consider is what happens when the container is mutated. I don't think that ranges necessarily behave all that well when an underlying container is mutated, but it is something that has to be considered when dealing with a range over a container. Even if mutating the underlying container doesn't necessarily invalidate a range, maintaining the length in the manner that you're suggesting probably makes it so that it would be invalidated in more cases, since if any elements are added or removed in the portion that was already popped off the range, then the iteration count couldn't be used to calculate the length in the same way anymore. Now, with a hash map, the range is probably fully invalidated when anything gets added or removed anyway, since that probably screws with the order of the elements in the range, but how the range is going to behave when the underlying container is mutated and how having the length property does or doesn't affect that is something that you'll need to consider. - Jonathan M Davis
Apr 10 2018
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/10/18 4:08 PM, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn wrote:
 I'm thinking of my own container Hashmap having its range
 ByKeyValue requiring one extra word of memory to store the
 iteration count which, in turn, can be used to calculate the
 length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in general, if a range has length, then some algorithms will be more efficient, and some algorithms do require length.
e.g. std.array.array is going to pre-allocate an array of the correct length and fill it in, vs. appending each element as it gets them from the range. Personally, I would store the length because typically a container range is short-lived. It also jives with the container itself which likely has O(1) length. -Steve
Apr 10 2018
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 10 April 2018 at 20:16:18 UTC, Steven Schveighoffer 
wrote:
 e.g. std.array.array is going to pre-allocate an array of the 
 correct length and fill it in, vs. appending each element as it 
 gets them from the range.

 Personally, I would store the length because typically a 
 container range is short-lived. It also jives with the 
 container itself which likely has O(1) length.
Thanks, that's what I thought too.
Apr 10 2018
prev sibling parent reply Cym13 <cpicard openmailbox.org> writes:
On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via 
 Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or 
 a requirement?
Just provide it whenever it is cheap to do so. If you need to do complex calculations or especially loop over contents to figure out the length, do NOT provide it. But if it is as simple as returning some value, provide it and algorithms can take advantage of it for optimizations etc. as needed.
I'm thinking of my own container Hashmap having its range ByKeyValue requiring one extra word of memory to store the iteration count which, in turn, can be used to calculate the length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in general, if a range has length, then some algorithms will be more efficient, and some algorithms do require length. So, if you can provide length, then the range will be more useful, just like a bidirectional range can be more useful than a forward range or a random-access range can be more useful than either. However, if you're not doing anything that ever benefits from it having length, then it doesn't buy you anything. So, it ultimately depends on what you're doing. In a general purpose library, I'd say that it should have length if it can do so in O(1), but if it's just for you, then it may or may not be worth it. The other thing to consider is what happens when the container is mutated. I don't think that ranges necessarily behave all that well when an underlying container is mutated, but it is something that has to be considered when dealing with a range over a container. Even if mutating the underlying container doesn't necessarily invalidate a range, maintaining the length in the manner that you're suggesting probably makes it so that it would be invalidated in more cases, since if any elements are added or removed in the portion that was already popped off the range, then the iteration count couldn't be used to calculate the length in the same way anymore. Now, with a hash map, the range is probably fully invalidated when anything gets added or removed anyway, since that probably screws with the order of the elements in the range, but how the range is going to behave when the underlying container is mutated and how having the length property does or doesn't affect that is something that you'll need to consider. - Jonathan M Davis
I find that discussion very interesting as I had never considered that because of design by introspection having a costly length method would lead to unexpected calls by generic algorithms making it a disadventage if present. On the other hand I don't think the end user should have to scratch his head to find the length of a range, especially if it's not trivial to get (say, O(log n) kind of case). Therefore exposing a method in any case seems the best from an API perspective. But to avoid the performance issues mentionned earlier it means it should bear a different name (get/setLength comes to mind). I believe this is the same kind of issue that lead to having "in" for associative arrays but not regular ones. However this also leads to less coherent APIs in contradiction with the principle of least surprise. In retrospect since only "unexpected" calls to such methods cause the issue I wonder if it wouldn't be best to have an UDA saying "Hey, please, this method is costly, if you're a generic template performing introspection you should probably not call me". And writing that Andrei's work on complexity annotations comes to mind. Anyway, I don't think the user should use different names just to alleviate an issue on the library side but the alternative would be costly to put in place... Any thoughts?
Apr 10 2018
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, April 10, 2018 22:07:40 Cym13 via Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via

 Digitalmars-d-learn wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or
 a requirement?
Just provide it whenever it is cheap to do so. If you need to do complex calculations or especially loop over contents to figure out the length, do NOT provide it. But if it is as simple as returning some value, provide it and algorithms can take advantage of it for optimizations etc. as needed.
I'm thinking of my own container Hashmap having its range ByKeyValue requiring one extra word of memory to store the iteration count which, in turn, can be used to calculate the length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in general, if a range has length, then some algorithms will be more efficient, and some algorithms do require length. So, if you can provide length, then the range will be more useful, just like a bidirectional range can be more useful than a forward range or a random-access range can be more useful than either. However, if you're not doing anything that ever benefits from it having length, then it doesn't buy you anything. So, it ultimately depends on what you're doing. In a general purpose library, I'd say that it should have length if it can do so in O(1), but if it's just for you, then it may or may not be worth it. The other thing to consider is what happens when the container is mutated. I don't think that ranges necessarily behave all that well when an underlying container is mutated, but it is something that has to be considered when dealing with a range over a container. Even if mutating the underlying container doesn't necessarily invalidate a range, maintaining the length in the manner that you're suggesting probably makes it so that it would be invalidated in more cases, since if any elements are added or removed in the portion that was already popped off the range, then the iteration count couldn't be used to calculate the length in the same way anymore. Now, with a hash map, the range is probably fully invalidated when anything gets added or removed anyway, since that probably screws with the order of the elements in the range, but how the range is going to behave when the underlying container is mutated and how having the length property does or doesn't affect that is something that you'll need to consider. - Jonathan M Davis
I find that discussion very interesting as I had never considered that because of design by introspection having a costly length method would lead to unexpected calls by generic algorithms making it a disadventage if present. On the other hand I don't think the end user should have to scratch his head to find the length of a range, especially if it's not trivial to get (say, O(log n) kind of case). Therefore exposing a method in any case seems the best from an API perspective. But to avoid the performance issues mentionned earlier it means it should bear a different name (get/setLength comes to mind). I believe this is the same kind of issue that lead to having "in" for associative arrays but not regular ones. However this also leads to less coherent APIs in contradiction with the principle of least surprise. In retrospect since only "unexpected" calls to such methods cause the issue I wonder if it wouldn't be best to have an UDA saying "Hey, please, this method is costly, if you're a generic template performing introspection you should probably not call me". And writing that Andrei's work on complexity annotations comes to mind. Anyway, I don't think the user should use different names just to alleviate an issue on the library side but the alternative would be costly to put in place... Any thoughts?
In general, if you care about efficient code, it becomes critical that anything that's going to be used in generic code has well-known Big-O complexity, which is why C++ cares about that sort of thing with its containers and why Andrei specifically put the Big-O complexity of all of the generic container operations in std.container. And that extends to ranges. If ranges are implemented without any care about the algorithmic complexity of the operations, then performance is ultimately going to tank. And in the case of ranges, it's really not all that hard to understand the algorithmic complexity, because pretty much all of the range API functions are supposed to be O(1). Certainly, having any of them be O(n) would be a disaster. Ranges can expose other functions if they want to, but they're not part of the range API and thus would not be useful in any code that wasn't written specifically for that range type. And honestly, at some point, it makes more sense to just allocate a dynamic array and get the full range API than to try and find ways to have inefficient versions of any of the functions in the range API. In the case of length, we have walkLength. So, anything that can't have an efficient length can have its length obtained using walkLength, and any code that doesn't care whether getting the length is efficient can call walkLength rather than ever dealing with length, since walkLength will use length if it's present. So, I don't think that it makes any sense to try and implement alternate function names for length. That problem has already been solved. As for in, canFind and find solve the problem. - Jonathan M Davis
Apr 10 2018
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 10, 2018 at 10:07:40PM +0000, Cym13 via Digitalmars-d-learn wrote:
[...]
 On the other hand I don't think the end user should have to scratch
 his head to find the length of a range, especially if it's not trivial
 to get (say, O(log n) kind of case). Therefore exposing a method in
 any case seems the best from an API perspective.
 
 But to avoid the performance issues mentionned earlier it means it
 should bear a different name (get/setLength comes to mind). I believe
 this is the same kind of issue that lead to having "in" for
 associative arrays but not regular ones. However this also leads to
 less coherent APIs in contradiction with the principle of least
 surprise.
[...] I've run into this in my own code, and I've been using `walkLength` as the name of the method, just for consistency with Phobos' walkLength. I'm not 100% sure this is a good idea, since overloading Phobos names can sometimes lead to annoying symbol conflict situations. But the one good thing is that you won't forget what it's called because it's so familiar. T -- An elephant: A mouse built to government specifications. -- Robert Heinlein
Apr 10 2018
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/10/18 6:07 PM, Cym13 wrote:
 On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
 On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn 
 wrote:
 On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
 On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
 Should ranges always provide a length property?
No.
 If so, in which cases is a length property an advantage or >> a 
requirement?
 Just provide it whenever it is cheap to do so. If you need > to do 
complex calculations or especially loop over contents > to figure out the length, do NOT provide it.
 But if it is as simple as returning some value, provide it > and 
algorithms can take advantage of it for optimizations > etc. as needed. I'm thinking of my own container Hashmap having its range ByKeyValue requiring one extra word of memory to store the iteration count which, in turn, can be used to calculate the length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in general, if a range has length, then some algorithms will be more efficient, and some algorithms do require length. So, if you can provide length, then the range will be more useful, just like a bidirectional range can be more useful than a forward range or a random-access range can be more useful than either. However, if you're not doing anything that ever benefits from it having length, then it doesn't buy you anything. So, it ultimately depends on what you're doing. In a general purpose library, I'd say that it should have length if it can do so in O(1), but if it's just for you, then it may or may not be worth it. The other thing to consider is what happens when the container is mutated. I don't think that ranges necessarily behave all that well when an underlying container is mutated, but it is something that has to be considered when dealing with a range over a container. Even if mutating the underlying container doesn't necessarily invalidate a range, maintaining the length in the manner that you're suggesting probably makes it so that it would be invalidated in more cases, since if any elements are added or removed in the portion that was already popped off the range, then the iteration count couldn't be used to calculate the length in the same way anymore. Now, with a hash map, the range is probably fully invalidated when anything gets added or removed anyway, since that probably screws with the order of the elements in the range, but how the range is going to behave when the underlying container is mutated and how having the length property does or doesn't affect that is something that you'll need to consider. - Jonathan M Davis
I find that discussion very interesting as I had never considered that because of design by introspection having a costly length method would lead to unexpected calls by generic algorithms making it a disadventage if present. On the other hand I don't think the end user should have to scratch his head to find the length of a range, especially if it's not trivial to get (say, O(log n) kind of case). Therefore exposing a method in any case seems the best from an API perspective.
O(lg n) is fine for .length, it doesn't need to be O(1). It just can't be O(n). I think we established that "fast" operations are O(lg n) or better. That being said, I don't know of a use case where you can get the length in O(lg n). It's usually O(1) or O(n).
 But to avoid the performance issues mentionned earlier it means it 
 should bear a different name (get/setLength comes to mind). I believe 
 this is the same kind of issue that lead to having "in" for associative 
 arrays but not regular ones. However this also leads to less coherent 
 APIs in contradiction with the principle of least surprise.
It's definitely a tradeoff. It pushes some implementation details to the user, but it also makes the runtime complexity more predictable.
 In retrospect since only "unexpected" calls to such methods cause the 
 issue I wonder if it wouldn't be best to have an UDA saying "Hey, 
 please, this method is costly, if you're a generic template performing 
 introspection you should probably not call me". And writing that 
 Andrei's work on complexity annotations comes to mind. Anyway, I don't 
 think the user should use different names just to alleviate an issue on 
 the library side but the alternative would be costly to put in place...
Potentially, but remember at the time length and walkLength were conceived, UDA's didn't exist! Using UDAs would also have the unfortunate side effect of eliminating self-documentation. When you see walkLength right now, you know it's "slow", when you see length, you know "fast". If you have to look at UDAs to figure that out, then reading the code is that much harder. -Steve
Apr 10 2018
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, April 10, 2018 14:25:52 Nordlöw via Digitalmars-d-learn wrote:
 Should ranges always provide a length property?

 If so, in which cases is a length property an advantage or a
 requirement?
Whether a range has a length property or not is primarily dependent on how efficient it is to implement length. https://dlang.org/phobos/std_container.html lists the Big-O complexity of the various functions and properties that are expected for containers, and in the case of length, that also applies to ranges. It lists O(log n) as the Big-O complexity of length, so length should only be implemented if its Big-O complexity is no worse than O(log n). In most cases, that means implementing it only if it's O(1) (I think that it's O(log n) rather than O(1) because of binary trees), and length should certainly never be implemented if it's O(n). Basically, if you can just return the length without calculating it, then it makes sense to implement it, but if you have to calculate it, then in most cases, it shouldn't be there. Range-based functions are going to expect length to be very cheap to call if it is present, and any function that needs to ascertain the length of a range even if it doesn't have length will call walkLength (which returns length if present and iterates the entire range to count it if it's not). As iterating the entire range to count it, is O(n), most algorithms won't do that are more likely to require length if they need to know how many elements there are in the range, but that depends on what they're doing. A function that requires a length property or which has a path optimized for ranges that have a length property checks for that with std.range.primitives.hasLength. Random access ranges are required to either be infinite or provide length, but for other range types, it's optional and must be checked for if an algorithm is going to use it. And actually, a large percentage of ranges are lazy, in which case, it's pretty rare that they can have a length property, because you usually have no way of knowing how long they're going to be without actually iterating through the range. So, while it's not uncommon for a range to define length, it's very common that they don't. - Jonathan M Davis
Apr 10 2018