digitalmars.D - DList and SList missing length property?

Paulo Pinto (12/12) May 28 2013 Hi,

Jonathan M Davis (25/39) May 28 2013 It never says that. It lists what complexity each operation must have in...

Paulo Pinto (9/48) May 28 2013 Ok, maybe the documentation should explicitly state to which containers
monarch_dodra (19/34) May 28 2013 SList can be spliced, but the semantics are a bit more

Jonathan M Davis (11/14) May 28 2013 I do remember hearing something about that, though that will actually re...

Steven Schveighoffer (11/30) May 28 2013 I actually had performance bugs the OTHER way -- I needed the length of ...

monarch_dodra (6/42) May 28 2013 A proper implementation should be able to track length anyway:

Steven Schveighoffer (4/8) May 28 2013 I'm not sure how that works, can you explain/have a link?

monarch_dodra (39/48) May 29 2013 Well, the basic idea is to give your list an "m_size" member.

Steven Schveighoffer (8/29) May 29 2013 This is O(n) length, not amortized O(1) length, as it is highly dependen...

monarch_dodra (16/50) May 29 2013 That's why I said "amortized" with quotes. Not the exact term, I

Steven Schveighoffer (4/8) May 30 2013 That is true, and a good point.

monarch_dodra (16/21) May 28 2013 I think having two lists in list would be overkill.

Paulo Pinto <pjmlp progtools.org> writes:

Hi,

maybe this is more indicated for d.learn, but I am not sure if
it is not a bug in the documentation/implementation.

According to the phobos documentation, all containers in std.container
should provide a length property.

However it is only available for arrays.

Sure one could use the std.algorithm.count() or std.range.walkLength(), 
but I don't see a reason why SList and DList aren't aware of their 
current size.

Is this on purpose?

--
Paulo

May 28 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, May 28, 2013 20:03:49 Paulo Pinto wrote:
 Hi,
 
 maybe this is more indicated for d.learn, but I am not sure if
 it is not a bug in the documentation/implementation.
 
 According to the phobos documentation, all containers in std.container
 should provide a length property.

It never says that. It lists what complexity each operation must have in order 
for a container to have that operation. It doesn't guarantee that any 
particular operation will be present on all containers (though any that's O(n) 
or worse is pretty much bound to be on all containers unless in does something 
fundamentally incompatible with a particular container's data structure).

 However it is only available for arrays.
 
 Sure one could use the std.algorithm.count() or std.range.walkLength(),
 but I don't see a reason why SList and DList aren't aware of their
 current size.
 
 Is this on purpose?

Yes. It's on purpose. When dealing with lists, because you can insert at any 
point, you have two choices:

1. Make length efficent.

2. Make splicing efficient.

If you keep track of the length, then you have to count all of the elements 
when you splice two lists, so splicing is O(n) instead of O(1). If you don't 
keep track of the length, then splicing is O(1), but then length is O(n), and 
length can't be O(n) per std.container's complexity requirements.

Now, that being said, I don't see a splice operation on either SList or DList 
(and I don't think that splicing an SList can be done in O(1) anyway), so it 
would appear that the implementation is optimizing for an operation that it 
doesn't currently support.

The same situation exists with C++'s std::list type though - size() is O(n) 
and splicing is O(1). However, we opted for not having containers implement 
inefficient operations like that (which C++ mostly did as well, but not in this 
case).

If you want the length of a container which doesn't have a length property, 
then use walkLength on its range - e.g. walkLength(container[]).

- Jonathan M Davis

May 28 2013

Paulo Pinto <pjmlp progtools.org> writes:

Am 28.05.2013 20:16, schrieb Jonathan M Davis:
 On Tuesday, May 28, 2013 20:03:49 Paulo Pinto wrote:
 Hi,

 maybe this is more indicated for d.learn, but I am not sure if
 it is not a bug in the documentation/implementation.

 According to the phobos documentation, all containers in std.container
 should provide a length property.

 It never says that. It lists what complexity each operation must have in order
 for a container to have that operation. It doesn't guarantee that any
 particular operation will be present on all containers (though any that's O(n)
 or worse is pretty much bound to be on all containers unless in does something
 fundamentally incompatible with a particular container's data structure).

 However it is only available for arrays.

 Sure one could use the std.algorithm.count() or std.range.walkLength(),
 but I don't see a reason why SList and DList aren't aware of their
 current size.

 Is this on purpose?

 Yes. It's on purpose. When dealing with lists, because you can insert at any
 point, you have two choices:

 1. Make length efficent.

 2. Make splicing efficient.

 If you keep track of the length, then you have to count all of the elements
 when you splice two lists, so splicing is O(n) instead of O(1). If you don't
 keep track of the length, then splicing is O(1), but then length is O(n), and
 length can't be O(n) per std.container's complexity requirements.

 Now, that being said, I don't see a splice operation on either SList or DList
 (and I don't think that splicing an SList can be done in O(1) anyway), so it
 would appear that the implementation is optimizing for an operation that it
 doesn't currently support.

 The same situation exists with C++'s std::list type though - size() is O(n)
 and splicing is O(1). However, we opted for not having containers implement
 inefficient operations like that (which C++ mostly did as well, but not in this
 case).

 If you want the length of a container which doesn't have a length property,
 then use walkLength on its range - e.g. walkLength(container[]).

 - Jonathan M Davis

Ok, maybe the documentation should explicitly state to which containers 
each operation applies to.

Currently it is a bit confusing to see the complexity listed, but not 
finding the methods.

Thanks for the lengthy explanation, it is actually in sync with what I 
was thinking might be the reason.

--
Paulo

May 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 28 May 2013 at 18:16:48 UTC, Jonathan M Davis wrote:
 Now, that being said, I don't see a splice operation on either 
 SList or DList
 (and I don't think that splicing an SList can be done in O(1) 
 anyway), so it
 would appear that the implementation is optimizing for an 
 operation that it
 doesn't currently support.

SList can be spliced, but the semantics are a bit more 
complicated. In most cases, if you need to splice, you'd use a 
DList anyway: SList is really for particular use cases.

I think that if splice is not in DList, it is only because it is 
not *yet* in DList. I have an implementation ready to submit, 
which adheres to Andrei's proposed container semantics. I have 
not yet submitted it though, because fixing DList is more 
important that inserting more functionality.

It's current semantics are neither value nor ref. It's... 
something else. My fault for inserting it when trying to fix 
previous clusterfuck. I've since submitted fix that gives it 
classic semantics, but it's kind of just sitting there...

We should *really* see to getting it (or something else) pulled 
through.

 The same situation exists with C++'s std::list type though - 
 size() is O(n)
 and splicing is O(1). However, we opted for not having 
 containers implement
 inefficient operations like that (which C++ mostly did as well, 
 but not in this
 case).

 - Jonathan M Davis

That was in C++03. In C++11, length was reduced to 0(1). And 
splice was increased to 0(N) (I think, I don't remember the exact 
details, but it was definitely changed).

I'll look up the exact requirement later tonight.

May 28 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the exact
 details, but it was definitely changed).

I do remember hearing something about that, though that will actually result 
in some nasty performance changes to some of the code out there (including 
code that I've written).

Regardless, you have the choice of making length O(1) or splicing O(1) and not 
both, and C++98/03 made splicing O(1), which I'm inclined to believe is the 
better choice, though having size when it's O(n) was a definite mistake on 
C++'s part IMHO. I hate to think of how many times I've seen programmer's 
write container.size() == 0 instead of empty(), particularly when size() was 
O(n)...

- Jonathan M Davis

May 28 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 May 2013 15:06:53 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the exact
 details, but it was definitely changed).

 I do remember hearing something about that, though that will actually  
 result
 in some nasty performance changes to some of the code out there  
 (including
 code that I've written).

 Regardless, you have the choice of making length O(1) or splicing O(1)  
 and not
 both, and C++98/03 made splicing O(1), which I'm inclined to believe is  
 the
 better choice, though having size when it's O(n) was a definite mistake  
 on
 C++'s part IMHO. I hate to think of how many times I've seen programmer's
 write container.size() == 0 instead of empty(), particularly when size()  
 was
 O(n)...

I actually had performance bugs the OTHER way -- I needed the length of a  
list, and was using .size() without realizing it was O(n).  In profiling  
my code, I found that std::list<T>::iterator::operator++ was the most used  
function, and I had no idea why for a LONG time :)

You know, it is trivial to have an O(1) splice O(n) length list wrapped  
into an o(n) splice O(1) length list.  All you need is to add a tracked  
length.  And this is a REAL pain when you have to do it manually.  There  
should just be two list types in <list>

-Steve

May 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 28 May 2013 at 19:24:05 UTC, Steven Schveighoffer 
wrote:
 On Tue, 28 May 2013 15:06:53 -0400, Jonathan M Davis 
 <jmdavisProg gmx.com> wrote:

 On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the 
 exact
 details, but it was definitely changed).

 I do remember hearing something about that, though that will 
 actually result
 in some nasty performance changes to some of the code out 
 there (including
 code that I've written).

 Regardless, you have the choice of making length O(1) or 
 splicing O(1) and not
 both, and C++98/03 made splicing O(1), which I'm inclined to 
 believe is the
 better choice, though having size when it's O(n) was a 
 definite mistake on
 C++'s part IMHO. I hate to think of how many times I've seen 
 programmer's
 write container.size() == 0 instead of empty(), particularly 
 when size() was
 O(n)...

 I actually had performance bugs the OTHER way -- I needed the 
 length of a list, and was using .size() without realizing it 
 was O(n).  In profiling my code, I found that 
 std::list<T>::iterator::operator++ was the most used function, 
 and I had no idea why for a LONG time :)

 You know, it is trivial to have an O(1) splice O(n) length list 
 wrapped into an o(n) splice O(1) length list.  All you need is 
 to add a tracked length.  And this is a REAL pain when you have 
 to do it manually.  There should just be two list types in 
 <list>

 -Steve

A proper implementation should be able to track length anyway: 
provide 0(1) splice, and an "amortized" 0(1) length.

I've always wondered why the standard didn't decide to do *that*? 
I think *we* should provide that...

May 28 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 A proper implementation should be able to track length anyway: provide  
 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do *that*? I  
 think *we* should provide that...

I'm not sure how that works, can you explain/have a link?

-Steve

May 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer 
wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length anyway: 
 provide 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do 
 *that*? I think *we* should provide that...

 I'm not sure how that works, can you explain/have a link?

 -Steve

Well, the basic idea is to give your list an "m_size" member. 
This starts at 0. Whenever the user does a push_back/pop_back or 
whatnot operation, the the "m_size" attribute gets correctly 
upgraded. At that point, calling "size()" simply returns "m_size".

Now, once a splice operation gets called, the m_size gets reset 
to a magic value, to indicate that tracking of the size has been 
lost. Operations will seize upgrading m_size, until a call to 
size is made, at which point it will be re-calculated, once.

This, I think is the best solution, since it keeps people that 
use length in a safe position, while users of splice are also 
satisfied. Also, I *think* people who use splice tend to be more 
aware of the situation, and avoid calling length entirely. 
Implementation wise, it would mostly look like this:

template <typename T>
class list
{
     size_t m_size = 0;

     size_t push_back(T other)
     {
         //Normal Code

         if ( m_size != std::numeric_limits<size_t>::max() )
             ++m_size;
     }
     size_t splice(list::iterator first, list::iterator last)
     {
         //Normal Code

         //Reset length
         m_size == std::numeric_limits<size_t>::max();
     }

     size_t size() const
     {
         if ( m_size == std::numeric_limits<size_t>::max() )
             //Re-evaluate m_size
             m_size = std::distance( cbegin(), cend() );
         return m_size;
     }
};

May 29 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 29 May 2013 05:15:00 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra  
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length anyway: provide  
 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do *that*? I  
 think *we* should provide that...

 I'm not sure how that works, can you explain/have a link?

 -Steve

 Well, the basic idea is to give your list an "m_size" member. This  
 starts at 0. Whenever the user does a push_back/pop_back or whatnot  
 operation, the the "m_size" attribute gets correctly upgraded. At that  
 point, calling "size()" simply returns "m_size".

 Now, once a splice operation gets called, the m_size gets reset to a  
 magic value, to indicate that tracking of the size has been lost.  
 Operations will seize upgrading m_size, until a call to size is made, at  
 which point it will be re-calculated, once.

This is O(n) length, not amortized O(1) length, as it is highly dependent  
on usage.

For example, in a project I am currently working on, I most frequently am  
using splice to move one list element at a time.  This degrades your  
solution to basically O(n) length for every move, and I move a lot.

-Steve

May 29 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 30 May 2013 at 01:56:03 UTC, Steven Schveighoffer 
wrote:
 On Wed, 29 May 2013 05:15:00 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer 
 wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length 
 anyway: provide 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do 
 *that*? I think *we* should provide that...

 I'm not sure how that works, can you explain/have a link?

 -Steve

 Well, the basic idea is to give your list an "m_size" member. 
 This starts at 0. Whenever the user does a push_back/pop_back 
 or whatnot operation, the the "m_size" attribute gets 
 correctly upgraded. At that point, calling "size()" simply 
 returns "m_size".

 Now, once a splice operation gets called, the m_size gets 
 reset to a magic value, to indicate that tracking of the size 
 has been lost. Operations will seize upgrading m_size, until a 
 call to size is made, at which point it will be re-calculated, 
 once.

 This is O(n) length, not amortized O(1) length, as it is highly 
 dependent on usage.

That's why I said "amortized" with quotes. Not the exact term, I 
know. I should have been clearer. "You can get 0(1) length, 
except in certain situations, where you'll have to pay once an 
0(n) to get back to 0(1)".

And yeah, it is dependent on usage. Like vector's push_back. 
(although incidentally *that* is true amortized)

 For example, in a project I am currently working on, I most 
 frequently am using splice to move one list element at a time.  
 This degrades your solution to basically O(n) length for every 
 move, and I move a lot.

http://www.cplusplus.com/reference/list/list/splice/

Note that 1 element splice (2) is not a problem. It has always 
been 0(1), regardless of scheme. In my scheme, it wouldn't 
invalidate legnth.

Only [iterator .. iterator] splice (3) is is problematic.

Full list splice (1) status is more complex. It is actually 0(1) 
in the scheme where length is 0(1). With my scheme, the 
guarantees depends on where you want to place the best compromise.

May 29 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 30 May 2013 02:09:41 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:


 http://www.cplusplus.com/reference/list/list/splice/

 Note that 1 element splice (2) is not a problem. It has always been  
 0(1), regardless of scheme. In my scheme, it wouldn't invalidate legnth.

 Only [iterator .. iterator] splice (3) is is problematic.

That is true, and a good point.

-Steve

May 30 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 28 May 2013 at 19:24:05 UTC, Steven Schveighoffer 
wrote:
 You know, it is trivial to have an O(1) splice O(n) length list 
 wrapped into an o(n) splice O(1) length list.  All you need is 
 to add a tracked length.  And this is a REAL pain when you have 
 to do it manually.  There should just be two list types in 
 <list>

I think having two lists in list would be overkill.

I *do* find the standard's choice odd though: The *primary* 
reason I ever use list, ever, is for its splice ability to move 
elements without copying them, and still having their addresses 
valid. No other container can do that.

I don't get why they borked list's primary strength for a 
function you really shouldn't be using anyway. Worst, as you say, 
it is easy to implement one in terms of the other... but not the 
other way around. By doing what they just did, they effectively 
closed *any* way to splice in 0(1)...

If you don't need splice, then most of the time, deque is a 
better choice of a container, and *that* has 0(1) length (along 
with a few neat properties).

All this really doesn't make sense to me.

May 28 2013

D Programming

C/C++ Programming

Other

digitalmars.D - DList and SList missing length property?