digitalmars.D - Complexity nomenclature

Andrei Alexandrescu (14/14) Dec 03 2015 Consider the collections universe. So we have an imperative primitive li...

Jack Stouffer (6/8) Dec 03 2015 I don't see why. IMO, names should convey what the function does,

Andrei Alexandrescu (3/9) Dec 03 2015 Complexity is part of the semantics, and in my design names and their

Timon Gehr (5/15) Dec 03 2015 Which is sensible. But in any given context, some parts of the semantics...

Andrei Alexandrescu (6/23) Dec 03 2015 Sure, and there will be provision for that. The collections framework

Timon Gehr (6/30) Dec 03 2015 This only works if the module imports std.container. Also, I might be

Andrei Alexandrescu (7/14) Dec 03 2015 Hmmmm... right, no provision for that right now. Need to think about it....

Andrea Fontana (5/11) Dec 04 2015 I'm agree. It sounds like a bad idea.

Andrei Alexandrescu (4/7) Dec 04 2015 Of course. The old name will still work because, remember, there's a

Timon Gehr (7/21) Dec 03 2015 Regarding nomenclature, it would be awesome if we could call this

Andrei Alexandrescu (2/4) Dec 03 2015 Instant understanding by everyone. -- Andrei

Timon Gehr (2/6) Dec 03 2015 Well, no. "running time" is actually more descriptive.

Andrei Alexandrescu (3/10) Dec 03 2015 http://en.cppreference.com/w/cpp/container/forward_list/insert_after

Timon Gehr (2/13) Dec 03 2015 Yes, I know it is popular.
Timon Gehr (2/13) Dec 03 2015 http://www.merriam-webster.com/dictionary/literally

Xinok (10/17) Dec 03 2015 I've seen you mention this before, and while you may be correct,

Timon Gehr (8/25) Dec 03 2015 I understand what is meant, but it is painful, much like when somebody

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) Dec 04 2015 It will make static arrays look very good. All operations are

Idan Arye (8/25) Dec 03 2015 The complexities of the operations is a property of the data

Andrei Alexandrescu (3/8) Dec 03 2015 Your premise is right but you reach the negation of the correct

Jack Stouffer (5/18) Dec 03 2015 How so? If a singly linked list and a doubly linked list have two

Andrei Alexandrescu (12/25) Dec 03 2015 Took me a while to figure. There's a hierarchy of operations, e.g. if a

Tofu Ninja (10/46) Dec 03 2015 This sounds really complicated to use? What are the benefits?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 04 2015 Yes, but can you export it reliably in generically composed
Jakob Ovrum (9/16) Dec 04 2015 A generic algorithm that uses exponential time when given the

Andrei Alexandrescu (2/17) Dec 04 2015 WORD! -- Andrei

ZombineDev (18/68) Dec 04 2015 Ranges are a good example - they provide only the operations thay
Andrei Alexandrescu (2/4) Dec 04 2015 Great idea, will keep it in mind. Thanks! -- Andrei

Jonathan M Davis (16/21) Dec 04 2015 Yeah. If the primary reason to have stable and linear and whatnot

Andrei Alexandrescu (2/4) Dec 04 2015 Only always :o). -- Andrei
Andrei Alexandrescu (10/14) Dec 04 2015 Well look at what the cat dragged in:

ref2401 (3/13) Dec 04 2015 Wow! It looks really great.
Jack Stouffer (4/6) Dec 04 2015 This is a great compromise; I (and everyone else probably) would
Andrei Alexandrescu (34/41) Dec 04 2015 Following up on this, I thought I'd define a little algebra for

Andrei Alexandrescu (5/16) Dec 04 2015 The question mark at log * linlog is spurious; should be replaced with
Timon Gehr (30/46) Dec 04 2015 I think that the exponents matter. E.g. n^1.5 vs n^2.

Andrei Alexandrescu (4/65) Dec 04 2015 Nice, I'll continue with this. Thanks! I won't use expBase because no

Timon Gehr (7/27) Dec 04 2015 This is probably sensible in the context of std.container.

Andrei Alexandrescu (9/12) Dec 04 2015 I'm thinking we could add complexity annotations to range functions. For...

Timon Gehr (6/19) Dec 04 2015 If only products should be expressible, this would maybe be reasonable

Timon Gehr (2/4) Dec 04 2015 But I think not. O(n+m) is common.
Andrei Alexandrescu (12/14) Dec 05 2015 That's actually pretty neat and easy to work around like this:

BLM768 (17/28) Dec 05 2015 Well, we could see how CAS libraries handle this kind of stuff

H. S. Teoh via Digitalmars-d (15/48) Dec 05 2015 Multi-term complexities arise from trivial graph algorithms. They are

BLM768 (19/31) Dec 05 2015 True. I don't expect that very many people would want to specify

Timon Gehr (2/4) Dec 05 2015 log(x^2) = 2 log x.

BLM768 (2/3) Dec 05 2015 Why do log rules have to make everything so simple? ;)

Andrei Alexandrescu (11/25) Dec 05 2015 Well you'd need multiple terms if you want to say things like,

Timon Gehr (2/8) Dec 05 2015 Computer Algebra System, I assume.
tsbockman (3/9) Dec 05 2015 https://en.wikipedia.org/wiki/Computer_algebra_system

Timon Gehr (12/28) Dec 05 2015 Some upper bounds are incomparable, so there would not be a total order....

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Dec 05 2015 It is a problem in all cases as you usually dont have an optimal

Timon Gehr (13/22) Dec 06 2015 Are you complaining that the given implementation does not support

Timon Gehr (5/6) Dec 06 2015 Hm, this does not actually work too well. E.g., we want
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/21) Dec 06 2015 I am saying that comparing bounds is not the same as comparing

Timon Gehr (3/8) Dec 06 2015 Yes, that's what you said later down the post. It's completely unrelated...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/10) Dec 06 2015 i would assume that there would have to be a usecase for

Andrei Alexandrescu (9/49) Dec 06 2015 Brilliant! My wife has a work emergency so I've been with the kids all d...

Timon Gehr (22/42) Dec 06 2015 The implementation of power on BigO given does not actually work in

Andrei Alexandrescu (27/55) Dec 06 2015 No matter, you may always use runtime assertions - after all it's all

Timon Gehr (14/46) Dec 07 2015 Even better: I was wrong when I claimed it does not work.

Andrei Alexandrescu (14/20) Dec 07 2015 I fail to see how no parens after log or "^" in lieu "^^" would make a

Timon Gehr (13/29) Dec 07 2015 Neither have I attempted to show anything like that. All arguments in

Andrei Alexandrescu (9/21) Dec 07 2015 These are still expressible without a DSL:

Timon Gehr (21/48) Dec 07 2015 This just goes from one string to two strings and adds some noise on top...
ZombineDev (9/11) Dec 08 2015 We can remove the use of strings if we tag walkLength with BigO:
Jonny (116/141) Dec 10 2015 There are many "operations" that can be carried out on methods.

Timon Gehr (7/18) Dec 06 2015 I have noticed another thing. The comparison operator is an

Andrei Alexandrescu (2/20) Dec 06 2015 I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei

Timon Gehr (3/26) Dec 07 2015 Yes, certainly. By complete, I mean it orders everything that can be

=?UTF-8?B?Tm9yZGzDtnc=?= (6/8) Mar 17 2016 What about a compile-time expression tree similar to how, for

H. S. Teoh via Digitalmars-d (68/107) Dec 04 2015 [...]

Andrei Alexandrescu (6/6) Dec 04 2015 I'll get back to this (on the phone) but this is incorrect:

H. S. Teoh via Digitalmars-d (15/21) Dec 04 2015 OK, I guess by poly you mean n^k for k>1. I was lumping everything

Dmitry Olshansky (5/19) Dec 04 2015 Was vaguely terrified reading this whole thread until hitting this gem.

Walter Bright (3/5) Dec 04 2015 Yeah, I think it puts UDA on the map!

Andrei Alexandrescu (8/14) Dec 04 2015 "Even better".

Minas Mina (3/20) Dec 04 2015 Looks good :)
Steven Schveighoffer (3/18) Dec 04 2015 This looks very excellent, nice work!
H. S. Teoh via Digitalmars-d (8/26) Dec 04 2015 [...]
Walter Bright (5/18) Dec 04 2015 This is very similar to the si units program Oskar Linde wrote back in 2...
Dmitry Olshansky (5/21) Dec 05 2015 Better still - no more strings:

Walter Bright (2/6) Dec 04 2015 Dang, you beat me to it!

tn (7/37) Dec 04 2015 "I just want to insert an element. I don't care how long it

Jakob Ovrum (4/9) Dec 04 2015 In the current std.container, linearInsert aliases to the

tn (10/20) Dec 04 2015 I understand this.

Andrei Alexandrescu (3/8) Dec 04 2015 When complexity information is not present, the name of the function may...

Andrei Alexandrescu (4/9) Dec 04 2015 What would be the complexity assumption of "insert"?

tn (11/22) Dec 04 2015 I don't, if I have a small container of constant size (or bounded

Idan Arye (4/7) Dec 03 2015 I find it ironic that this thread has moved to discuss how to
Walter Bright (4/11) Dec 03 2015 Are you sure there are only 3 complexities? What if it's a self-balancin...

Minas Mina (10/30) Dec 03 2015 I agree -- putting the complexity (eh, running time) to the

ZombineDev (10/43) Dec 04 2015 That's wrong. Imagine a sort method that calls to an API server

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/11) Dec 04 2015 I am sorry to say this, but hard performance requirements require

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/14) Dec 04 2015 And even then you cannot assume that it is real time as the
ZombineDev (6/17) Dec 04 2015 Well, I agree, but I didn't say hard real-time, only performance

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/21) Dec 04 2015 Yes, if you are writing for large scalable systems then it is

ZombineDev (16/38) Dec 04 2015 I strongly agree with what you said earlier that typical

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/19) Dec 04 2015 Well, if the algorithm can choose between containers it might be

Jonathan M Davis (29/40) Dec 04 2015 Only when dealing with an arbitrary number of elements. O(n^2)

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (49/64) Dec 04 2015 I think there was a misunderstanding about notation here. If we

Andrei Alexandrescu (5/13) Dec 04 2015 Sort is almost always assumed to take n log n time. The problem is when

Andrei Alexandrescu (21/40) Dec 04 2015 Looks exaggerated, innit? The fact of the matter is people choose

CraigDillabaugh (11/21) Dec 04 2015 My personal preference would be not to have the complexity in the

Jonathan M Davis (25/32) Dec 04 2015 std.container puts linear and/or stable in some of its names and

Walter Bright (14/27) Dec 04 2015 Excessive use of aliases is a problem in and of itself - for example, tr...

BLM768 (13/19) Dec 04 2015 The nice thing about this is that it can be easy to specify which
Jonathan M Davis (11/31) Dec 04 2015 Oh, I agree. But if we end up with stuff like stableLinearRemove

BLM768 (18/23) Dec 04 2015 It should be possible to unify the template and UDA forms.

BLM768 (2/3) Dec 04 2015 Uh, I mean list.removeWithComplexity!(Complexity.linear)(3).

Jonny (10/27) Dec 08 2015 Why not create the capability to get the complexity by the user:
=?UTF-8?B?Tm9yZGzDtnc=?= (21/26) Mar 17 2016 I'm currently sitting in a software sector with one of the most

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Consider the collections universe. So we have an imperative primitive like:

c.insertAfter(r, x)

where c is a collection, r is a range previously extracted from c, and x 
is a value convertible to collection's element type. The expression 
imperatively inserts x in the collection right after r.

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection (e.g. c is 
an array)

* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the primitives. Or 
perhaps it's okay to conflate a couple of them. I know names are 
something we're awfully good at discussing :o). Destroy!


Andrei

Dec 03 2015

Jack Stouffer <jack jackstouffer.com> writes:

On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
 These complexities must be reflected in the name of the 
 primitives.

I don't see why. IMO, names should convey what the function does, 
not how it does it. Complexity is usually put in the function 
documentation in Phobos when it's not constant, especially for 
range based ones, I don't see a reason to change that.

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 08:37 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:
 These complexities must be reflected in the name of the primitives.

 I don't see why. IMO, names should convey what the function does, not
 how it does it. Complexity is usually put in the function documentation
 in Phobos when it's not constant, especially for range based ones, I
 don't see a reason to change that.

Complexity is part of the semantics, and in my design names and their 
semantics go hand in hand. -- Andrei

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:37 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:
 These complexities must be reflected in the name of the primitives.

 I don't see why. IMO, names should convey what the function does, not
 how it does it. Complexity is usually put in the function documentation
 in Phobos when it's not constant, especially for range based ones, I
 don't see a reason to change that.

 Complexity is part of the semantics, and in my design names and their
 semantics go hand in hand. -- Andrei

Which is sensible. But in any given context, some parts of the semantics 
are usually abstracted away. Sometimes one wants to abstract over 
running times of called methods. ("This function calls this other 
function at most O(n) times.")

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 09:24 PM, Timon Gehr wrote:
 On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:37 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:
 These complexities must be reflected in the name of the primitives.

 I don't see why. IMO, names should convey what the function does, not
 how it does it. Complexity is usually put in the function documentation
 in Phobos when it's not constant, especially for range based ones, I
 don't see a reason to change that.

 Complexity is part of the semantics, and in my design names and their
 semantics go hand in hand. -- Andrei

 Which is sensible. But in any given context, some parts of the semantics
 are usually abstracted away. Sometimes one wants to abstract over
 running times of called methods. ("This function calls this other
 function at most O(n) times.")

Sure, and there will be provision for that. The collections framework 
will contain logic such as: "If collection type C implements 
insertAfter, then implement UFCS function linearInsertAfter that 
forwards to it". So if as user you're fine with linearInsertAfter, it'll 
work as an upper bound. -- Andrei

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:35 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 09:24 PM, Timon Gehr wrote:
 On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:37 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:
 These complexities must be reflected in the name of the primitives.

 I don't see why. IMO, names should convey what the function does, not
 how it does it. Complexity is usually put in the function documentation
 in Phobos when it's not constant, especially for range based ones, I
 don't see a reason to change that.

 Complexity is part of the semantics, and in my design names and their
 semantics go hand in hand. -- Andrei

 Which is sensible. But in any given context, some parts of the semantics
 are usually abstracted away. Sometimes one wants to abstract over
 running times of called methods. ("This function calls this other
 function at most O(n) times.")

 Sure, and there will be provision for that. The collections framework
 will contain logic such as: "If collection type C implements
 insertAfter, then implement UFCS function linearInsertAfter that
 forwards to it". So if as user you're fine with linearInsertAfter, it'll
 work as an upper bound. -- Andrei

This only works if the module imports std.container. Also, I might be 
okay with arbitrary running times, even super-linear. There's also the 
case where I want to implement a wrapper around a generic container. I 
will then need to provide static introspection in order to expose the 
methods with the correct names.

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 09:52 PM, Timon Gehr wrote:
 This only works if the module imports std.container.

Yeah, that's the way it works.

 Also, I might be
 okay with arbitrary running times, even super-linear.

Hmmmm... right, no provision for that right now. Need to think about it. 
Maybe put them all in a "superlinear" bin.

Names are hard.

 There's also the
 case where I want to implement a wrapper around a generic container. I
 will then need to provide static introspection in order to expose the
 methods with the correct names.

Yep, again that's the way it's done.


Andrei

Dec 03 2015

Andrea Fontana <nospam example.com> writes:

On Friday, 4 December 2015 at 01:37:33 UTC, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
 wrote:
 These complexities must be reflected in the name of the 
 primitives.

 I don't see why. IMO, names should convey what the function 
 does, not how it does it.

I'm agree. It sounds like a bad idea.

And who knows, maybe someone will discover a more efficient way 
to implement "insertAfter" or the collection itself... We should 
change its name?

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 04:27 AM, Andrea Fontana wrote:
 And who knows, maybe someone will discover a more efficient way to
 implement "insertAfter" or the collection itself... We should change its
 name?

Of course. The old name will still work because, remember, there's a 
hierarchy of operations; those that promise less automatically forward 
to those that promise more. -- Andrei

Dec 04 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 02:27 AM, Andrei Alexandrescu wrote:
 Consider the collections universe. So we have an imperative primitive like:

 c.insertAfter(r, x)

 where c is a collection, r is a range previously extracted from c, and x
 is a value convertible to collection's element type. The expression
 imperatively inserts x in the collection right after r.

 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection (e.g. c is
 an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the primitives. Or
 perhaps it's okay to conflate a couple of them. I know names are
 something we're awfully good at discussing :o). Destroy!


 Andrei

Regarding nomenclature, it would be awesome if we could call this 
"running time" instead of "complexity". Algorithms have running times 
and memory usages. Problems have (time and space) complexities. I know 
that calling running time 'complexity' is awfully popular outside of 
actual complexity theory papers. I don't really understand what calling 
it 'complexity' actually buys though.

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 08:55 PM, Timon Gehr wrote:
 I don't really understand what calling it 'complexity' actually buys
 though.

Instant understanding by everyone. -- Andrei

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:55 PM, Timon Gehr wrote:
 I don't really understand what calling it 'complexity' actually buys
 though.

 Instant understanding by everyone. -- Andrei

Well, no. "running time" is actually more descriptive.

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 09:27 PM, Timon Gehr wrote:
 On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:55 PM, Timon Gehr wrote:
 I don't really understand what calling it 'complexity' actually buys
 though.

 Instant understanding by everyone. -- Andrei

 Well, no. "running time" is actually more descriptive.

http://en.cppreference.com/w/cpp/container/forward_list/insert_after

Andrei

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:36 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 09:27 PM, Timon Gehr wrote:
 On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:55 PM, Timon Gehr wrote:
 I don't really understand what calling it 'complexity' actually buys
 though.

 Instant understanding by everyone. -- Andrei

 Well, no. "running time" is actually more descriptive.

 http://en.cppreference.com/w/cpp/container/forward_list/insert_after

 Andrei

Yes, I know it is popular.

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:36 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 09:27 PM, Timon Gehr wrote:
 On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:
 On 12/03/2015 08:55 PM, Timon Gehr wrote:
 I don't really understand what calling it 'complexity' actually buys
 though.

 Instant understanding by everyone. -- Andrei

 Well, no. "running time" is actually more descriptive.

 http://en.cppreference.com/w/cpp/container/forward_list/insert_after

 Andrei

http://www.merriam-webster.com/dictionary/literally

Dec 03 2015

Xinok <xinok live.com> writes:

On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:
 Regarding nomenclature, it would be awesome if we could call 
 this "running time" instead of "complexity". Algorithms have 
 running times and memory usages. Problems have (time and space) 
 complexities. I know that calling running time 'complexity' is 
 awfully popular outside of actual complexity theory papers. I 
 don't really understand what calling it 'complexity' actually 
 buys though.

I've seen you mention this before, and while you may be correct, 
the term "complexity" is so widely applied to algorithms that 
it's not worth going against the norm. For the vast majority of 
computer scientists, when they hear the term "time complexity", 
they immediately understand it as running time. In fact, what 
I've seen most often is that algorithms have complexities and 
problems fall into a *complexity class*. For example, just take a 
look at the Wikipedia page on time complexity:

https://en.wikipedia.org/wiki/Time_complexity

Dec 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 03:34 AM, Xinok wrote:
 On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:
 Regarding nomenclature, it would be awesome if we could call this
 "running time" instead of "complexity". Algorithms have running times
 and memory usages. Problems have (time and space) complexities. I know
 that calling running time 'complexity' is awfully popular outside of
 actual complexity theory papers. I don't really understand what
 calling it 'complexity' actually buys though.

 I've seen you mention this before, and while you may be correct, the
 term "complexity" is so widely applied to algorithms that it's not worth
 going against the norm. For the vast majority of computer scientists,
 when they hear the term "time complexity", they immediately understand
 it as running time.

I understand what is meant, but it is painful, much like when somebody 
uses "literally" but actually means "figuratively". :-)

 In fact, what I've seen most often is that
 algorithms have complexities and problems fall into a *complexity
 class*. For example, just take a look at the Wikipedia page on time
 complexity:

 https://en.wikipedia.org/wiki/Time_complexity

It's a schizophrenic article. It mostly uses the standard terminology 
starting from this section:

https://en.wikipedia.org/wiki/Time_complexity#Constant_time

I guess the article has multiple authors, only some of which are 
complexity theorists.

Dec 03 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:
 awfully popular outside of actual complexity theory papers. I 
 don't really understand what calling it 'complexity' actually 
 buys though.

It will make static arrays look very good. All operations are 
O(1).

Dec 04 2015

Idan Arye <GenericNPC gmail.com> writes:

On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
 Consider the collections universe. So we have an imperative 
 primitive like:

 c.insertAfter(r, x)

 where c is a collection, r is a range previously extracted from 
 c, and x is a value convertible to collection's element type. 
 The expression imperatively inserts x in the collection right 
 after r.

 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection 
 (e.g. c is an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the 
 primitives. Or perhaps it's okay to conflate a couple of them. 
 I know names are something we're awfully good at discussing 
 :o). Destroy!


 Andrei

The complexities of the operations is a property of the data 
structure being used. If each collection type will have it's own 
set of method names based on the complexity of operations on it, 
we won't be able to have templated functions that operate on any 
kind of collection(or at the very least, these functions will be 
really tedious to code).

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the data structure
 being used. If each collection type will have it's own set of method
 names based on the complexity of operations on it, we won't be able to
 have templated functions that operate on any kind of collection(or at
 the very least, these functions will be really tedious to code).

Your premise is right but you reach the negation of the correct 
conclusion. -- Andrei

Dec 03 2015

Jack Stouffer <jack jackstouffer.com> writes:

On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu 
wrote:
 On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the data 
 structure
 being used. If each collection type will have it's own set of 
 method
 names based on the complexity of operations on it, we won't be 
 able to
 have templated functions that operate on any kind of 
 collection(or at
 the very least, these functions will be really tedious to 
 code).

 Your premise is right but you reach the negation of the correct 
 conclusion. -- Andrei

How so? If a singly linked list and a doubly linked list have two 
different method names for the same operation, then they cannot 
be easily templated.

Dec 03 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/3/15 10:29 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:
 On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the data structure
 being used. If each collection type will have it's own set of method
 names based on the complexity of operations on it, we won't be able to
 have templated functions that operate on any kind of collection(or at
 the very least, these functions will be really tedious to code).

 Your premise is right but you reach the negation of the correct
 conclusion. -- Andrei

 How so? If a singly linked list and a doubly linked list have two
 different method names for the same operation, then they cannot be
 easily templated.

Took me a while to figure. There's a hierarchy of operations, e.g. if a 
collection implements insert, it automatically implements linearInsert. 
And so on. The collections framework provides these defaults, so client 
code that needs quick insert uses insert, whereas code that's fine with 
a linear upper bound uses linearInsert and captures both.

Another way to look at it: in STL container-independent code is near 
impossible because different containers use the same signature for 
operations that are different (either wrt iterator invalidation or 
complexity). My design avoids that by giving distinct operations 
distinct names.


Andrei

Dec 03 2015

Tofu Ninja <emmons0 purdue.edu> writes:

On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
wrote:
 On 12/3/15 10:29 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
 Alexandrescu wrote:
 On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the data 
 structure
 being used. If each collection type will have it's own set 
 of method
 names based on the complexity of operations on it, we won't 
 be able to
 have templated functions that operate on any kind of 
 collection(or at
 the very least, these functions will be really tedious to 
 code).

 Your premise is right but you reach the negation of the 
 correct
 conclusion. -- Andrei

 How so? If a singly linked list and a doubly linked list have 
 two
 different method names for the same operation, then they 
 cannot be
 easily templated.

 Took me a while to figure. There's a hierarchy of operations, 
 e.g. if a collection implements insert, it automatically 
 implements linearInsert. And so on. The collections framework 
 provides these defaults, so client code that needs quick insert 
 uses insert, whereas code that's fine with a linear upper bound 
 uses linearInsert and captures both.

 Another way to look at it: in STL container-independent code is 
 near impossible because different containers use the same 
 signature for operations that are different (either wrt 
 iterator invalidation or complexity). My design avoids that by 
 giving distinct operations distinct names.


 Andrei

This sounds really complicated to use? What are the benefits?
When would a generic algorithm even need to know the complexity 
of the container?

Also maybe a simpler idea would just be to annotate the the 
operations with there complexity with UDAs. That way things that 
really care about the complexity can get it, and those who don't 
can ignore it. It has the benefit of being self documenting as 
well.

Dec 03 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the 
 operations with there complexity with UDAs. That way things 
 that really care about the complexity can get it, and those who 
 don't can ignore it. It has the benefit of being self 
 documenting as well.

Yes, but can you export it reliably in generically composed 
functions?

Say you have a loop of N iterations calling a function that 
involves M operations, you want to export N*M, but then the 
caller needs to know what N and M are referring to, so you need 
some standard way of getting there. (e.g. N could be number of 
edges and M could be number of nodes or something).

Dec 04 2015

Jakob Ovrum <jakobovrum gmail.com> writes:

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
 When would a generic algorithm even need to know the complexity 
 of the container?

A generic algorithm that uses exponential time when given the 
"wrong" inputs is a hidden danger. This can easily happen when 
composing with linear algorithms. This naming scheme, also 
employed by std.container, prevents this.

 Also maybe a simpler idea would just be to annotate the the 
 operations with there complexity with UDAs. That way things 
 that really care about the complexity can get it, and those who 
 don't can ignore it. It has the benefit of being self 
 documenting as well.

If you don't care about the complexity, why are you using 
collections? Just use T[] and T[U]. Surely choosing a data 
structure is all about complexity (barring some details like 
optimizing for cache locality or small data structures).

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 03:41 AM, Jakob Ovrum wrote:
 On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
 When would a generic algorithm even need to know the complexity of the
 container?

 A generic algorithm that uses exponential time when given the "wrong"
 inputs is a hidden danger. This can easily happen when composing with
 linear algorithms. This naming scheme, also employed by std.container,
 prevents this.

 Also maybe a simpler idea would just be to annotate the the operations
 with there complexity with UDAs. That way things that really care
 about the complexity can get it, and those who don't can ignore it. It
 has the benefit of being self documenting as well.

 If you don't care about the complexity, why are you using collections?
 Just use T[] and T[U]. Surely choosing a data structure is all about
 complexity (barring some details like optimizing for cache locality or
 small data structures).

WORD! -- Andrei

Dec 04 2015

ZombineDev <valid_email he.re> writes:

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
 On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
 wrote:
 On 12/3/15 10:29 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
 Alexandrescu wrote:
 On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the 
 data structure
 being used. If each collection type will have it's own set 
 of method
 names based on the complexity of operations on it, we won't 
 be able to
 have templated functions that operate on any kind of 
 collection(or at
 the very least, these functions will be really tedious to 
 code).

 Your premise is right but you reach the negation of the 
 correct
 conclusion. -- Andrei

 How so? If a singly linked list and a doubly linked list have 
 two
 different method names for the same operation, then they 
 cannot be
 easily templated.

 Took me a while to figure. There's a hierarchy of operations, 
 e.g. if a collection implements insert, it automatically 
 implements linearInsert. And so on. The collections framework 
 provides these defaults, so client code that needs quick 
 insert uses insert, whereas code that's fine with a linear 
 upper bound uses linearInsert and captures both.

 Another way to look at it: in STL container-independent code 
 is near impossible because different containers use the same 
 signature for operations that are different (either wrt 
 iterator invalidation or complexity). My design avoids that by 
 giving distinct operations distinct names.


 Andrei

 This sounds really complicated to use? What are the benefits?
 When would a generic algorithm even need to know the complexity 
 of the container?

 Also maybe a simpler idea would just be to annotate the the 
 operations with there complexity with UDAs. That way things 
 that really care about the complexity can get it, and those who 
 don't can ignore it. It has the benefit of being self 
 documenting as well.

Ranges are a good example - they provide only the operations thay 
can efficiently implement. For example forward ranges could 
provide random access but at the cost of linear running time.

std.containers provides the `elem in container` operation only if 
they can implement it in logarithmic time or faster.

The fact that you can't use opIn for DList or Array is very good, 
because it prevents you from writing inefficient genric code. 
You're algorithm will manifest that it can only work with 
containers provide efficient operations and because of that you 
can be sure what its time complexity would be.

You should choose a specific data structure only because you can 
efficiently implement the algorithm you have in mind with it.

One of worst examples of the opposite is .Count extention method 
in .NET (which happens to have the same name as .Count member 
function of ICollection - one of the most used (often implicitly) 
interfaces), which has linear running time. The most horrible 
thing I have seen!

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the operations
 with there complexity with UDAs.

Great idea, will keep it in mind. Thanks! -- Andrei

Dec 04 2015

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 December 2015 at 13:59:41 UTC, Andrei Alexandrescu 
wrote:
 On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the 
 operations
 with there complexity with UDAs.

 Great idea, will keep it in mind. Thanks! -- Andrei

Yeah. If the primary reason to have stable and linear and whatnot 
in the names is so that introspection can be used on them, then 
UDAs will work fine for that. Where it's stickier is if you're 
just calling them in generic code, since then you have no 
protection against the complexity changing when the container 
being used changes. But pretty much no one is going to be wanting 
to use stuff like stableLinearInsert or stable.linear.insert in 
their code instead of insert. So, while having the complexity be 
part of the API has some serious advantages, it's not 
user-friendly for normal use, whereas the UDAs put you somewhere 
in between not having the complexity in the API at all and 
forcing everyone to type out the complexity every time they use a 
function.

- Jonathan M Davis

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 When would a generic algorithm even need to know the complexity of the
 container?

Only always :o). -- Andrei

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the operations
 with there complexity with UDAs. That way things that really care about
 the complexity can get it, and those who don't can ignore it. It has the
 benefit of being self documenting as well.

Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450

That's quite promising. The code is very preliminary and uses strings 
for typical complexity values (e.g. "constant", "linear", and later 
"loglinear" etc). I need to see how to integrate this whole idea.

Also an unpleasant problem is overloading - when present, user code 
needs to specify which overload they're referring to.

Anyhow, this is really interesting. Thanks Tofu.


Andrei

Dec 04 2015

ref2401 <refactor24 gmail.com> writes:

On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu 
wrote:
 Well look at what the cat dragged in:

 http://dpaste.dzfl.pl/711aecacc450

 That's quite promising. The code is very preliminary and uses 
 strings for typical complexity values (e.g. "constant", 
 "linear", and later "loglinear" etc). I need to see how to 
 integrate this whole idea.

 Also an unpleasant problem is overloading - when present, user 
 code needs to specify which overload they're referring to.

 Anyhow, this is really interesting. Thanks Tofu.


 Andrei

Wow! It looks really great.

Dec 04 2015

Jack Stouffer <jack jackstouffer.com> writes:

On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu 
wrote:
 Well look at what the cat dragged in:

 http://dpaste.dzfl.pl/711aecacc450

This is a great compromise; I (and everyone else probably) would 
love to see a prototype of this when it's ready.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:
 On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the operations
 with there complexity with UDAs. That way things that really care about
 the complexity can get it, and those who don't can ignore it. It has the
 benefit of being self documenting as well.

 Well look at what the cat dragged in:

 http://dpaste.dzfl.pl/711aecacc450

Following up on this, I thought I'd define a little algebra for 
complexities. It will be closed (no arbitrary expressions) and will 
support only the usually encountered complexities.

The idea is to allow functions to compute their own complexity in terms 
of complexities of their respective primitives.

Here's the algebra.

Terms:

1 = O(1)
log = O(log n)
plog = O((log n)^k)
sqrt = O(sqrt n)
lin = O(n)
linlog = O(n * log n)
linplog = O(n * (log n)^k)
poly = O(n^k)
exp = O(2^n)

Ordering:

Terms above are in increasing order.

Summation:

x + y = max(x, y)

Product:

        | 1       log      plog     sqrt  lin      linlog   poly  exp 

-------+------------------------------------------------------------
1      | 1       log      plog     sqrt  lin      linlog   poly  exp 

log    | log     plog     plog     ?     linlog   ?        poly  exp
plog   | plog    plog     plog     ?     linplog  linplog  poly  exp
sqrt   | sqrt    ?        ?        lin   poly     poly     poly  exp
lin    | lin     linlog   linplog  poly  poly     poly     poly  exp
linlog | linlog  linplog  linplog  poly  poly     poly     poly  exp
poly   | poly    poly     poly     poly  poly     poly     poly  exp
exp    | exp     exp      exp      exp   exp      exp      exp   exp

I've left a few question marks for the tricky cases. Ideas?


Andrei

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 01:37 PM, Andrei Alexandrescu wrote:
         | 1       log      plog     sqrt  lin      linlog   poly  exp
 -------+------------------------------------------------------------
 1      | 1       log      plog     sqrt  lin      linlog   poly  exp
 log    | log     plog     plog     ?     linlog   ?        poly  exp
 plog   | plog    plog     plog     ?     linplog  linplog  poly  exp
 sqrt   | sqrt    ?        ?        lin   poly     poly     poly  exp
 lin    | lin     linlog   linplog  poly  poly     poly     poly  exp
 linlog | linlog  linplog  linplog  poly  poly     poly     poly  exp
 poly   | poly    poly     poly     poly  poly     poly     poly  exp
 exp    | exp     exp      exp      exp   exp      exp      exp   exp

 I've left a few question marks for the tricky cases. Ideas?

The question mark at log * linlog is spurious; should be replaced with 
linplog.

So the tricky cases are sqrt * log and sqrt * plog.


Andrei

Dec 04 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:
 Following up on this, I thought I'd define a little algebra ....
 It will be closed (no arbitrary expressions)
 ...

I think that the exponents matter. E.g. n^1.5 vs n^2.
The algebra can be made more expressive while simplifying its 
implementation.

 The idea is to allow functions to compute their own complexity in terms
 of complexities of their respective primitives.

 Here's the algebra.

 Terms:

 1 = O(1)
 log = O(log n)
 plog = O((log n)^k)
 sqrt = O(sqrt n)
 lin = O(n)
 linlog = O(n * log n)
 linplog = O(n * (log n)^k)
 poly = O(n^k)
 exp = O(2^n)

Example alternative:

struct BigO{
     Fraction logExp;
     Fraction polyExp;
     Fraction expBase;

     bool opCmp(BigO rhs){
         return 
tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp));
     }
     BigO opBinary(string op:"*")(BigO rhs){
         return 
BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase);
     }
     // ...
}

Your cases then become:

O(1):     expBase=1, polyExp=0, logExp=0;
O(log n): expBase=1, polyExp=0, logExp=1;
O((log n)^k): expBase=1, polyExp=0, logExp=k;
O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
O(n): expBase=1, polyExp=1, logExp=0;
O(n * log n): expBase=1, polyExp=1, logExp=1;
O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
O(n^k): expBase=1, polyExp=k, logExp=0;
O(2^n): expBase=2, polyExp=0, logExp=0;

Combinations just work, especially n^(1/2) * log n.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Timon Gehr <timon.gehr gmx.ch> wrote:
 On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:
 Following up on this, I thought I'd define a little algebra ....
 It will be closed (no arbitrary expressions)
 ...

 
 I think that the exponents matter. E.g. n^1.5 vs n^2.
 The algebra can be made more expressive while simplifying its 
 implementation.
 
 
 The idea is to allow functions to compute their own complexity in terms
 of complexities of their respective primitives.
 
 Here's the algebra.
 
 Terms:
 
 1 = O(1)
 log = O(log n)
 plog = O((log n)^k)
 sqrt = O(sqrt n)
 lin = O(n)
 linlog = O(n * log n)
 linplog = O(n * (log n)^k)
 poly = O(n^k)
 exp = O(2^n)

 
 Example alternative:
 
 struct BigO{
     Fraction logExp;
     Fraction polyExp;
     Fraction expBase;
 
     bool opCmp(BigO rhs){
         return 
 tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp));
     }
     BigO opBinary(string op:"*")(BigO rhs){
         return 
 BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase);
     }
     // ...
 }
 
 Your cases then become:
 
 O(1):     expBase=1, polyExp=0, logExp=0;
 O(log n): expBase=1, polyExp=0, logExp=1;
 O((log n)^k): expBase=1, polyExp=0, logExp=k;
 O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
 O(n): expBase=1, polyExp=1, logExp=0;
 O(n * log n): expBase=1, polyExp=1, logExp=1;
 O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
 O(n^k): expBase=1, polyExp=k, logExp=0;
 O(2^n): expBase=2, polyExp=0, logExp=0;
 
 Combinations just work, especially n^(1/2) * log n.
 
 
 

Nice, I'll continue with this. Thanks! I won't use expBase because no
operation leads to it; instead I'll just lump everything superpoly into one
"fast growing" category.

Dec 04 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 09:03 PM, Andrei Alexandrescu wrote:
Your cases then become:

O(1):     expBase=1, polyExp=0, logExp=0;
O(log n): expBase=1, polyExp=0, logExp=1;
O((log n)^k): expBase=1, polyExp=0, logExp=k;
O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
O(n): expBase=1, polyExp=1, logExp=0;
O(n * log n): expBase=1, polyExp=1, logExp=1;
O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
O(n^k): expBase=1, polyExp=k, logExp=0;
O(2^n): expBase=2, polyExp=0, logExp=0;

Combinations just work, especially n^(1/2) * log n.


 Nice, I'll continue with this. Thanks! I won't use expBase because no
 operation leads to it; instead I'll just lump everything superpoly into one
 "fast growing" category.

This is probably sensible in the context of std.container.
Note that some containers only give amortized guarantees. E.g. in C++, 
calling push_back n times on a vector that starts out empty is O(n), but 
the worst case for a single push_back is Ω(n).

Another question is how widely applicable BigO should become beyond 
std.container. E.g. some runtime bounds have multiple parameters.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 04:40 PM, Timon Gehr wrote:
 Another question is how widely applicable BigO should become beyond
 std.container.

I'm thinking we could add complexity annotations to range functions. For 
example the complexity of map is linear etc.

E.g. some runtime bounds have multiple parameters.

Yah, I was wondering how necessary that'd be. One problem is how to 
align parameters of different algorithms, for example say one is n * m 
and the other is m log n. What if they swap the order of m and n?

I guess some reasonable convention should take care of this. Otherwise, 
we'll need to use a hashtable mapping names to (exp, logExp) tuples.


Andrei

Dec 04 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/04/2015 11:57 PM, Andrei Alexandrescu wrote:
 On 12/04/2015 04:40 PM, Timon Gehr wrote:
 Another question is how widely applicable BigO should become beyond
 std.container.

 I'm thinking we could add complexity annotations to range functions. For
 example the complexity of map is linear etc.
 ...

It depends on the mapped function.

 E.g. some runtime bounds have multiple parameters.

 Yah, I was wondering how necessary that'd be. One problem is how to
 align parameters of different algorithms, for example say one is n * m
 and the other is m log n. What if they swap the order of m and n?

 I guess some reasonable convention should take care of this. Otherwise,
 we'll need to use a hashtable mapping names to (exp, logExp) tuples.


 Andrei

If only products should be expressible, this would maybe be reasonable 
as well:

void foo( (BigO.linear) int n, (BigO.linear) int m);

But UDAs for parameters are not supported.

Dec 04 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/05/2015 04:24 AM, Timon Gehr wrote:

 If only products should be expressible,

But I think not. O(n+m) is common.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 10:24 PM, Timon Gehr wrote:
 void foo( (BigO.linear) int n, (BigO.linear) int m);

 But UDAs for parameters are not supported.

That's actually pretty neat and easy to work around like this:

void foo(int n, int m)  (BigOParam!2(BigO.linear, BigO.linear);

In fact I went through the implementation but soon I hit a wall: what's 
the _relationship_ between the two growths? It may be the sum O(m + n) 
but also the product O(m * n). So the operator must be encoded as well.

Then what do we do with more complex relationships like O((m + n) log n) 
etc.

Then once you get to some reasonable formula, what's the ordering on top 
of these complexities? Probably difficult.

I gave up on this for the time being. Ideas welcome.


Andrei

Dec 05 2015

BLM768 <blm768 gmail.com> writes:

On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu 
wrote:
 On 12/04/2015 10:24 PM, Timon Gehr wrote:
 In fact I went through the implementation but soon I hit a 
 wall: what's the _relationship_ between the two growths? It may 
 be the sum O(m + n) but also the product O(m * n). So the 
 operator must be encoded as well.

 Then what do we do with more complex relationships like O((m + 
 n) log n) etc.

 Then once you get to some reasonable formula, what's the 
 ordering on top of these complexities? Probably difficult.

 I gave up on this for the time being. Ideas welcome.

 Andrei

Well, we could see how CAS libraries handle this kind of stuff 
(if there _is_ one which handles it)...

Really, though, with the more complex algorithms, you start 
running into the kinds of issues noted in the first reply to this 
article: http://www.perlmonks.org/?node_id=573138

Besides, is anyone actually going to specify that they need an 
algorithm that is O(log(n) + m^2) or whatever? Or would a 
function just take _two_ algorithms, assert that each is 
O(log(n)) and O(m^2), respectively, and then compose them itself? 
The complexities depend on the types of container anyway, so in 
general, you should get only multi-term big-O notation when 
you're using multiple containers (or Cartesian products of a 
container with itself or something like that). Can't we just make 
sure one container's insert() and the other container's sort() 
work within a certain complexity?

Dec 05 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Sat, Dec 05, 2015 at 09:52:08PM +0000, BLM768 via Digitalmars-d wrote:
 On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu wrote:
On 12/04/2015 10:24 PM, Timon Gehr wrote:
In fact I went through the implementation but soon I hit a wall:
what's the _relationship_ between the two growths? It may be the sum
O(m + n) but also the product O(m * n). So the operator must be
encoded as well.

Then what do we do with more complex relationships like O((m + n) log
n) etc.

Then once you get to some reasonable formula, what's the ordering on
top of these complexities? Probably difficult.

I gave up on this for the time being. Ideas welcome.

Andrei

 
 Well, we could see how CAS libraries handle this kind of stuff (if
 there _is_ one which handles it)...
 
 Really, though, with the more complex algorithms, you start running
 into the kinds of issues noted in the first reply to this article:
 http://www.perlmonks.org/?node_id=573138
 
 Besides, is anyone actually going to specify that they need an
 algorithm that is O(log(n) + m^2) or whatever? Or would a function
 just take _two_ algorithms, assert that each is O(log(n)) and O(m^2),
 respectively, and then compose them itself? The complexities depend on
 the types of container anyway, so in general, you should get only
 multi-term big-O notation when you're using multiple containers (or
 Cartesian products of a container with itself or something like that).
 Can't we just make sure one container's insert() and the other
 container's sort() work within a certain complexity?

Multi-term complexities arise from trivial graph algorithms. They are
not limited to the use of multiple containers. More precisely, the
multiple terms arise because of the structure of the graph (being
composed of nodes and edges); what the algorithms add are functions on
nodes and edges.

Unfortunately, once you have more than a single variable in your
functions, the big-O expressions rapidly become inextricably complex,
and can no longer map to nice abstractions like the reals + infinities +
infinitesimals linear ordering. For example, two graph algorithms may
be, respectively, O(V^2 + E) and O(V + E^2); it's difficult to judge
based on that which one is "superior".


T

-- 
They say that "guns don't kill people, people kill people." Well I think the
gun helps. If you just stood there and yelled BANG, I don't think you'd kill
too many people. -- Eddie Izzard, Dressed to Kill

Dec 05 2015

BLM768 <blm768 gmail.com> writes:

On Saturday, 5 December 2015 at 22:56:23 UTC, H. S. Teoh wrote:
 Multi-term complexities arise from trivial graph algorithms. 
 They are not limited to the use of multiple containers. More 
 precisely, the multiple terms arise because of the structure of 
 the graph (being composed of nodes and edges); what the 
 algorithms add are functions on nodes and edges.

True. I don't expect that very many people would want to specify 
an algorithmic complexity for those operations, though. It seems 
much more rigidly defined than for lists, arrays, etc. The 
question is not really about whether "complex complexities" will 
exist but whether the user has a practical reason to care about 
specifying them.

 Unfortunately, once you have more than a single variable in 
 your functions, the big-O expressions rapidly become 
 inextricably complex, and can no longer map to nice 
 abstractions like the reals + infinities + infinitesimals 
 linear ordering. For example, two graph algorithms may be, 
 respectively, O(V^2 + E) and O(V + E^2); it's difficult to 
 judge based on that which one is "superior".

Well, if one variable is "capped" (or at least expected to be 
"small") it's easy enough to eliminate that term. Beyond that, 
there isn't necessarily a single ordering of big-O expressions, 
but many of them can be ordered relative to a single variable. 
For instance, O(n + m^2) is less complex than O(n^2 + m) with 
respect to n (and vice versa for m). It's trickier when 
expressions are more deeply nested, but if select one term (in 
this case, n), set all the others to be constant (since none of 
them depends on n), and evaluate the resulting expression tree, 
you should get something half-usable. Some simplifying rules may 
be useful. For instance, log(x^2) should approach log(x) as x 
approaches infinity, I think. (My calculus is a bit rusty.)

Dec 05 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 03:47 AM, BLM768 wrote:
 log(x^2) should approach log(x) as x approaches infinity, I think. (My
 calculus is a bit rusty.)

log(x^2) = 2 log x.

Dec 05 2015

BLM768 <blm768 gmail.com> writes:

On Sunday, 6 December 2015 at 03:30:51 UTC, Timon Gehr wrote:
 log(x^2) = 2 log x.

Why do log rules have to make everything so simple? ;)

Dec 05 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/05/2015 04:52 PM, BLM768 wrote:
 Well, we could see how CAS libraries handle this kind of stuff (if there
 _is_ one which handles it)...

CAS = ?

 Really, though, with the more complex algorithms, you start running into
 the kinds of issues noted in the first reply to this article:
 http://www.perlmonks.org/?node_id=573138

 Besides, is anyone actually going to specify that they need an algorithm
 that is O(log(n) + m^2) or whatever? Or would a function just take _two_
 algorithms, assert that each is O(log(n)) and O(m^2), respectively, and
 then compose them itself? The complexities depend on the types of
 container anyway, so in general, you should get only multi-term big-O
 notation when you're using multiple containers (or Cartesian products of
 a container with itself or something like that). Can't we just make sure
 one container's insert() and the other container's sort() work within a
 certain complexity?

Well you'd need multiple terms if you want to say things like, 
"inserting a range into an array is O(array[].walkLength + 
r.walkLength)." When you insert a range in a binary search tree, the 
complexity would be O(log(array[].walkLength) * r.walkLength).

Part of the art here, I feel, is knowing when to stop going too crazy 
about this. At this point we have a nice, round system that's simple to 
understand and use. Making it more complex should come only with a 
corresponding growth in power.


Andrei

Dec 05 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:
 On 12/05/2015 04:52 PM, BLM768 wrote:
 Well, we could see how CAS libraries handle this kind of stuff (if there
 _is_ one which handles it)...

 CAS = ?
 ...

Computer Algebra System, I assume.

Dec 05 2015

tsbockman <thomas.bockman gmail.com> writes:

On Saturday, 5 December 2015 at 23:55:16 UTC, Andrei Alexandrescu 
wrote:
 On 12/05/2015 04:52 PM, BLM768 wrote:
 Well, we could see how CAS libraries handle this kind of stuff 
 (if there
 _is_ one which handles it)...

 CAS = ?

https://en.wikipedia.org/wiki/Computer_algebra_system

Dec 05 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:
 On 12/04/2015 10:24 PM, Timon Gehr wrote:
 void foo( (BigO.linear) int n, (BigO.linear) int m);

 But UDAs for parameters are not supported.

 That's actually pretty neat and easy to work around like this:

 void foo(int n, int m)  (BigOParam!2(BigO.linear, BigO.linear);

 In fact I went through the implementation but soon I hit a wall: what's
 the _relationship_ between the two growths? It may be the sum O(m + n)
 but also the product O(m * n). So the operator must be encoded as well.

 Then what do we do with more complex relationships like O((m + n) log n)
 etc.

 Then once you get to some reasonable formula, what's the ordering on top
 of these complexities? Probably difficult.
 ...

Some upper bounds are incomparable, so there would not be a total order. 
But that is not a problem.

 I gave up on this for the time being. Ideas welcome.
 ...

The next step up the expressiveness scale would be to have a 
sum-of-products representation.

Proof of concept (disclaimer: hacked together in the middle of the 
night, and not tested thoroughly):

http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. (The 
implementation is not feature-complete yet, though. It would be nice if 
it supported automatically computing a new asymptotic runtime bound from 
asymptotic bounds on the arguments.)

Dec 05 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:
 Some upper bounds are incomparable, so there would not be a 
 total order. But that is not a problem.

It is a problem in all cases as you usually dont have an optimal 
bound. And with your approach that will most certainly be 
guaranteed to happen. Comparing bounds does not mean you are 
comparing running time.

O(1) implies O(f(x)), O(N) implies O(N^2).

You can also get tighter bounds for specific input models.

Dec 05 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 08:59 AM, Ola Fosheim Grøstad wrote:
 On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:
 Some upper bounds are incomparable, so there would not be a total
 order. But that is not a problem.

 It is a problem in all cases as you usually dont have an optimal bound.

Are you complaining that the given implementation does not support 
'min', or what are you trying to say here?

 And with your approach that will most certainly be guaranteed to happen.

How is that specific to my approach? I only showed a more expressive 
BigO implementation.

 Comparing bounds does not mean you are comparing running time.
 ...

BigO represents a set of functions. Comparing BigO checks for subset 
inclusion.

 O(1) implies O(f(x)), O(N) implies O(N^2).

For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }.

O(1) ⊆ O(f(x)), O(N) ⊆ O(N²).

<= checks ⊆.
== checks =.

(The final implementation should use exact fractions instead of doubles.)

 You can also get tighter bounds for specific input models.

Yes, you can.

Dec 06 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 03:39 PM, Timon Gehr wrote:
 For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }.

Hm, this does not actually work too well. E.g., we want

O(n+m) ⊆ O(n log m + m log n).

This breaks down with that definition if we e.g. fix m=1 and let n vary.

Anyway, I think this can be fixed.

Dec 06 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:
 Are you complaining that the given implementation does not 
 support 'min', or what are you trying to say here?

I am saying that comparing bounds is not the same as comparing 
running time of implemented algorithms. Insertion sort is both 
O(n^2) and O(n^3), but if you run it on a sorted array where each 
element have been swapped with neighbouring elements 16 times, 
then it is O(N).

So these derived bounds are too loose to be useful, generic 
algorithms cannot make use of them beyond the trivial case.

 BigO represents a set of functions. Comparing BigO checks for 
 subset inclusion.

But what can you use it for? When you compose algorithms and even 
run an optimizer over it, then combining a O(N^2) with O(N) kan 
turn into O(1). You need advanced compiler support for this to be 
valuable.

 You can also get tighter bounds for specific input models.

 Yes, you can.

Exactly, and when you compose/combine algorithms you often end up 
constraining the input model.

Dec 06 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 10:35 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:
 Are you complaining that the given implementation does not support
 'min', or what are you trying to say here?

 I am saying that comparing bounds is not the same as comparing running
 time of implemented algorithms.

Yes, that's what you said later down the post. It's completely unrelated 
to the sentence you claimed was false.

Dec 06 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Sunday, 6 December 2015 at 23:49:00 UTC, Timon Gehr wrote:
 Yes, that's what you said later down the post. It's completely 
 unrelated to the sentence you claimed was false.

i would assume that there would have to be a usecase for 
something added to a standard library. Based on the presented use 
case it is like using the classifications "younger than 100" and 
"younger than 16", apply them randomly to indivduals of the same 
age and use the classifications for making decisions about 
whether they should be allowed to see adult movies or not. 
Putting an ordering on the classification is in that case useless.

Dec 06 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Timon Gehr <timon.gehr gmx.ch> wrote:
 On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:
 On 12/04/2015 10:24 PM, Timon Gehr wrote:
 void foo( (BigO.linear) int n, (BigO.linear) int m);
 
 But UDAs for parameters are not supported.

 
 That's actually pretty neat and easy to work around like this:
 
 void foo(int n, int m)  (BigOParam!2(BigO.linear, BigO.linear);
 
 In fact I went through the implementation but soon I hit a wall: what's
 the _relationship_ between the two growths? It may be the sum O(m + n)
 but also the product O(m * n). So the operator must be encoded as well.
 
 Then what do we do with more complex relationships like O((m + n) log n)
 etc.
 
 Then once you get to some reasonable formula, what's the ordering on top
 of these complexities? Probably difficult.
 ...

 
 Some upper bounds are incomparable, so there would not be a total order. 
 But that is not a problem.
 
 I gave up on this for the time being. Ideas welcome.
 ...

 
 The next step up the expressiveness scale would be to have a 
 sum-of-products representation.
 
 Proof of concept (disclaimer: hacked together in the middle of the 
 night, and not tested thoroughly):
 
 http://dpaste.dzfl.pl/d1512905accd
 
 I think this general approach is probably close to the sweet spot. (The 
 implementation is not feature-complete yet, though. It would be nice if 
 it supported automatically computing a new asymptotic runtime bound from 
 asymptotic bounds on the arguments.)
 

Brilliant! My wife has a work emergency so I've been with the kids all day,
but here's what can be done to make this simpler. 

Use D parsing and eliminate the whole parsing routine. Add multiply and
power are all defined so you only need log of bigo.

Define global constants K, N, and N1 through N7. K is constant complexity
all others are free variables.

Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

On a the phone sry typos.

Dec 06 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:
The next step up the expressiveness scale would be to have a
sum-of-products representation.

Proof of concept (disclaimer: hacked together in the middle of the
night, and not tested thoroughly):

http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. ...


 Brilliant! My wife has a work emergency so I've been with the kids all day,
 but here's what can be done to make this simpler.

 Use D parsing and eliminate the whole parsing routine. Add multiply and
 power are all defined so you only need log of bigo.
 ...

The implementation of power on BigO given does not actually work in 
general (there should have been an enforcement that makes sure there is 
only one summand). I'll think about whether and how it can be made to 
work for multiple summands.

 Define global constants K, N, and N1 through N7. K is constant complexity
 all others are free variables.

 Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

 On a the phone sry typos.

I generally tend to avoid DSLs based solely on operator overloading, as 
they don't always work and hence are awkward to evolve. Here, the 
biggest current nuisance is that the variable names are not very 
descriptive. If we'll go with a log(BigO) function, we possibly want to 
make BigO closed under log without approximating iterated logarithms:

struct Term{
     Variable n;
     Fraction[] exponents; // exponents[0] is exponent of n,
                           // exponents[1] is exponent of log n,
                           // exponents[2] is exponent of log log n, ...
}

Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence 
every BigO has a well-defined logarithm.


[1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)).
     O(log(f)+log(g)) ⊆ O(max(log(f),log(g)))
                      = O(log(max(f,g)))
                      ⊆ O(log(f+g)).

Dec 06 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/06/2015 06:21 PM, Timon Gehr wrote:
 The implementation of power on BigO given does not actually work in
 general (there should have been an enforcement that makes sure there is
 only one summand). I'll think about whether and how it can be made to
 work for multiple summands.

No matter, you may always use runtime assertions - after all it's all 
during compilation. That's the beauty of it!

 Define global constants K, N, and N1 through N7. K is constant complexity
 all others are free variables.

 Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

 On a the phone sry typos.

 I generally tend to avoid DSLs based solely on operator overloading, as
 they don't always work and hence are awkward to evolve. Here, the
 biggest current nuisance is that the variable names are not very
 descriptive.

The bright side is you get to standardize names. If you limit names to 
K, N, and N1 through N7 then you can always impose to APIs the meaning 
of these.

Another bright side is people don't need to learn a new, juuust slightly 
different grammar for complexity expressions - just use D. For example 
the grammar you defined allows log n without parens - what's the 
priority of log compared to power etc? Why is power ^ in this notation 
and ^^ in D? All these differences without a distinction are gratuitous. 
Just use D.

 If we'll go with a log(BigO) function, we possibly want to
 make BigO closed under log without approximating iterated logarithms:

 struct Term{
      Variable n;
      Fraction[] exponents; // exponents[0] is exponent of n,
                            // exponents[1] is exponent of log n,
                            // exponents[2] is exponent of log log n, ...
 }

 Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence
 every BigO has a well-defined logarithm.


 [1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)).
      O(log(f)+log(g)) ⊆ O(max(log(f),log(g)))
                       = O(log(max(f,g)))
                       ⊆ O(log(f+g)).

Yah, the stuff under log must be restricted. Here's the grammar I'm 
toying with:

Atom ::= K | N | N1 | ... | N7
SimpleExp ::= SimpleTerm ('+' SimpleTerm)*
SimpleTerm ::= SimpleFactor ('*' SimpleFactor)*
SimpleFactor ::= Atom ('^^' double)? | '(' SimpleExp ')'
BigO ::= Term ('+' Term)*
Term ::= SimpleFactor ('*' 'log' '(' SimpleExp ')' ('^^' double)?)?

(I used regex notations for "optional" and "zero or more".)

This is expressible with D's native operations (so no need for custom 
parsing) and covers, I think, what we need. It could be further 
simplified if we move some of the grammar's restrictions to runtime 
(e.g. no need for SimpleXxx, some expressions can be forced to be simple 
during "runtime").


Andrei

Dec 06 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/07/2015 02:05 AM, Andrei Alexandrescu wrote:
 On 12/06/2015 06:21 PM, Timon Gehr wrote:
 The implementation of power on BigO given does not actually work in
 general (there should have been an enforcement that makes sure there is
 only one summand). I'll think about whether and how it can be made to
 work for multiple summands.

 No matter, you may always use runtime assertions - after all it's all
 during compilation. That's the beauty of it!
 ...

Even better: I was wrong when I claimed it does not work.
For 0<=x, it actually works as-is:

O((f+g)^x) = O(max(f,g)^x) = O(max(f^x,g^x)) = O(f^x+g^x).

 Define global constants K, N, and N1 through N7. K is constant
 complexity
 all others are free variables.

 Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

 On a the phone sry typos.

 I generally tend to avoid DSLs based solely on operator overloading, as
 they don't always work and hence are awkward to evolve. Here, the
 biggest current nuisance is that the variable names are not very
 descriptive.

 The bright side is you get to standardize names. If you limit names to
 K, N, and N1 through N7 then you can always impose to APIs the meaning
 of these.
 ...

Well, how?

 Another bright side is people don't need to learn a new, juuust slightly
 different grammar for complexity expressions - just use D. For example
 the grammar you defined allows log n without parens - what's the
 priority of log compared to power etc?

There is no priority. The parser explicitly rejects log n^x. :-)

 Why is power ^ in this notation and ^^ in D?

Because ^ is taken for bitwise xor in D due to backwards-compatibility 
constraints.

 All these differences without a distinction are gratuitous.
 Just use D.
 ...

D does not allow overloading of syntax to the extent necessary to make 
similar things really pleasant in the long run, and it has been 
repeatedly argued that this is a good thing; that custom parsing should 
be used instead. It is easy to align the parser with D syntax. Anyway, 
syntax is not the problem here, and the implementation can be downgraded 
to not support parsing and/or proper names at any point.

Dec 07 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/7/15 5:14 AM, Timon Gehr wrote:
 D does not allow overloading of syntax to the extent necessary to make
 similar things really pleasant in the long run, and it has been
 repeatedly argued that this is a good thing; that custom parsing should
 be used instead. It is easy to align the parser with D syntax. Anyway,
 syntax is not the problem here, and the implementation can be downgraded
 to not support parsing and/or proper names at any point.

I fail to see how no parens after log or "^" in lieu "^^" would make a 
positive difference. What would be a few examples of things that won't 
work pleasantly enough?

I'm not sure whether the DSL argument is well applied here. Clearly 
using D expressions for e.g. regex or SQL syntax would be at best 
avoided in favor of a DSL. In this case we're defining an algebra over 
restricted expressions, which are a subset of the usual mathematical 
expressions that D's syntax already emulates.

Looks like a debate on whether to choose one standard language vs. an 
obscure one (in this case invented ad-hoc) is starting. This is deja vu 
all over again :o).

I hope you won't mind if I give your idea a slightly different angle.


Andrei

Dec 07 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/07/2015 02:46 PM, Andrei Alexandrescu wrote:
 On 12/7/15 5:14 AM, Timon Gehr wrote:
 D does not allow overloading of syntax to the extent necessary to make
 similar things really pleasant in the long run, and it has been
 repeatedly argued that this is a good thing; that custom parsing should
 be used instead. It is easy to align the parser with D syntax. Anyway,
 syntax is not the problem here, and the implementation can be downgraded
 to not support parsing and/or proper names at any point.

 I fail to see how no parens after log or "^" in lieu "^^" would make a
 positive difference.

Neither have I attempted to show anything like that. All arguments in 
this debate are obvious and boring, so no need to discuss it.

 What would be a few examples of things that won't
 work pleasantly enough?
 ...

You gave this example:
http://en.cppreference.com/w/cpp/container/forward_list/insert_after

1-2) BigO("1")
3) BigO("count")
4) BigO("distance(first,last)")
5) BigO("ilist.size()")

There's also this:
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:
 Well you'd need multiple terms if you want to say things like,
 "inserting a range into an array is O(array[].walkLength +
 r.walkLength)." When you insert a range in a binary search tree, the
 complexity would be O(log(array[].walkLength) * r.walkLength).

BigO("array[].walkLength + r.walkLength");
BigO("log(array[].walkLength) * r.walkLength");

Dec 07 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/07/2015 09:43 AM, Timon Gehr wrote:
 1-2) BigO("1")
 3) BigO("count")
 4) BigO("distance(first,last)")
 5) BigO("ilist.size()")

 There's also this:
 On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:
 Well you'd need multiple terms if you want to say things like,
 "inserting a range into an array is O(array[].walkLength +
 r.walkLength)." When you insert a range in a binary search tree, the
 complexity would be O(log(array[].walkLength) * r.walkLength).

 BigO("array[].walkLength + r.walkLength");
 BigO("log(array[].walkLength) * r.walkLength");

These are still expressible without a DSL: 
BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.

Somewhat independently of DSL or not: At this point I'm unclear whether 
supporting free variables with arbitrary names is a good thing. The key 
to unleashing the power of BigO is to compute it when combining 
functions, and the names seem to be specific to the function and 
therefore not easy to combine.


Andrei

Dec 07 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/07/2015 05:15 PM, Andrei Alexandrescu wrote:
 On 12/07/2015 09:43 AM, Timon Gehr wrote:
 1-2) BigO("1")
 3) BigO("count")
 4) BigO("distance(first,last)")
 5) BigO("ilist.size()")

 There's also this:
 On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:
 Well you'd need multiple terms if you want to say things like,
 "inserting a range into an array is O(array[].walkLength +
 r.walkLength)." When you insert a range in a binary search tree, the
 complexity would be O(log(array[].walkLength) * r.walkLength).

 BigO("array[].walkLength + r.walkLength");
 BigO("log(array[].walkLength) * r.walkLength");

 These are still expressible without a DSL:
 BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.
 ...

This just goes from one string to two strings and adds some noise on top 
of it. Also, now, what is D and what is in a string is arbitrary.

BigO(Var.array[].walkLength + Var.r.walkLength);

 Somewhat independently of DSL or not:

Yes, notation does not matter at this stage.

 At this point I'm unclear whether
 supporting free variables with arbitrary names is a good thing.

Restricting the set of names to 8 predefined ones does not simplify 
anything. It just means that a mapping of predefined names to real names 
has to be carefully managed manually and clashes have to be avoided, all 
while limiting the value of BigO as documentation.

 The key
 to unleashing the power of BigO is to compute it when combining
 functions, and the names seem to be specific to the function and
 therefore not easy to combine.
 ...

Just substitute something else for those names to get rid of them. That 
is how passing of parameters is handled. Probably we should get some 
actual examples where combination should work to see what could be done.

If you have:

void foo(int n,int m)  BigO("n*m^2"){ ... }

Something like this is easy to implement:

// compute runtime bound from bounds on parameters:
// first expression passed is O(x^3) and second expression passed
// is O(log(y)^(1/2)).
enum bound = getRuntimeBound!foo(BigO("x^3"),BigO("log(y)^(1/2)"));
static assert(bound == BigO("x^3*log(y)"));

Note how the end user does not need to worry much about names.

Dec 07 2015

ZombineDev <valid_email he.re> writes:

On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu 
wrote:
 These are still expressible without a DSL: 
 BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.

We can remove the use of strings if we tag walkLength with BigO:

// this:
BigO(Atom("array[].walkLength") + Atom("r.walkLength"))

//become this:
 (complexity!(array[].walkLength) + complexity!(r.walkLength))

Or if we rename complexity to bigO:

 (bigO!(array[].walkLength) + bigO!(r.walkLength))

Dec 08 2015

Jonny <Jonny nomail.com> writes:

On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu 
wrote:
 On 12/07/2015 09:43 AM, Timon Gehr wrote:
 1-2) BigO("1")
 3) BigO("count")
 4) BigO("distance(first,last)")
 5) BigO("ilist.size()")

 There's also this:
 On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:
 Well you'd need multiple terms if you want to say things like,
 "inserting a range into an array is O(array[].walkLength +
 r.walkLength)." When you insert a range in a binary search 
 tree, the
 complexity would be O(log(array[].walkLength) * r.walkLength).

 BigO("array[].walkLength + r.walkLength");
 BigO("log(array[].walkLength) * r.walkLength");

 These are still expressible without a DSL: 
 BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.

 Somewhat independently of DSL or not: At this point I'm unclear 
 whether supporting free variables with arbitrary names is a 
 good thing. The key to unleashing the power of BigO is to 
 compute it when combining functions, and the names seem to be 
 specific to the function and therefore not easy to combine.


 Andrei

There are many "operations" that can be carried out on methods.

Many are composable and decomposable in the exact same way and 
can be generalized as an algorithm:

let f,f1,f2, etc be functions,

if f = f1 + f2
then L(f) = L(f1) + L(f2)


Hence if L is a linear operator, this always holds. Addition(+) 
is linear, BigO is linear. Composing linear operators is 
linear(but it doesn't have an inverse like addition).


If D supported some generic way to handle such linear operators 
acting on methods, with calls to other methods as decomposing the 
method, one could do many cool things quite easily in D. BigO 
would just be one.


e.g.,

void foo()
{
     bar1();
     bar2();
     for(auto i = 1; i < 100; i++)
        for(auto j = 1; j < 100; j++)
            bar3();
}

then we can see L(foo) = L(bar1) + L(bar2) + X(n^2)*L(bar3)

(Here, X(n^2) is because we have a double nested for loop which 
multiples the complexity by n^2 for big notation. To be more 
precise, we should refactor all code inside the function in such 
a way to make things precise, i.e.,

void foo()
{
     bar1();
     bar2();
}

void fooTemp()
{
     for(auto i = 1; i < 100; i++)
        for(auto j = 1; j < 100; j++)
            bar3();
}

which then gives the more precise decomposition as

L(foo) = L(bar1) + L(bar2) + L(fooTemp)


The main point of all this, is that L can be many things. If one 
has the complete hierarchical call stack(with root = Main) then L 
can be scene as recursively working on the tree.

In this case, addition would rather be "insertion into tree". We 
could easily check relations such as if foo is called, calls, or 
disjoint from bar.


We can, as already stated, implement complexity analysis 
automatically:

BigO - The sup of the complexity of a function assuming all 
unknowns(loops, user input, unknown sub-function calls) have 
maximal complexity.
bigO - The sup of the complexity of all functions assuming all 
unknowns have 0 complexity.
LittleO - The inf of complexity....
litleO - ...


The point, by simply implementing a decomposable structure on the 
"call hierarchy", one can implement all kinds of cool stuff. If 
it is exposed to D at run time, then even more amazing things can 
happen.

e.g., one could implement a real time profiler. L would be an 
"operator" that simply sends the current ticks when first called 
and when returning. The "addition" is really just code that does 
the computations on the differences it is getting.

Imagine hitting some hot key and your current app showing a 
tree(something like a fractal tree) where each node is a function 
call and sub nodes are functions that are the functions that the 
function calls("uses").

Color the tree based on the use rate(count how many times the 
function is called, or even accumulate hierarchically) or the 
execution time, or even BigO. You'll then see a visual of where 
the program is active the most. It might sort of look like the 
heat distribution map of the earth.

I realize it is more complex to implement than I'm making it out, 
but it's basically tree traversal and hooks stuff.

Once you start allowing meta coding, the power becomes magnified:

e.g. (psuedo-code)

// Define generic meta code function composer(the + in the 
algorithm) for BigO
 function!composer!BigO(functions)
{
     let f1, ..., fn = functions
     return function!composer!BigO(f1) + ... + 
function!composer!BigO(f2)

     // Would need to separate parse f1 in such a way that it is 
composed of only calls to functions or none calls(the loop stuff 
I mentioned above)
}

// The Linear operator BigO that returns the complexity of a 
function f
 function!composer!BigO(f)
{
    return complexity(f);
}


// Explicitly specify complexity for foo
void foo()
[function!composer!BigO() { return 0; }
{
     ...
}


------------------------------------

The above represents a mock way to allow the programmer to "hook" 
into the compilers call hierarchy structure. Obviously it would 
require some work for actual implementation. But once such a 
feature is implemented, the programmer can do all sorts of 
things. Implement "attributes"(not needed, of course), real time 
profiling, BigO, Code replacement/Self Modifying 
Programs(essentially change how a function behaves), etc...

Of course, maybe we are starting to get into AI here? ;) (e.g., 
one could use multiple similar functions in a program for 
performance enhancements, but implement a few lines of code to 
have those functions reconfigure themselves to use the most 
optimal one for performance reasons)


Also, why stop at functions? what about all sub-structures in the 
program? We could work on classes, structs, etc...

Dec 10 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:
The next step up the expressiveness scale would be to have a
sum-of-products representation.

Proof of concept (disclaimer: hacked together in the middle of the
night, and not tested thoroughly):

http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. ...


 Brilliant! ...

I have noticed another thing. The comparison operator is an 
underapproximation (it sometimes returns NaN when ordering would 
actually be possible).

E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

Interesting. It would be nice if the final version had a complete 
decision procedure for ⊆.

Dec 06 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/06/2015 07:50 PM, Timon Gehr wrote:
 On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:
The next step up the expressiveness scale would be to have a
sum-of-products representation.

Proof of concept (disclaimer: hacked together in the middle of the
night, and not tested thoroughly):

http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. ...


 Brilliant! ...

 I have noticed another thing. The comparison operator is an
 underapproximation (it sometimes returns NaN when ordering would
 actually be possible).

 E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

 Interesting. It would be nice if the final version had a complete
 decision procedure for ⊆.

I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei

Dec 06 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/07/2015 02:07 AM, Andrei Alexandrescu wrote:
 On 12/06/2015 07:50 PM, Timon Gehr wrote:
 On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:
The next step up the expressiveness scale would be to have a
sum-of-products representation.

Proof of concept (disclaimer: hacked together in the middle of the
night, and not tested thoroughly):

http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. ...


 Brilliant! ...

 I have noticed another thing. The comparison operator is an
 underapproximation (it sometimes returns NaN when ordering would
 actually be possible).

 E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

 Interesting. It would be nice if the final version had a complete
 decision procedure for ⊆.

 I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei

Yes, certainly. By complete, I mean it orders everything that can be 
ordered.

Dec 07 2015

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu 
wrote:
 Then what do we do with more complex relationships like O((m + 
 n) log n) etc.

What about a compile-time expression tree similar to how, for 
instance, `BaseUnit`, `ScaledUnit`, (and suggestedly 
`ShiftedUnit` and `LinearUnit`), are defined and used in David 
Nadlinger's std.units?

Mar 17 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Dec 04, 2015 at 01:37:06PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:

[...]
 Here's the algebra.
 
 Terms:
 
 1 = O(1)
 log = O(log n)
 plog = O((log n)^k)
 sqrt = O(sqrt n)
 lin = O(n)
 linlog = O(n * log n)
 linplog = O(n * (log n)^k)
 poly = O(n^k)
 exp = O(2^n)
 
 Ordering:
 
 Terms above are in increasing order.
 
 Summation:
 
 x + y = max(x, y)
 
 Product:
 
        | 1       log      plog     sqrt  lin      linlog   poly  exp
 
 -------+------------------------------------------------------------
 1      | 1       log      plog     sqrt  lin      linlog   poly  exp
 
 log    | log     plog     plog     ?     linlog   ?        poly  exp
 plog   | plog    plog     plog     ?     linplog  linplog  poly  exp
 sqrt   | sqrt    ?        ?        lin   poly     poly     poly  exp
 lin    | lin     linlog   linplog  poly  poly     poly     poly  exp
 linlog | linlog  linplog  linplog  poly  poly     poly     poly  exp
 poly   | poly    poly     poly     poly  poly     poly     poly  exp
 exp    | exp     exp      exp      exp   exp      exp      exp   exp
 
 I've left a few question marks for the tricky cases. Ideas?

[...]

sqrt really belongs under poly, as far as asymptotic behaviour is
concerned. Basically, O(n^j) * O(n^k) for any real j, k > 0 is
asymptotically equal to O(n^(j+k)). Furthermore, O(O(n^j)^k) =
O(n^(j*k)). So you can treat polynomial complexities as a field of real
numbers, where + on the O(...) terms behaves like max(), * behaves like
+, and function composition behaves like ^.

Then exp behaves like an "infinite real", where exp > O(n^k) for all
real k>0. Its inverse, log, therefore, behaves like an "infinitesimal",
where O((log n)^k) for all real k>0 is less than O(n^k) for all real
k>0. (Yes, even O(n^(1/1000000)) will eventually outgrow O(log n).)

The log powers behave like "intermediate infinitesimals", where you have
O((log n)^j) < O((log n)^k) for all positive real j < k.

So these O-terms behave approximately like real numbers (poly) enhanced
with infinite quantities (exp and its derivations) and infinitesimal
quantities (log and its derivations), they follow the usual laws of
arithmetic. Therefore,

	O(n^k) * O(log n)

can be treated as the equivalent of a real number k + an infinitesimal
L, such that k < (k+L) < k+j for all real j>0. In other words,

	O(n) < O(n * log n) < O(n^(1+k))  for all k>0.

(Yes, even O(n^1.00000000001) will eventually outgrow O(n*log n). O(log
n) behaves like an infinitesimal.)

The nice thing about this is that you can simplify a lot of complicated
O(...) expressions just by applying algebra as described above on the
analogous {real + infinities + infinitesimals} system. Ordering
relations are preserved (for the most part -- this only breaks down with
pathological cases like O(sin n) which is unlikely to be ever
encountered). Also, function composition between poly and non-poly
complexities will generally be non-commutative, and mostly will break
the + = max analogy. (But it seems unlikely, in real-world algorithms,
to ever need to worry about the intricacies of exponential-time
algorithms, since generally they are to be avoided where possible.)

So you can get a (mostly) closed algebra just by mapping poly to the
real numbers, and then adding:

	L = infinitesimal quantity corresponding to O(log n)
	E = infinite quantity corresponding to exp

Various operations inside O(...) are thus mapped:

	+ inside O(...) = max(...)
	* inside O(...) = + between mapped quantities
	O(f(g(n))) = O(f(n)) * O(g(n))
	O(1) = 0

Example: is O(n^3*log(n)) better than O(n^2*(log(n))^3)?
Answer:
	O(n^3*log(n)) maps to 3 + L, and O(n^2*(log(n))^3) maps to 2 +
	L^3. Since L^3 is infinitesimal, (2 + L^3) is very close to 2,
	whereas (3 + L) lies slightly above 3. Therefore O(n^3*log(n))
	is definitely faster-growing than O(n^2*(log(n))^3).

This algebra leads to various interesting questions like:

- Is there an intermediate complexity that lies between poly and exp?
  Yes: since exp is mapped to the infinite quantity E, it follows that
  E/2 is still an infinite quantity (even though it's strictly less than
  E). E/2 corresponds to E*1/2, which is the composition of exp with
  sqrt, so it follows that O(n^k) < O(e^sqrt(n)) < O(e^n) for all real
  k>0. (There are, of course, many other possibilities.)

- Similarly, the log powers O((log n)^k) are *always* slower-growing
  than any polynomial complexity, because L*k, being still
  infinitesimal, will always < j for all real j>0. So even
  O((log n)^10000) will not outgrow O(n^1.000000001). Multiplying with a
  poly function preserves this relationship:

  	O(n^j * (log n)^k) --> j + L*k < j + m, for all real j,m>0, so
	O(n^j * (log n)^k) < O(n^(j+m)) for all j,m>0.

  Basically you're just displacing the mapped values by a constant.


T

-- 
People who are more than casually interested in computers should have at least
some idea of what the underlying hardware is like. Otherwise the programs they
write will be pretty weird. -- D. Knuth

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I'll get back to this (on the phone) but this is incorrect:

sqrt really belongs under poly, as far as asymptotic behaviour is
concerned.

Fractional powers are sublinear. And sqrt times sqrt is linear which is
important.


Andrei

Dec 04 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Dec 04, 2015 at 08:03:17PM +0000, Andrei Alexandrescu via Digitalmars-d
wrote:
 I'll get back to this (on the phone) but this is incorrect:
 
 sqrt really belongs under poly, as far as asymptotic behaviour is
 concerned.

OK, I guess by poly you mean n^k for k>1.  I was lumping everything
under polynomial for k>0. The reason is because of the homogeneity for
all values of k>0 when it comes to the algebra of complexities.
Obviously, for performance-measuring purposes, k=1 is a significant
landmark.


 Fractional powers are sublinear.

Wrong, O(n^(4/3)) is a fractional power, but it's not sublinear.


 And sqrt times sqrt is linear which is important.

[...]

But it's just a special case of the general multiplicative->additive
rule. Everyone knows 1/2 + 1/2 = 1; it doesn't seem to warrant special
treatment above, say, 1/3 + 2/3 = 1, or any of the other endless
identities involving 1.


T

-- 
Meat: euphemism for dead animal. -- Flora

Dec 04 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 04-Dec-2015 19:06, Andrei Alexandrescu wrote:
 On 12/04/2015 01:05 AM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the operations
 with there complexity with UDAs. That way things that really care about
 the complexity can get it, and those who don't can ignore it. It has the
 benefit of being self documenting as well.

 Well look at what the cat dragged in:

 http://dpaste.dzfl.pl/711aecacc450

Was vaguely terrified reading this whole thread until hitting this gem. 
Seems like a creative use for UDA.

 That's quite promising. The code is very preliminary and uses strings
 for typical complexity values (e.g. "constant", "linear", and later
 "loglinear" etc). I need to see how to integrate this whole idea.

 Also an unpleasant problem is overloading - when present, user code
 needs to specify which overload they're referring to.

 Anyhow, this is really interesting. Thanks Tofu.


 Andrei


-- 
Dmitry Olshansky

Dec 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting this gem. Seems
 like a creative use for UDA.

Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 03:43 PM, Walter Bright wrote:
 On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting this
 gem. Seems
 like a creative use for UDA.

 Yeah, I think it puts UDA on the map!

 Instead of string arguments, though, I suggest enums.

"Even better".

http://dpaste.dzfl.pl/50340e189f92

That code is actually remarkably complete, with algebra and remarkable 
constants.

Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable 
language.


Andrei

Dec 04 2015

Minas Mina <minas_0 hotmail.co.uk> writes:

On Friday, 4 December 2015 at 22:48:03 UTC, Andrei Alexandrescu 
wrote:
 On 12/04/2015 03:43 PM, Walter Bright wrote:
 On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting 
 this
 gem. Seems
 like a creative use for UDA.

 Yeah, I think it puts UDA on the map!

 Instead of string arguments, though, I suggest enums.

 "Even better".

 http://dpaste.dzfl.pl/50340e189f92

 That code is actually remarkably complete, with algebra and 
 remarkable constants.

 Pretty crazy, yo. Thanks all for the great ideas. D is a 
 remarkable language.


 Andrei

Looks good :)

Dec 04 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 12/4/15 5:48 PM, Andrei Alexandrescu wrote:
 On 12/04/2015 03:43 PM, Walter Bright wrote:
 On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting this
 gem. Seems
 like a creative use for UDA.

 Yeah, I think it puts UDA on the map!

 Instead of string arguments, though, I suggest enums.

 "Even better".

 http://dpaste.dzfl.pl/50340e189f92

 That code is actually remarkably complete, with algebra and remarkable
 constants.

 Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
 language.

This looks very excellent, nice work!

-Steve

Dec 04 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Dec 04, 2015 at 05:48:03PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 12/04/2015 03:43 PM, Walter Bright wrote:
On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
Was vaguely terrified reading this whole thread until hitting this
gem. Seems like a creative use for UDA.

Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.

 
 "Even better".
 
 http://dpaste.dzfl.pl/50340e189f92
 
 That code is actually remarkably complete, with algebra and remarkable
 constants.
 
 Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
 language.

[...]

Nice, very nice!

But ... opMul? I thought that was deprecated? Shouldn't we be using
opBinary!"*" instead?


T

-- 
Food and laptops don't mix.

Dec 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 12/4/2015 2:48 PM, Andrei Alexandrescu wrote:
 On 12/04/2015 03:43 PM, Walter Bright wrote:
 On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting this
 gem. Seems
 like a creative use for UDA.

 Yeah, I think it puts UDA on the map!

 Instead of string arguments, though, I suggest enums.

 "Even better".

 http://dpaste.dzfl.pl/50340e189f92

 That code is actually remarkably complete, with algebra and remarkable
constants.

 Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language.


This is very similar to the si units program Oskar Linde wrote back in 2006.

Anyhow, this is very cool and should:

1. have an article written about it
2. get its own Phobos module

Dec 04 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 05-Dec-2015 01:48, Andrei Alexandrescu wrote:
 On 12/04/2015 03:43 PM, Walter Bright wrote:
 On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
 Was vaguely terrified reading this whole thread until hitting this
 gem. Seems
 like a creative use for UDA.

 Yeah, I think it puts UDA on the map!

 Instead of string arguments, though, I suggest enums.

 "Even better".

 http://dpaste.dzfl.pl/50340e189f92

Better still - no more strings:

http://dpaste.dzfl.pl/5ed2b3ad3043

 That code is actually remarkably complete, with algebra and remarkable
 constants.

 Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
 language.


 Andrei


-- 
Dmitry Olshansky

Dec 05 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 12/3/2015 10:05 PM, Tofu Ninja wrote:
 Also maybe a simpler idea would just be to annotate the the operations with
 there complexity with UDAs. That way things that really care about the
 complexity can get it, and those who don't can ignore it. It has the benefit of
 being self documenting as well.

Dang, you beat me to it!

Dec 04 2015

tn <no email.com> writes:

On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
wrote:
 On 12/3/15 10:29 PM, Jack Stouffer wrote:
 On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
 Alexandrescu wrote:
 On 12/03/2015 09:10 PM, Idan Arye wrote:
 The complexities of the operations is a property of the data 
 structure
 being used. If each collection type will have it's own set 
 of method
 names based on the complexity of operations on it, we won't 
 be able to
 have templated functions that operate on any kind of 
 collection(or at
 the very least, these functions will be really tedious to 
 code).

 Your premise is right but you reach the negation of the 
 correct
 conclusion. -- Andrei

 How so? If a singly linked list and a doubly linked list have 
 two
 different method names for the same operation, then they 
 cannot be
 easily templated.

 Took me a while to figure. There's a hierarchy of operations, 
 e.g. if a collection implements insert, it automatically 
 implements linearInsert. And so on. The collections framework 
 provides these defaults, so client code that needs quick insert 
 uses insert, whereas code that's fine with a linear upper bound 
 uses linearInsert and captures both.

"I just want to insert an element. I don't care how long it 
takes. Why do I need to specify that it should be linear?"

In my opinion, there should be "constantInsert", "linearInsert", 
etc. for those who care about the complexity, and "insert" should 
be always available.

Dec 04 2015

Jakob Ovrum <jakobovrum gmail.com> writes:

On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
 "I just want to insert an element. I don't care how long it 
 takes. Why do I need to specify that it should be linear?"

 In my opinion, there should be "constantInsert", 
 "linearInsert", etc. for those who care about the complexity, 
 and "insert" should be always available.

In the current std.container, linearInsert aliases to the 
sublinear insert function when it's provided. If you don't care 
about running time, just use linearInsert.

Dec 04 2015

tn <no email.com> writes:

On Friday, 4 December 2015 at 09:57:48 UTC, Jakob Ovrum wrote:
 On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
 "I just want to insert an element. I don't care how long it 
 takes. Why do I need to specify that it should be linear?"

 In my opinion, there should be "constantInsert", 
 "linearInsert", etc. for those who care about the complexity, 
 and "insert" should be always available.

 In the current std.container, linearInsert aliases to the 
 sublinear insert function when it's provided. If you don't care 
 about running time, just use linearInsert.

I understand this.

However, my point is that it seems backwards if
1) When I don't care about the complexity, then I need to specify 
one (e.g. linearInsert).
2) When I care and want a constant complexity, then I don't 
specify one (e.g. use insert instead of constantInsert).

In addition, if a new container is added that only provides O(n 
log n) insert, then my "don't care" code that uses linearInsert 
does not support the new container without changes.

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 09:50 AM, tn wrote:
 However, my point is that it seems backwards if
 1) When I don't care about the complexity, then I need to specify one
 (e.g. linearInsert).
 2) When I care and want a constant complexity, then I don't specify one
 (e.g. use insert instead of constantInsert).

When complexity information is not present, the name of the function may 
carry an implicit and documented assumption of complexity bounds. -- Andrei

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 04:51 AM, tn wrote:
 "I just want to insert an element. I don't care how long it takes. Why
 do I need to specify that it should be linear?"

Oh but you do care.

 In my opinion, there should be "constantInsert", "linearInsert", etc.
 for those who care about the complexity, and "insert" should be always
 available.

What would be the complexity assumption of "insert"?


Andrei

Dec 04 2015

tn <no email.com> writes:

On Friday, 4 December 2015 at 14:08:08 UTC, Andrei Alexandrescu 
wrote:
 On 12/04/2015 04:51 AM, tn wrote:
 "I just want to insert an element. I don't care how long it 
 takes. Why
 do I need to specify that it should be linear?"

 Oh but you do care.

I don't, if I have a small container of constant size (or bounded 
by a small constant). Say, less than 10 elements.

Or perhaps I just want to quickly make some prototype.

And if the other part of my algorithm is exponential or even 
superexponential, then it is going to dominate anyway. Or should 
there be also baseTwoExponentialInsert etc. for the case in which 
I don't care as long as the complexity is at most O(2^n)?

 In my opinion, there should be "constantInsert", 
 "linearInsert", etc.
 for those who care about the complexity, and "insert" should 
 be always
 available.

 What would be the complexity assumption of "insert"?

None. So in a sense that would be an alias of whateverInsert or 
idontcareInsert or infinityInsert.

Dec 04 2015

Idan Arye <GenericNPC gmail.com> writes:

On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
 I know names are something we're awfully good at discussing 
 :o). Destroy!


 Andrei

I find it ironic that this thread has moved to discuss how to 
name complexity/running-time...

Dec 03 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection (e.g. c is an
array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the primitives. Or perhaps
 it's okay to conflate a couple of them. I know names are something we're
awfully
 good at discussing :o). Destroy!

Are you sure there are only 3 complexities? What if it's a self-balancing tree?

I'm also not sure it is a good idea to put the complexity in the name of the 
primitive. Do any other algorithm libraries do that?

Dec 03 2015

Minas Mina <minas_0 hotmail.co.uk> writes:

On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:
 On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection 
 (e.g. c is an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the 
 primitives. Or perhaps
 it's okay to conflate a couple of them. I know names are 
 something we're awfully
 good at discussing :o). Destroy!

 Are you sure there are only 3 complexities? What if it's a 
 self-balancing tree?

 I'm also not sure it is a good idea to put the complexity in 
 the name of the primitive. Do any other algorithm libraries do 
 that?

I agree -- putting the complexity (eh, running time) to the 
primitive's name seems like a bad idea.

I believe that stability is a more important "feature" of a 
sorting algorithm than its running time, because you can make 
implications about it and use them for your own algorithms (I use 
it for implementing delta compression) and affects the output.

Running time is still a good guarantee to have, but if `sort` 
runs O(n^2) or O(nlogn) is not going to affect how your output 
will look like!

Dec 03 2015

ZombineDev <valid_email he.re> writes:

On Friday, 4 December 2015 at 07:50:19 UTC, Minas Mina wrote:
 On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:
 On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection 
 (e.g. c is an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the 
 primitives. Or perhaps
 it's okay to conflate a couple of them. I know names are 
 something we're awfully
 good at discussing :o). Destroy!

 Are you sure there are only 3 complexities? What if it's a 
 self-balancing tree?

 I'm also not sure it is a good idea to put the complexity in 
 the name of the primitive. Do any other algorithm libraries do 
 that?

 I agree -- putting the complexity (eh, running time) to the 
 primitive's name seems like a bad idea.

 I believe that stability is a more important "feature" of a 
 sorting algorithm than its running time, because you can make 
 implications about it and use them for your own algorithms (I 
 use it for implementing delta compression) and affects the 
 output.

 Running time is still a good guarantee to have, but if `sort` 
 runs O(n^2) or O(nlogn) is not going to affect how your output 
 will look like!

That's wrong. Imagine a sort method that calls to an API server 
to sort the numbers. Given a good internet connection, this 
implementation detail will not affect the correctness of the 
result. But I'm sure that no sane person would want to use this 
method. When the only job of a function is to implement an 
algorithm then the only thing you should care about is the time 
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.

Dec 04 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
 and space complexity (the correctness is given). Otherwise, 
 designing large systems becomes impossible, because all large 
 systems have hard performance requirements.

I am sorry to say this, but hard performance requirements require 
O(1) on everything.

Big-Oh tells you essentially very little if it is more than O(1). 
Quick sort and insertion sort are O(N^2), bucket sort is O(N). 
That does not make bucket sort faster than quick sort or even 
insertion sort on a specific case as it is dominated by the 
number of possible values.

Dec 04 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
 and space complexity (the correctness is given). Otherwise, 
 designing large systems becomes impossible, because all large 
 systems have hard performance requirements.

 I am sorry to say this, but hard performance requirements 
 require O(1) on everything.

And even then you cannot assume that it is real time as the 
following is O(1):

if (full_moon()) sleep(1000);

So you can have an O(1) algorithm that occasionally triggers an 
expensive constant-time/memory path that causes big problems in a 
running system. You need a hard upper bound measured in cycles.

Dec 04 2015

ZombineDev <valid_email he.re> writes:

On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
 and space complexity (the correctness is given). Otherwise, 
 designing large systems becomes impossible, because all large 
 systems have hard performance requirements.

 I am sorry to say this, but hard performance requirements 
 require O(1) on everything.

 Big-Oh tells you essentially very little if it is more than 
 O(1). Quick sort and insertion sort are O(N^2), bucket sort is 
 O(N). That does not make bucket sort faster than quick sort or 
 even insertion sort on a specific case as it is dominated by 
 the number of possible values.

Well, I agree, but I didn't say hard real-time, only performance 
requirements that are hard to achieve because the system is 
large, and becuase it would cost even more if the system was 
larger (slower).

Dec 04 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
 Well, I agree, but I didn't say hard real-time, only 
 performance requirements that are hard to achieve because the 
 system is large, and becuase it would cost even more if the 
 system was larger (slower).

Yes, if you are writing for large scalable systems then it is 
interesting to know what the ceiling is, but then you probably 
also want something that is no worse than O(log N). Say for a 
chat service you'd know that the more expensive to develop 
solution can scale to 10 million users (O(1)), and the less 
expensive do develop solution will become very expensive to run 
when you hit 1 million (O(log N)).

So Big-Oh can be useful when you work with very large sizes and 
large variations, but in most cases I think it can be misleading 
without a good description of the actual implementation.

Another very useful measure in real time is the amount of 
variation between repeated calls. Say if there is a guarantee of 
<50% variation, you can measure actual running time and add 50% 
headroom on the hardware requirements. Some algorithms have some 
fixed size tables they need to clear out every once in a while 
and that can be very bad.

Dec 04 2015

ZombineDev <valid_email he.re> writes:

On Friday, 4 December 2015 at 09:50:15 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
 Well, I agree, but I didn't say hard real-time, only 
 performance requirements that are hard to achieve because the 
 system is large, and becuase it would cost even more if the 
 system was larger (slower).

 Yes, if you are writing for large scalable systems then it is 
 interesting to know what the ceiling is, but then you probably 
 also want something that is no worse than O(log N). Say for a 
 chat service you'd know that the more expensive to develop 
 solution can scale to 10 million users (O(1)), and the less 
 expensive do develop solution will become very expensive to run 
 when you hit 1 million (O(log N)).

 So Big-Oh can be useful when you work with very large sizes and 
 large variations, but in most cases I think it can be 
 misleading without a good description of the actual 
 implementation.

 Another very useful measure in real time is the amount of 
 variation between repeated calls. Say if there is a guarantee 
 of <50% variation, you can measure actual running time and add 
 50% headroom on the hardware requirements. Some algorithms have 
 some fixed size tables they need to clear out every once in a 
 while and that can be very bad.

I strongly agree with what you said earlier that typical 
complexity analysis is a too naive model for real-world systems. 
Very often (e.g. due to the nature of real hardware) your time 
may be dominated by constants. Or a function which looks like 
O(1) is sometimes O(n), due to hidden complexity. This (the fact 
that constants matter) is especially true when you aim for O(1). 
And yes, variation of execution time is also very important.

However, here we're talking about API for containers and 
algorithms and I think that putting the complexity in the 
function signature is better than only using abstract data 
structures, because it makes you more aware of the consequences 
of your choice of data structures and algorithms that you make.
If we can figure out a way to put even more information it may be 
even better.

Dec 04 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 10:35:33 UTC, ZombineDev wrote:
 However, here we're talking about API for containers and 
 algorithms and I think that putting the complexity in the 
 function signature is better than only using abstract data 
 structures, because it makes you more aware of the consequences 
 of your choice of data structures and algorithms that you make.
 If we can figure out a way to put even more information it may 
 be even better.

Well, if the algorithm can choose between containers it might be 
interesting to query the characteristics of a set of operations 
on each container. Not sure if the method name is is the best way 
to do it. Although I guess having a generic find(x) and 
find_instantly(x) would be ok, if the latter verison basically 
means less than 200 cycles in all cases. If it means less than 
2000 cycles I think it is not so useful.

Maybe it is possible to make the inference as static analysis and 
in addition make it possible for libraries to express various 
guarantees. Since it is worst case it would be acceptable that if 
analysis fails it will result in "unbounded".

Dec 04 2015

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
 and space complexity (the correctness is given). Otherwise, 
 designing large systems becomes impossible, because all large 
 systems have hard performance requirements.

 I am sorry to say this, but hard performance requirements 
 require O(1) on everything.

Only when dealing with an arbitrary number of elements. O(n^2) 
could be fine if you know for a fact that you're always dealing 
with a very small number of elements. And some algorithms with 
worse complexity are actually faster than algorithms with lower 
complexity when dealing with a small number of elements - 
particularly since with a small number of elements, the 
coefficients can matter a lot more than n. So, really, to know 
which algorithm is appropriate, you need to know how it's 
actually going to be used in the program in question and how 
different algorithms perform there. O(1) is definitely better, 
but it's not necessarily required.

But yes, if you're dealing with an arbitrarily large number of 
elements, anything worse than O(1) isn't going to cut it if you 
have hard performance requirements.

 Big-Oh tells you essentially very little if it is more than 
 O(1). Quick sort and insertion sort are O(N^2), bucket sort is 
 O(N). That does not make bucket sort faster than quick sort or 
 even insertion sort on a specific case as it is dominated by 
 the number of possible values.

Yeah. Really, Big-Oh tells you something about the worst case 
with an arbitrary number of elements. What actually happens in 
the program depends heavily on the number of elements which are 
actually involved. With large numbers of elements, Big-Oh starts 
mattering, but if the number of elements isn't large, then 
coefficients and minor implementation details of an algorithm 
start mattering a lot more than its conceptual worst case.

Still, in general, it's better to be aware of algorithmic 
complexity and favor algorithms with lower complexity until you 
need to optimize the code based on your specific use case and 
determine that a higher complexity algorithm actually performs 
better in that specific use case.

- Jonathan M Davis

Dec 04 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 4 December 2015 at 15:33:56 UTC, Jonathan M Davis 
wrote:
 Only when dealing with an arbitrary number of elements. O(n^2) 
 could be fine if you know for a fact that you're always dealing 
 with a very small number of elements.

I think there was a misunderstanding about notation here. If we 
know that we use a small number of elements, then all operations 
are O(1) by definition, for any small constant.

 different algorithms perform there. O(1) is definitely better, 
 but it's not necessarily required.

Well, O(1) isn't better, it is just mandatory for anything real 
time. E.g: you fix your buffer to 1 seconds of audio and that is 
44100 samples, then filling it is O(44100) == O(1).

If am doing a scanline renderer for 2048 pixels, then an 
insertion sort is faster than merge sort, because most 
linesegments only move a few pixels per scanline. But here the 
problem is that there might be a 0.1% chance that most lines are 
almost horizontal in a variety of angles and then it breaks down, 
so that will be very limiting, but I still can figure out the 
hard maximum time required to complete, so it is still O(1) and 
tractable as a real time operation.

If I accept arbitrary input, then I no longer can meet real time 
requirements, I cannot get to O(1) and therefore cannot put a 
fixed limit on maximum time required to complete.

And that is really the only thing O(1) says: you can put a fixed 
limit on the time needed to complete the operation at compile 
time.

 actually involved. With large numbers of elements, Big-Oh 
 starts mattering, but if the number of elements isn't large, 
 then coefficients and minor implementation details of an 
 algorithm start mattering a lot more than its conceptual worst 
 case.

Yes, but most collections of entities are not very large by 
nature. Usually, when you try to solve a problem faster you break 
it down by useful segmentation and clusters (like age, kind, 
proximity).

Complexity is mostly for thinking about what options you have 
when designing an algorithm. The cookbook/library approach to 
complexity is rather limited since they were not written for the 
specifics of the problem.

 Still, in general, it's better to be aware of algorithmic 
 complexity and favor algorithms with lower complexity until you 
 need to optimize the code based on your specific use case and 
 determine that a higher complexity algorithm actually performs 
 better in that specific use case.

Be aware of it, yes. Although in real code it is often best to 
just start out with something flexible like a dynamic array, and 
optimize when you know which operations are required and what 
operations you never will need. That often changes over time.

I find that I am able to solve most of my real time problems with 
arrays. When I need hard requirements, I often end up using very 
simple datastructures like fixed sized circular buffers with a 
carefully calculated headroom (number of mostly unused elements) 
or linked lists of chunks or something really easy to reason 
about.

During initialisation of the program it often does not matter. So 
if I use linear search and it completes in 100ms then fine. I 
don't care if there is a delay of 100ms at startup.

For batch things done in Python, I just use whatever is easy, 
with current machines most things complete in less than a few 
seconds anyway.

Making libraries easy to use is by far the most important 
parameter, I think. What would be cool is if the compiler could 
adjust the implementation of collections based on profiling!

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/04/2015 02:50 AM, Minas Mina wrote:
 I agree -- putting the complexity (eh, running time) to the primitive's
 name seems like a bad idea.

 I believe that stability is a more important "feature" of a sorting
 algorithm than its running time, because you can make implications about
 it and use them for your own algorithms (I use it for implementing delta
 compression) and affects the output.

 Running time is still a good guarantee to have, but if `sort` runs
 O(n^2) or O(nlogn) is not going to affect how your output will look like!

Sort is almost always assumed to take n log n time. The problem is when 
you combine simpler operations, e.g. inserting elements at the front in 
a loop.

Andrei

Dec 04 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/03/2015 10:46 PM, Walter Bright wrote:
 On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection (e.g. c
 is an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the primitives. Or
 perhaps
 it's okay to conflate a couple of them. I know names are something
 we're awfully
 good at discussing :o). Destroy!

 Are you sure there are only 3 complexities? What if it's a
 self-balancing tree?

 I'm also not sure it is a good idea to put the complexity in the name of
 the primitive. Do any other algorithm libraries do that?

Looks exaggerated, innit? The fact of the matter is people choose 
collections based on the complexity of their operations all the time. "I 
need to insert and remove at the front, so I'll use a list here." Or: "I 
need random access, I'll use a vector" etc.

Not designing nomenclature for complexity leads to the usual design 
disasters such as providing a list with the same indexing operator as an 
array.

Turning that on its head, if we want to allow people to design 
collection-independent code and mix and match collections based only on 
their APIs, those APIs must reflect the complexity of operations.

Collection-independent code is big in my design btw.

One idea is to have the very name reflect the operation. I'm thinking 
"insert" to mean "scoot over elements at the tail and insert the new 
element in the gap" and "link" to mean "create a new node and link it 
within this collection without scooting over anything". Then the 
relative costs will be implied.

Another idea is to only classify operations as "quick" (expected O(1) up 
to O(log n)) and "not quick" (everything worse). Then we only have two 
categories, and "quick" will be in the name of the respective methods.


Andrei

Dec 04 2015

CraigDillabaugh <craig.dillabaugh gmail.com> writes:

On Friday, 4 December 2015 at 13:58:50 UTC, Andrei Alexandrescu 
wrote:
 On 12/03/2015 10:46 PM, Walter Bright wrote:
 On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
 Now this primitive may have three complexities:


 Looks exaggerated, innit? The fact of the matter is people 
 choose collections based on the complexity of their operations 
 all the time. "I need to insert and remove at the front, so 
 I'll use a list here." Or: "I need random access, I'll use a 
 vector" etc.


 Andrei

My personal preference would be not to have the complexity in the 
names, as I prefer shorter/concise names. Typically when I am 
writing code using containers of any sort I will check the 
documentation to determine what the cost of the operations I need 
is and base my choice off of that. I would think (hope) most 
others do this too.  However, I don't have a strong objection to 
the what is being proposed.

Would you have an insertLogarithmic ( insertInverseAckerman :o) 
too?

Dec 04 2015

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:
 My personal preference would be not to have the complexity in 
 the names, as I prefer shorter/concise names. Typically when I 
 am writing code using containers of any sort I will check the 
 documentation to determine what the cost of the operations I 
 need is and base my choice off of that. I would think (hope) 
 most others do this too.  However, I don't have a strong 
 objection to the what is being proposed.

std.container puts linear and/or stable in some of its names and 
then creates an alias to whichever one makes sense as the default 
where the alias doesn't have linear or stable in the name. e.g. 
linearRemove becomes remove via an alias.

But actually using something like stableLinearRemove in code (or 
stable.linear.remove) becomes hideous _very_ quickly. No one is 
going to like it, and it really only benefits generic code, which 
most container code really isn't (and given how different 
containers tend to be in the complexity of their operations, I 
question that it even makes sense to use containers generically 
in many cases if you actually care about efficiency). Usually the 
genericity is achieved via iterators or ranges, not via the code 
that actually operates on the container.

So, while I applaud Andrei's attempt to make it so that 
containers can be used generically where appropriate (and having 
a consistent API across containers is a big win even if they're 
never used generically), I do fear that attempts to put the 
complexity and stability of the functions into the API are going 
to make it so unwieldy that it's useless (or at least, very 
un-user-friendly). Obviously, we'll have to wait and see what he 
comes up with, but what almost everyone is going to want is 
remove and insert and the like, not stable.linear.insert or 
stableLinearInsert or any variant on that.

- Jonathan M Davis

Dec 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 12/4/2015 7:11 AM, Jonathan M Davis wrote:
 On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:
 My personal preference would be not to have the complexity in the names, as I
 prefer shorter/concise names. Typically when I am writing code using
 containers of any sort I will check the documentation to determine what the
 cost of the operations I need is and base my choice off of that. I would think
 (hope) most others do this too.  However, I don't have a strong objection to
 the what is being proposed.

 std.container puts linear and/or stable in some of its names and then creates
an
 alias to whichever one makes sense as the default where the alias doesn't have
 linear or stable in the name. e.g. linearRemove becomes remove via an alias.

Excessive use of aliases is a problem in and of itself - for example, trying to 
use a debugger with it, or examining the object code.


 But actually using something like stableLinearRemove in code (or
 stable.linear.remove) becomes hideous _very_ quickly. No one is going to like
 it,

I agree. Having such long names consisting of repeated use of the same words is 
a problem.

I suggested in the pseudo namespaces thread using template parameters to
express 
characteristics, as in:

     remove!(stable, linear)

with sensible defaults so most of the time the user would just use:

     remove

Generic code could use the parameters. Another alternative is to have
attributes 
for the algorithm:

      stable  linear remove() { ... }

and then generic code could test for those attributes.

Dec 04 2015

BLM768 <blm768 gmail.com> writes:

On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:
 I suggested in the pseudo namespaces thread using template 
 parameters to express characteristics, as in:

     remove!(stable, linear)

 with sensible defaults so most of the time the user would just 
 use:

     remove

The nice thing about this is that it can be easy to specify which 
complexities an operation supports.

---

void remove(complexity, T)(List!T list, size_t index) 
if(complexity >=
     Complexity.linear); //or however complexity objects work...

List l;
//...
l.remove(3);
l.remove!(Complexity.polynomial(2))(3);
l.remove!(Complexity.constant)(3);//fails; there's no template 
specialization for this case because complex < linear.

Dec 04 2015

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:
 On 12/4/2015 7:11 AM, Jonathan M Davis wrote:
 std.container puts linear and/or stable in some of its names 
 and then creates an
 alias to whichever one makes sense as the default where the 
 alias doesn't have
 linear or stable in the name. e.g. linearRemove becomes remove 
 via an alias.

 Excessive use of aliases is a problem in and of itself - for 
 example, trying to use a debugger with it, or examining the 
 object code.

Oh, I agree. But if we end up with stuff like stableLinearRemove 
all over the place, it's better to have the aliases than not. 
However, it's far better IMHO to avoid having all of those long 
names in the first place.

 I suggested in the pseudo namespaces thread using template 
 parameters to express characteristics, as in:

     remove!(stable, linear)

 with sensible defaults so most of the time the user would just 
 use:

     remove

 Generic code could use the parameters. Another alternative is 
 to have attributes for the algorithm:

      stable  linear remove() { ... }

 and then generic code could test for those attributes.

Either of those would be better than stableLinearRemove or 
stable.linear.remove. The UDAs would be more documentation 
friendly, though being able to pass around template arguments 
could be valuable for the cases where you're trying to enforce 
specific complexity requirements.

- Jonathan M Davis

Dec 04 2015

BLM768 <blm768 gmail.com> writes:

On Friday, 4 December 2015 at 22:17:21 UTC, Jonathan M Davis 
wrote:
 Either of those would be better than stableLinearRemove or 
 stable.linear.remove. The UDAs would be more documentation 
 friendly, though being able to pass around template arguments 
 could be valuable for the cases where you're trying to enforce 
 specific complexity requirements.

It should be possible to unify the template and UDA forms. 
Assuming a template supportsComplexity(func, complexity) that 
determines whether a function or method has the specified 
complexity UDA (or a "faster" one), it's not hard to implement 
something like this:

void removeWithComplexity(T, complexity)(T collection, size_t 
index) if(supportsComplexity!(T.remove, complexity) {
     collection.remove(idx);
}

List!int list;
//...
list.removeWithComplexity(Complexity.linear, 3);

---

Of course, this implementation has issues with collection 
operations which are defined as UFCS functions (as opposed to 
methods), but it should be possible to get around that issue.

Dec 04 2015

BLM768 <blm768 gmail.com> writes:

On Saturday, 5 December 2015 at 00:13:02 UTC, BLM768 wrote:
 list.removeWithComplexity(Complexity.linear, 3);

Uh, I mean list.removeWithComplexity!(Complexity.linear)(3).

Dec 04 2015

Jonny <Jonny nomail.com> writes:

On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
 Consider the collections universe. So we have an imperative 
 primitive like:

 c.insertAfter(r, x)

 where c is a collection, r is a range previously extracted from 
 c, and x is a value convertible to collection's element type. 
 The expression imperatively inserts x in the collection right 
 after r.

 Now this primitive may have three complexities:

 * linear in the length of r (e.g. c is a singly-linked list)

 * linear in the number of elements after r in the collection 
 (e.g. c is an array)

 * constant (c is a doubly-linked list)

 These complexities must be reflected in the name of the 
 primitives. Or perhaps it's okay to conflate a couple of them. 
 I know names are something we're awfully good at discussing 
 :o). Destroy!


 Andrei

Why not create the capability to get the complexity by the user:

writeln(c.insertAfter(r, null).complexity);

If all algorithms implemented such a feature, one could have a 
compile time argument to display the complexity of the algorithms 
along with profiling.

Complexity is not important except when profiling. It's also 
generally static depending on the types rather than the values of 
the types.

Dec 08 2015

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
 These complexities must be reflected in the name of the 
 primitives. Or perhaps it's okay to conflate a couple of them. 
 I know names are something we're awfully good at discussing 
 :o). Destroy!


 Andrei

I'm currently sitting in a software sector with one of the most 
demands on software stability and predictability there is. A key 
issue here is time-deterministic execution. I've seen no language 
that addresses this problem. What about adding yet another code 
qualifier, say  deterministic, that either

- forbids non-time-determinstic execution paths such as 
`if-statements` or perhaps better
- makes the compiler generate code that strives to be as 
deterministic as possible, for instance executing all brances of 
a conditional statement.

and of course checking this transitively over function calls. 
This would have to take into account direct or in-direct 
recursions which might become complicated.

I'm of course also aware of the inherent variations in microcode 
execution time for most CPU architectures of today. Despite this 
fact, safety-critical systems (in for instance aerospace) uses 
the architectures and are in need of tools that can do as good as 
currently possible at least at the source code level.

Destroy!

Mar 17 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Complexity nomenclature