digitalmars.D - Complexity nomenclature
- Andrei Alexandrescu (14/14) Dec 03 2015 Consider the collections universe. So we have an imperative primitive li...
- Jack Stouffer (6/8) Dec 03 2015 I don't see why. IMO, names should convey what the function does,
- Andrei Alexandrescu (3/9) Dec 03 2015 Complexity is part of the semantics, and in my design names and their
- Timon Gehr (5/15) Dec 03 2015 Which is sensible. But in any given context, some parts of the semantics...
- Andrei Alexandrescu (6/23) Dec 03 2015 Sure, and there will be provision for that. The collections framework
- Timon Gehr (6/30) Dec 03 2015 This only works if the module imports std.container. Also, I might be
- Andrei Alexandrescu (7/14) Dec 03 2015 Hmmmm... right, no provision for that right now. Need to think about it....
- Andrea Fontana (5/11) Dec 04 2015 I'm agree. It sounds like a bad idea.
- Andrei Alexandrescu (4/7) Dec 04 2015 Of course. The old name will still work because, remember, there's a
- Timon Gehr (7/21) Dec 03 2015 Regarding nomenclature, it would be awesome if we could call this
- Andrei Alexandrescu (2/4) Dec 03 2015 Instant understanding by everyone. -- Andrei
- Timon Gehr (2/6) Dec 03 2015 Well, no. "running time" is actually more descriptive.
- Andrei Alexandrescu (3/10) Dec 03 2015 http://en.cppreference.com/w/cpp/container/forward_list/insert_after
- Timon Gehr (2/13) Dec 03 2015 Yes, I know it is popular.
- Timon Gehr (2/13) Dec 03 2015 http://www.merriam-webster.com/dictionary/literally
- Xinok (10/17) Dec 03 2015 I've seen you mention this before, and while you may be correct,
- Timon Gehr (8/25) Dec 03 2015 I understand what is meant, but it is painful, much like when somebody
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) Dec 04 2015 It will make static arrays look very good. All operations are
- Idan Arye (8/25) Dec 03 2015 The complexities of the operations is a property of the data
- Andrei Alexandrescu (3/8) Dec 03 2015 Your premise is right but you reach the negation of the correct
- Jack Stouffer (5/18) Dec 03 2015 How so? If a singly linked list and a doubly linked list have two
- Andrei Alexandrescu (12/25) Dec 03 2015 Took me a while to figure. There's a hierarchy of operations, e.g. if a
- Tofu Ninja (10/46) Dec 03 2015 This sounds really complicated to use? What are the benefits?
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 04 2015 Yes, but can you export it reliably in generically composed
- Jakob Ovrum (9/16) Dec 04 2015 A generic algorithm that uses exponential time when given the
- Andrei Alexandrescu (2/17) Dec 04 2015 WORD! -- Andrei
- ZombineDev (18/68) Dec 04 2015 Ranges are a good example - they provide only the operations thay
- Andrei Alexandrescu (2/4) Dec 04 2015 Great idea, will keep it in mind. Thanks! -- Andrei
- Jonathan M Davis (16/21) Dec 04 2015 Yeah. If the primary reason to have stable and linear and whatnot
- Andrei Alexandrescu (2/4) Dec 04 2015 Only always :o). -- Andrei
- Andrei Alexandrescu (10/14) Dec 04 2015 Well look at what the cat dragged in:
- ref2401 (3/13) Dec 04 2015 Wow! It looks really great.
- Jack Stouffer (4/6) Dec 04 2015 This is a great compromise; I (and everyone else probably) would
- Andrei Alexandrescu (34/41) Dec 04 2015 Following up on this, I thought I'd define a little algebra for
- Andrei Alexandrescu (5/16) Dec 04 2015 The question mark at log * linlog is spurious; should be replaced with
- Timon Gehr (30/46) Dec 04 2015 I think that the exponents matter. E.g. n^1.5 vs n^2.
- Andrei Alexandrescu (4/65) Dec 04 2015 Nice, I'll continue with this. Thanks! I won't use expBase because no
- Timon Gehr (7/27) Dec 04 2015 This is probably sensible in the context of std.container.
- Andrei Alexandrescu (9/12) Dec 04 2015 I'm thinking we could add complexity annotations to range functions. For...
- Timon Gehr (6/19) Dec 04 2015 If only products should be expressible, this would maybe be reasonable
- Timon Gehr (2/4) Dec 04 2015 But I think not. O(n+m) is common.
- Andrei Alexandrescu (12/14) Dec 05 2015 That's actually pretty neat and easy to work around like this:
- BLM768 (17/28) Dec 05 2015 Well, we could see how CAS libraries handle this kind of stuff
- H. S. Teoh via Digitalmars-d (15/48) Dec 05 2015 Multi-term complexities arise from trivial graph algorithms. They are
- BLM768 (19/31) Dec 05 2015 True. I don't expect that very many people would want to specify
- Timon Gehr (2/4) Dec 05 2015 log(x^2) = 2 log x.
- BLM768 (2/3) Dec 05 2015 Why do log rules have to make everything so simple? ;)
- Andrei Alexandrescu (11/25) Dec 05 2015 Well you'd need multiple terms if you want to say things like,
- Timon Gehr (2/8) Dec 05 2015 Computer Algebra System, I assume.
- tsbockman (3/9) Dec 05 2015 https://en.wikipedia.org/wiki/Computer_algebra_system
- Timon Gehr (12/28) Dec 05 2015 Some upper bounds are incomparable, so there would not be a total order....
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Dec 05 2015 It is a problem in all cases as you usually dont have an optimal
- Timon Gehr (13/22) Dec 06 2015 Are you complaining that the given implementation does not support
- Timon Gehr (5/6) Dec 06 2015 Hm, this does not actually work too well. E.g., we want
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/21) Dec 06 2015 I am saying that comparing bounds is not the same as comparing
- Timon Gehr (3/8) Dec 06 2015 Yes, that's what you said later down the post. It's completely unrelated...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/10) Dec 06 2015 i would assume that there would have to be a usecase for
- Andrei Alexandrescu (9/49) Dec 06 2015 Brilliant! My wife has a work emergency so I've been with the kids all d...
- Timon Gehr (22/42) Dec 06 2015 The implementation of power on BigO given does not actually work in
- Andrei Alexandrescu (27/55) Dec 06 2015 No matter, you may always use runtime assertions - after all it's all
- Timon Gehr (14/46) Dec 07 2015 Even better: I was wrong when I claimed it does not work.
- Andrei Alexandrescu (14/20) Dec 07 2015 I fail to see how no parens after log or "^" in lieu "^^" would make a
- Timon Gehr (13/29) Dec 07 2015 Neither have I attempted to show anything like that. All arguments in
- Andrei Alexandrescu (9/21) Dec 07 2015 These are still expressible without a DSL:
- Timon Gehr (21/48) Dec 07 2015 This just goes from one string to two strings and adds some noise on top...
- ZombineDev (9/11) Dec 08 2015 We can remove the use of strings if we tag walkLength with BigO:
- Jonny (116/141) Dec 10 2015 There are many "operations" that can be carried out on methods.
- Timon Gehr (7/18) Dec 06 2015 I have noticed another thing. The comparison operator is an
- Andrei Alexandrescu (2/20) Dec 06 2015 I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei
- Timon Gehr (3/26) Dec 07 2015 Yes, certainly. By complete, I mean it orders everything that can be
- =?UTF-8?B?Tm9yZGzDtnc=?= (6/8) Mar 17 2016 What about a compile-time expression tree similar to how, for
- H. S. Teoh via Digitalmars-d (68/107) Dec 04 2015 [...]
- Andrei Alexandrescu (6/6) Dec 04 2015 I'll get back to this (on the phone) but this is incorrect:
- H. S. Teoh via Digitalmars-d (15/21) Dec 04 2015 OK, I guess by poly you mean n^k for k>1. I was lumping everything
- Dmitry Olshansky (5/19) Dec 04 2015 Was vaguely terrified reading this whole thread until hitting this gem.
- Walter Bright (3/5) Dec 04 2015 Yeah, I think it puts UDA on the map!
- Andrei Alexandrescu (8/14) Dec 04 2015 "Even better".
- Minas Mina (3/20) Dec 04 2015 Looks good :)
- Steven Schveighoffer (3/18) Dec 04 2015 This looks very excellent, nice work!
- H. S. Teoh via Digitalmars-d (8/26) Dec 04 2015 [...]
- Walter Bright (5/18) Dec 04 2015 This is very similar to the si units program Oskar Linde wrote back in 2...
- Dmitry Olshansky (5/21) Dec 05 2015 Better still - no more strings:
- Walter Bright (2/6) Dec 04 2015 Dang, you beat me to it!
- tn (7/37) Dec 04 2015 "I just want to insert an element. I don't care how long it
- Jakob Ovrum (4/9) Dec 04 2015 In the current std.container, linearInsert aliases to the
- tn (10/20) Dec 04 2015 I understand this.
- Andrei Alexandrescu (3/8) Dec 04 2015 When complexity information is not present, the name of the function may...
- Andrei Alexandrescu (4/9) Dec 04 2015 What would be the complexity assumption of "insert"?
- tn (11/22) Dec 04 2015 I don't, if I have a small container of constant size (or bounded
- Idan Arye (4/7) Dec 03 2015 I find it ironic that this thread has moved to discuss how to
- Walter Bright (4/11) Dec 03 2015 Are you sure there are only 3 complexities? What if it's a self-balancin...
- Minas Mina (10/30) Dec 03 2015 I agree -- putting the complexity (eh, running time) to the
- ZombineDev (10/43) Dec 04 2015 That's wrong. Imagine a sort method that calls to an API server
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/11) Dec 04 2015 I am sorry to say this, but hard performance requirements require
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/14) Dec 04 2015 And even then you cannot assume that it is real time as the
- ZombineDev (6/17) Dec 04 2015 Well, I agree, but I didn't say hard real-time, only performance
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/21) Dec 04 2015 Yes, if you are writing for large scalable systems then it is
- ZombineDev (16/38) Dec 04 2015 I strongly agree with what you said earlier that typical
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/19) Dec 04 2015 Well, if the algorithm can choose between containers it might be
- Jonathan M Davis (29/40) Dec 04 2015 Only when dealing with an arbitrary number of elements. O(n^2)
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (49/64) Dec 04 2015 I think there was a misunderstanding about notation here. If we
- Andrei Alexandrescu (5/13) Dec 04 2015 Sort is almost always assumed to take n log n time. The problem is when
- Andrei Alexandrescu (21/40) Dec 04 2015 Looks exaggerated, innit? The fact of the matter is people choose
- CraigDillabaugh (11/21) Dec 04 2015 My personal preference would be not to have the complexity in the
- Jonathan M Davis (25/32) Dec 04 2015 std.container puts linear and/or stable in some of its names and
- Walter Bright (14/27) Dec 04 2015 Excessive use of aliases is a problem in and of itself - for example, tr...
- BLM768 (13/19) Dec 04 2015 The nice thing about this is that it can be easy to specify which
- Jonathan M Davis (11/31) Dec 04 2015 Oh, I agree. But if we end up with stuff like stableLinearRemove
- Jonny (10/27) Dec 08 2015 Why not create the capability to get the complexity by the user:
- =?UTF-8?B?Tm9yZGzDtnc=?= (21/26) Mar 17 2016 I'm currently sitting in a software sector with one of the most
Consider the collections universe. So we have an imperative primitive like: c.insertAfter(r, x) where c is a collection, r is a range previously extracted from c, and x is a value convertible to collection's element type. The expression imperatively inserts x in the collection right after r. Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy! Andrei
Dec 03 2015
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:These complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it. Complexity is usually put in the function documentation in Phobos when it's not constant, especially for range based ones, I don't see a reason to change that.
Dec 03 2015
On 12/03/2015 08:37 PM, Jack Stouffer wrote:On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Complexity is part of the semantics, and in my design names and their semantics go hand in hand. -- AndreiThese complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it. Complexity is usually put in the function documentation in Phobos when it's not constant, especially for range based ones, I don't see a reason to change that.
Dec 03 2015
On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:On 12/03/2015 08:37 PM, Jack Stouffer wrote:Which is sensible. But in any given context, some parts of the semantics are usually abstracted away. Sometimes one wants to abstract over running times of called methods. ("This function calls this other function at most O(n) times.")On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Complexity is part of the semantics, and in my design names and their semantics go hand in hand. -- AndreiThese complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it. Complexity is usually put in the function documentation in Phobos when it's not constant, especially for range based ones, I don't see a reason to change that.
Dec 03 2015
On 12/03/2015 09:24 PM, Timon Gehr wrote:On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:Sure, and there will be provision for that. The collections framework will contain logic such as: "If collection type C implements insertAfter, then implement UFCS function linearInsertAfter that forwards to it". So if as user you're fine with linearInsertAfter, it'll work as an upper bound. -- AndreiOn 12/03/2015 08:37 PM, Jack Stouffer wrote:Which is sensible. But in any given context, some parts of the semantics are usually abstracted away. Sometimes one wants to abstract over running times of called methods. ("This function calls this other function at most O(n) times.")On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Complexity is part of the semantics, and in my design names and their semantics go hand in hand. -- AndreiThese complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it. Complexity is usually put in the function documentation in Phobos when it's not constant, especially for range based ones, I don't see a reason to change that.
Dec 03 2015
On 12/04/2015 03:35 AM, Andrei Alexandrescu wrote:On 12/03/2015 09:24 PM, Timon Gehr wrote:This only works if the module imports std.container. Also, I might be okay with arbitrary running times, even super-linear. There's also the case where I want to implement a wrapper around a generic container. I will then need to provide static introspection in order to expose the methods with the correct names.On 12/04/2015 03:18 AM, Andrei Alexandrescu wrote:Sure, and there will be provision for that. The collections framework will contain logic such as: "If collection type C implements insertAfter, then implement UFCS function linearInsertAfter that forwards to it". So if as user you're fine with linearInsertAfter, it'll work as an upper bound. -- AndreiOn 12/03/2015 08:37 PM, Jack Stouffer wrote:Which is sensible. But in any given context, some parts of the semantics are usually abstracted away. Sometimes one wants to abstract over running times of called methods. ("This function calls this other function at most O(n) times.")On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Complexity is part of the semantics, and in my design names and their semantics go hand in hand. -- AndreiThese complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it. Complexity is usually put in the function documentation in Phobos when it's not constant, especially for range based ones, I don't see a reason to change that.
Dec 03 2015
On 12/03/2015 09:52 PM, Timon Gehr wrote:This only works if the module imports std.container.Yeah, that's the way it works.Also, I might be okay with arbitrary running times, even super-linear.Hmmmm... right, no provision for that right now. Need to think about it. Maybe put them all in a "superlinear" bin. Names are hard.There's also the case where I want to implement a wrapper around a generic container. I will then need to provide static introspection in order to expose the methods with the correct names.Yep, again that's the way it's done. Andrei
Dec 03 2015
On Friday, 4 December 2015 at 01:37:33 UTC, Jack Stouffer wrote:On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:I'm agree. It sounds like a bad idea. And who knows, maybe someone will discover a more efficient way to implement "insertAfter" or the collection itself... We should change its name?These complexities must be reflected in the name of the primitives.I don't see why. IMO, names should convey what the function does, not how it does it.
Dec 04 2015
On 12/04/2015 04:27 AM, Andrea Fontana wrote:And who knows, maybe someone will discover a more efficient way to implement "insertAfter" or the collection itself... We should change its name?Of course. The old name will still work because, remember, there's a hierarchy of operations; those that promise less automatically forward to those that promise more. -- Andrei
Dec 04 2015
On 12/04/2015 02:27 AM, Andrei Alexandrescu wrote:Consider the collections universe. So we have an imperative primitive like: c.insertAfter(r, x) where c is a collection, r is a range previously extracted from c, and x is a value convertible to collection's element type. The expression imperatively inserts x in the collection right after r. Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy! AndreiRegarding nomenclature, it would be awesome if we could call this "running time" instead of "complexity". Algorithms have running times and memory usages. Problems have (time and space) complexities. I know that calling running time 'complexity' is awfully popular outside of actual complexity theory papers. I don't really understand what calling it 'complexity' actually buys though.
Dec 03 2015
On 12/03/2015 08:55 PM, Timon Gehr wrote:I don't really understand what calling it 'complexity' actually buys though.Instant understanding by everyone. -- Andrei
Dec 03 2015
On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:On 12/03/2015 08:55 PM, Timon Gehr wrote:Well, no. "running time" is actually more descriptive.I don't really understand what calling it 'complexity' actually buys though.Instant understanding by everyone. -- Andrei
Dec 03 2015
On 12/03/2015 09:27 PM, Timon Gehr wrote:On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:http://en.cppreference.com/w/cpp/container/forward_list/insert_after AndreiOn 12/03/2015 08:55 PM, Timon Gehr wrote:Well, no. "running time" is actually more descriptive.I don't really understand what calling it 'complexity' actually buys though.Instant understanding by everyone. -- Andrei
Dec 03 2015
On 12/04/2015 03:36 AM, Andrei Alexandrescu wrote:On 12/03/2015 09:27 PM, Timon Gehr wrote:Yes, I know it is popular.On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:http://en.cppreference.com/w/cpp/container/forward_list/insert_after AndreiOn 12/03/2015 08:55 PM, Timon Gehr wrote:Well, no. "running time" is actually more descriptive.I don't really understand what calling it 'complexity' actually buys though.Instant understanding by everyone. -- Andrei
Dec 03 2015
On 12/04/2015 03:36 AM, Andrei Alexandrescu wrote:On 12/03/2015 09:27 PM, Timon Gehr wrote:http://www.merriam-webster.com/dictionary/literallyOn 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:http://en.cppreference.com/w/cpp/container/forward_list/insert_after AndreiOn 12/03/2015 08:55 PM, Timon Gehr wrote:Well, no. "running time" is actually more descriptive.I don't really understand what calling it 'complexity' actually buys though.Instant understanding by everyone. -- Andrei
Dec 03 2015
On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:Regarding nomenclature, it would be awesome if we could call this "running time" instead of "complexity". Algorithms have running times and memory usages. Problems have (time and space) complexities. I know that calling running time 'complexity' is awfully popular outside of actual complexity theory papers. I don't really understand what calling it 'complexity' actually buys though.I've seen you mention this before, and while you may be correct, the term "complexity" is so widely applied to algorithms that it's not worth going against the norm. For the vast majority of computer scientists, when they hear the term "time complexity", they immediately understand it as running time. In fact, what I've seen most often is that algorithms have complexities and problems fall into a *complexity class*. For example, just take a look at the Wikipedia page on time complexity: https://en.wikipedia.org/wiki/Time_complexity
Dec 03 2015
On 12/04/2015 03:34 AM, Xinok wrote:On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:I understand what is meant, but it is painful, much like when somebody uses "literally" but actually means "figuratively". :-)Regarding nomenclature, it would be awesome if we could call this "running time" instead of "complexity". Algorithms have running times and memory usages. Problems have (time and space) complexities. I know that calling running time 'complexity' is awfully popular outside of actual complexity theory papers. I don't really understand what calling it 'complexity' actually buys though.I've seen you mention this before, and while you may be correct, the term "complexity" is so widely applied to algorithms that it's not worth going against the norm. For the vast majority of computer scientists, when they hear the term "time complexity", they immediately understand it as running time.In fact, what I've seen most often is that algorithms have complexities and problems fall into a *complexity class*. For example, just take a look at the Wikipedia page on time complexity: https://en.wikipedia.org/wiki/Time_complexityIt's a schizophrenic article. It mostly uses the standard terminology starting from this section: https://en.wikipedia.org/wiki/Time_complexity#Constant_time I guess the article has multiple authors, only some of which are complexity theorists.
Dec 03 2015
On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:awfully popular outside of actual complexity theory papers. I don't really understand what calling it 'complexity' actually buys though.It will make static arrays look very good. All operations are O(1).
Dec 04 2015
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Consider the collections universe. So we have an imperative primitive like: c.insertAfter(r, x) where c is a collection, r is a range previously extracted from c, and x is a value convertible to collection's element type. The expression imperatively inserts x in the collection right after r. Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy! AndreiThe complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).
Dec 03 2015
On 12/03/2015 09:10 PM, Idan Arye wrote:The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 03 2015
On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:On 12/03/2015 09:10 PM, Idan Arye wrote:How so? If a singly linked list and a doubly linked list have two different method names for the same operation, then they cannot be easily templated.The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 03 2015
On 12/3/15 10:29 PM, Jack Stouffer wrote:On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both. Another way to look at it: in STL container-independent code is near impossible because different containers use the same signature for operations that are different (either wrt iterator invalidation or complexity). My design avoids that by giving distinct operations distinct names. AndreiOn 12/03/2015 09:10 PM, Idan Arye wrote:How so? If a singly linked list and a doubly linked list have two different method names for the same operation, then they cannot be easily templated.The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 03 2015
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:On 12/3/15 10:29 PM, Jack Stouffer wrote:This sounds really complicated to use? What are the benefits? When would a generic algorithm even need to know the complexity of the container? Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both. Another way to look at it: in STL container-independent code is near impossible because different containers use the same signature for operations that are different (either wrt iterator invalidation or complexity). My design avoids that by giving distinct operations distinct names. AndreiOn 12/03/2015 09:10 PM, Idan Arye wrote:How so? If a singly linked list and a doubly linked list have two different method names for the same operation, then they cannot be easily templated.The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 03 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.Yes, but can you export it reliably in generically composed functions? Say you have a loop of N iterations calling a function that involves M operations, you want to export N*M, but then the caller needs to know what N and M are referring to, so you need some standard way of getting there. (e.g. N could be number of edges and M could be number of nodes or something).
Dec 04 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:When would a generic algorithm even need to know the complexity of the container?A generic algorithm that uses exponential time when given the "wrong" inputs is a hidden danger. This can easily happen when composing with linear algorithms. This naming scheme, also employed by std.container, prevents this.Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.If you don't care about the complexity, why are you using collections? Just use T[] and T[U]. Surely choosing a data structure is all about complexity (barring some details like optimizing for cache locality or small data structures).
Dec 04 2015
On 12/04/2015 03:41 AM, Jakob Ovrum wrote:On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:WORD! -- AndreiWhen would a generic algorithm even need to know the complexity of the container?A generic algorithm that uses exponential time when given the "wrong" inputs is a hidden danger. This can easily happen when composing with linear algorithms. This naming scheme, also employed by std.container, prevents this.Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.If you don't care about the complexity, why are you using collections? Just use T[] and T[U]. Surely choosing a data structure is all about complexity (barring some details like optimizing for cache locality or small data structures).
Dec 04 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:Ranges are a good example - they provide only the operations thay can efficiently implement. For example forward ranges could provide random access but at the cost of linear running time. std.containers provides the `elem in container` operation only if they can implement it in logarithmic time or faster. The fact that you can't use opIn for DList or Array is very good, because it prevents you from writing inefficient genric code. You're algorithm will manifest that it can only work with containers provide efficient operations and because of that you can be sure what its time complexity would be. You should choose a specific data structure only because you can efficiently implement the algorithm you have in mind with it. One of worst examples of the opposite is .Count extention method in .NET (which happens to have the same name as .Count member function of ICollection - one of the most used (often implicitly) interfaces), which has linear running time. The most horrible thing I have seen!On 12/3/15 10:29 PM, Jack Stouffer wrote:This sounds really complicated to use? What are the benefits? When would a generic algorithm even need to know the complexity of the container? Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both. Another way to look at it: in STL container-independent code is near impossible because different containers use the same signature for operations that are different (either wrt iterator invalidation or complexity). My design avoids that by giving distinct operations distinct names. AndreiOn 12/03/2015 09:10 PM, Idan Arye wrote:How so? If a singly linked list and a doubly linked list have two different method names for the same operation, then they cannot be easily templated.The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 04 2015
On 12/04/2015 01:05 AM, Tofu Ninja wrote:Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs.Great idea, will keep it in mind. Thanks! -- Andrei
Dec 04 2015
On Friday, 4 December 2015 at 13:59:41 UTC, Andrei Alexandrescu wrote:On 12/04/2015 01:05 AM, Tofu Ninja wrote:Yeah. If the primary reason to have stable and linear and whatnot in the names is so that introspection can be used on them, then UDAs will work fine for that. Where it's stickier is if you're just calling them in generic code, since then you have no protection against the complexity changing when the container being used changes. But pretty much no one is going to be wanting to use stuff like stableLinearInsert or stable.linear.insert in their code instead of insert. So, while having the complexity be part of the API has some serious advantages, it's not user-friendly for normal use, whereas the UDAs put you somewhere in between not having the complexity in the API at all and forcing everyone to type out the complexity every time they use a function. - Jonathan M DavisAlso maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs.Great idea, will keep it in mind. Thanks! -- Andrei
Dec 04 2015
On 12/04/2015 01:05 AM, Tofu Ninja wrote:When would a generic algorithm even need to know the complexity of the container?Only always :o). -- Andrei
Dec 04 2015
On 12/04/2015 01:05 AM, Tofu Ninja wrote:Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.Well look at what the cat dragged in: http://dpaste.dzfl.pl/711aecacc450 That's quite promising. The code is very preliminary and uses strings for typical complexity values (e.g. "constant", "linear", and later "loglinear" etc). I need to see how to integrate this whole idea. Also an unpleasant problem is overloading - when present, user code needs to specify which overload they're referring to. Anyhow, this is really interesting. Thanks Tofu. Andrei
Dec 04 2015
On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu wrote:Well look at what the cat dragged in: http://dpaste.dzfl.pl/711aecacc450 That's quite promising. The code is very preliminary and uses strings for typical complexity values (e.g. "constant", "linear", and later "loglinear" etc). I need to see how to integrate this whole idea. Also an unpleasant problem is overloading - when present, user code needs to specify which overload they're referring to. Anyhow, this is really interesting. Thanks Tofu. AndreiWow! It looks really great.
Dec 04 2015
On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu wrote:Well look at what the cat dragged in: http://dpaste.dzfl.pl/711aecacc450This is a great compromise; I (and everyone else probably) would love to see a prototype of this when it's ready.
Dec 04 2015
On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:On 12/04/2015 01:05 AM, Tofu Ninja wrote:Following up on this, I thought I'd define a little algebra for complexities. It will be closed (no arbitrary expressions) and will support only the usually encountered complexities. The idea is to allow functions to compute their own complexity in terms of complexities of their respective primitives. Here's the algebra. Terms: 1 = O(1) log = O(log n) plog = O((log n)^k) sqrt = O(sqrt n) lin = O(n) linlog = O(n * log n) linplog = O(n * (log n)^k) poly = O(n^k) exp = O(2^n) Ordering: Terms above are in increasing order. Summation: x + y = max(x, y) Product: | 1 log plog sqrt lin linlog poly exp -------+------------------------------------------------------------ 1 | 1 log plog sqrt lin linlog poly exp log | log plog plog ? linlog ? poly exp plog | plog plog plog ? linplog linplog poly exp sqrt | sqrt ? ? lin poly poly poly exp lin | lin linlog linplog poly poly poly poly exp linlog | linlog linplog linplog poly poly poly poly exp poly | poly poly poly poly poly poly poly exp exp | exp exp exp exp exp exp exp exp I've left a few question marks for the tricky cases. Ideas? AndreiAlso maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.Well look at what the cat dragged in: http://dpaste.dzfl.pl/711aecacc450
Dec 04 2015
On 12/04/2015 01:37 PM, Andrei Alexandrescu wrote:| 1 log plog sqrt lin linlog poly exp -------+------------------------------------------------------------ 1 | 1 log plog sqrt lin linlog poly exp log | log plog plog ? linlog ? poly exp plog | plog plog plog ? linplog linplog poly exp sqrt | sqrt ? ? lin poly poly poly exp lin | lin linlog linplog poly poly poly poly exp linlog | linlog linplog linplog poly poly poly poly exp poly | poly poly poly poly poly poly poly exp exp | exp exp exp exp exp exp exp exp I've left a few question marks for the tricky cases. Ideas?The question mark at log * linlog is spurious; should be replaced with linplog. So the tricky cases are sqrt * log and sqrt * plog. Andrei
Dec 04 2015
On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:Following up on this, I thought I'd define a little algebra .... It will be closed (no arbitrary expressions) ...I think that the exponents matter. E.g. n^1.5 vs n^2. The algebra can be made more expressive while simplifying its implementation.The idea is to allow functions to compute their own complexity in terms of complexities of their respective primitives. Here's the algebra. Terms: 1 = O(1) log = O(log n) plog = O((log n)^k) sqrt = O(sqrt n) lin = O(n) linlog = O(n * log n) linplog = O(n * (log n)^k) poly = O(n^k) exp = O(2^n)Example alternative: struct BigO{ Fraction logExp; Fraction polyExp; Fraction expBase; bool opCmp(BigO rhs){ return tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp)); } BigO opBinary(string op:"*")(BigO rhs){ return BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase); } // ... } Your cases then become: O(1): expBase=1, polyExp=0, logExp=0; O(log n): expBase=1, polyExp=0, logExp=1; O((log n)^k): expBase=1, polyExp=0, logExp=k; O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0; O(n): expBase=1, polyExp=1, logExp=0; O(n * log n): expBase=1, polyExp=1, logExp=1; O(n * (log n)^k): expBase=1, polyExp=1, logExp=k; O(n^k): expBase=1, polyExp=k, logExp=0; O(2^n): expBase=2, polyExp=0, logExp=0; Combinations just work, especially n^(1/2) * log n.
Dec 04 2015
Timon Gehr <timon.gehr gmx.ch> wrote:On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:Nice, I'll continue with this. Thanks! I won't use expBase because no operation leads to it; instead I'll just lump everything superpoly into one "fast growing" category.Following up on this, I thought I'd define a little algebra .... It will be closed (no arbitrary expressions) ...I think that the exponents matter. E.g. n^1.5 vs n^2. The algebra can be made more expressive while simplifying its implementation.The idea is to allow functions to compute their own complexity in terms of complexities of their respective primitives. Here's the algebra. Terms: 1 = O(1) log = O(log n) plog = O((log n)^k) sqrt = O(sqrt n) lin = O(n) linlog = O(n * log n) linplog = O(n * (log n)^k) poly = O(n^k) exp = O(2^n)Example alternative: struct BigO{ Fraction logExp; Fraction polyExp; Fraction expBase; bool opCmp(BigO rhs){ return tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp)); } BigO opBinary(string op:"*")(BigO rhs){ return BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase); } // ... } Your cases then become: O(1): expBase=1, polyExp=0, logExp=0; O(log n): expBase=1, polyExp=0, logExp=1; O((log n)^k): expBase=1, polyExp=0, logExp=k; O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0; O(n): expBase=1, polyExp=1, logExp=0; O(n * log n): expBase=1, polyExp=1, logExp=1; O(n * (log n)^k): expBase=1, polyExp=1, logExp=k; O(n^k): expBase=1, polyExp=k, logExp=0; O(2^n): expBase=2, polyExp=0, logExp=0; Combinations just work, especially n^(1/2) * log n.
Dec 04 2015
On 12/04/2015 09:03 PM, Andrei Alexandrescu wrote:This is probably sensible in the context of std.container. Note that some containers only give amortized guarantees. E.g. in C++, calling push_back n times on a vector that starts out empty is O(n), but the worst case for a single push_back is Ω(n). Another question is how widely applicable BigO should become beyond std.container. E.g. some runtime bounds have multiple parameters.Nice, I'll continue with this. Thanks! I won't use expBase because no operation leads to it; instead I'll just lump everything superpoly into one "fast growing" category.Your cases then become: O(1): expBase=1, polyExp=0, logExp=0; O(log n): expBase=1, polyExp=0, logExp=1; O((log n)^k): expBase=1, polyExp=0, logExp=k; O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0; O(n): expBase=1, polyExp=1, logExp=0; O(n * log n): expBase=1, polyExp=1, logExp=1; O(n * (log n)^k): expBase=1, polyExp=1, logExp=k; O(n^k): expBase=1, polyExp=k, logExp=0; O(2^n): expBase=2, polyExp=0, logExp=0; Combinations just work, especially n^(1/2) * log n.
Dec 04 2015
On 12/04/2015 04:40 PM, Timon Gehr wrote:Another question is how widely applicable BigO should become beyond std.container.I'm thinking we could add complexity annotations to range functions. For example the complexity of map is linear etc.E.g. some runtime bounds have multiple parameters.Yah, I was wondering how necessary that'd be. One problem is how to align parameters of different algorithms, for example say one is n * m and the other is m log n. What if they swap the order of m and n? I guess some reasonable convention should take care of this. Otherwise, we'll need to use a hashtable mapping names to (exp, logExp) tuples. Andrei
Dec 04 2015
On 12/04/2015 11:57 PM, Andrei Alexandrescu wrote:On 12/04/2015 04:40 PM, Timon Gehr wrote:It depends on the mapped function.Another question is how widely applicable BigO should become beyond std.container.I'm thinking we could add complexity annotations to range functions. For example the complexity of map is linear etc. ...If only products should be expressible, this would maybe be reasonable as well: void foo( (BigO.linear) int n, (BigO.linear) int m); But UDAs for parameters are not supported.E.g. some runtime bounds have multiple parameters.Yah, I was wondering how necessary that'd be. One problem is how to align parameters of different algorithms, for example say one is n * m and the other is m log n. What if they swap the order of m and n? I guess some reasonable convention should take care of this. Otherwise, we'll need to use a hashtable mapping names to (exp, logExp) tuples. Andrei
Dec 04 2015
On 12/05/2015 04:24 AM, Timon Gehr wrote:But I think not. O(n+m) is common.If only products should be expressible,
Dec 04 2015
On 12/04/2015 10:24 PM, Timon Gehr wrote:void foo( (BigO.linear) int n, (BigO.linear) int m); But UDAs for parameters are not supported.That's actually pretty neat and easy to work around like this: void foo(int n, int m) (BigOParam!2(BigO.linear, BigO.linear); In fact I went through the implementation but soon I hit a wall: what's the _relationship_ between the two growths? It may be the sum O(m + n) but also the product O(m * n). So the operator must be encoded as well. Then what do we do with more complex relationships like O((m + n) log n) etc. Then once you get to some reasonable formula, what's the ordering on top of these complexities? Probably difficult. I gave up on this for the time being. Ideas welcome. Andrei
Dec 05 2015
On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu wrote:On 12/04/2015 10:24 PM, Timon Gehr wrote: In fact I went through the implementation but soon I hit a wall: what's the _relationship_ between the two growths? It may be the sum O(m + n) but also the product O(m * n). So the operator must be encoded as well. Then what do we do with more complex relationships like O((m + n) log n) etc. Then once you get to some reasonable formula, what's the ordering on top of these complexities? Probably difficult. I gave up on this for the time being. Ideas welcome. AndreiWell, we could see how CAS libraries handle this kind of stuff (if there _is_ one which handles it)... Really, though, with the more complex algorithms, you start running into the kinds of issues noted in the first reply to this article: http://www.perlmonks.org/?node_id=573138 Besides, is anyone actually going to specify that they need an algorithm that is O(log(n) + m^2) or whatever? Or would a function just take _two_ algorithms, assert that each is O(log(n)) and O(m^2), respectively, and then compose them itself? The complexities depend on the types of container anyway, so in general, you should get only multi-term big-O notation when you're using multiple containers (or Cartesian products of a container with itself or something like that). Can't we just make sure one container's insert() and the other container's sort() work within a certain complexity?
Dec 05 2015
On Sat, Dec 05, 2015 at 09:52:08PM +0000, BLM768 via Digitalmars-d wrote:On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu wrote:Multi-term complexities arise from trivial graph algorithms. They are not limited to the use of multiple containers. More precisely, the multiple terms arise because of the structure of the graph (being composed of nodes and edges); what the algorithms add are functions on nodes and edges. Unfortunately, once you have more than a single variable in your functions, the big-O expressions rapidly become inextricably complex, and can no longer map to nice abstractions like the reals + infinities + infinitesimals linear ordering. For example, two graph algorithms may be, respectively, O(V^2 + E) and O(V + E^2); it's difficult to judge based on that which one is "superior". T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to KillOn 12/04/2015 10:24 PM, Timon Gehr wrote: In fact I went through the implementation but soon I hit a wall: what's the _relationship_ between the two growths? It may be the sum O(m + n) but also the product O(m * n). So the operator must be encoded as well. Then what do we do with more complex relationships like O((m + n) log n) etc. Then once you get to some reasonable formula, what's the ordering on top of these complexities? Probably difficult. I gave up on this for the time being. Ideas welcome. AndreiWell, we could see how CAS libraries handle this kind of stuff (if there _is_ one which handles it)... Really, though, with the more complex algorithms, you start running into the kinds of issues noted in the first reply to this article: http://www.perlmonks.org/?node_id=573138 Besides, is anyone actually going to specify that they need an algorithm that is O(log(n) + m^2) or whatever? Or would a function just take _two_ algorithms, assert that each is O(log(n)) and O(m^2), respectively, and then compose them itself? The complexities depend on the types of container anyway, so in general, you should get only multi-term big-O notation when you're using multiple containers (or Cartesian products of a container with itself or something like that). Can't we just make sure one container's insert() and the other container's sort() work within a certain complexity?
Dec 05 2015
On Saturday, 5 December 2015 at 22:56:23 UTC, H. S. Teoh wrote:Multi-term complexities arise from trivial graph algorithms. They are not limited to the use of multiple containers. More precisely, the multiple terms arise because of the structure of the graph (being composed of nodes and edges); what the algorithms add are functions on nodes and edges.True. I don't expect that very many people would want to specify an algorithmic complexity for those operations, though. It seems much more rigidly defined than for lists, arrays, etc. The question is not really about whether "complex complexities" will exist but whether the user has a practical reason to care about specifying them.Unfortunately, once you have more than a single variable in your functions, the big-O expressions rapidly become inextricably complex, and can no longer map to nice abstractions like the reals + infinities + infinitesimals linear ordering. For example, two graph algorithms may be, respectively, O(V^2 + E) and O(V + E^2); it's difficult to judge based on that which one is "superior".Well, if one variable is "capped" (or at least expected to be "small") it's easy enough to eliminate that term. Beyond that, there isn't necessarily a single ordering of big-O expressions, but many of them can be ordered relative to a single variable. For instance, O(n + m^2) is less complex than O(n^2 + m) with respect to n (and vice versa for m). It's trickier when expressions are more deeply nested, but if select one term (in this case, n), set all the others to be constant (since none of them depends on n), and evaluate the resulting expression tree, you should get something half-usable. Some simplifying rules may be useful. For instance, log(x^2) should approach log(x) as x approaches infinity, I think. (My calculus is a bit rusty.)
Dec 05 2015
On 12/06/2015 03:47 AM, BLM768 wrote:log(x^2) should approach log(x) as x approaches infinity, I think. (My calculus is a bit rusty.)log(x^2) = 2 log x.
Dec 05 2015
On Sunday, 6 December 2015 at 03:30:51 UTC, Timon Gehr wrote:log(x^2) = 2 log x.Why do log rules have to make everything so simple? ;)
Dec 05 2015
On 12/05/2015 04:52 PM, BLM768 wrote:Well, we could see how CAS libraries handle this kind of stuff (if there _is_ one which handles it)...CAS = ?Really, though, with the more complex algorithms, you start running into the kinds of issues noted in the first reply to this article: http://www.perlmonks.org/?node_id=573138 Besides, is anyone actually going to specify that they need an algorithm that is O(log(n) + m^2) or whatever? Or would a function just take _two_ algorithms, assert that each is O(log(n)) and O(m^2), respectively, and then compose them itself? The complexities depend on the types of container anyway, so in general, you should get only multi-term big-O notation when you're using multiple containers (or Cartesian products of a container with itself or something like that). Can't we just make sure one container's insert() and the other container's sort() work within a certain complexity?Well you'd need multiple terms if you want to say things like, "inserting a range into an array is O(array[].walkLength + r.walkLength)." When you insert a range in a binary search tree, the complexity would be O(log(array[].walkLength) * r.walkLength). Part of the art here, I feel, is knowing when to stop going too crazy about this. At this point we have a nice, round system that's simple to understand and use. Making it more complex should come only with a corresponding growth in power. Andrei
Dec 05 2015
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:On 12/05/2015 04:52 PM, BLM768 wrote:Computer Algebra System, I assume.Well, we could see how CAS libraries handle this kind of stuff (if there _is_ one which handles it)...CAS = ? ...
Dec 05 2015
On Saturday, 5 December 2015 at 23:55:16 UTC, Andrei Alexandrescu wrote:On 12/05/2015 04:52 PM, BLM768 wrote:https://en.wikipedia.org/wiki/Computer_algebra_systemWell, we could see how CAS libraries handle this kind of stuff (if there _is_ one which handles it)...CAS = ?
Dec 05 2015
On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:On 12/04/2015 10:24 PM, Timon Gehr wrote:Some upper bounds are incomparable, so there would not be a total order. But that is not a problem.void foo( (BigO.linear) int n, (BigO.linear) int m); But UDAs for parameters are not supported.That's actually pretty neat and easy to work around like this: void foo(int n, int m) (BigOParam!2(BigO.linear, BigO.linear); In fact I went through the implementation but soon I hit a wall: what's the _relationship_ between the two growths? It may be the sum O(m + n) but also the product O(m * n). So the operator must be encoded as well. Then what do we do with more complex relationships like O((m + n) log n) etc. Then once you get to some reasonable formula, what's the ordering on top of these complexities? Probably difficult. ...I gave up on this for the time being. Ideas welcome. ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. (The implementation is not feature-complete yet, though. It would be nice if it supported automatically computing a new asymptotic runtime bound from asymptotic bounds on the arguments.)
Dec 05 2015
On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:Some upper bounds are incomparable, so there would not be a total order. But that is not a problem.It is a problem in all cases as you usually dont have an optimal bound. And with your approach that will most certainly be guaranteed to happen. Comparing bounds does not mean you are comparing running time. O(1) implies O(f(x)), O(N) implies O(N^2). You can also get tighter bounds for specific input models.
Dec 05 2015
On 12/06/2015 08:59 AM, Ola Fosheim Grøstad wrote:On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:Are you complaining that the given implementation does not support 'min', or what are you trying to say here?Some upper bounds are incomparable, so there would not be a total order. But that is not a problem.It is a problem in all cases as you usually dont have an optimal bound.And with your approach that will most certainly be guaranteed to happen.How is that specific to my approach? I only showed a more expressive BigO implementation.Comparing bounds does not mean you are comparing running time. ...BigO represents a set of functions. Comparing BigO checks for subset inclusion.O(1) implies O(f(x)), O(N) implies O(N^2).For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }. O(1) ⊆ O(f(x)), O(N) ⊆ O(N²). <= checks ⊆. == checks =. (The final implementation should use exact fractions instead of doubles.)You can also get tighter bounds for specific input models.Yes, you can.
Dec 06 2015
On 12/06/2015 03:39 PM, Timon Gehr wrote:For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }.Hm, this does not actually work too well. E.g., we want O(n+m) ⊆ O(n log m + m log n). This breaks down with that definition if we e.g. fix m=1 and let n vary. Anyway, I think this can be fixed.
Dec 06 2015
On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:Are you complaining that the given implementation does not support 'min', or what are you trying to say here?I am saying that comparing bounds is not the same as comparing running time of implemented algorithms. Insertion sort is both O(n^2) and O(n^3), but if you run it on a sorted array where each element have been swapped with neighbouring elements 16 times, then it is O(N). So these derived bounds are too loose to be useful, generic algorithms cannot make use of them beyond the trivial case.BigO represents a set of functions. Comparing BigO checks for subset inclusion.But what can you use it for? When you compose algorithms and even run an optimizer over it, then combining a O(N^2) with O(N) kan turn into O(1). You need advanced compiler support for this to be valuable.Exactly, and when you compose/combine algorithms you often end up constraining the input model.You can also get tighter bounds for specific input models.Yes, you can.
Dec 06 2015
On 12/06/2015 10:35 PM, Ola Fosheim Grøstad wrote:On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:Yes, that's what you said later down the post. It's completely unrelated to the sentence you claimed was false.Are you complaining that the given implementation does not support 'min', or what are you trying to say here?I am saying that comparing bounds is not the same as comparing running time of implemented algorithms.
Dec 06 2015
On Sunday, 6 December 2015 at 23:49:00 UTC, Timon Gehr wrote:Yes, that's what you said later down the post. It's completely unrelated to the sentence you claimed was false.i would assume that there would have to be a usecase for something added to a standard library. Based on the presented use case it is like using the classifications "younger than 100" and "younger than 16", apply them randomly to indivduals of the same age and use the classifications for making decisions about whether they should be allowed to see adult movies or not. Putting an ordering on the classification is in that case useless.
Dec 06 2015
Timon Gehr <timon.gehr gmx.ch> wrote:On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:Brilliant! My wife has a work emergency so I've been with the kids all day, but here's what can be done to make this simpler. Use D parsing and eliminate the whole parsing routine. Add multiply and power are all defined so you only need log of bigo. Define global constants K, N, and N1 through N7. K is constant complexity all others are free variables. Then complexities are regular D expressions e.g BigO(N^2 * log(N)). On a the phone sry typos.On 12/04/2015 10:24 PM, Timon Gehr wrote:Some upper bounds are incomparable, so there would not be a total order. But that is not a problem.void foo( (BigO.linear) int n, (BigO.linear) int m); But UDAs for parameters are not supported.That's actually pretty neat and easy to work around like this: void foo(int n, int m) (BigOParam!2(BigO.linear, BigO.linear); In fact I went through the implementation but soon I hit a wall: what's the _relationship_ between the two growths? It may be the sum O(m + n) but also the product O(m * n). So the operator must be encoded as well. Then what do we do with more complex relationships like O((m + n) log n) etc. Then once you get to some reasonable formula, what's the ordering on top of these complexities? Probably difficult. ...I gave up on this for the time being. Ideas welcome. ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. (The implementation is not feature-complete yet, though. It would be nice if it supported automatically computing a new asymptotic runtime bound from asymptotic bounds on the arguments.)
Dec 06 2015
On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:The implementation of power on BigO given does not actually work in general (there should have been an enforcement that makes sure there is only one summand). I'll think about whether and how it can be made to work for multiple summands.Brilliant! My wife has a work emergency so I've been with the kids all day, but here's what can be done to make this simpler. Use D parsing and eliminate the whole parsing routine. Add multiply and power are all defined so you only need log of bigo. ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. ...Define global constants K, N, and N1 through N7. K is constant complexity all others are free variables. Then complexities are regular D expressions e.g BigO(N^2 * log(N)). On a the phone sry typos.I generally tend to avoid DSLs based solely on operator overloading, as they don't always work and hence are awkward to evolve. Here, the biggest current nuisance is that the variable names are not very descriptive. If we'll go with a log(BigO) function, we possibly want to make BigO closed under log without approximating iterated logarithms: struct Term{ Variable n; Fraction[] exponents; // exponents[0] is exponent of n, // exponents[1] is exponent of log n, // exponents[2] is exponent of log log n, ... } Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence every BigO has a well-defined logarithm. [1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)). O(log(f)+log(g)) ⊆ O(max(log(f),log(g))) = O(log(max(f,g))) ⊆ O(log(f+g)).
Dec 06 2015
On 12/06/2015 06:21 PM, Timon Gehr wrote:The implementation of power on BigO given does not actually work in general (there should have been an enforcement that makes sure there is only one summand). I'll think about whether and how it can be made to work for multiple summands.No matter, you may always use runtime assertions - after all it's all during compilation. That's the beauty of it!The bright side is you get to standardize names. If you limit names to K, N, and N1 through N7 then you can always impose to APIs the meaning of these. Another bright side is people don't need to learn a new, juuust slightly different grammar for complexity expressions - just use D. For example the grammar you defined allows log n without parens - what's the priority of log compared to power etc? Why is power ^ in this notation and ^^ in D? All these differences without a distinction are gratuitous. Just use D.Define global constants K, N, and N1 through N7. K is constant complexity all others are free variables. Then complexities are regular D expressions e.g BigO(N^2 * log(N)). On a the phone sry typos.I generally tend to avoid DSLs based solely on operator overloading, as they don't always work and hence are awkward to evolve. Here, the biggest current nuisance is that the variable names are not very descriptive.If we'll go with a log(BigO) function, we possibly want to make BigO closed under log without approximating iterated logarithms: struct Term{ Variable n; Fraction[] exponents; // exponents[0] is exponent of n, // exponents[1] is exponent of log n, // exponents[2] is exponent of log log n, ... } Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence every BigO has a well-defined logarithm. [1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)). O(log(f)+log(g)) ⊆ O(max(log(f),log(g))) = O(log(max(f,g))) ⊆ O(log(f+g)).Yah, the stuff under log must be restricted. Here's the grammar I'm toying with: Atom ::= K | N | N1 | ... | N7 SimpleExp ::= SimpleTerm ('+' SimpleTerm)* SimpleTerm ::= SimpleFactor ('*' SimpleFactor)* SimpleFactor ::= Atom ('^^' double)? | '(' SimpleExp ')' BigO ::= Term ('+' Term)* Term ::= SimpleFactor ('*' 'log' '(' SimpleExp ')' ('^^' double)?)? (I used regex notations for "optional" and "zero or more".) This is expressible with D's native operations (so no need for custom parsing) and covers, I think, what we need. It could be further simplified if we move some of the grammar's restrictions to runtime (e.g. no need for SimpleXxx, some expressions can be forced to be simple during "runtime"). Andrei
Dec 06 2015
On 12/07/2015 02:05 AM, Andrei Alexandrescu wrote:On 12/06/2015 06:21 PM, Timon Gehr wrote:Even better: I was wrong when I claimed it does not work. For 0<=x, it actually works as-is: O((f+g)^x) = O(max(f,g)^x) = O(max(f^x,g^x)) = O(f^x+g^x).The implementation of power on BigO given does not actually work in general (there should have been an enforcement that makes sure there is only one summand). I'll think about whether and how it can be made to work for multiple summands.No matter, you may always use runtime assertions - after all it's all during compilation. That's the beauty of it! ...Well, how?The bright side is you get to standardize names. If you limit names to K, N, and N1 through N7 then you can always impose to APIs the meaning of these. ...Define global constants K, N, and N1 through N7. K is constant complexity all others are free variables. Then complexities are regular D expressions e.g BigO(N^2 * log(N)). On a the phone sry typos.I generally tend to avoid DSLs based solely on operator overloading, as they don't always work and hence are awkward to evolve. Here, the biggest current nuisance is that the variable names are not very descriptive.Another bright side is people don't need to learn a new, juuust slightly different grammar for complexity expressions - just use D. For example the grammar you defined allows log n without parens - what's the priority of log compared to power etc?There is no priority. The parser explicitly rejects log n^x. :-)Why is power ^ in this notation and ^^ in D?Because ^ is taken for bitwise xor in D due to backwards-compatibility constraints.All these differences without a distinction are gratuitous. Just use D. ...D does not allow overloading of syntax to the extent necessary to make similar things really pleasant in the long run, and it has been repeatedly argued that this is a good thing; that custom parsing should be used instead. It is easy to align the parser with D syntax. Anyway, syntax is not the problem here, and the implementation can be downgraded to not support parsing and/or proper names at any point.
Dec 07 2015
On 12/7/15 5:14 AM, Timon Gehr wrote:D does not allow overloading of syntax to the extent necessary to make similar things really pleasant in the long run, and it has been repeatedly argued that this is a good thing; that custom parsing should be used instead. It is easy to align the parser with D syntax. Anyway, syntax is not the problem here, and the implementation can be downgraded to not support parsing and/or proper names at any point.I fail to see how no parens after log or "^" in lieu "^^" would make a positive difference. What would be a few examples of things that won't work pleasantly enough? I'm not sure whether the DSL argument is well applied here. Clearly using D expressions for e.g. regex or SQL syntax would be at best avoided in favor of a DSL. In this case we're defining an algebra over restricted expressions, which are a subset of the usual mathematical expressions that D's syntax already emulates. Looks like a debate on whether to choose one standard language vs. an obscure one (in this case invented ad-hoc) is starting. This is deja vu all over again :o). I hope you won't mind if I give your idea a slightly different angle. Andrei
Dec 07 2015
On 12/07/2015 02:46 PM, Andrei Alexandrescu wrote:On 12/7/15 5:14 AM, Timon Gehr wrote:Neither have I attempted to show anything like that. All arguments in this debate are obvious and boring, so no need to discuss it.D does not allow overloading of syntax to the extent necessary to make similar things really pleasant in the long run, and it has been repeatedly argued that this is a good thing; that custom parsing should be used instead. It is easy to align the parser with D syntax. Anyway, syntax is not the problem here, and the implementation can be downgraded to not support parsing and/or proper names at any point.I fail to see how no parens after log or "^" in lieu "^^" would make a positive difference.What would be a few examples of things that won't work pleasantly enough? ...You gave this example: http://en.cppreference.com/w/cpp/container/forward_list/insert_after 1-2) BigO("1") 3) BigO("count") 4) BigO("distance(first,last)") 5) BigO("ilist.size()") There's also this: On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:Well you'd need multiple terms if you want to say things like, "inserting a range into an array is O(array[].walkLength + r.walkLength)." When you insert a range in a binary search tree, the complexity would be O(log(array[].walkLength) * r.walkLength).BigO("array[].walkLength + r.walkLength"); BigO("log(array[].walkLength) * r.walkLength");
Dec 07 2015
On 12/07/2015 09:43 AM, Timon Gehr wrote:1-2) BigO("1") 3) BigO("count") 4) BigO("distance(first,last)") 5) BigO("ilist.size()") There's also this: On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:These are still expressible without a DSL: BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc. Somewhat independently of DSL or not: At this point I'm unclear whether supporting free variables with arbitrary names is a good thing. The key to unleashing the power of BigO is to compute it when combining functions, and the names seem to be specific to the function and therefore not easy to combine. AndreiWell you'd need multiple terms if you want to say things like, "inserting a range into an array is O(array[].walkLength + r.walkLength)." When you insert a range in a binary search tree, the complexity would be O(log(array[].walkLength) * r.walkLength).BigO("array[].walkLength + r.walkLength"); BigO("log(array[].walkLength) * r.walkLength");
Dec 07 2015
On 12/07/2015 05:15 PM, Andrei Alexandrescu wrote:On 12/07/2015 09:43 AM, Timon Gehr wrote:This just goes from one string to two strings and adds some noise on top of it. Also, now, what is D and what is in a string is arbitrary. BigO(Var.array[].walkLength + Var.r.walkLength);1-2) BigO("1") 3) BigO("count") 4) BigO("distance(first,last)") 5) BigO("ilist.size()") There's also this: On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:These are still expressible without a DSL: BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc. ...Well you'd need multiple terms if you want to say things like, "inserting a range into an array is O(array[].walkLength + r.walkLength)." When you insert a range in a binary search tree, the complexity would be O(log(array[].walkLength) * r.walkLength).BigO("array[].walkLength + r.walkLength"); BigO("log(array[].walkLength) * r.walkLength");Somewhat independently of DSL or not:Yes, notation does not matter at this stage.At this point I'm unclear whether supporting free variables with arbitrary names is a good thing.Restricting the set of names to 8 predefined ones does not simplify anything. It just means that a mapping of predefined names to real names has to be carefully managed manually and clashes have to be avoided, all while limiting the value of BigO as documentation.The key to unleashing the power of BigO is to compute it when combining functions, and the names seem to be specific to the function and therefore not easy to combine. ...Just substitute something else for those names to get rid of them. That is how passing of parameters is handled. Probably we should get some actual examples where combination should work to see what could be done. If you have: void foo(int n,int m) BigO("n*m^2"){ ... } Something like this is easy to implement: // compute runtime bound from bounds on parameters: // first expression passed is O(x^3) and second expression passed // is O(log(y)^(1/2)). enum bound = getRuntimeBound!foo(BigO("x^3"),BigO("log(y)^(1/2)")); static assert(bound == BigO("x^3*log(y)")); Note how the end user does not need to worry much about names.
Dec 07 2015
On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu wrote:These are still expressible without a DSL: BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.We can remove the use of strings if we tag walkLength with BigO: // this: BigO(Atom("array[].walkLength") + Atom("r.walkLength")) //become this: (complexity!(array[].walkLength) + complexity!(r.walkLength)) Or if we rename complexity to bigO: (bigO!(array[].walkLength) + bigO!(r.walkLength))
Dec 08 2015
On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu wrote:On 12/07/2015 09:43 AM, Timon Gehr wrote:There are many "operations" that can be carried out on methods. Many are composable and decomposable in the exact same way and can be generalized as an algorithm: let f,f1,f2, etc be functions, if f = f1 + f2 then L(f) = L(f1) + L(f2) Hence if L is a linear operator, this always holds. Addition(+) is linear, BigO is linear. Composing linear operators is linear(but it doesn't have an inverse like addition). If D supported some generic way to handle such linear operators acting on methods, with calls to other methods as decomposing the method, one could do many cool things quite easily in D. BigO would just be one. e.g., void foo() { bar1(); bar2(); for(auto i = 1; i < 100; i++) for(auto j = 1; j < 100; j++) bar3(); } then we can see L(foo) = L(bar1) + L(bar2) + X(n^2)*L(bar3) (Here, X(n^2) is because we have a double nested for loop which multiples the complexity by n^2 for big notation. To be more precise, we should refactor all code inside the function in such a way to make things precise, i.e., void foo() { bar1(); bar2(); } void fooTemp() { for(auto i = 1; i < 100; i++) for(auto j = 1; j < 100; j++) bar3(); } which then gives the more precise decomposition as L(foo) = L(bar1) + L(bar2) + L(fooTemp) The main point of all this, is that L can be many things. If one has the complete hierarchical call stack(with root = Main) then L can be scene as recursively working on the tree. In this case, addition would rather be "insertion into tree". We could easily check relations such as if foo is called, calls, or disjoint from bar. We can, as already stated, implement complexity analysis automatically: BigO - The sup of the complexity of a function assuming all unknowns(loops, user input, unknown sub-function calls) have maximal complexity. bigO - The sup of the complexity of all functions assuming all unknowns have 0 complexity. LittleO - The inf of complexity.... litleO - ... The point, by simply implementing a decomposable structure on the "call hierarchy", one can implement all kinds of cool stuff. If it is exposed to D at run time, then even more amazing things can happen. e.g., one could implement a real time profiler. L would be an "operator" that simply sends the current ticks when first called and when returning. The "addition" is really just code that does the computations on the differences it is getting. Imagine hitting some hot key and your current app showing a tree(something like a fractal tree) where each node is a function call and sub nodes are functions that are the functions that the function calls("uses"). Color the tree based on the use rate(count how many times the function is called, or even accumulate hierarchically) or the execution time, or even BigO. You'll then see a visual of where the program is active the most. It might sort of look like the heat distribution map of the earth. I realize it is more complex to implement than I'm making it out, but it's basically tree traversal and hooks stuff. Once you start allowing meta coding, the power becomes magnified: e.g. (psuedo-code) // Define generic meta code function composer(the + in the algorithm) for BigO function!composer!BigO(functions) { let f1, ..., fn = functions return function!composer!BigO(f1) + ... + function!composer!BigO(f2) // Would need to separate parse f1 in such a way that it is composed of only calls to functions or none calls(the loop stuff I mentioned above) } // The Linear operator BigO that returns the complexity of a function f function!composer!BigO(f) { return complexity(f); } // Explicitly specify complexity for foo void foo() [function!composer!BigO() { return 0; } { ... } ------------------------------------ The above represents a mock way to allow the programmer to "hook" into the compilers call hierarchy structure. Obviously it would require some work for actual implementation. But once such a feature is implemented, the programmer can do all sorts of things. Implement "attributes"(not needed, of course), real time profiling, BigO, Code replacement/Self Modifying Programs(essentially change how a function behaves), etc... Of course, maybe we are starting to get into AI here? ;) (e.g., one could use multiple similar functions in a program for performance enhancements, but implement a few lines of code to have those functions reconfigure themselves to use the most optimal one for performance reasons) Also, why stop at functions? what about all sub-structures in the program? We could work on classes, structs, etc...1-2) BigO("1") 3) BigO("count") 4) BigO("distance(first,last)") 5) BigO("ilist.size()") There's also this: On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:These are still expressible without a DSL: BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc. Somewhat independently of DSL or not: At this point I'm unclear whether supporting free variables with arbitrary names is a good thing. The key to unleashing the power of BigO is to compute it when combining functions, and the names seem to be specific to the function and therefore not easy to combine. AndreiWell you'd need multiple terms if you want to say things like, "inserting a range into an array is O(array[].walkLength + r.walkLength)." When you insert a range in a binary search tree, the complexity would be O(log(array[].walkLength) * r.walkLength).BigO("array[].walkLength + r.walkLength"); BigO("log(array[].walkLength) * r.walkLength");
Dec 10 2015
On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:I have noticed another thing. The comparison operator is an underapproximation (it sometimes returns NaN when ordering would actually be possible). E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m². Interesting. It would be nice if the final version had a complete decision procedure for ⊆.Brilliant! ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. ...
Dec 06 2015
On 12/06/2015 07:50 PM, Timon Gehr wrote:On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- AndreiI have noticed another thing. The comparison operator is an underapproximation (it sometimes returns NaN when ordering would actually be possible). E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m². Interesting. It would be nice if the final version had a complete decision procedure for ⊆.Brilliant! ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. ...
Dec 06 2015
On 12/07/2015 02:07 AM, Andrei Alexandrescu wrote:On 12/06/2015 07:50 PM, Timon Gehr wrote:Yes, certainly. By complete, I mean it orders everything that can be ordered.On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- AndreiI have noticed another thing. The comparison operator is an underapproximation (it sometimes returns NaN when ordering would actually be possible). E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m². Interesting. It would be nice if the final version had a complete decision procedure for ⊆.Brilliant! ...The next step up the expressiveness scale would be to have a sum-of-products representation. Proof of concept (disclaimer: hacked together in the middle of the night, and not tested thoroughly): http://dpaste.dzfl.pl/d1512905accd I think this general approach is probably close to the sweet spot. ...
Dec 07 2015
On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu wrote:Then what do we do with more complex relationships like O((m + n) log n) etc.What about a compile-time expression tree similar to how, for instance, `BaseUnit`, `ScaledUnit`, (and suggestedly `ShiftedUnit` and `LinearUnit`), are defined and used in David Nadlinger's std.units?
Mar 17 2016
On Fri, Dec 04, 2015 at 01:37:06PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:[...]Here's the algebra. Terms: 1 = O(1) log = O(log n) plog = O((log n)^k) sqrt = O(sqrt n) lin = O(n) linlog = O(n * log n) linplog = O(n * (log n)^k) poly = O(n^k) exp = O(2^n) Ordering: Terms above are in increasing order. Summation: x + y = max(x, y) Product: | 1 log plog sqrt lin linlog poly exp -------+------------------------------------------------------------ 1 | 1 log plog sqrt lin linlog poly exp log | log plog plog ? linlog ? poly exp plog | plog plog plog ? linplog linplog poly exp sqrt | sqrt ? ? lin poly poly poly exp lin | lin linlog linplog poly poly poly poly exp linlog | linlog linplog linplog poly poly poly poly exp poly | poly poly poly poly poly poly poly exp exp | exp exp exp exp exp exp exp exp I've left a few question marks for the tricky cases. Ideas?[...] sqrt really belongs under poly, as far as asymptotic behaviour is concerned. Basically, O(n^j) * O(n^k) for any real j, k > 0 is asymptotically equal to O(n^(j+k)). Furthermore, O(O(n^j)^k) = O(n^(j*k)). So you can treat polynomial complexities as a field of real numbers, where + on the O(...) terms behaves like max(), * behaves like +, and function composition behaves like ^. Then exp behaves like an "infinite real", where exp > O(n^k) for all real k>0. Its inverse, log, therefore, behaves like an "infinitesimal", where O((log n)^k) for all real k>0 is less than O(n^k) for all real k>0. (Yes, even O(n^(1/1000000)) will eventually outgrow O(log n).) The log powers behave like "intermediate infinitesimals", where you have O((log n)^j) < O((log n)^k) for all positive real j < k. So these O-terms behave approximately like real numbers (poly) enhanced with infinite quantities (exp and its derivations) and infinitesimal quantities (log and its derivations), they follow the usual laws of arithmetic. Therefore, O(n^k) * O(log n) can be treated as the equivalent of a real number k + an infinitesimal L, such that k < (k+L) < k+j for all real j>0. In other words, O(n) < O(n * log n) < O(n^(1+k)) for all k>0. (Yes, even O(n^1.00000000001) will eventually outgrow O(n*log n). O(log n) behaves like an infinitesimal.) The nice thing about this is that you can simplify a lot of complicated O(...) expressions just by applying algebra as described above on the analogous {real + infinities + infinitesimals} system. Ordering relations are preserved (for the most part -- this only breaks down with pathological cases like O(sin n) which is unlikely to be ever encountered). Also, function composition between poly and non-poly complexities will generally be non-commutative, and mostly will break the + = max analogy. (But it seems unlikely, in real-world algorithms, to ever need to worry about the intricacies of exponential-time algorithms, since generally they are to be avoided where possible.) So you can get a (mostly) closed algebra just by mapping poly to the real numbers, and then adding: L = infinitesimal quantity corresponding to O(log n) E = infinite quantity corresponding to exp Various operations inside O(...) are thus mapped: + inside O(...) = max(...) * inside O(...) = + between mapped quantities O(f(g(n))) = O(f(n)) * O(g(n)) O(1) = 0 Example: is O(n^3*log(n)) better than O(n^2*(log(n))^3)? Answer: O(n^3*log(n)) maps to 3 + L, and O(n^2*(log(n))^3) maps to 2 + L^3. Since L^3 is infinitesimal, (2 + L^3) is very close to 2, whereas (3 + L) lies slightly above 3. Therefore O(n^3*log(n)) is definitely faster-growing than O(n^2*(log(n))^3). This algebra leads to various interesting questions like: - Is there an intermediate complexity that lies between poly and exp? Yes: since exp is mapped to the infinite quantity E, it follows that E/2 is still an infinite quantity (even though it's strictly less than E). E/2 corresponds to E*1/2, which is the composition of exp with sqrt, so it follows that O(n^k) < O(e^sqrt(n)) < O(e^n) for all real k>0. (There are, of course, many other possibilities.) - Similarly, the log powers O((log n)^k) are *always* slower-growing than any polynomial complexity, because L*k, being still infinitesimal, will always < j for all real j>0. So even O((log n)^10000) will not outgrow O(n^1.000000001). Multiplying with a poly function preserves this relationship: O(n^j * (log n)^k) --> j + L*k < j + m, for all real j,m>0, so O(n^j * (log n)^k) < O(n^(j+m)) for all j,m>0. Basically you're just displacing the mapped values by a constant. T -- People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. Knuth
Dec 04 2015
I'll get back to this (on the phone) but this is incorrect: sqrt really belongs under poly, as far as asymptotic behaviour is concerned. Fractional powers are sublinear. And sqrt times sqrt is linear which is important. Andrei
Dec 04 2015
On Fri, Dec 04, 2015 at 08:03:17PM +0000, Andrei Alexandrescu via Digitalmars-d wrote:I'll get back to this (on the phone) but this is incorrect: sqrt really belongs under poly, as far as asymptotic behaviour is concerned.OK, I guess by poly you mean n^k for k>1. I was lumping everything under polynomial for k>0. The reason is because of the homogeneity for all values of k>0 when it comes to the algebra of complexities. Obviously, for performance-measuring purposes, k=1 is a significant landmark.Fractional powers are sublinear.Wrong, O(n^(4/3)) is a fractional power, but it's not sublinear.And sqrt times sqrt is linear which is important.[...] But it's just a special case of the general multiplicative->additive rule. Everyone knows 1/2 + 1/2 = 1; it doesn't seem to warrant special treatment above, say, 1/3 + 2/3 = 1, or any of the other endless identities involving 1. T -- Meat: euphemism for dead animal. -- Flora
Dec 04 2015
On 04-Dec-2015 19:06, Andrei Alexandrescu wrote:On 12/04/2015 01:05 AM, Tofu Ninja wrote:Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.Well look at what the cat dragged in: http://dpaste.dzfl.pl/711aecacc450That's quite promising. The code is very preliminary and uses strings for typical complexity values (e.g. "constant", "linear", and later "loglinear" etc). I need to see how to integrate this whole idea. Also an unpleasant problem is overloading - when present, user code needs to specify which overload they're referring to. Anyhow, this is really interesting. Thanks Tofu. Andrei-- Dmitry Olshansky
Dec 04 2015
On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On 12/04/2015 03:43 PM, Walter Bright wrote:On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92 That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language. AndreiWas vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On Friday, 4 December 2015 at 22:48:03 UTC, Andrei Alexandrescu wrote:On 12/04/2015 03:43 PM, Walter Bright wrote:Looks good :)On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92 That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language. AndreiWas vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On 12/4/15 5:48 PM, Andrei Alexandrescu wrote:On 12/04/2015 03:43 PM, Walter Bright wrote:This looks very excellent, nice work! -SteveOn 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92 That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language.Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On Fri, Dec 04, 2015 at 05:48:03PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:On 12/04/2015 03:43 PM, Walter Bright wrote:[...] Nice, very nice! But ... opMul? I thought that was deprecated? Shouldn't we be using opBinary!"*" instead? T -- Food and laptops don't mix.On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92 That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language.Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On 12/4/2015 2:48 PM, Andrei Alexandrescu wrote:On 12/04/2015 03:43 PM, Walter Bright wrote:This is very similar to the si units program Oskar Linde wrote back in 2006. Anyhow, this is very cool and should: 1. have an article written about it 2. get its own Phobos moduleOn 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92 That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language.Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.
Dec 04 2015
On 05-Dec-2015 01:48, Andrei Alexandrescu wrote:On 12/04/2015 03:43 PM, Walter Bright wrote:Better still - no more strings: http://dpaste.dzfl.pl/5ed2b3ad3043On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:"Even better". http://dpaste.dzfl.pl/50340e189f92Was vaguely terrified reading this whole thread until hitting this gem. Seems like a creative use for UDA.Yeah, I think it puts UDA on the map! Instead of string arguments, though, I suggest enums.That code is actually remarkably complete, with algebra and remarkable constants. Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language. Andrei-- Dmitry Olshansky
Dec 05 2015
On 12/3/2015 10:05 PM, Tofu Ninja wrote:Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.Dang, you beat me to it!
Dec 04 2015
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:On 12/3/15 10:29 PM, Jack Stouffer wrote:"I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?" In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both.On 12/03/2015 09:10 PM, Idan Arye wrote:How so? If a singly linked list and a doubly linked list have two different method names for the same operation, then they cannot be easily templated.The complexities of the operations is a property of the data structure being used. If each collection type will have it's own set of method names based on the complexity of operations on it, we won't be able to have templated functions that operate on any kind of collection(or at the very least, these functions will be really tedious to code).Your premise is right but you reach the negation of the correct conclusion. -- Andrei
Dec 04 2015
On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:"I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?" In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.In the current std.container, linearInsert aliases to the sublinear insert function when it's provided. If you don't care about running time, just use linearInsert.
Dec 04 2015
On Friday, 4 December 2015 at 09:57:48 UTC, Jakob Ovrum wrote:On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:I understand this. However, my point is that it seems backwards if 1) When I don't care about the complexity, then I need to specify one (e.g. linearInsert). 2) When I care and want a constant complexity, then I don't specify one (e.g. use insert instead of constantInsert). In addition, if a new container is added that only provides O(n log n) insert, then my "don't care" code that uses linearInsert does not support the new container without changes."I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?" In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.In the current std.container, linearInsert aliases to the sublinear insert function when it's provided. If you don't care about running time, just use linearInsert.
Dec 04 2015
On 12/04/2015 09:50 AM, tn wrote:However, my point is that it seems backwards if 1) When I don't care about the complexity, then I need to specify one (e.g. linearInsert). 2) When I care and want a constant complexity, then I don't specify one (e.g. use insert instead of constantInsert).When complexity information is not present, the name of the function may carry an implicit and documented assumption of complexity bounds. -- Andrei
Dec 04 2015
On 12/04/2015 04:51 AM, tn wrote:"I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?"Oh but you do care.In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.What would be the complexity assumption of "insert"? Andrei
Dec 04 2015
On Friday, 4 December 2015 at 14:08:08 UTC, Andrei Alexandrescu wrote:On 12/04/2015 04:51 AM, tn wrote:I don't, if I have a small container of constant size (or bounded by a small constant). Say, less than 10 elements. Or perhaps I just want to quickly make some prototype. And if the other part of my algorithm is exponential or even superexponential, then it is going to dominate anyway. Or should there be also baseTwoExponentialInsert etc. for the case in which I don't care as long as the complexity is at most O(2^n)?"I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?"Oh but you do care.None. So in a sense that would be an alias of whateverInsert or idontcareInsert or infinityInsert.In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.What would be the complexity assumption of "insert"?
Dec 04 2015
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:I know names are something we're awfully good at discussing :o). Destroy! AndreiI find it ironic that this thread has moved to discuss how to name complexity/running-time...
Dec 03 2015
On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy!Are you sure there are only 3 complexities? What if it's a self-balancing tree? I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?
Dec 03 2015
On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:I agree -- putting the complexity (eh, running time) to the primitive's name seems like a bad idea. I believe that stability is a more important "feature" of a sorting algorithm than its running time, because you can make implications about it and use them for your own algorithms (I use it for implementing delta compression) and affects the output. Running time is still a good guarantee to have, but if `sort` runs O(n^2) or O(nlogn) is not going to affect how your output will look like!Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy!Are you sure there are only 3 complexities? What if it's a self-balancing tree? I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?
Dec 03 2015
On Friday, 4 December 2015 at 07:50:19 UTC, Minas Mina wrote:On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:That's wrong. Imagine a sort method that calls to an API server to sort the numbers. Given a good internet connection, this implementation detail will not affect the correctness of the result. But I'm sure that no sane person would want to use this method. When the only job of a function is to implement an algorithm then the only thing you should care about is the time and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:I agree -- putting the complexity (eh, running time) to the primitive's name seems like a bad idea. I believe that stability is a more important "feature" of a sorting algorithm than its running time, because you can make implications about it and use them for your own algorithms (I use it for implementing delta compression) and affects the output. Running time is still a good guarantee to have, but if `sort` runs O(n^2) or O(nlogn) is not going to affect how your output will look like!Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy!Are you sure there are only 3 complexities? What if it's a self-balancing tree? I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?
Dec 04 2015
On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.I am sorry to say this, but hard performance requirements require O(1) on everything. Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.
Dec 04 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:And even then you cannot assume that it is real time as the following is O(1): if (full_moon()) sleep(1000); So you can have an O(1) algorithm that occasionally triggers an expensive constant-time/memory path that causes big problems in a running system. You need a hard upper bound measured in cycles.and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.I am sorry to say this, but hard performance requirements require O(1) on everything.
Dec 04 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.I am sorry to say this, but hard performance requirements require O(1) on everything. Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.
Dec 04 2015
On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).Yes, if you are writing for large scalable systems then it is interesting to know what the ceiling is, but then you probably also want something that is no worse than O(log N). Say for a chat service you'd know that the more expensive to develop solution can scale to 10 million users (O(1)), and the less expensive do develop solution will become very expensive to run when you hit 1 million (O(log N)). So Big-Oh can be useful when you work with very large sizes and large variations, but in most cases I think it can be misleading without a good description of the actual implementation. Another very useful measure in real time is the amount of variation between repeated calls. Say if there is a guarantee of <50% variation, you can measure actual running time and add 50% headroom on the hardware requirements. Some algorithms have some fixed size tables they need to clear out every once in a while and that can be very bad.
Dec 04 2015
On Friday, 4 December 2015 at 09:50:15 UTC, Ola Fosheim Grøstad wrote:On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:I strongly agree with what you said earlier that typical complexity analysis is a too naive model for real-world systems. Very often (e.g. due to the nature of real hardware) your time may be dominated by constants. Or a function which looks like O(1) is sometimes O(n), due to hidden complexity. This (the fact that constants matter) is especially true when you aim for O(1). And yes, variation of execution time is also very important. However, here we're talking about API for containers and algorithms and I think that putting the complexity in the function signature is better than only using abstract data structures, because it makes you more aware of the consequences of your choice of data structures and algorithms that you make. If we can figure out a way to put even more information it may be even better.Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).Yes, if you are writing for large scalable systems then it is interesting to know what the ceiling is, but then you probably also want something that is no worse than O(log N). Say for a chat service you'd know that the more expensive to develop solution can scale to 10 million users (O(1)), and the less expensive do develop solution will become very expensive to run when you hit 1 million (O(log N)). So Big-Oh can be useful when you work with very large sizes and large variations, but in most cases I think it can be misleading without a good description of the actual implementation. Another very useful measure in real time is the amount of variation between repeated calls. Say if there is a guarantee of <50% variation, you can measure actual running time and add 50% headroom on the hardware requirements. Some algorithms have some fixed size tables they need to clear out every once in a while and that can be very bad.
Dec 04 2015
On Friday, 4 December 2015 at 10:35:33 UTC, ZombineDev wrote:However, here we're talking about API for containers and algorithms and I think that putting the complexity in the function signature is better than only using abstract data structures, because it makes you more aware of the consequences of your choice of data structures and algorithms that you make. If we can figure out a way to put even more information it may be even better.Well, if the algorithm can choose between containers it might be interesting to query the characteristics of a set of operations on each container. Not sure if the method name is is the best way to do it. Although I guess having a generic find(x) and find_instantly(x) would be ok, if the latter verison basically means less than 200 cycles in all cases. If it means less than 2000 cycles I think it is not so useful. Maybe it is possible to make the inference as static analysis and in addition make it possible for libraries to express various guarantees. Since it is worst case it would be acceptable that if analysis fails it will result in "unbounded".
Dec 04 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:Only when dealing with an arbitrary number of elements. O(n^2) could be fine if you know for a fact that you're always dealing with a very small number of elements. And some algorithms with worse complexity are actually faster than algorithms with lower complexity when dealing with a small number of elements - particularly since with a small number of elements, the coefficients can matter a lot more than n. So, really, to know which algorithm is appropriate, you need to know how it's actually going to be used in the program in question and how different algorithms perform there. O(1) is definitely better, but it's not necessarily required. But yes, if you're dealing with an arbitrarily large number of elements, anything worse than O(1) isn't going to cut it if you have hard performance requirements.and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.I am sorry to say this, but hard performance requirements require O(1) on everything.Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.Yeah. Really, Big-Oh tells you something about the worst case with an arbitrary number of elements. What actually happens in the program depends heavily on the number of elements which are actually involved. With large numbers of elements, Big-Oh starts mattering, but if the number of elements isn't large, then coefficients and minor implementation details of an algorithm start mattering a lot more than its conceptual worst case. Still, in general, it's better to be aware of algorithmic complexity and favor algorithms with lower complexity until you need to optimize the code based on your specific use case and determine that a higher complexity algorithm actually performs better in that specific use case. - Jonathan M Davis
Dec 04 2015
On Friday, 4 December 2015 at 15:33:56 UTC, Jonathan M Davis wrote:Only when dealing with an arbitrary number of elements. O(n^2) could be fine if you know for a fact that you're always dealing with a very small number of elements.I think there was a misunderstanding about notation here. If we know that we use a small number of elements, then all operations are O(1) by definition, for any small constant.different algorithms perform there. O(1) is definitely better, but it's not necessarily required.Well, O(1) isn't better, it is just mandatory for anything real time. E.g: you fix your buffer to 1 seconds of audio and that is 44100 samples, then filling it is O(44100) == O(1). If am doing a scanline renderer for 2048 pixels, then an insertion sort is faster than merge sort, because most linesegments only move a few pixels per scanline. But here the problem is that there might be a 0.1% chance that most lines are almost horizontal in a variety of angles and then it breaks down, so that will be very limiting, but I still can figure out the hard maximum time required to complete, so it is still O(1) and tractable as a real time operation. If I accept arbitrary input, then I no longer can meet real time requirements, I cannot get to O(1) and therefore cannot put a fixed limit on maximum time required to complete. And that is really the only thing O(1) says: you can put a fixed limit on the time needed to complete the operation at compile time.actually involved. With large numbers of elements, Big-Oh starts mattering, but if the number of elements isn't large, then coefficients and minor implementation details of an algorithm start mattering a lot more than its conceptual worst case.Yes, but most collections of entities are not very large by nature. Usually, when you try to solve a problem faster you break it down by useful segmentation and clusters (like age, kind, proximity). Complexity is mostly for thinking about what options you have when designing an algorithm. The cookbook/library approach to complexity is rather limited since they were not written for the specifics of the problem.Still, in general, it's better to be aware of algorithmic complexity and favor algorithms with lower complexity until you need to optimize the code based on your specific use case and determine that a higher complexity algorithm actually performs better in that specific use case.Be aware of it, yes. Although in real code it is often best to just start out with something flexible like a dynamic array, and optimize when you know which operations are required and what operations you never will need. That often changes over time. I find that I am able to solve most of my real time problems with arrays. When I need hard requirements, I often end up using very simple datastructures like fixed sized circular buffers with a carefully calculated headroom (number of mostly unused elements) or linked lists of chunks or something really easy to reason about. During initialisation of the program it often does not matter. So if I use linear search and it completes in 100ms then fine. I don't care if there is a delay of 100ms at startup. For batch things done in Python, I just use whatever is easy, with current machines most things complete in less than a few seconds anyway. Making libraries easy to use is by far the most important parameter, I think. What would be cool is if the compiler could adjust the implementation of collections based on profiling!
Dec 04 2015
On 12/04/2015 02:50 AM, Minas Mina wrote:I agree -- putting the complexity (eh, running time) to the primitive's name seems like a bad idea. I believe that stability is a more important "feature" of a sorting algorithm than its running time, because you can make implications about it and use them for your own algorithms (I use it for implementing delta compression) and affects the output. Running time is still a good guarantee to have, but if `sort` runs O(n^2) or O(nlogn) is not going to affect how your output will look like!Sort is almost always assumed to take n log n time. The problem is when you combine simpler operations, e.g. inserting elements at the front in a loop. Andrei
Dec 04 2015
On 12/03/2015 10:46 PM, Walter Bright wrote:On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:Looks exaggerated, innit? The fact of the matter is people choose collections based on the complexity of their operations all the time. "I need to insert and remove at the front, so I'll use a list here." Or: "I need random access, I'll use a vector" etc. Not designing nomenclature for complexity leads to the usual design disasters such as providing a list with the same indexing operator as an array. Turning that on its head, if we want to allow people to design collection-independent code and mix and match collections based only on their APIs, those APIs must reflect the complexity of operations. Collection-independent code is big in my design btw. One idea is to have the very name reflect the operation. I'm thinking "insert" to mean "scoot over elements at the tail and insert the new element in the gap" and "link" to mean "create a new node and link it within this collection without scooting over anything". Then the relative costs will be implied. Another idea is to only classify operations as "quick" (expected O(1) up to O(log n)) and "not quick" (everything worse). Then we only have two categories, and "quick" will be in the name of the respective methods. AndreiNow this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy!Are you sure there are only 3 complexities? What if it's a self-balancing tree? I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?
Dec 04 2015
On Friday, 4 December 2015 at 13:58:50 UTC, Andrei Alexandrescu wrote:On 12/03/2015 10:46 PM, Walter Bright wrote:My personal preference would be not to have the complexity in the names, as I prefer shorter/concise names. Typically when I am writing code using containers of any sort I will check the documentation to determine what the cost of the operations I need is and base my choice off of that. I would think (hope) most others do this too. However, I don't have a strong objection to the what is being proposed. Would you have an insertLogarithmic ( insertInverseAckerman :o) too?On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:Looks exaggerated, innit? The fact of the matter is people choose collections based on the complexity of their operations all the time. "I need to insert and remove at the front, so I'll use a list here." Or: "I need random access, I'll use a vector" etc. AndreiNow this primitive may have three complexities:
Dec 04 2015
On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:My personal preference would be not to have the complexity in the names, as I prefer shorter/concise names. Typically when I am writing code using containers of any sort I will check the documentation to determine what the cost of the operations I need is and base my choice off of that. I would think (hope) most others do this too. However, I don't have a strong objection to the what is being proposed.std.container puts linear and/or stable in some of its names and then creates an alias to whichever one makes sense as the default where the alias doesn't have linear or stable in the name. e.g. linearRemove becomes remove via an alias. But actually using something like stableLinearRemove in code (or stable.linear.remove) becomes hideous _very_ quickly. No one is going to like it, and it really only benefits generic code, which most container code really isn't (and given how different containers tend to be in the complexity of their operations, I question that it even makes sense to use containers generically in many cases if you actually care about efficiency). Usually the genericity is achieved via iterators or ranges, not via the code that actually operates on the container. So, while I applaud Andrei's attempt to make it so that containers can be used generically where appropriate (and having a consistent API across containers is a big win even if they're never used generically), I do fear that attempts to put the complexity and stability of the functions into the API are going to make it so unwieldy that it's useless (or at least, very un-user-friendly). Obviously, we'll have to wait and see what he comes up with, but what almost everyone is going to want is remove and insert and the like, not stable.linear.insert or stableLinearInsert or any variant on that. - Jonathan M Davis
Dec 04 2015
On 12/4/2015 7:11 AM, Jonathan M Davis wrote:On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:Excessive use of aliases is a problem in and of itself - for example, trying to use a debugger with it, or examining the object code.My personal preference would be not to have the complexity in the names, as I prefer shorter/concise names. Typically when I am writing code using containers of any sort I will check the documentation to determine what the cost of the operations I need is and base my choice off of that. I would think (hope) most others do this too. However, I don't have a strong objection to the what is being proposed.std.container puts linear and/or stable in some of its names and then creates an alias to whichever one makes sense as the default where the alias doesn't have linear or stable in the name. e.g. linearRemove becomes remove via an alias.But actually using something like stableLinearRemove in code (or stable.linear.remove) becomes hideous _very_ quickly. No one is going to like it,I agree. Having such long names consisting of repeated use of the same words is a problem. I suggested in the pseudo namespaces thread using template parameters to express characteristics, as in: remove!(stable, linear) with sensible defaults so most of the time the user would just use: remove Generic code could use the parameters. Another alternative is to have attributes for the algorithm: stable linear remove() { ... } and then generic code could test for those attributes.
Dec 04 2015
On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:I suggested in the pseudo namespaces thread using template parameters to express characteristics, as in: remove!(stable, linear) with sensible defaults so most of the time the user would just use: removeThe nice thing about this is that it can be easy to specify which complexities an operation supports. --- void remove(complexity, T)(List!T list, size_t index) if(complexity >= Complexity.linear); //or however complexity objects work... List l; //... l.remove(3); l.remove!(Complexity.polynomial(2))(3); l.remove!(Complexity.constant)(3);//fails; there's no template specialization for this case because complex < linear.
Dec 04 2015
On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:On 12/4/2015 7:11 AM, Jonathan M Davis wrote:Oh, I agree. But if we end up with stuff like stableLinearRemove all over the place, it's better to have the aliases than not. However, it's far better IMHO to avoid having all of those long names in the first place.std.container puts linear and/or stable in some of its names and then creates an alias to whichever one makes sense as the default where the alias doesn't have linear or stable in the name. e.g. linearRemove becomes remove via an alias.Excessive use of aliases is a problem in and of itself - for example, trying to use a debugger with it, or examining the object code.I suggested in the pseudo namespaces thread using template parameters to express characteristics, as in: remove!(stable, linear) with sensible defaults so most of the time the user would just use: remove Generic code could use the parameters. Another alternative is to have attributes for the algorithm: stable linear remove() { ... } and then generic code could test for those attributes.Either of those would be better than stableLinearRemove or stable.linear.remove. The UDAs would be more documentation friendly, though being able to pass around template arguments could be valuable for the cases where you're trying to enforce specific complexity requirements. - Jonathan M Davis
Dec 04 2015
On Friday, 4 December 2015 at 22:17:21 UTC, Jonathan M Davis wrote:Either of those would be better than stableLinearRemove or stable.linear.remove. The UDAs would be more documentation friendly, though being able to pass around template arguments could be valuable for the cases where you're trying to enforce specific complexity requirements.It should be possible to unify the template and UDA forms. Assuming a template supportsComplexity(func, complexity) that determines whether a function or method has the specified complexity UDA (or a "faster" one), it's not hard to implement something like this: void removeWithComplexity(T, complexity)(T collection, size_t index) if(supportsComplexity!(T.remove, complexity) { collection.remove(idx); } List!int list; //... list.removeWithComplexity(Complexity.linear, 3); --- Of course, this implementation has issues with collection operations which are defined as UFCS functions (as opposed to methods), but it should be possible to get around that issue.
Dec 04 2015
On Saturday, 5 December 2015 at 00:13:02 UTC, BLM768 wrote:list.removeWithComplexity(Complexity.linear, 3);Uh, I mean list.removeWithComplexity!(Complexity.linear)(3).
Dec 04 2015
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:Consider the collections universe. So we have an imperative primitive like: c.insertAfter(r, x) where c is a collection, r is a range previously extracted from c, and x is a value convertible to collection's element type. The expression imperatively inserts x in the collection right after r. Now this primitive may have three complexities: * linear in the length of r (e.g. c is a singly-linked list) * linear in the number of elements after r in the collection (e.g. c is an array) * constant (c is a doubly-linked list) These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy! AndreiWhy not create the capability to get the complexity by the user: writeln(c.insertAfter(r, null).complexity); If all algorithms implemented such a feature, one could have a compile time argument to display the complexity of the algorithms along with profiling. Complexity is not important except when profiling. It's also generally static depending on the types rather than the values of the types.
Dec 08 2015
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:These complexities must be reflected in the name of the primitives. Or perhaps it's okay to conflate a couple of them. I know names are something we're awfully good at discussing :o). Destroy! AndreiI'm currently sitting in a software sector with one of the most demands on software stability and predictability there is. A key issue here is time-deterministic execution. I've seen no language that addresses this problem. What about adding yet another code qualifier, say deterministic, that either - forbids non-time-determinstic execution paths such as `if-statements` or perhaps better - makes the compiler generate code that strives to be as deterministic as possible, for instance executing all brances of a conditional statement. and of course checking this transitively over function calls. This would have to take into account direct or in-direct recursions which might become complicated. I'm of course also aware of the inherent variations in microcode execution time for most CPU architectures of today. Despite this fact, safety-critical systems (in for instance aerospace) uses the architectures and are in need of tools that can do as good as currently possible at least at the source code level. Destroy!
Mar 17 2016