digitalmars.D - "in" everywhere

atommixz (12/12) Oct 07 2010 It would be nice if it were possible to use the "in" expression wherever

Steven Schveighoffer (4/12) Oct 07 2010 This has been suggested before. The problem is that 'in' is generally

Daniel Gibson (7/22) Oct 07 2010 Is it?

Steven Schveighoffer (12/32) Oct 07 2010 The problem is with generic code. Generic code that accepts some type

Daniel Gibson (5/43) Oct 07 2010 I didn't know about arr.find(). Is that documented somewhere?

Adam D. Ruppe (4/4) Oct 07 2010 arr.find()

Daniel Gibson (3/9) Oct 07 2010 Ah ok, I thought it was a property of arrays or something like that.

Steven Schveighoffer (7/47) Oct 07 2010 std.algorithm.find:

Stewart Gordon (6/14) Oct 09 2010

Jonathan M Davis (17/35) Oct 09 2010 _Both_ matter.

Stewart Gordon (19/40) Oct 12 2010 True, yet you don't address semantic differences at all in what follows,...

Simen kjaeraas (11/28) Oct 12 2010 Not at all. What is being argued is that an interface not only dictate
Steven Schveighoffer (13/35) Oct 13 2010 No, what is suggested is that we build the complexity guarantees into th...

Stewart Gordon (10/16) Oct 14 2010

Simen kjaeraas (4/18) Oct 14 2010 --
Steven Schveighoffer (17/32) Oct 14 2010 No, not really. What I meant by "define get to be a fast operation", I ...

Andrei Alexandrescu (8/20) Oct 07 2010 I'm a bit leary of adopting this feature (it has been discussed). To me

Daniel Gibson (6/35) Oct 07 2010 That feels inconsistent.. to be able to use it with "literal arrays to

Andrei Alexandrescu (4/38) Oct 07 2010 It's not. It's all about constant size in the size of the input vs.

Daniel Gibson (4/47) Oct 07 2010 So what about static arrays?

Andrei Alexandrescu (4/50) Oct 07 2010 Same deal - same as literal arrays: they can be searched. The expected

Michel Fortin (14/65) Oct 07 2010 What about a static array of 200000 elements? At which point does it

Andrei Alexandrescu (4/62) Oct 07 2010 At no point. "Linear" means "linear in the input size". I don't think

Don (11/78) Oct 07 2010 I actually see a very significant difference between 'in' on a

Steven Schveighoffer (4/14) Oct 07 2010 Let's first make literals not allocate heap data or else the big-O

Michel Fortin (22/24) Oct 07 2010 It is linear in regard to the array length, the static array being the

Andrei Alexandrescu (4/12) Oct 07 2010 "Input" == "Input to the program" i.e. not known during compilation of

Michel Fortin (16/30) Oct 07 2010 Sorry, but my interpretation in this context is that "Input" should be

Steven Schveighoffer (5/19) Oct 08 2010 100% agree. If it can do something non-linear (i.e. auto sort the array...

Manfred_Nowak (5/8) Oct 07 2010 Seems that you are argumenting for some elaborated sort of hungarian

Andrei Alexandrescu (3/11) Oct 07 2010 All I'm saying is that complexity is not an implementation detail.

Jonathan M Davis (12/34) Oct 07 2010 It feels inconsistent because it only works on a particular type (arrays...

Marianne Gagnon (4/17) Oct 07 2010 IMO this could be added, with the documentation specifying the implement...
Austin Hastings (11/34) Oct 07 2010 This is a non-argument. "In" is an operator. "*" is an operator.

Steven Schveighoffer (10/47) Oct 07 2010 No, this is not just a difference in runtime, this is a difference in
Andrei Alexandrescu (3/45) Oct 07 2010 Sorry, I disagree with this so completely, I wouldn't know where to star...

so (15/38) Oct 09 2010 Reading these boards, both new and old posts, i hardly find any topic th...

so (8/19) Oct 09 2010 Language libraries are special and naturally care needs to be taken, you...

bearophile (11/20) Oct 07 2010 The "in" is useful to tell if an item is present in a collection (dynami...

Pelle (3/23) Oct 07 2010 I think it's better to know that in always has logarithmic or better
Andrei Alexandrescu (4/5) Oct 07 2010 That means we'd have to define another operation, i.e. "quickIn" that

Rainer Deyke (18/24) Oct 07 2010 Why?

Steven Schveighoffer (12/21) Oct 07 2010 Then you don't understand how important it is. Here is an example of ho...

Stanislav Blinov (16/34) Oct 07 2010 Personally, I'd vote for inclusion of such a function in Phobos. The big...
Rainer Deyke (20/32) Oct 07 2010 Let me rephrase that. I care about performance. Big-O complexity can

Steven Schveighoffer (26/56) Oct 08 2010 You'd be surprised at how important it is. I remember competing on

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (10/26) Oct 08 2010 If big O complexity is so important, then why does everyone use

Seth Hoenig (5/30) Oct 08 2010 Because on average quicksort is faster than heap sort, and uses much les...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (11/28) Oct 09 2010 e

Andrei Alexandrescu (10/30) Oct 09 2010 In fact, guards can be put to ensure that the _expected_ (not average,

bearophile (4/7) Oct 09 2010 Isn't median of 3 a better default?

Peter Alexander (12/19) Oct 09 2010 There is no "best". They are all trade-offs between the time taken to

Andrei Alexandrescu (3/25) Oct 09 2010 Also: in-place sorting for the median cases.

Peter Alexander (3/32) Oct 09 2010 Could you elaborate? I don't see how you lose in-place sorting with the

Andrei Alexandrescu (4/38) Oct 09 2010 You just take median-of-some and also sort those elements right away

Sean Kelly (2/27) Oct 09 2010 Introsort (a quicksort variant) degrades to heapsort for sorts that are ...

Sean Kelly (2/14) Oct 09 2010 It would be nice if it used insertion sort for small ranges as well. Th...

Andrei Alexandrescu (3/17) Oct 09 2010 I guess that's "fit pivot" :o). Fat pivot does cost some more.

Lutger (5/28) Oct 08 2010 For average case big O and dwarfing heap/merge sort in the constant fact...

Andrei Alexandrescu (16/38) Oct 07 2010 Complexity composes very badly. Issues tend to manifest at moderate

Stanislav Blinov (12/39) Oct 07 2010 But that is not a matter of library interface isn't it? It's a matter of...

Stanislav Blinov (2/4) Oct 07 2010 Sorry, that should be 'cases', not 'containers' there.
Jonathan M Davis (21/40) Oct 07 2010 Except that when you're dealing with generic code which has to deal with...

Juanjo Alvarez (8/11) Oct 07 2010 with

Pelle (2/13) Oct 08 2010 What do you suggest for fast lookup in a container?

Juanjo Alvarez (3/18) Oct 08 2010 What is being used now? How can having "in" and not using it (just like ...

Pelle (4/22) Oct 08 2010 If I write a generic algorithm, I can use opIn and assume it is fast. If...
so (6/12) Oct 09 2010 This is pretty much it. I would never use a language which is designed f...

Steven Schveighoffer (8/19) Oct 08 2010 That kind of "documentation" is useless, it doesn't prevent use, and it ...

Juanjo Alvarez (5/30) Oct 08 2010 True! And that's the only drawback I see on generalizing "in", but there...

Simen kjaeraas (13/23) Oct 08 2010 Absolutely. And one of its trademarks is being as fast as C. Now, C clea...
Steven Schveighoffer (42/63) Oct 08 2010 Let's move off of in for a minute, so I can illustrate what I mean. in ...
Lutger (7/44) Oct 08 2010 Suppose I would like to use a faster AA (for some cases) and define opIn...

so (13/33) Oct 09 2010 Sure, write some random strings and compile it, if it doesn't compile, y...

Stanislav Blinov (25/53) Oct 08 2010 Yet still, generality ends at some point. You can't devise every

Jonathan M Davis (13/40) Oct 08 2010 All true. However, the point is that operations need to have a know comp...

Stanislav Blinov (3/21) Oct 08 2010 Ahh, I think I get the perspective now, though I had to reread the whole...

Torarin (10/15) Oct 08 2010 void push_back(const _Tp& __x) {

Andrei Alexandrescu (3/18) Oct 08 2010 My code wasn't using push_back.

Tomek =?UTF-8?B?U293acWEc2tp?= (6/17) Oct 07 2010 Good idea. It would make a nice use case for user-defined attributes in ...

Simen kjaeraas (8/9) Oct 07 2010 How does this test for things like N log M?

Tomek =?UTF-8?B?U293acWEc2tp?= (16/24) Oct 07 2010 Or:

Simen kjaeraas (7/14) Oct 07 2010 =

Tomek =?UTF-8?B?U293acWEc2tp?= (4/14) Oct 07 2010 What's the halting problem?

Jonathan M Davis (17/30) Oct 07 2010 http://en.wikipedia.org/wiki/Halting_problem

Bruno Medeiros (4/11) Oct 29 2010 O_o

Simen kjaeraas (23/35) Oct 07 2010 ne
Stewart Gordon (20/21) Oct 09 2010 Basically, it's the challenge of determining algorithmically whether an

Peter Alexander (5/8) Oct 09 2010 But solving those problems would mean nothing in that hypothetical
Seth Hoenig (68/90) Oct 09 2010 m

Tomek =?UTF-8?B?U293acWEc2tp?= (8/42) Oct 07 2010 If O.linear was a template, it could take an alias...

Daniel Gibson (7/24) Oct 07 2010 n

Tomek =?UTF-8?B?U293acWEc2tp?= (10/38) Oct 07 2010 I don't think so, there can always be uniform syntax wrappers (there's a...

bearophile (4/5) Oct 08 2010 Yes, that's overkill, and it's not a good solution. But sometimes it's u...

atommixz <atommixz gmail.com> writes:

It would be nice if it were possible to use the "in" expression wherever
possible. Now it is only implemented for associative. arrays. (Weird).
Examples of how this could be used:
- Find string in string
- Search for a character in a string
- Search for an item in the array, array of characters, array of strings,
tuples, enum, structure
- what else?

In Python done something like this.

Here it would be useful to me
http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

This has been suggested before.  The problem is that 'in' is generally  
considered to be a fast operation (< O(n)) so linear search is out.

-Steve

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:
 
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 
 This has been suggested before.  The problem is that 'in' is generally 
 considered to be a fast operation (< O(n)) so linear search is out.
 
 -Steve

Is it?
I wouldn't expect it to be < O(n) on a regular array, but it'd still be 
convenient to have instead of iterating over the array and comparing the 
contents yourself.

Cheers,
- Daniel

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

  This has been suggested before.  The problem is that 'in' is generally  
 considered to be a fast operation (< O(n)) so linear search is out.
  -Steve

 Is it?
 I wouldn't expect it to be < O(n) on a regular array, but it'd still be  
 convenient to have instead of iterating over the array and comparing the  
 contents yourself.

The problem is with generic code.  Generic code that accepts some type  
that defines the "in" operator.  What can that generic code expect for  
performance when using "in"?  As a general rule, generic programming must  
always assume the worst case, and if we have no rules for 'in', the worst  
case is linear.  Which means generic code may not use 'in' when it would  
be a fast operation.  Same thing with indexing.  Try sorting a 'random  
access' range which uses a linear search on opIndex, and see what the  
performance is.

In addition, arr.find(x) seems pretty simple to me.

-Steve

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson 
 <metalcaedes gmail.com> wrote:
 
 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression 
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of 
 strings,
 tuples, enum, structure
 - what else?

  This has been suggested before.  The problem is that 'in' is 
 generally considered to be a fast operation (< O(n)) so linear search 
 is out.
  -Steve

 Is it?
 I wouldn't expect it to be < O(n) on a regular array, but it'd still 
 be convenient to have instead of iterating over the array and 
 comparing the contents yourself.

 
 The problem is with generic code.  Generic code that accepts some type 
 that defines the "in" operator.  What can that generic code expect for 
 performance when using "in"?  As a general rule, generic programming 
 must always assume the worst case, and if we have no rules for 'in', the 
 worst case is linear.  Which means generic code may not use 'in' when it 
 would be a fast operation.  Same thing with indexing.  Try sorting a 
 'random access' range which uses a linear search on opIndex, and see 
 what the performance is.
 
 In addition, arr.find(x) seems pretty simple to me.
 
 -Steve

I didn't know about arr.find(). Is that documented somewhere?

I do however agree that arr.find() is sufficient if it's there.

Cheers,
- Daniel

Oct 07 2010

Adam D. Ruppe <destructionator gmail.com> writes:

arr.find()

http://dpldocs.info/std.algorithm.find

With arrays, you can call free functions with the dot member
syntax so import std.algorithm and that just works.

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Adam D. Ruppe schrieb:
 arr.find()
 
 http://dpldocs.info/std.algorithm.find
 
 With arrays, you can call free functions with the dot member
 syntax so import std.algorithm and that just works.

Ah ok, I thought it was a property of arrays or something like that.
Thanks :-)

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 13:51:36 -0400, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson  
 <metalcaedes gmail.com> wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com>  
 wrote:

 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays.  
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

  This has been suggested before.  The problem is that 'in' is  
 generally considered to be a fast operation (< O(n)) so linear search  
 is out.
  -Steve

 Is it?
 I wouldn't expect it to be < O(n) on a regular array, but it'd still  
 be convenient to have instead of iterating over the array and  
 comparing the contents yourself.

  The problem is with generic code.  Generic code that accepts some type  
 that defines the "in" operator.  What can that generic code expect for  
 performance when using "in"?  As a general rule, generic programming  
 must always assume the worst case, and if we have no rules for 'in',  
 the worst case is linear.  Which means generic code may not use 'in'  
 when it would be a fast operation.  Same thing with indexing.  Try  
 sorting a 'random access' range which uses a linear search on opIndex,  
 and see what the performance is.
  In addition, arr.find(x) seems pretty simple to me.
  -Steve

 I didn't know about arr.find(). Is that documented somewhere?

 I do however agree that arr.find() is sufficient if it's there.

std.algorithm.find:  
http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html#find

arr.find works because of the universal call syntax for arrays makes  
arr.find(x) equivalent to find(arr, x).

-Steve

Oct 07 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 07/10/2010 15:22, Steven Schveighoffer wrote:
<snip>
 The problem is with generic code. Generic code that accepts some type
 that defines the "in" operator. What can that generic code expect for
 performance when using "in"? As a general rule, generic programming must
 always assume the worst case, and if we have no rules for 'in', the
 worst case is linear. Which means generic code may not use 'in' when it
 would be a fast operation. Same thing with indexing. Try sorting a
 'random access' range which uses a linear search on opIndex, and see
 what the performance is.

<snip>

Surely, what matters most for generic code is that in has consistent 
_semantics_, rather than the computational complexity?

Stewart.

Oct 09 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:
 On 07/10/2010 15:22, Steven Schveighoffer wrote:
 <snip>
 
 The problem is with generic code. Generic code that accepts some type
 that defines the "in" operator. What can that generic code expect for
 performance when using "in"? As a general rule, generic programming must
 always assume the worst case, and if we have no rules for 'in', the
 worst case is linear. Which means generic code may not use 'in' when it
 would be a fast operation. Same thing with indexing. Try sorting a
 'random access' range which uses a linear search on opIndex, and see
 what the performance is.

 
 <snip>
 
 Surely, what matters most for generic code is that in has consistent
 _semantics_, rather than the computational complexity?
 
 Stewart.

_Both_ matter.

Imagine, for instance, if we were to make Java's mistake of having List be an 
interface which ArrayList and LinkedList implement and that List has a get() 
function. Using ArrayList, get() has a complexity of O(1). Using LinkedList, it 
has a complexity of O(n). Now, imagine code written with List rather than 
ArrayList or LinkedList directly. If you use get(), then your code will vary 
considerably in how efficient is. In small cases, this won't matter, but with 
large lists, it could matter a lot. Your generic code which uses List _has_ to 
worry about the computational complexity of get() or it could become very 
inefficient. In this case, for most situations, that would mean using a foreach 
loop or iterators directly, but using get() is likely a _bad_ idea. You need to 
know the computational complexity of get() in order to use List efficiently.

To write efficient algorithms, you _must_ be able to rely on certain complexity 
guarantees for the operations that you use. So, regardless of what those 
complexity gurantees actually _are_, they need to be there.

- Jonathan M Davis

Oct 09 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 10/10/2010 02:05, Jonathan M Davis wrote:
 On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:

<snip>
 Surely, what matters most for generic code is that in has consistent
 _semantics_, rather than the computational complexity?

 Stewart.

 _Both_ matter.

True, yet you don't address semantic differences at all in what follows, 
so it doesn't really follow on from what I said.  But anyway....

 Imagine, for instance, if we were to make Java's mistake of having List be an
 interface which ArrayList and LinkedList implement and that List has a get()
 function. Using ArrayList, get() has a complexity of O(1). Using LinkedList, it
 has a complexity of O(n).

Are you suggesting we just don't have a List interface, just to prevent 
people from writing algorithms that work well in some list 
implementations but poorly in others?  I'm not sure if that risk 
outweighs any benefits....

 Now, imagine code written with List rather than
 ArrayList or LinkedList directly. If you use get(), then your code will vary
 considerably in how efficient is. In small cases, this won't matter, but with
 large lists, it could matter a lot. Your generic code which uses List _has_ to
 worry about the computational complexity of get() or it could become very
 inefficient. In this case, for most situations, that would mean using a foreach
 loop or iterators directly, but using get() is likely a _bad_ idea. You need to
 know the computational complexity of get() in order to use List efficiently.

Indeed.  But what is an efficient algorithm depends on how the data 
structure works in the first place, not just the computational 
complexity of particular operations.  It is perhaps for this reason that 
it is foolish to try to write such generic algorithms.

An example of this is Collections.sort(List) in Java, which gets around 
it by dumping the list into an array, an operation which itself adds 
overhead.  Better would be to have sort as a method of List, so that for 
each list class an efficient implementation of sort can be provided.

 To write efficient algorithms, you _must_ be able to rely on certain complexity
 guarantees for the operations that you use. So, regardless of what those
 complexity gurantees actually _are_, they need to be there.

Setting the complexity guarantee of everything to O(n!) would achieve 
_that_ aim in most everyday cases.... :)

Stewart.

Oct 12 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Imagine, for instance, if we were to make Java's mistake of having List  
 be an
 interface which ArrayList and LinkedList implement and that List has a  
 get()
 function. Using ArrayList, get() has a complexity of O(1). Using  
 LinkedList, it
 has a complexity of O(n).

 Are you suggesting we just don't have a List interface, just to prevent  
 people from writing algorithms that work well in some list  
 implementations but poorly in others?  I'm not sure if that risk  
 outweighs any benefits....

Not at all. What is being argued is that an interface not only dictate
what the object does, but to an extent how it does it (i.e. complexity
guarantees).

Just as an interface offering only function signatures with no
explanation of what a particular function does, would be considered
insufficient or outright dangerous, as should one not warning of
complexity consideration.


 To write efficient algorithms, you _must_ be able to rely on certain  
 complexity
 guarantees for the operations that you use. So, regardless of what those
 complexity gurantees actually _are_, they need to be there.

 Setting the complexity guarantee of everything to O(n!) would achieve  
 _that_ aim in most everyday cases.... :)

As well as being completely useless.


-- 
Simen

Oct 12 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 12 Oct 2010 10:48:26 -0400, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 10/10/2010 02:05, Jonathan M Davis wrote:
 On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:

 <snip>
 Surely, what matters most for generic code is that in has consistent
 _semantics_, rather than the computational complexity?

 Stewart.

 _Both_ matter.

 True, yet you don't address semantic differences at all in what follows,  
 so it doesn't really follow on from what I said.  But anyway....

 Imagine, for instance, if we were to make Java's mistake of having List  
 be an
 interface which ArrayList and LinkedList implement and that List has a  
 get()
 function. Using ArrayList, get() has a complexity of O(1). Using  
 LinkedList, it
 has a complexity of O(n).

 Are you suggesting we just don't have a List interface, just to prevent  
 people from writing algorithms that work well in some list  
 implementations but poorly in others?  I'm not sure if that risk  
 outweighs any benefits....

No, what is suggested is that we build the complexity guarantees into the  
interface.  For example, if you had a List interface, and it could only  
guarantee O(n) performance for get, you'd call it e.g. slowGet(), to  
signify it's a slow operation, and define get to signify a fast  
operation.  Similarly, 'in' is defined to be fast, so it has no business  
being a function on an array.

So you can still have the List interface, but you don't hijack the "fast  
operation" of 'in' with a slow operation.  This allows it to still provide  
the slow operation, just not in a way that makes generic code use it  
assuming it is fast.

-Steve

Oct 13 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 13/10/2010 21:11, Steven Schveighoffer wrote:
<snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

<snip>

And make get assert(false) if a fast get isn't implementable?  I think 
this is bug-prone - somebody may use it forgetting that it won't work on 
all List implementations, and then the algorithm will refuse at runtime 
to work at all rather than degrading gracefully to something that does.

But something else that would be useful is methods in the interface to 
tell which operations are fast and which are slow.

Stewart.

Oct 14 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Stewart Gordon <smjg_1998 yahoo.com> wrote:

 On 13/10/2010 21:11, Steven Schveighoffer wrote:
 <snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

 <snip>

 And make get assert(false) if a fast get isn't implementable?

No. That list would not be able to implement the List interface.


 I think this is bug-prone - somebody may use it forgetting that it won't  
 work on all List implementations, and then the algorithm will refuse at  
 runtime to work at all rather than degrading gracefully to something  
 that does.



-- 
Simen

Oct 14 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 14 Oct 2010 06:04:24 -0400, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 13/10/2010 21:11, Steven Schveighoffer wrote:
 <snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

 <snip>

 And make get assert(false) if a fast get isn't implementable?  I think  
 this is bug-prone - somebody may use it forgetting that it won't work on  
 all List implementations, and then the algorithm will refuse at runtime  
 to work at all rather than degrading gracefully to something that does.

 But something else that would be useful is methods in the interface to  
 tell which operations are fast and which are slow.

No, not really.  What I meant by "define get to be a fast operation", I  
mean define the term "get" to mean fast operation, not to define get on  
the List interface and you implement the one you want.  You'd only define  
slowGet as a method on the List interface, and then compile-time checks  
that require fast operations wouldn't be able to use List.  If it's  
acceptable for a compile-time checked function to use slowGet, then it  
could use either.

Essentially, in your interface, you define the operation's complexity, and  
avoid using terms that imply fast operation.  This allows both  
compile-time interfaces and runtime interfaces.

For example, dcollections' List interface doesn't even define something  
like get.  Why?  Because Lists aren't meant to be storage containers for  
quick access to elements.  It used to have a contains method, but I  
eliminated that because the value to cost ratio was too low.

-Steve

Oct 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me 
"in" implies a fast operation and substring searching isn't quite it.

One thing that could be done is to allow "in" with literal arrays to 
their right:

if (x in ["abcde", "asd"]) { ... }

The size of the operand is constant, known, and visible.


Andrei

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 
 I'm a bit leary of adopting this feature (it has been discussed). To me 
 "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to 
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

That feels inconsistent.. to be able to use it with "literal arrays to 
their right" (and what about fixed size arrays?) but not with actual 
arrays and dynamic arrays seems weird.

Cheers,
- Daniel

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. 
arbitrary size. Makes perfect sense to me.

Andrei

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression 
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d 

 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 
 It's not. It's all about constant size in the size of the input vs. 
 arbitrary size. Makes perfect sense to me.
 

So what about static arrays?

Cheers,
- Daniel

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite
 it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 It's not. It's all about constant size in the size of the input vs.
 arbitrary size. Makes perfect sense to me.

 So what about static arrays?

Same deal - same as literal arrays: they can be searched. The expected 
run time is known during compilation and independent of the input size.

Andrei

Oct 07 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?
 
 In Python done something like this.
 
 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 
 
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d
 

 
 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite
 it.
 
 One thing that could be done is to allow "in" with literal arrays to
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

 
 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 
 It's not. It's all about constant size in the size of the input vs.
 arbitrary size. Makes perfect sense to me.
 

 
 So what about static arrays?

 
 Same deal - same as literal arrays: they can be searched. The expected 
 run time is known during compilation and independent of the input size.

What about a static array of 200000 elements? At which point does it 
become linear?

I'm okay with it for compile-time constant literals, because the values 
are known at compile time and the compiler can optimize the search at 
compile time (so it doesn't have to be linear), but for static array 
where the content isn't known at compile time, I'd not allow it.

Perhaps it could be allowed with a tuple. :-)

	if (x in ("abcde", "asd")) { ... }

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 12:11 CDT, Michel Fortin wrote:
 On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays.
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py



 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite
 it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 It's not. It's all about constant size in the size of the input vs.
 arbitrary size. Makes perfect sense to me.

 So what about static arrays?

 Same deal - same as literal arrays: they can be searched. The expected
 run time is known during compilation and independent of the input size.

 What about a static array of 200000 elements? At which point does it
 become linear?

At no point. "Linear" means "linear in the input size". I don't think 
such arguments are valid.

Andrei

Oct 07 2010

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 On 10/7/10 12:11 CDT, Michel Fortin wrote:
 On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays.
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 




 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d 

 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite
 it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 That feels inconsistent.. to be able to use it with "literal 
 arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 It's not. It's all about constant size in the size of the input vs.
 arbitrary size. Makes perfect sense to me.

 So what about static arrays?

 Same deal - same as literal arrays: they can be searched. The expected
 run time is known during compilation and independent of the input size.

 What about a static array of 200000 elements? At which point does it
 become linear?

 
 At no point. "Linear" means "linear in the input size". I don't think 
 such arguments are valid.


I actually see a very significant difference between 'in' on a 
compile-time array literal, vs 'in' on a static array. When all values 
are known, the compiler can make a perfect hash. So potentially it could 
be O(1).

But 'find' on a static array will run the same function which would be 
used for a dynamic array. The difference is theoretical ONLY -- the 
exact same asm instructions will be executed!

In fact,   x in ["abc", "def"] could be supported by allowing implicit 
conversion from array literals to AA literals.
["abc", "def"] -----> ["abc" : 0, "def" : 1]

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 16:32:22 -0400, Don <nospam nospam.com> wrote:

 I actually see a very significant difference between 'in' on a  
 compile-time array literal, vs 'in' on a static array. When all values  
 are known, the compiler can make a perfect hash. So potentially it could  
 be O(1).

 But 'find' on a static array will run the same function which would be  
 used for a dynamic array. The difference is theoretical ONLY -- the  
 exact same asm instructions will be executed!

 In fact,   x in ["abc", "def"] could be supported by allowing implicit  
 conversion from array literals to AA literals.
 ["abc", "def"] -----> ["abc" : 0, "def" : 1]

Let's first make literals not allocate heap data or else the big-O  
complexity don't matter at all :)

-Steve

Oct 07 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 At no point. "Linear" means "linear in the input size". I don't think 
 such arguments are valid.

It is linear in regard to the array length, the static array being the 
input. That the length is known at compile time doesn't make it less of 
an input for the "in" operator, even though it's not an input of the 
program at runtime.

Let's say for instance that you have a big static array of keywords and 
you want to check if a string contains one of those keywords, would you 
really do this?

	string[200] keywords = ["for", "foreach", "auto", .... 200 keywords .... ];

	foreach (word; listOfWords)
		if (word in keywords)
			writeln("Found a keyword! ", word);

That's so easy to read and write! but so stupid at the same time if 
"in" performs a linear search.

In the example above, if "keywords" was immutable then the compiler 
could optimize things by using a good search strategy, but otherwise 
it's just ridiculous to allow it to compile.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 16:23 CDT, Michel Fortin wrote:
 On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 At no point. "Linear" means "linear in the input size". I don't think
 such arguments are valid.

 It is linear in regard to the array length, the static array being the
 input. That the length is known at compile time doesn't make it less of
 an input for the "in" operator, even though it's not an input of the
 program at runtime.

"Input" == "Input to the program" i.e. not known during compilation of 
the program.

Andrei

Oct 07 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-10-07 18:53:51 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 16:23 CDT, Michel Fortin wrote:
 On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 
 At no point. "Linear" means "linear in the input size". I don't think
 such arguments are valid.

 
 It is linear in regard to the array length, the static array being the
 input. That the length is known at compile time doesn't make it less of
 an input for the "in" operator, even though it's not an input of the
 program at runtime.

 
 "Input" == "Input to the program" i.e. not known during compilation of 
 the program.

Sorry, but my interpretation in this context is that "Input" should be 
"Input to the 'in' operator". Sure, it won't affect the complexity of 
the program if the input of the operator is constant, but the 
complexity of that operator is still linear in the sense that the time 
it takes for one search scales linearly with the size of the input. 
That the size of the input is decided at runtime or at compile time 
does not change the complexity of the operator, nor does it make it 
less stupid to use the operator on arrays longer than a few elements.

"in" shouldn't be allowed to perform a linear operation, and I mean 
linear relative to the size of it's input.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 19:28:59 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2010-10-07 18:53:51 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:
  "Input" == "Input to the program" i.e. not known during compilation of  
 the program.

 Sorry, but my interpretation in this context is that "Input" should be  
 "Input to the 'in' operator". Sure, it won't affect the complexity of  
 the program if the input of the operator is constant, but the complexity  
 of that operator is still linear in the sense that the time it takes for  
 one search scales linearly with the size of the input. That the size of  
 the input is decided at runtime or at compile time does not change the  
 complexity of the operator, nor does it make it less stupid to use the  
 operator on arrays longer than a few elements.

 "in" shouldn't be allowed to perform a linear operation, and I mean  
 linear relative to the size of it's input.

100% agree.  If it can do something non-linear (i.e. auto sort the array  
or optimize using a static hash table), then I think that's fine.

-Steve

Oct 08 2010

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

 The size of the operand is constant, known, and visible.

 The expected run time is known during compilation and independent of
 the input size.

Seems that you are argumenting for some elaborated sort of hungarian 
notation: operators should be attributed by some otherwise hidden 
expectations of their runtime?

-manfred

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 15:00 CDT, Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 The size of the operand is constant, known, and visible.

 The expected run time is known during compilation and independent of
 the input size.

 Seems that you are argumenting for some elaborated sort of hungarian
 notation: operators should be attributed by some otherwise hidden
 expectations of their runtime?

 -manfred

All I'm saying is that complexity is not an implementation detail.

Andrei

Oct 07 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, October 07, 2010 08:37:19 Andrei Alexandrescu wrote:
 
 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

 
 That feels inconsistent.. to be able to use it with "literal arrays to
 their right" (and what about fixed size arrays?) but not with actual
 arrays and dynamic arrays seems weird.

 
 It's not. It's all about constant size in the size of the input vs.
 arbitrary size. Makes perfect sense to me.
 
 Andrei

It feels inconsistent because it only works on a particular type (arrays) some 
of the time. At least with associative arrays it _always_ works. You don't have 
to worry about whether it's legal in a particular context. Being able to use in 
on only a subset of arrays (like array literals) would not be consistent with 
regards to type (even if it's consistent with regards to complexity) and could 
lead to confusion.

Personally, I think that find() and indexof() do the the job just fine and I
see 
no need to expand in. Regardless, I certainly don't want to have in working on 
arrays only some of the time. I completely agree with Daniel that doing so
would 
be inconsistent.

- Jonathan M Davis

Oct 07 2010

Marianne Gagnon <auria.mg gmail.com> writes:

IMO this could be added, with the documentation specifying the implementation
doesn't need to best fast. So people who want quick development for things that
don't need high performance may use it, while people coding for performance
just need not use it. The compiler may try to optimize it if it can.

I have seen python programs avoid the use of "in" since it was slower

About substring I don't know, though. But I think it makes sense for
collections (arrays) and opIn or so could be added so other types of
collections may use the "in" keyword perhaps

-- Auria

 On 10/7/10 6:54 CDT, atommixz wrote:
 I'm a bit leary of adopting this feature (it has been discussed). To me 
 "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to 
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

Oct 07 2010

Austin Hastings <ah08010-d yahoo.com> writes:

On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To me
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

This is a non-argument. "In" is an operator. "*" is an operator. 
Everyone "knew" that multiplying longs was slower than multiplying ints. 
Everyone "knew" that multiplying floating point was slower than 
multiplying integers. But * was still defined for floating point types 
because it made sense.

If querying a collection for the presence of a key makes sense (and it 
does to me) then the operator should work across as many collections as 
possible, regardless of the running time.

(The presence of optimized variations is a bonus. Not a requirement.)

=Austin

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 13:28:51 -0400, Austin Hastings <ah08010-d yahoo.com>  
wrote:

 On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To me
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 This is a non-argument. "In" is an operator. "*" is an operator.  
 Everyone "knew" that multiplying longs was slower than multiplying ints.  
 Everyone "knew" that multiplying floating point was slower than  
 multiplying integers. But * was still defined for floating point types  
 because it made sense.

No, this is not just a difference in runtime, this is a difference in  
complexity.  The complexity of in on an associative array is O(1), the  
complexity of in on an array is O(n).  This is a huge difference from  
comparing multiplying floats and ints.

in's complexity should be O(lg(n)) or better.

I also disagree with Andrei that in could work on arrays in certain  
cases.  It has no business being called on an array, we already have find.

-Steve

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 12:28 CDT, Austin Hastings wrote:
 On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To me
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

 This is a non-argument. "In" is an operator. "*" is an operator.
 Everyone "knew" that multiplying longs was slower than multiplying ints.
 Everyone "knew" that multiplying floating point was slower than
 multiplying integers. But * was still defined for floating point types
 because it made sense.

 If querying a collection for the presence of a key makes sense (and it
 does to me) then the operator should work across as many collections as
 possible, regardless of the running time.

 (The presence of optimized variations is a bonus. Not a requirement.)

 =Austin

Sorry, I disagree with this so completely, I wouldn't know where to start.

Andrei

Oct 07 2010

so <so so.do> writes:

On Thu, 07 Oct 2010 17:52:10 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

 I'm a bit leary of adopting this feature (it has been discussed). To me  
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to  
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.


 Andrei

Reading these boards, both new and old posts, i hardly find any topic that  
i disagree with you.
This one is one of those rare cases.

It is really hard for me to connect "operator usage" and "operation  
complexity". To me, operators are just "cute", and mostly meaningful (and  
not necessarily) shortcuts.
You can overload any operator, doesn't matter what it says/feels, you can  
do pretty much anything you like, on any complexity level.

Either purge it (you get Java) or lift the restrictions and let coder have  
it.

Thanks.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Oct 09 2010

so <so so.do> writes:

 Reading these boards, both new and old posts, i hardly find any topic  
 that i disagree with you.
 This one is one of those rare cases.

 It is really hard for me to connect "operator usage" and "operation  
 complexity". To me, operators are just "cute", and mostly meaningful  
 (and not necessarily) shortcuts.
 You can overload any operator, doesn't matter what it says/feels, you  
 can do pretty much anything you like, on any complexity level.

 Either purge it (you get Java) or lift the restrictions and let coder  
 have it.

 Thanks.

Language libraries are special and naturally care needs to be taken, you  
would need strong guaranties for pretty much anything in Phobos.

You can just use some other function, as a reference library, by doing  
that you also discourage its usage on things that require strong  
guaranties.

Thanks.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Oct 09 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 I'm a bit leary of adopting this feature (it has been discussed). To me 
 "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to 
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.

The "in" is useful to tell if an item is present in a collection (dynamic
arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree
set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and
it's useful to tell if a substring is present in a string. Those two purposes
are very commonly useful in programs.

Despite those purposes are very common, there is no need for an "in" syntax. In
dlibs1 I have created an isIn() function that was good enough.

D supports the "in" operator for associative arrays, and in its operator
overloading (my Set()/set() implementation supports the "in" operator), so it's
natural for people to desire that operator to be usable for the other kind of
collections too, like arrays. Maybe those people have used Python, where the in
operator (__contains__) is widey used.

Regarding generic programming, Python doesn't have templates, but it has
dynamic typing, that also has purposes similar to generic programming. To a
python function that uses the "in" operator you may give both a list and an
associative array, or other objects that support the __contains__. In this case
the Python code performs an O(n) or O(1) operation according to the dynamic
type given to the function.

I can live without the "in" operation for arrays and strings (useful to look
for substrings too). So one solution is not not change the D language, and not
add support for "in" to arrays.

Another solution is just to accept O(n) as the worst complexity for the "in"
operator. I don't understand what's the problem in this.

Another solution is to support the "in" operator for dynamic arrays too, and
define a new attribute, like  complexity(), plus an Enum that allows to specify
the worst case complexity. So associative arrays are annotated with
 complexity(O.linear), while the function that searches for items/substrings in
arrays/strings is  complexity(O.constant). At compile-time the generic code is
then able to query the computational complexity of the "in" operator
implementation. The problem is that the compiler today can't enforce functions
annotated with  complexity(O.constant) to actually not perform a linear search
(but I don't think it's a problem, today if the Range protocol asks an
operation to not be worse than O(n ln n) the compiler doesn't enforce it).

I don't like the idea of allowing the "in" operator for sorted arrays, tuples,
fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic
arrays. It looks a bit silly. People for the next ten years will then ask "in"
to be extended for unsorted dynamic arrays & substring search too, you can bet
on it.

Bye,
bearophile

Oct 07 2010

Pelle <pelle.mansson gmail.com> writes:

On 10/07/2010 09:40 PM, bearophile wrote:
 Andrei Alexandrescu:

 I'm a bit leary of adopting this feature (it has been discussed). To me
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.

 The "in" is useful to tell if an item is present in a collection (dynamic
arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree
set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and
it's useful to tell if a substring is present in a string. Those two purposes
are very commonly useful in programs.

 Despite those purposes are very common, there is no need for an "in" syntax.
In dlibs1 I have created an isIn() function that was good enough.

 D supports the "in" operator for associative arrays, and in its operator
overloading (my Set()/set() implementation supports the "in" operator), so it's
natural for people to desire that operator to be usable for the other kind of
collections too, like arrays. Maybe those people have used Python, where the in
operator (__contains__) is widey used.

 Regarding generic programming, Python doesn't have templates, but it has
dynamic typing, that also has purposes similar to generic programming. To a
python function that uses the "in" operator you may give both a list and an
associative array, or other objects that support the __contains__. In this case
the Python code performs an O(n) or O(1) operation according to the dynamic
type given to the function.

 I can live without the "in" operation for arrays and strings (useful to look
for substrings too). So one solution is not not change the D language, and not
add support for "in" to arrays.

 Another solution is just to accept O(n) as the worst complexity for the "in"
operator. I don't understand what's the problem in this.

 Another solution is to support the "in" operator for dynamic arrays too, and
define a new attribute, like  complexity(), plus an Enum that allows to specify
the worst case complexity. So associative arrays are annotated with
 complexity(O.linear), while the function that searches for items/substrings in
arrays/strings is  complexity(O.constant). At compile-time the generic code is
then able to query the computational complexity of the "in" operator
implementation. The problem is that the compiler today can't enforce functions
annotated with  complexity(O.constant) to actually not perform a linear search
(but I don't think it's a problem, today if the Range protocol asks an
operation to not be worse than O(n ln n) the compiler doesn't enforce it).

 I don't like the idea of allowing the "in" operator for sorted arrays, tuples,
fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic
arrays. It looks a bit silly. People for the next ten years will then ask "in"
to be extended for unsorted dynamic arrays&  substring search too, you can bet
on it.

 Bye,
 bearophile

I think it's better to know that in always has logarithmic or better 
complexity. Otherwise, there is always canFind (which needs a new name :-)

Oct 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for the "in"
operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that 
has O(log n) bound.

Andrei

Oct 07 2010

Rainer Deyke <rainerd eldwood.com> writes:

On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 
 That means we'd have to define another operation, i.e. "quickIn" that
 has O(log n) bound.

Why?

I can't say I've ever cared about the big-O complexity of an operation.
 All I care about is that it's "fast enough", which is highly
context-dependent and may have nothing to do with complexity.  I can't
see myself replacing my 'int[]' arrays with the much slower and bigger
'int[MAX_SIZE]' arrays just to satisfy the compiler.  I shouldn't have
to.  The type system shouldn't encourage me to.

I think it's an abuse of the type system to use it to guarantee
performance.  However, if I wanted the type system to provide
performance guarantees, I would need a lot more language support than a
convention that certain operations are supposed to be O(n).  I'm talking
performance specification on *all* functions, with a compile-time error
if the compiler can't prove that the compiled function meets those
guarantees.  And *even then*, I would like to be able to use an O(n)
implementation of 'in' where I know that O(n) performance is acceptable.


-- 
Rainer Deyke - rainerd eldwood.com

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 That means we'd have to define another operation, i.e. "quickIn" that
 has O(log n) bound.

 Why?

 I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is.  Here is an example of how  
caring about the big O complexity cut the runtime of dmd to about 1/3:

http://d.puremagic.com/issues/show_bug.cgi?id=4721

big O complexity is very important when you are writing libraries.  Not so  
much when you are writing applications -- if you can live with it in your  
application, then fine.  But Phobos should not have these problems for  
people who *do* care.

What I'd suggest is to write your own function that uses in when possible  
and find when not possible.  Then use that in your code.

-Steve

Oct 07 2010

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

 
 Then you don't understand how important it is.  Here is an example of 
 how caring about the big O complexity cut the runtime of dmd to about 1/3:
 
 http://d.puremagic.com/issues/show_bug.cgi?id=4721
 
 big O complexity is very important when you are writing libraries.  Not 
 so much when you are writing applications -- if you can live with it in 
 your application, then fine.  But Phobos should not have these problems 
 for people who *do* care.
 
 What I'd suggest is to write your own function that uses in when 
 possible and find when not possible.  Then use that in your code.
 

Personally, I'd vote for inclusion of such a function in Phobos. The big 
problem with C++ containers was and is an absence of uniform interface. 
Typedefing those templates to save typing was good, but it didn't 
completely solve the problem of interchangeability. E.g., you make a 
change from list to vector and suddenly realize that remove() doesn't 
work anymore.

Having uniform interface for insertion/removal/lookup is plainly awesome.
What if Phobos defined a function, say, E* contains(C,E)(C container, E 
element) that'd behave as 'in' operator as best it could for type C? 
It'd fit frequent demands without making big promises - in other words, 
would still be largerly useful.

Yes, I understand that good (performance-wise) generic solution isn't 
that simple to discover and implement. But having an easily accessible 
(i.e. standard), though maybe not all that efficient solution is a good 
thing nevertheless.

Oct 07 2010

Rainer Deyke <rainerd eldwood.com> writes:

On 10/7/2010 14:33, Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

 
 Then you don't understand how important it is.

Let me rephrase that.  I care about performance.  Big-O complexity can
obviously have a significant effect on performance, so I so do care
about it, but only to the extend that it affects performance.  Low big-O
complexity is a means to an end, not a goal in and of itself.  If 'n' is
low enough, then a O(2**n) algorithm may well be faster than an O(1)
algorithm.

I also believe that, in the absence of a sophisticated system that
actually verifies performance guarantees, the language and standard
library should trust the programmer to know what he is doing.  The
standard library should only provide transitive performance guarantees,
e.g. this algorithm calls function 'f' 'n' times, so the algorithm's
performance is O(n * complexity(f)).  If 'f' runs in constant time, the
algorithm runs in linear time.  If 'f' runs in exponential time, the
algorithm still runs.

 big O complexity is very important when you are writing libraries.  Not
 so much when you are writing applications -- if you can live with it in
 your application, then fine.  But Phobos should not have these problems
 for people who *do* care.
 
 What I'd suggest is to write your own function that uses in when
 possible and find when not possible.  Then use that in your code.

The issue is that algorithms may use 'in' internally, so I may have to
rewrite large parts of Phobos.  (And the issue isn't about 'in'
specifically, but complexity guarantees in general.)


-- 
Rainer Deyke - rainerd eldwood.com

Oct 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 19:07:55 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 On 10/7/2010 14:33, Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

 Then you don't understand how important it is.

 Let me rephrase that.  I care about performance.  Big-O complexity can
 obviously have a significant effect on performance, so I so do care
 about it, but only to the extend that it affects performance.  Low big-O
 complexity is a means to an end, not a goal in and of itself.  If 'n' is
 low enough, then a O(2**n) algorithm may well be faster than an O(1)
 algorithm.

You'd be surprised at how important it is.  I remember competing on  
TopCoder when they just barely added C++ as a possible language.  I  
remember wondering "how are all the Java guys going to even compete with  
C++?"  But the big-O complexity was always way way more important than the  
performance differences of native vs. JVM'd code.

The thing about big-O complexity is that it gives you a good idea of how  
well your library will perform under defined circumstances.  And libraries  
must be completely aware and rigid about it, or else you have situations  
where things are nice and easy syntax-wise but perform horribly when  
actually used.  What you end up with when libraries don't care about it  
are "mysterious" slowdowns, or cases where people just start blaming the  
language for being so slow.

Imagine if a database backend developer said "you know, who cares about  
big-O complexity, I think linear performance is good enough, most people  
have small databases anyways" who would ever use that backend?  This is  
akin to how phobos needs to strive for the best performance.

 I also believe that, in the absence of a sophisticated system that
 actually verifies performance guarantees, the language and standard
 library should trust the programmer to know what he is doing.  The
 standard library should only provide transitive performance guarantees,
 e.g. this algorithm calls function 'f' 'n' times, so the algorithm's
 performance is O(n * complexity(f)).  If 'f' runs in constant time, the
 algorithm runs in linear time.  If 'f' runs in exponential time, the
 algorithm still runs.

 big O complexity is very important when you are writing libraries.  Not
 so much when you are writing applications -- if you can live with it in
 your application, then fine.  But Phobos should not have these problems
 for people who *do* care.

 What I'd suggest is to write your own function that uses in when
 possible and find when not possible.  Then use that in your code.

 The issue is that algorithms may use 'in' internally, so I may have to
 rewrite large parts of Phobos.  (And the issue isn't about 'in'
 specifically, but complexity guarantees in general.)\

You have two options, define opIn how you want if you don't care about  
complexity guarantees (not recommended), or define a wrapper function that  
uses the best available option, which your code can use.

Despite phobos defining opIn to require lg(n) or better complexity, it  
does not restrict you from defining opIn how you want (even on arrays if  
you wish).  I personally find the tools phobos provides completely  
adequate for the tasks you would need to do.

-Steve

Oct 08 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
=20
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 That means we'd have to define another operation, i.e. "quickIn" that=



 has O(log n) bound.

 Why?

 I can't say I've ever cared about the big-O complexity of an operation=


=2E
=20
 Then you don't understand how important it is.

	If big O complexity is so important, then why does everyone use
quicksort (which is O(n**2)) and not heap sort or merge sort (which
are O(n*log(n)))?

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Oct 08 2010

Seth Hoenig <seth.a.hoenig gmail.com> writes:

2010/10/8 "J=E9r=F4me M. Berger" <jeberger free.fr>

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:

 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 That means we'd have to define another operation, i.e. "quickIn" that
 has O(log n) bound.

 Why?

 I can't say I've ever cared about the big-O complexity of an operation=



.
 Then you don't understand how important it is.

         If big O complexity is so important, then why does everyone use
 quicksort (which is O(n**2)) and not heap sort or merge sort (which
 are O(n*log(n)))?

                Jerome
 --
 mailto:jeberger free.fr
 http://jeberger.free.fr
 Jabber: jeberger jabber.fr

Because on average quicksort is faster than heap sort, and uses much less
space than merge sort. Also, trivial guards can be put in place to avoid
running quicksort in a worst case (pre-sorted data) scenario.

Oct 08 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Seth Hoenig wrote:
 2010/10/8 "J=C3=A9r=C3=B4me M. Berger" <jeberger free.fr>
=20
 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com=


 wrote:
 I can't say I've ever cared about the big-O complexity of an operati=




on.
 Then you don't understand how important it is.

         If big O complexity is so important, then why does everyone us=


e
 quicksort (which is O(n**2)) and not heap sort or merge sort (which
 are O(n*log(n)))?

                Jerome

=20
 Because on average quicksort is faster than heap sort, and uses much le=

ss
 space than merge sort. Also, trivial guards can be put in place to avoi=

d
 running quicksort in a worst case (pre-sorted data) scenario.
=20

	I rest my case.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Oct 09 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/9/10 9:05 CDT, "Jérôme M. Berger" wrote:
 Seth Hoenig wrote:
 2010/10/8 "Jérôme M. Berger"<jeberger free.fr>

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke<rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

 Then you don't understand how important it is.

          If big O complexity is so important, then why does everyone use
 quicksort (which is O(n**2)) and not heap sort or merge sort (which
 are O(n*log(n)))?

                 Jerome

 Because on average quicksort is faster than heap sort, and uses much less
 space than merge sort. Also, trivial guards can be put in place to avoid
 running quicksort in a worst case (pre-sorted data) scenario.

 	I rest my case.

 		Jerome

In fact, guards can be put to ensure that the _expected_ (not average, 
not best-case) complexity is O(n log n). This makes the risk of hitting 
a worst-case scenario negligible in a principled manner.

http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity

http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ

Currently std.algorithm.getPivot picks the middle of the range as the 
pivot, but I made it modular such that it can be improved in a number of 
ways (i.e. randomized, median-of-3, median-of-5, etc).


Andrei

Oct 09 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the 
 pivot, but I made it modular such that it can be improved in a number of 
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default?

Bye,
bearophile

Oct 09 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 Isn't median of 3 a better default?

 Bye,
 bearophile

There is no "best". They are all trade-offs between the time taken to 
select the pivot and the suitability of the pivot.

Middle - Fastest, not very good
Median-of-3 - Slower, but slightly better
Median-of-5 - Slower still, but even better

Randomized - Speed depends on the random number algorithm. Suitability 
is random, so could be good, could be bad, but on average it's good 
enough. Has the added quality that it's practically impossible to devise 
worst-case input for it (you'd need to know the generator and seed), 
whereas for the other 3, a malicious user could provide data that gives 
you O(n^2) complexity.

Oct 09 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 Isn't median of 3 a better default?

 Bye,
 bearophile

 There is no "best". They are all trade-offs between the time taken to
 select the pivot and the suitability of the pivot.

 Middle - Fastest, not very good
 Median-of-3 - Slower, but slightly better
 Median-of-5 - Slower still, but even better

 Randomized - Speed depends on the random number algorithm. Suitability
 is random, so could be good, could be bad, but on average it's good
 enough. Has the added quality that it's practically impossible to devise
 worst-case input for it (you'd need to know the generator and seed),
 whereas for the other 3, a malicious user could provide data that gives
 you O(n^2) complexity.

Also: in-place sorting for the median cases.

Andrei

Oct 09 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 9/10/10 5:25 PM, Andrei Alexandrescu wrote:
 On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a
 number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 Isn't median of 3 a better default?

 Bye,
 bearophile

 There is no "best". They are all trade-offs between the time taken to
 select the pivot and the suitability of the pivot.

 Middle - Fastest, not very good
 Median-of-3 - Slower, but slightly better
 Median-of-5 - Slower still, but even better

 Randomized - Speed depends on the random number algorithm. Suitability
 is random, so could be good, could be bad, but on average it's good
 enough. Has the added quality that it's practically impossible to devise
 worst-case input for it (you'd need to know the generator and seed),
 whereas for the other 3, a malicious user could provide data that gives
 you O(n^2) complexity.

 Also: in-place sorting for the median cases.

 Andrei

Could you elaborate? I don't see how you lose in-place sorting with the 
random and middle selection cases.

Oct 09 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/9/10 12:12 CDT, Peter Alexander wrote:
 On 9/10/10 5:25 PM, Andrei Alexandrescu wrote:
 On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a
 number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 Isn't median of 3 a better default?

 Bye,
 bearophile

 There is no "best". They are all trade-offs between the time taken to
 select the pivot and the suitability of the pivot.

 Middle - Fastest, not very good
 Median-of-3 - Slower, but slightly better
 Median-of-5 - Slower still, but even better

 Randomized - Speed depends on the random number algorithm. Suitability
 is random, so could be good, could be bad, but on average it's good
 enough. Has the added quality that it's practically impossible to devise
 worst-case input for it (you'd need to know the generator and seed),
 whereas for the other 3, a malicious user could provide data that gives
 you O(n^2) complexity.

 Also: in-place sorting for the median cases.

 Andrei

 Could you elaborate? I don't see how you lose in-place sorting with the
 random and middle selection cases.

You just take median-of-some and also sort those elements right away 
before returning the pivot. It's a minor improvement.

Andrei

Oct 09 2010

Sean Kelly <sean invisibleduck.org> writes:

Peter Alexander Wrote:

 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 Isn't median of 3 a better default?

 Bye,
 bearophile

 
 There is no "best". They are all trade-offs between the time taken to 
 select the pivot and the suitability of the pivot.
 
 Middle - Fastest, not very good
 Median-of-3 - Slower, but slightly better
 Median-of-5 - Slower still, but even better
 
 Randomized - Speed depends on the random number algorithm. Suitability 
 is random, so could be good, could be bad, but on average it's good 
 enough. Has the added quality that it's practically impossible to devise 
 worst-case input for it (you'd need to know the generator and seed), 
 whereas for the other 3, a malicious user could provide data that gives 
 you O(n^2) complexity.

Introsort (a quicksort variant) degrades to heapsort for sorts that are going
quadratic (it tracks the recursion depth and transitions when the depth is more
than a desired threshold for the size of the range).  The sort routine I wrote
for Tango uses this approach plus media-of-3, the insertion sort fallback, and
the quicksort variant that separately tracks equal values.  All told, it's as
fast or faster than everything else I've tested, even for contrived degenerate
cases.  Perhaps I'll convert it to use ranges and see how it does.

Oct 09 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:
 
 In fact, guards can be put to ensure that the _expected_ (not average, 
 not best-case) complexity is O(n log n). This makes the risk of hitting 
 a worst-case scenario negligible in a principled manner.
 
 http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity
 
 http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ
 
 Currently std.algorithm.getPivot picks the middle of the range as the 
 pivot, but I made it modular such that it can be improved in a number of 
 ways (i.e. randomized, median-of-3, median-of-5, etc).

It would be nice if it used insertion sort for small ranges as well.  There's
also a quicksort variant that partitions equal-to elements separately from
less-than and greater-than values, which is faster for ranges with a lot of
equal elements (and no slower for ranges without).

Oct 09 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/9/10 11:52 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 In fact, guards can be put to ensure that the _expected_ (not average,
 not best-case) complexity is O(n log n). This makes the risk of hitting
 a worst-case scenario negligible in a principled manner.

 http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity

 http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

 It would be nice if it used insertion sort for small ranges as well.  There's
also a quicksort variant that partitions equal-to elements separately from
less-than and greater-than values, which is faster for ranges with a lot of
equal elements (and no slower for ranges without).

I guess that's "fit pivot" :o). Fat pivot does cost some more.

Andrei

Oct 09 2010

Lutger <lutger.blijdestijn gmail.com> writes:

"Jérôme M. Berger" wrote:

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 That means we'd have to define another operation, i.e. "quickIn" that
 has O(log n) bound.

 Why?

 I can't say I've ever cared about the big-O complexity of an operation.

 
 Then you don't understand how important it is.

 
 If big O complexity is so important, then why does everyone use
 quicksort (which is O(n**2)) and not heap sort or merge sort (which
 are O(n*log(n)))?
 
 Jerome

For average case big O and dwarfing heap/merge sort in the constant factor. But 
in fact its not totally true that everyone uses quicksort pur sang: most sort 
implementations deal with the worst case of quicksort, by switching to heapsort
 
for example. It is exactly because of this behavior.

Oct 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/10 15:23 CDT, Rainer Deyke wrote:
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

 That means we'd have to define another operation, i.e. "quickIn" that
 has O(log n) bound.

 Why?

 I can't say I've ever cared about the big-O complexity of an operation.

Complexity composes very badly. Issues tend to manifest at moderate 
sizes and may make things unbearable at large sizes. I'm really grateful 
I'm at a workplace where the exclamation "Damn! I was waiting like an 
idiot for this quadratic append!" is met with understanding nods from 
workmates who've made the same mistake before.

As an example, only last week I was working on cramming a sort of an 
index of the entire Wikipedia on one machine. I was waiting for the 
indexer which ran slower and slower and slower. In the end I figured 
there was only _one_ quadratic operation - appending to a vector<size_t> 
that held document lengths. That wasn't even the bulk of the data and it 
was the last thing I looked at! Yet it made the run time impossible to 
endure.

 All I care about is that it's "fast enough", which is highly
 context-dependent and may have nothing to do with complexity.  I can't
 see myself replacing my 'int[]' arrays with the much slower and bigger
 'int[MAX_SIZE]' arrays just to satisfy the compiler.  I shouldn't have
 to.  The type system shouldn't encourage me to.

 I think it's an abuse of the type system to use it to guarantee
 performance.  However, if I wanted the type system to provide
 performance guarantees, I would need a lot more language support than a
 convention that certain operations are supposed to be O(n).  I'm talking
 performance specification on *all* functions, with a compile-time error
 if the compiler can't prove that the compiled function meets those
 guarantees.  And *even then*, I would like to be able to use an O(n)
 implementation of 'in' where I know that O(n) performance is acceptable.

std.container introduces the convention that O(n) methods start with 
"linear".


Andrei

Oct 07 2010

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Andrei Alexandrescu wrote:

 
 Complexity composes very badly. Issues tend to manifest at moderate 
 sizes and may make things unbearable at large sizes. I'm really grateful 
 I'm at a workplace where the exclamation "Damn! I was waiting like an 
 idiot for this quadratic append!" is met with understanding nods from 
 workmates who've made the same mistake before.
 
 As an example, only last week I was working on cramming a sort of an 
 index of the entire Wikipedia on one machine. I was waiting for the 
 indexer which ran slower and slower and slower. In the end I figured 
 there was only _one_ quadratic operation - appending to a vector<size_t> 
 that held document lengths. That wasn't even the bulk of the data and it 
 was the last thing I looked at! Yet it made the run time impossible to 
 endure.
 

But that is not a matter of library interface isn't it? It's a matter of 
algorithm/container choice. It's not the push_back that was slow in the 
end, it was std::vector (yes, that's arguable, but the point is that 
container defines rules for its methods, not vice-versa).

 I think it's an abuse of the type system to use it to guarantee
 performance.  However, if I wanted the type system to provide
 performance guarantees, I would need a lot more language support than a
 convention that certain operations are supposed to be O(n).  I'm talking
 performance specification on *all* functions, with a compile-time error
 if the compiler can't prove that the compiled function meets those
 guarantees.  And *even then*, I would like to be able to use an O(n)
 implementation of 'in' where I know that O(n) performance is acceptable.

 
 std.container introduces the convention that O(n) methods start with 
 "linear".

I find such convention useful indeed, though it brings a fact to 
surface: if we need to emphasize method that make strong promises about 
complexity with prefixes/suffixes, or say by putting them into separate 
module, then why don't we have an non-emphasized counterpart that won't 
make strong promises but would fit wider range of containers? After all, 
std.algorithm offers different search mechanisms with varying complexity 
(e.g. find() vs boyerMooreFinder()).

Oct 07 2010

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Stanislav Blinov wrote:

 ... make strong promises but would fit wider range of containers? After all, 
 std.algorithm offers ...

Sorry, that should be 'cases', not 'containers' there.

Oct 07 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:
 Andrei Alexandrescu wrote:
 Complexity composes very badly. Issues tend to manifest at moderate
 sizes and may make things unbearable at large sizes. I'm really grateful
 I'm at a workplace where the exclamation "Damn! I was waiting like an
 idiot for this quadratic append!" is met with understanding nods from
 workmates who've made the same mistake before.
 
 As an example, only last week I was working on cramming a sort of an
 index of the entire Wikipedia on one machine. I was waiting for the
 indexer which ran slower and slower and slower. In the end I figured
 there was only _one_ quadratic operation - appending to a vector<size_t>
 that held document lengths. That wasn't even the bulk of the data and it
 was the last thing I looked at! Yet it made the run time impossible to
 endure.

 
 But that is not a matter of library interface isn't it? It's a matter of
 algorithm/container choice. It's not the push_back that was slow in the
 end, it was std::vector (yes, that's arguable, but the point is that
 container defines rules for its methods, not vice-versa).

Except that when you're dealing with generic code which has to deal with 
multiple container types (like std.algorithm), you _need_ certain complexity 
guarantees about an operation since it could happen on any container that it's 
given. Using operator in in an algorithm could be perfectly reasonable if it
had 
O(1) or O(log n) complexity but be completely unreasonable if it had O(n) 
complexity. So, the algorithm then is reasonable with some containers and not 
others when if in were restricted to O(1) or O(log n), it could be used by the 
algorithm without worrying that a container would be used with it which would 
make it an order of complexity greater, or worse.

If it were strictly a matter of writing code which directly used a particular 
algorithm and container type, then you could know what the complexities of the 
algorithm and operations on the container are, but once you're dealing with 
generic code, operations need to have complexity guarantees regardless of their 
container, or the complexity of the algorithm will vary with the container
type. 
And if that algorithm is used in yet another algorithm, then it gets that much 
worse. It can quickly become easy to bury the place where what you're trying to 
do becomes an order of complexity greater, because the container that you 
selected was an order of complexity greater than other containers on an 
operation that an algorithm is using buried in code somewhere.

- Jonathan M Davis

Oct 07 2010

Juanjo Alvarez <fake fakeemail.com> writes:

On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis 
<jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal 

with 
 multiple container types (like std.algorithm), you _need_ certain 

complexity 
 guarantees about an operation since it could happen on any 

container that it's

Then, don't use it in std.algorithm or any other code that needs 
guaranteed complexity, just like now. I don't see the problem with a 
generic "in" operator, nobody would be forced to use it.

Oct 07 2010

Pelle <pelle.mansson gmail.com> writes:

On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs
 guaranteed complexity, just like now. I don't see the problem with a
 generic "in" operator, nobody would be forced to use it.

What do you suggest for fast lookup in a container?

Oct 08 2010

Juanjo Alvarez <juanjux gmail.com> writes:

Pelle Wrote:

 On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs
 guaranteed complexity, just like now. I don't see the problem with a
 generic "in" operator, nobody would be forced to use it.

 
 What do you suggest for fast lookup in a container?

What is being used now? How can having "in" and not using it (just like now) in
functions requiring guaranteed complexity can be worse than not having it?

The only drawback I can see to having an "in" operator with all containers is
that some programmers would not read the documentation and use it expecting it
to be fast. But then that also happens with many other language constructs and
some programmers will write crappy algoritms anyway.

Oct 08 2010

Pelle <pelle.mansson gmail.com> writes:

On 10/08/2010 01:45 PM, Juanjo Alvarez wrote:
 Pelle Wrote:

 On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com>  wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs
 guaranteed complexity, just like now. I don't see the problem with a
 generic "in" operator, nobody would be forced to use it.

 What do you suggest for fast lookup in a container?

 What is being used now? How can having "in" and not using it (just like now)
in functions requiring guaranteed complexity can be worse than not having it?

If I write a generic algorithm, I can use opIn and assume it is fast. If 
I don't need the speed, I can use canFind over the container's range 
instead. If we say opIn can be slow, the fast version goes away.

 The only drawback I can see to having an "in" operator with all containers is
that some programmers would not read the documentation and use it expecting it
to be fast. But then that also happens with many other language constructs and
some programmers will write crappy algoritms anyway.

Oct 08 2010

so <so so.do> writes:

On Fri, 08 Oct 2010 14:45:43 +0300, Juanjo Alvarez <juanjux gmail.com>  
wrote:

 Pelle Wrote:

 The only drawback I can see to having an "in" operator with all  
 containers is that some programmers would not read the documentation and  
 use it expecting it to be fast. But then that also happens with many  
 other language constructs and some programmers will write crappy  
 algoritms anyway.

This is pretty much it. I would never use a language which is designed for  
"some programmers".

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Oct 09 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
wrote:

 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs  
 guaranteed complexity, just like now. I don't see the problem with a  
 generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it  
doesn't feel right to the person who accidentally uses it.  When I call

sort(x);

and it performs horribly, am I going to blame x or sort?  Certainly, I'll  
never think it's my own fault :)

-Steve

Oct 08 2010

Juanjo Alvarez <juanjux gmail.com> writes:

Steven Schveighoffer Wrote:

 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
 wrote:
 
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs  
 guaranteed complexity, just like now. I don't see the problem with a  
 generic "in" operator, nobody would be forced to use it.

 
 That kind of "documentation" is useless, it doesn't prevent use, and it  
 doesn't feel right to the person who accidentally uses it.  When I call
 
 sort(x);
 
 and it performs horribly, am I going to blame x or sort?  Certainly, I'll  
 never think it's my own fault :)
 
 -Steve

True! And that's the only drawback I see on generalizing "in", but there are
many things in programming languages that doesn't feel right when you don't
know the language well. That doesn't mean that D should be the "programming for
dummies on rails with a constant automated tutor included" language; if I 
read well the site, it is mean to be a practical language with the ability to
shot yourself in the foot.

Still, I don't understand how generalizing "in" could affect std.algorithm et
al if they only use "in" for AAs, just like now.

Oct 08 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Juanjo Alvarez <juanjux gmail.com> wrote:

 True! And that's the only drawback I see on generalizing "in", but there  
 are many things in programming languages that doesn't feel right when  
 you don't
 know the language well. That doesn't mean that D should be the  
 "programming for dummies on rails with a constant automated tutor  
 included" language; if I
 read well the site, it is mean to be a practical language with the  
 ability to shot yourself in the foot.

Absolutely. And one of its trademarks is being as fast as C. Now, C clearly
does not have the 'in' operator, but it is a goal in D that the obvious way
to do something should be fast and correct.


 Still, I don't understand how generalizing "in" could affect  
 std.algorithm et al if they only use "in" for AAs, just like now.

Because if 'in' is available for other uses for other containers, one would
be tempted to use it also there. The alternative is to put it in the coding
standards:

43. Thou shalt not use magic numbers.
44. Thou shalt not use 'in', as it may be slow as heck.
45. Thou shalt not write spaghetti code. Nor macaroni.

This would make the feature useless.

-- 
Simen

Oct 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 08 Oct 2010 09:46:54 -0400, Juanjo Alvarez <juanjux gmail.com>  
wrote:

 Steven Schveighoffer Wrote:

 That kind of "documentation" is useless, it doesn't prevent use, and it
 doesn't feel right to the person who accidentally uses it.  When I call

 sort(x);

 and it performs horribly, am I going to blame x or sort?  Certainly,  
 I'll
 never think it's my own fault :)

 -Steve

 True! And that's the only drawback I see on generalizing "in", but there  
 are many things in programming languages that doesn't feel right when  
 you don't
 know the language well. That doesn't mean that D should be the  
 "programming for dummies on rails with a constant automated tutor  
 included" language; if I
 read well the site, it is mean to be a practical language with the  
 ability to shot yourself in the foot.

 Still, I don't understand how generalizing "in" could affect  
 std.algorithm et al if they only use "in" for AAs, just like now.

Let's move off of in for a minute, so I can illustrate what I mean.  in is  
kind of a non-universal operator (other languages define it differently).   
Let's try indexing.

If you see the following code:

for(int i = 0; i < x.length; i++)
    x[i]++;

If you see this code, you might think it's O(n).  But let's say the author  
of x didn't care about complexity, and just felt like [] should be for  
indexing, no matter what the cost.  Then x could possibly be a linked  
list, and each index operation is O(n), then this block of code becomes  
O(n^2), and your performance suffers.  You may not notice it, it might be  
"acceptable", but then somewhere down the road you start calling this  
function more, and your program all of a sudden gets really slow.  What  
gives?  Let's say you spend 1-2 hours looking for this and find out that  
the problem is that indexing x is linear, you can change the loop to this:

foreach(ref i; x)
     i++;

and all of a sudden your performance comes back.  Maybe the author of x  
puts right in his docs that indexing is an O(n) operation.  You might  
grumble about it, and move on.  But if this happens all the time, you are  
just going to start blaming the author of x more than your own  
incompetence.  It's one of those things where user interface designers get  
it, and most engineers don't -- people have certain flaws, and a big one  
is not reading the manual.  Making the interface as intuitive as possible  
is very important for the success of a product.

But what if we made the language such that this *couldn't possibly happen*  
because you don't allow indexing on linked lists?  This has the two very  
good properties:

1. It makes the user more aware of the limitations, even though the syntax  
is harder to use (hm.. it doesn't let me use indexing, there must be a  
good reason).
2. It makes *even experienced users* avoid this bug because they can't  
possibly compile it.


make the above mistake (look at Walter's mistake on the compiler that I  
mentioned earlier).  We don't want to set minefields for experienced  
developers if we can help it.  Yes they can shoot themselves in the foot,  
but we don't want to remove the safety on the gun so they are *more  
likely* to shoot themselves in the foot.

-Steve

Oct 08 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Juanjo Alvarez wrote:

 Steven Schveighoffer Wrote:
 
 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>
 wrote:
 
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs
 guaranteed complexity, just like now. I don't see the problem with a
 generic "in" operator, nobody would be forced to use it.

 
 That kind of "documentation" is useless, it doesn't prevent use, and it
 doesn't feel right to the person who accidentally uses it.  When I call
 
 sort(x);
 
 and it performs horribly, am I going to blame x or sort?  Certainly, I'll
 never think it's my own fault :)
 
 -Steve

 
 True! And that's the only drawback I see on generalizing "in", but there are
 many things in programming languages that doesn't feel right when you don't
 know the language well. That doesn't mean that D should be the "programming
 for dummies on rails with a constant automated tutor included" language; if I
 read well the site, it is mean to be a practical language with the ability to
 shot yourself in the foot.
 
 Still, I don't understand how generalizing "in" could affect std.algorithm et
 al if they only use "in" for AAs, just like now.

Suppose I would like to use a faster AA (for some cases) and define opIn with 
good lookup behavior. Then the algorithms in phobos would think my opIn is
crap, 
what now? All code using regular AA's could have just been replaced by my super-
duper user-defined AA if it were not for this generalizing of OpIn. So we need 
another operation that replaces the previously fast opIn and make phobos use 
that instead. But why bother if we have a perfectly good one to begin with?

Oct 08 2010

so <so so.do> writes:

On Fri, 08 Oct 2010 14:58:27 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
 wrote:

 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 with
 multiple container types (like std.algorithm), you _need_ certain

 complexity
 guarantees about an operation since it could happen on any

 container that it's

 Then, don't use it in std.algorithm or any other code that needs  
 guaranteed complexity, just like now. I don't see the problem with a  
 generic "in" operator, nobody would be forced to use it.

 That kind of "documentation" is useless, it doesn't prevent use, and it  
 doesn't feel right to the person who accidentally uses it.  When I call

 sort(x);

 and it performs horribly, am I going to blame x or sort?  Certainly,  
 I'll never think it's my own fault :)

 -Steve

Sure, write some random strings and compile it, if it doesn't compile, you  
can always blame Walter, right?
If documentation is useless, so is most of the programmers, you got to  
accept it :)

Question is, should this affect compiler design? If you think it should,  
you can't write a single line that calls "some other guy"'s code, it  
doesn't matter if he use "in" or "out",
operators or just simple functions.

Thanks!

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Oct 09 2010

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Jonathan M Davis wrote:
 On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:

 But that is not a matter of library interface isn't it? It's a matter of
 algorithm/container choice. It's not the push_back that was slow in the
 end, it was std::vector (yes, that's arguable, but the point is that
 container defines rules for its methods, not vice-versa).

 
 Except that when you're dealing with generic code which has to deal with 
 multiple container types (like std.algorithm), you _need_ certain complexity 
 guarantees about an operation since it could happen on any container that it's 
 given. Using operator in in an algorithm could be perfectly reasonable if it
had 
 O(1) or O(log n) complexity but be completely unreasonable if it had O(n) 
 complexity. So, the algorithm then is reasonable with some containers and not 
 others when if in were restricted to O(1) or O(log n), it could be used by the 
 algorithm without worrying that a container would be used with it which would 
 make it an order of complexity greater, or worse.
 
 If it were strictly a matter of writing code which directly used a particular 
 algorithm and container type, then you could know what the complexities of the 
 algorithm and operations on the container are, but once you're dealing with 
 generic code, operations need to have complexity guarantees regardless of
their 
 container, or the complexity of the algorithm will vary with the container
type. 
 And if that algorithm is used in yet another algorithm, then it gets that much 
 worse. It can quickly become easy to bury the place where what you're trying
to 
 do becomes an order of complexity greater, because the container that you 
 selected was an order of complexity greater than other containers on an 
 operation that an algorithm is using buried in code somewhere.
 
 - Jonathan M Davis

Yet still, generality ends at some point. You can't devise every 
possible algorithm for any possible types and have it have set-in-stone 
complexity independently of types.

Take std.range.popFrontN(). It's generic, and it's used in other 
algorithms. Yet it has O(1) complexity for ranges that support slicing, 
and O(n) for those that do not. But you don't take into account 
complexity of slicing operation, or complexity of stepping through the 
range.

Or std.algorithm.find(). It's basically O(n), but then again, when using 
it, one should also consider complexity of used predicate.
Just the same happened in the case Andrei described: the algorithm was 
O(n) judging from the description (performing n insertions into 
container). But the container itself blew it up into quadratic time 
because of it's own insertion algorithm.

What I mean is you'll always have algorithms that will perform 
differently for different containers, and you'll always have to choose 
containers that best fit your needs. Generic code is not only about 
efficiency: you'd at least expect it to work for different types.
Replacing vector with list in Andrei's case would probably solve the 
problem at the cost of losing random access (together with contiguous 
storage). Which means it'd also require changing code if random access 
was in use somewhere. Having a generic random access function 
at(Container,index) (offering complexity that varies with container 
used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.

Oct 08 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday 08 October 2010 03:06:01 Stanislav Blinov wrote:
 Yet still, generality ends at some point. You can't devise every
 possible algorithm for any possible types and have it have set-in-stone
 complexity independently of types.
 
 Take std.range.popFrontN(). It's generic, and it's used in other
 algorithms. Yet it has O(1) complexity for ranges that support slicing,
 and O(n) for those that do not. But you don't take into account
 complexity of slicing operation, or complexity of stepping through the
 range.
 
 Or std.algorithm.find(). It's basically O(n), but then again, when using
 it, one should also consider complexity of used predicate.
 Just the same happened in the case Andrei described: the algorithm was
 O(n) judging from the description (performing n insertions into
 container). But the container itself blew it up into quadratic time
 because of it's own insertion algorithm.
 
 What I mean is you'll always have algorithms that will perform
 differently for different containers, and you'll always have to choose
 containers that best fit your needs. Generic code is not only about
 efficiency: you'd at least expect it to work for different types.
 Replacing vector with list in Andrei's case would probably solve the
 problem at the cost of losing random access (together with contiguous
 storage). Which means it'd also require changing code if random access
 was in use somewhere. Having a generic random access function
 at(Container,index) (offering complexity that varies with container
 used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.

All true. However, the point is that operations need to have a know complexity. 
If in is known to be O(1), the algorithms can safely use it, knowing that it's 
O(1). Any container that can't do it in O(1) doesn't implement in. You could,
on 
the other hand, have in be O(n) (which doesn't stop containers from
implementing 
it as O(1) since that's better than O(n)), and then any algorithm that uses it 
has to assume that it could be O(n) and therfore may need to avoid it. The real 
question is what complexity we want to define it as. At the moment, the only 
container to use it are the built-in associative arrays, and for them, it's 
O(1). Since find() is already going to be O(n), it seems to me that in should
be 
better than O(n) - be it O(1) or O(log n) - and Andrei appears to agree, but 
others don't, hence this long discussion...

- Jonathan M Davis

Oct 08 2010

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Jonathan M Davis wrote:
 On Friday 08 October 2010 03:06:01 Stanislav Blinov wrote:

 [...]
 What I mean is you'll always have algorithms that will perform
 differently for different containers, and you'll always have to choose
 containers that best fit your needs [...]


 
 All true. However, the point is that operations need to have a know
complexity. 
 If in is known to be O(1), the algorithms can safely use it, knowing that it's 
 O(1). Any container that can't do it in O(1) doesn't implement in. You could,
on 
 the other hand, have in be O(n) (which doesn't stop containers from
implementing 
 it as O(1) since that's better than O(n)), and then any algorithm that uses it 
 has to assume that it could be O(n) and therfore may need to avoid it. The
real 
 question is what complexity we want to define it as. At the moment, the only 
 container to use it are the built-in associative arrays, and for them, it's 
 O(1). Since find() is already going to be O(n), it seems to me that in should
be 
 better than O(n) - be it O(1) or O(log n) - and Andrei appears to agree, but 
 others don't, hence this long discussion...
 

Ahh, I think I get the perspective now, though I had to reread the whole 
thread two times. Thank you.

Oct 08 2010

Torarin <torarind gmail.com> writes:

2010/10/7 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 In the end I figured there was only _one_
 quadratic operation - appending to a vector<size_t> that held document
 lengths. That wasn't even the bulk of the data and it was the last thing I
 looked at! Yet it made the run time impossible to endure.

From sgi's stl vector:

void push_back(const _Tp& __x) {
    if (_M_finish != _M_end_of_storage) {
      construct(_M_finish, __x);
      ++_M_finish;
    }
    else
      _M_insert_aux(end(), __x);
  }

How is this quadratic, let alone linear?

Oct 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/8/10 5:24 CDT, Torarin wrote:
 2010/10/7 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>:
 In the end I figured there was only _one_
 quadratic operation - appending to a vector<size_t>  that held document
 lengths. That wasn't even the bulk of the data and it was the last thing I
 looked at! Yet it made the run time impossible to endure.

 From sgi's stl vector:

 void push_back(const _Tp&  __x) {
      if (_M_finish != _M_end_of_storage) {
        construct(_M_finish, __x);
        ++_M_finish;
      }
      else
        _M_insert_aux(end(), __x);
    }

 How is this quadratic, let alone linear?

My code wasn't using push_back.

Andrei

Oct 08 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

bearophile napisał:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are annotated
 with  complexity(O.linear), while the function that searches for
 items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an operation
 to not be worse than O(n ln n) the compiler doesn't enforce it).

Good idea. It would make a nice use case for user-defined attributes in D3.
Making the 
language aware of  complexity specifically doesn't buy much, all you need is:

__traits(getAttribute, opIn,  complexity).bigOh == O.constant

-- 
Tomek

Oct 07 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Tomek Sowi=C5=84ski <just ask.me> wrote:

 __traits(getAttribute, opIn,  complexity).bigOh =3D=3D O.constant

How does this test for things like N log M?

__traits(getAttribute, opIn,  complexity).bigOh =3D=3D tuple( O.linear, =
 =

O.logarithmic )

?

-- =

Simen

Oct 07 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Simen kjaeraas napisał:

 Tomek Sowiński <just ask.me> wrote:
 
 __traits(getAttribute, opIn,  complexity).bigOh == O.constant

 
 How does this test for things like N log M?
 
 __traits(getAttribute, opIn,  complexity).bigOh == tuple( O.linear,
 O.logarithmic )

Or:

__traits(getAttribute, opIn,  complexity).bigOh = O.linear * O.log

bigOh could be e.g. a struct with an overloaded multiplier.

But you brought up something interesting -- how to bind N, M with different
properties of 
function arguments; big oh expressions can get quite complex, e.g.

void completeSort(alias less = "a < b", SwapStrategy ss =
SwapStrategy.unstable, Range1, 
Range2)(SortedRange!(Range1,less) lhs, Range2 rhs);

"[...] Performs Ο(lhs.length + rhs.length * log(rhs.length)) (best case) to
Ο((lhs.length + 
rhs.length) * log(lhs.length + rhs.length)) (worst-case) evaluations of swap."

Even if the attribute properties could see the arguments, how to deal with
things like 
lhs.length + rhs.length? It has to be inspectable at compile-time. One idea is
to store the 
expression's abstract syntax tree (we want AST macros in D3 anyway)... But I
got a feeling 
we're heading for an overkill :)

-- 
Tomek

Oct 07 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal =

 =

 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One=

  =

 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...=

  =

 But I got a feeling
 we're heading for an overkill :)

Just as soon as we solve the halting problem, eh?

-- =

Simen

Oct 07 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Simen kjaeraas napisał:

 Tomek Sowiński <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal
 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One
 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...
 But I got a feeling
 we're heading for an overkill :)

 
 Just as soon as we solve the halting problem, eh?

What's the halting problem?

-- 
Tomek

Oct 07 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, October 07, 2010 16:39:15 Tomek Sowi=C5=84ski wrote:
 Simen kjaeraas napisa=C5=82:
 Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal
 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One
 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...
 But I got a feeling
 we're heading for an overkill :)

=20
 Just as soon as we solve the halting problem, eh?

=20
 What's the halting problem?

http://en.wikipedia.org/wiki/Halting_problem

It's a classic problem in computer science. Essentially what it comes down =
to is=20
that you can't determine when - or even if - a program will halt until it=20
actually has. It's why stuff like file transfer dialogs can never be totall=
y=20
accurate. And best, you can estimate how long a file transfer will take bas=
ed on=20
the current progress, but you can't _know_ when it will complete. It's even=
=20
worse for algorithms where you can't even estimate how much work there is l=
eft.=20
And of course, you don't even necessarily know that the program _will_ halt=
=2E It=20
could be an infinite loop for all you know.

=2D Jonathan M Davis

Oct 07 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 08/10/2010 00:42, Jonathan M Davis wrote:
 On Thursday, October 07, 2010 16:39:15 Tomek Sowiński wrote:

 http://en.wikipedia.org/wiki/Halting_problem

 It's a classic problem in computer science. Essentially what it comes down to
is
 that you can't determine when - or even if - a program will halt until it
 actually has. It's why stuff like file transfer dialogs can never be totally
 accurate. And best, you can estimate how long a file transfer will take based
on
 the current progress, but you can't _know_ when it will complete.

O_o

-- 
Bruno Medeiros - Software Engineer

Oct 29 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Tomek Sowi=C5=84ski <just ask.me> wrote:

 Simen kjaeraas napisa=C5=82:

 Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to dea=



l
 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. O=



ne
 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway).=



..
 But I got a feeling
 we're heading for an overkill :)

 Just as soon as we solve the halting problem, eh?

 What's the halting problem?

http://en.wikipedia.org/wiki/Halting_problem

Basically, inspecting the AST in search of the complexity might have an
infinite (or at least, arbitrarily high) complexity itself. It is likely=

possible in some situations, but in the general case, not so much.

Also, consider the while loop. It may have an arbitrarily complex
termination condition, making it hard or impossible to find the
complexity. Example, with added complexity by omitting the source of
foo:

extern bool foo( );

void bar( bool delegate( ) dg ) {
     while ( 1 ) {
         if ( dg( ) ) {
             break;
         }
     }
}

bar( &foo );

-- =

Simen

Oct 07 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 08/10/2010 00:39, Tomek Sowiński wrote:
<snip>
 What's the halting problem?

Basically, it's the challenge of determining algorithmically whether an 
arbitrary algorithm given arbitrary input will eventually halt or carry 
on running forever.

The point is that the halting problem is known to be unsolvable.  The 
standard proof of this is as follows.  Suppose the halt analyser 
algorithm we seek exists.  Call it WillHalt(Algorithm, Input).  Then we 
can consider WillHalt(Algorithm, Algorithm).

Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), 
defined as

if WillHalt(Algorithm, Algorithm) then
     loop forever
else
     return

Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself).


Personally, I think it's a shame that the halting problem can't be 
solved.  If it could, we could use it to solve many mathematical 
problems that have as it happens remained unsolved for centuries.

Stewart.

Oct 09 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 9/10/10 3:11 PM, Stewart Gordon wrote:
 Personally, I think it's a shame that the halting problem can't be
 solved. If it could, we could use it to solve many mathematical problems
 that have as it happens remained unsolved for centuries.

But solving those problems would mean nothing in that hypothetical 
situation, because, for the halting problem to be solvable, it would 
require that P <=> ~P, so any "theorem" would be meaningless.

Besides, I don't care to think about universes where P <=> ~P  :-)

Oct 09 2010

Seth Hoenig <seth.a.hoenig gmail.com> writes:

On Sat, Oct 9, 2010 at 9:11 AM, Stewart Gordon <smjg_1998 yahoo.com> wrote:

 On 08/10/2010 00:39, Tomek Sowi=C5=84ski wrote:
 <snip>

 What's the halting problem?

 Basically, it's the challenge of determining algorithmically whether an
 arbitrary algorithm given arbitrary input will eventually halt or carry o=

n
 running forever.

 The point is that the halting problem is known to be unsolvable.  The
 standard proof of this is as follows.  Suppose the halt analyser algorith=

m
 we seek exists.  Call it WillHalt(Algorithm, Input).  Then we can conside=

r
 WillHalt(Algorithm, Algorithm).

 Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), defin=

ed
 as

 if WillHalt(Algorithm, Algorithm) then
    loop forever
 else
    return

 Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself=

).
 Personally, I think it's a shame that the halting problem can't be solved=

.
  If it could, we could use it to solve many mathematical problems that ha=

ve
 as it happens remained unsolved for centuries.

 Stewart.


Or  more poetically,

No general procedure for bug checks succeeds.
Now, I won=E2=80=99t just assert that, I=E2=80=99ll show where it leads:
I will prove that although you might work till you drop,
you cannot tell if computation will stop.

For imagine we have a procedure called P
that for specified input permits you to see
whether specified source code, with all of its faults,
defines a routine that eventually halts.

You feed in your program, with suitable data,
and P gets to work, and a little while later
(in finite compute time) correctly infers
whether infinite looping behavior occurs.

If there will be no looping, then P prints out `Good.=E2=80=99
That means work on this input will halt, as it should.
But if it detects an unstoppable loop,
then P reports `Bad!=E2=80=99 =E2=80=94 which means you=E2=80=99re in the s=
oup.

Well, the truth is that P cannot possibly be,
because if you wrote it and gave it to me,
I could use it to set up a logical bind
that would shatter your reason and scramble your mind.

Here=E2=80=99s the trick that I=E2=80=99ll use =E2=80=94 and it=E2=80=99s s=
imple to do.
I=E2=80=99ll define a procedure, which I will call Q,
that will use P=E2=80=99s predictions of halting success
to stir up a terrible logical mess.

For a specified program, say A, one supplies,
the first step of this program called Q I devise
is to find out from P what=E2=80=99s the right thing to say
of the looping behavior of A run on A.

If P=E2=80=99s answer is `Bad!=E2=80=99, Q will suddenly stop.
But otherwise, Q will go back to the top,
and start off again, looping endlessly back,
till the universe dies and turns frozen and black.

And this program called Q wouldn=E2=80=99t stay on the shelf;
I would ask it to forecast its run on itself.
When it reads its own source code, just what will it do?
What=E2=80=99s the looping behavior of Q run on Q?

If P warns of infinite loops, Q will quit;
yet P is supposed to speak truly of it!
And if Q=E2=80=99s going to quit, then P should say `Good=E2=80=99
=E2=80=94 which makes Q start to loop! (P denied that it would.)

No matter how P might perform, Q will scoop it:
Q uses P=E2=80=99s output to make P look stupid.
Whatever P says, it cannot predict Q:
P is right when it=E2=80=99s wrong, and is false when it=E2=80=99s true!

I=E2=80=99ve created a paradox, neat as can be =E2=80=94
and simply by using your putative P.
When you posited P you stepped into a snare;
Your assumption has led you right into my lair.

So where can this argument possibly go?
I don=E2=80=99t have to tell you; I=E2=80=99m sure you must know.
By reductio, there cannot possibly be
a procedure that acts like the mythical P.

You can never find general mechanical means
for predicting the acts of computing machines.
It=E2=80=99s something that cannot be done. So we users
must find our own bugs. Our computers are losers!
- Geoffrey K. Pullum

Oct 09 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Tomek Sowiński napisał:

 Simen kjaeraas napisał:
 
 Tomek Sowiński <just ask.me> wrote:
 
 __traits(getAttribute, opIn,  complexity).bigOh == O.constant

 
 How does this test for things like N log M?
 
 __traits(getAttribute, opIn,  complexity).bigOh == tuple( O.linear,
 O.logarithmic )

 
 Or:
 
 __traits(getAttribute, opIn,  complexity).bigOh = O.linear * O.log
 
 bigOh could be e.g. a struct with an overloaded multiplier.
 
 But you brought up something interesting -- how to bind N, M with
 different properties of function arguments; big oh expressions can get
 quite complex, e.g.
 
 void completeSort(alias less = "a < b", SwapStrategy ss =
 SwapStrategy.unstable, Range1, Range2)(SortedRange!(Range1,less) lhs,
 Range2 rhs);
 
 "[...] Performs Ο(lhs.length + rhs.length * log(rhs.length)) (best case)
 to Ο((lhs.length + rhs.length) * log(lhs.length + rhs.length))
 (worst-case) evaluations of swap."
 
 Even if the attribute properties could see the arguments, how to deal with
 things like lhs.length + rhs.length? It has to be inspectable at
 compile-time. One idea is to store the expression's abstract syntax tree
 (we want AST macros in D3 anyway)... But I got a feeling we're heading for
 an overkill :)

If O.linear was a template, it could take an alias...

alias completeSort f;
__traits(getAttribute, f,  complexity).bigOh == (O.linear!(f.args.lhs.length) +
O.linear!
(f.args.lhs.length)) * O.log!(O.linear!(f.args.lhs.length) +
O.linear!(f.args.lhs.length))

:)

-- 
Tomek

Oct 07 2010

Daniel Gibson <metalcaedes gmail.com> writes:

2010/10/8 Tomek Sowi=F1ski <just ask.me>:
 bearophile napisa=B3:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are annotate=


d
 with  complexity(O.linear), while the function that searches for
 items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an operatio=


n
 to not be worse than O(n ln n) the compiler doesn't enforce it).

 Good idea. It would make a nice use case for user-defined attributes in D=

3. Making the
 language aware of  complexity specifically doesn't buy much, all you need=

 is:
 __traits(getAttribute, opIn,  complexity).bigOh =3D=3D O.constant

 --
 Tomek

Doesn't the language have to be aware of this if it's supposed to work
with ordinary arrays?

Oct 07 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Daniel Gibson napisał:

 2010/10/8 Tomek Sowiński <just ask.me>:
 bearophile napisał:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are
 annotated with  complexity(O.linear), while the function that searches
 for items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an
 operation to not be worse than O(n ln n) the compiler doesn't enforce
 it).

 Good idea. It would make a nice use case for user-defined attributes in
 D3. Making the language aware of  complexity specifically doesn't buy
 much, all you need is:

 __traits(getAttribute, opIn,  complexity).bigOh == O.constant

 --
 Tomek

 
 Doesn't the language have to be aware of this if it's supposed to work
 with ordinary arrays?

I don't think so, there can always be uniform syntax wrappers (there's already
a bunch of 
them in std.array):

 complexity(O.constant)
size_t length(T[] arr) {
     return arr.length;
}

or special cases similar to hasLength!string.

-- 
Tomek

Oct 07 2010

bearophile <bearophileHUGS lycos.com> writes:

Tomek S.

 But I got a feeling we're heading for an overkill :)<

Yes, that's overkill, and it's not a good solution. But sometimes it's useful
to discuss even bad solutions to problems, because they may lead to different
and more acceptable solutions. During brainstorming you need to lower the
censoring threshold, to increase creativity (it's the second rule:
http://en.wikipedia.org/wiki/Brainstorming#Ground_Rules ).

Bye,
bearophile

Oct 08 2010

D Programming

C/C++ Programming

Other

digitalmars.D - "in" everywhere