www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Algorithms, term rewriting and compile time reflection

reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
Generic programming without high level analytical capabilities 
are problematic since you cannot optimize everything on a low 
level.

For instance, some algorithms run better if the input is sorted, 
and although you can cover one property with typing (e.g. sorted 
range) the combinatorial explosion is too big and the code 
becomes very messy.

The better approach is to let the compiler deduce properties of 
the input/output based on stronger assertions than 
postconditions. And it should work for FFI as well as for D. That 
way D can obtain meta-information about C functions.

Let's call it input-output-assumptions and take a look at:

y = f(x)

After calling f on x, f could provide propositions about y such 
as:

- y is null iff x is null
- length(y) == length(x) if x is not null
- y is sorted for all x if x is not null
- y contains the same elements as x
- y is alias free (copy by value)

It should also state whether the semantic side effects has been 
completely described.

Then you need statments for f that says something about how 
functions affects the properties of input like. E.g.:

z = g( f(x) )

For g we could say that it preserves:
- element order
- element values
- z is an alias free copy of the input

Thus we cannot assume length(z) == length(x), but we can assume 
that it is sorted and that z is a subset of x. That means the 
compiler can optimize expressions. E.g.:

y = f(x);
z = g(y);

inline_sort(z);  ==>  ;

x = union(z,y); ==> ;

canFind(y,v) || canFind(z,v) ==> canFind(y,v);

linear_find(x,v)  =>  binary_search(y,v)

With term rewriting and compile time reflection we can now 
perform optimizations by querying properties of expressions, 
looking up dependent results of an expression, and doing a search 
over various ways of ordering expressions.

f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of 
the effects of sort() is complete and if the total effects of f,g 
and x completely covers what sort() do.

These kinds of optimizations are very difficult to achieve in a 
low level backend, but you really need them in order to do 
generic programming properly.

A simple start would be to not provide term rewriting as a 
language feature, but rather define a vocabulary that is useful 
for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.
Oct 22 2014
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 22 October 2014 at 11:10:35 UTC, Ola Fosheim 
Grøstad wrote:
 [snip]

 These kinds of optimizations are very difficult to achieve in a 
 low level backend, but you really need them in order to do 
 generic programming properly.

 A simple start would be to not provide term rewriting as a 
 language feature, but rather define a vocabulary that is useful 
 for phobos and hardcode term rewriting for that vocabulary.

 I think this is feasible.
Term rewriting is very interesting, and I believe some work has been done for this in Haskell. I don't believe anything has been done with this propositions and inference approach you describe. I see a number of problems: First, annotating this, in the presence of templates, is very hard. Consider: auto f(alias g, T)(T x) { return g(x); } We cannot possibly annotate this function with any of propositions you described because we know nothing about g or T. Like purity and nothrow, we'd have to deduce these properties, but most escape deduction in all but the most trivial cases. Suppose we could deduce a large subset of useful propositions, how does the programmer know what has been deduced? How can I tell what has been deduced and applied without having to disassemble the code to see what's actually going on? And even if everything is deduced correctly, and I know what's deduced, what if it does a transformation that's undesirable? For example, you changed linear_search to binary_search. What if I knew the element was likely to be at the front and would prefer linear_search to binary_search? If you have any, I'd love to see some papers on this kind of work.
Oct 22 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 22 October 2014 at 21:34:51 UTC, Peter Alexander 
wrote:
 Term rewriting is very interesting, and I believe some work has 
 been done for this in Haskell. I don't believe anything has 
 been done with this propositions and inference approach you 
 describe.
I would always assume it has been done plenty of work related to proving properties about a program since this is the holy grail of CS! I think it is related to so-called dependent types? http://en.wikipedia.org/wiki/Dependent_type I'd like to take a look at Xanadu some time… http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html And Agda which also guarantee termination! http://en.wikipedia.org/wiki/Agda_(programming_language)
 I see a number of problems:

 First, annotating this, in the presence of templates, is very 
 hard. Consider:

 auto f(alias g, T)(T x) { return g(x); }

 We cannot possibly annotate this function with any of 
 propositions you described because we know nothing about g or T.
Why not? You know the moment you instantiate. If you had pattern matching you could break the signature into multiple distinguishing ones, but the equivalent is to put conditionals into the postcondition.
 Like purity and nothrow, we'd have to deduce these properties, 
 but most escape deduction in all but the most trivial cases.
I think you can solve this with logic/constraints programming and simple dataflow analysis. // a is notnull and aliasfree b = sort(a) // b is notnull, aliasfree, sorted_ascending(somekey) // assume f is sortpreserving, pure c = f(b) // b is notnull, aliasfree, sorted_ascending(somekey) // assume reverse is sortpreserving, pure d = reverse(b) // d is notnull, aliasfree, sorted_descending(somekey) e = unknownfunction(b) // e is typeinfobased
 Suppose we could deduce a large subset of useful propositions, 
 how does the programmer know what has been deduced?
He doesn't have to. The point is to express common patterns in the most terse and readable way. Developers analyze this and come up with the appropriate heuristics. Juste like with peephole optimization?
 How can I tell what has been deduced and applied without having 
 to disassemble the code to see what's actually going on?
If you are that performance oriented then you will not use templates at all! Generic programming is not suitable for performance… Because performance comes with changing the data to fit the hardware, e.g. SIMD instruction set, caches etc
 And even if everything is deduced correctly, and I know what's 
 deduced, what if it does a transformation that's undesirable? 
 For example, you changed linear_search to binary_search. What 
 if I knew the element was likely to be at the front and would 
 prefer linear_search to binary_search?
That's a good point, but then you should be able to guide the search by adding an optimization hint that states that the sought element probably is at the front, or that you constrain the search to a linear_search. But that does not have to be part of the semantics of the program. So you can keep correctness and optimization separate. IMO it is always an advantage to keep the code that has to do with correctness short, simple and readable.
Oct 22 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim 
Grøstad wrote:
 I think it is related to so-called dependent types?
 http://en.wikipedia.org/wiki/Dependent_type
And: http://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types And perhaps: http://en.wikipedia.org/wiki/Liskov_substitution_principle
Oct 22 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On a related note, semantic slicing is an interesting concept:

http://en.wikipedia.org/wiki/FermaT_Transformation_System
Oct 22 2014
prev sibling parent reply "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim 
Grøstad wrote:

 I  would always assume it has been done plenty of work related 
 to proving properties about a program since this is the holy 
 grail of CS!

 I think it is related to so-called dependent types?
 http://en.wikipedia.org/wiki/Dependent_type
Yes, dependent types allow expressing properties like the ones you describe. However a) it's not easy at all even for simple data structures, often requiring defining many additional types and lemmas, b) checking them requires turning your compiler into a proof-checker, c) what works in "clean room" (like high-level total functional language) is hardly feasible in a "dirty" language like D where you can go as unsafe as you want.
 I'd like to take a look at Xanadu some time…
 http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html
That's obsolete, I guess, better take a look at ATS language from the same author. It's really really close to what you're thinking of here, minus the rewriting.
 And Agda which also guarantee termination!
 http://en.wikipedia.org/wiki/Agda_(programming_language)
Yep, all dependently typed languages like Agda, ATS or Idris require the functions you use in types to be terminating (otherwise your proofs become unsound and worthless) and they all include termination checking. To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)
Oct 23 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 23 October 2014 at 09:41:04 UTC, thedeemon wrote:
 Yes, dependent types allow expressing properties like the ones 
 you describe. However
 a) it's not easy at all even for simple data structures, often 
 requiring defining many additional types and lemmas,
Which is why I only suggest language-builtin-propositions on the library level.
 b) checking them requires turning your compiler into a 
 proof-checker,
I don't propose checking, as in an assert(). I propose to assume() them and then run regular dataflow over the assumptions that have been vetted by the library author. Pretty close to what Walter wanted in the "assert as assume" thread, but safer.
 c) what works in "clean room" (like high-level total functional 
 language) is hardly feasible in a "dirty" language like D where 
 you can go as unsafe as you want.
That is true, which is why you have to do this at the high level, and assume the worst.
 To scare you well, here, for example, is my Smoothsort 
 implementation in ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
Thanks! I'll look at it later, but note that I am not proposing something like coq, ats or agda. No theorem prover. What I am proposing is that libraries can provide assumed facts about the result, then propagate those facts in the dataflow until they have all become invalid.
 properties. Writing it took me a few weeks. You don't want to 
 turn D into this mess. ;)
No, I want to turn it into a more pragmatic mess. ;-)
Oct 23 2014
parent reply "Low Functioning" <nathan.judge gmail.com> writes:
How about a function returns a T', which is implicitly 
convertible to T, where T' has some enum "tags" attached to it.
Oct 23 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning 
wrote:
 How about a function returns a T', which is implicitly 
 convertible to T, where T' has some enum "tags" attached to it.
Why is implicit conversion a problem? To the compiler it would just be another function call?
Oct 23 2014
parent "Low Functioning" <nathan.judge gmail.com> writes:
On Thursday, 23 October 2014 at 15:18:09 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning 
 wrote:
 How about a function returns a T', which is implicitly 
 convertible to T, where T' has some enum "tags" attached to it.
Why is implicit conversion a problem? To the compiler it would just be another function call?
Not everything is generic, for one reason or another. struct notimplicit(T) { T _x; enum fubared; } struct foo(T) { T _x; alias _x this; enum fubared; } unittest { notimplicit!int a; //int _a = a; //error foo!int b; int _b = b; } While it wouldn't matter for a fully generic pipeline, and you'd lose the fubared tag if you turned it back to the base type, it might be handy to propagate the fubared type while remaining compatible with the base.
Oct 23 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the Leonardo
 heaps used in the process have proper form and properties. Writing it
 took me a few weeks. You don't want to turn D into this mess. ;)
I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Oct 27 2014
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort 
 implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the 
 Leonardo
 heaps used in the process have proper form and properties. 
 Writing it
 took me a few weeks. You don't want to turn D into this mess. 
 ;)
I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Sound like madness :) I'd love to see it.
Oct 27 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/27/14 6:53 PM, deadalnix wrote:
 On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
 wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the Leonardo
 heaps used in the process have proper form and properties. Writing it
 took me a few weeks. You don't want to turn D into this mess. ;)
I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Sound like madness :) I'd love to see it.
Looked again for it, couldn't find it. Got this other one instead: http://lara.epfl.ch/~psuter/articles/BlancETAL13OverviewLeonVerificationSystem.pdf Andrei
Oct 28 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I recall there was an earlier implementation of a 
 statically-checked sort, maybe in Agda? It wouldn't typecheck 
 if the output array weren't sorted.
Yes, there is a similar code even in ATS language (that is much simpler than Agda, you can't verify a generic proof as in Agda), this is a verified QuickSort-like on lists (I don't remember an equivalent verified QuickSort on arrays in ATS): http://dpaste.dzfl.pl/e60eeb30e3b6 Bye, bearophile
Oct 28 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
 http://dpaste.dzfl.pl/e60eeb30e3b6
That code is in ATS1. Now there is ATS2 that has a better syntax, and is a bit more powerful (and can compile even to JavaScript). On the ATS site all the ATS1 examples apparently have being removed. Bye, bearophile
Oct 28 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/28/14 2:26 AM, bearophile wrote:
 Andrei Alexandrescu:

 I recall there was an earlier implementation of a statically-checked
 sort, maybe in Agda? It wouldn't typecheck if the output array weren't
 sorted.
Yes, there is a similar code even in ATS language (that is much simpler than Agda, you can't verify a generic proof as in Agda), this is a verified QuickSort-like on lists (I don't remember an equivalent verified QuickSort on arrays in ATS): http://dpaste.dzfl.pl/e60eeb30e3b6
Looks like the one I saw years ago: a proof that you don't want that kind of stuff :o). -- Andrei
Oct 28 2014
parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Looks like the one I saw years ago: a proof that you don't want 
 that kind of stuff :o). -- Andrei
Silly Andrei :-) Bye, bearophile
Oct 28 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 28 October 2014 at 09:26:35 UTC, bearophile wrote:
 Yes, there is a similar code even in ATS language (that is much 
 simpler than Agda, you can't verify a generic proof as in 
 Agda), this is a verified QuickSort-like on lists (I don't 
 remember an equivalent verified QuickSort on arrays in ATS):
 http://dpaste.dzfl.pl/e60eeb30e3b6
Quick sort and insertion sort is relatively easy to prove correct using induction: 1. prove termination 2. prove output is permutation of input 3. prove that the processed units are sorted (left part in insertion sort) But this is not what I am suggesting in this thread. This is not meant for application level code, but for libraries and compiler. So rather than proving property 2 and 3 it should just be provided as facts from the library, then the compiler will propagate that knowledge through the call graph until it has been deemed uncertain. For instance if you have asserted that a value exist in an array, then that knowledge will hold for all permutations of that array further down the call graph.
Oct 29 2014
prev sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu 
wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort 
 implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the 
 Leonardo
 heaps used in the process have proper form and properties. 
 Writing it
 took me a few weeks. You don't want to turn D into this mess. 
 ;)
I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Here's one in Agda: http://twanvl.nl/blog/agda/sorting
Oct 28 2014