digitalmars.D - Algorithms, term rewriting and compile time reflection
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (52/52) Oct 22 2014 Generic programming without high level analytical capabilities
- Peter Alexander (23/31) Oct 22 2014 Term rewriting is very interesting, and I believe some work has
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (44/65) Oct 22 2014 I would always assume it has been done plenty of work related to
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/8) Oct 22 2014 And:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/2) Oct 22 2014 On a related note, semantic slicing is an interesting concept:
- thedeemon (25/34) Oct 23 2014 Yes, dependent types allow expressing properties like the ones
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/30) Oct 23 2014 Which is why I only suggest language-builtin-propositions on the
- Low Functioning (2/2) Oct 23 2014 How about a function returns a T', which is implicitly
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/6) Oct 23 2014 Why is implicit conversion a problem? To the compiler it would
- Low Functioning (22/28) Oct 23 2014 Not everything is generic, for one reason or another.
- Andrei Alexandrescu (5/11) Oct 27 2014 I recall there was an earlier implementation of a statically-checked
- deadalnix (3/18) Oct 27 2014 Sound like madness :) I'd love to see it.
- Andrei Alexandrescu (4/20) Oct 28 2014 Looked again for it, couldn't find it. Got this other one instead:
- bearophile (8/11) Oct 28 2014 Yes, there is a similar code even in ATS language (that is much
- bearophile (6/7) Oct 28 2014 That code is in ATS1. Now there is ATS2 that has a better syntax,
- Andrei Alexandrescu (3/12) Oct 28 2014 Looks like the one I saw years ago: a proof that you don't want that
- bearophile (4/6) Oct 28 2014 Silly Andrei :-)
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/21) Oct 29 2014 Quick sort and insertion sort is relatively easy to prove correct
- thedeemon (4/19) Oct 28 2014 Here's one in Agda:
Generic programming without high level analytical capabilities are problematic since you cannot optimize everything on a low level. For instance, some algorithms run better if the input is sorted, and although you can cover one property with typing (e.g. sorted range) the combinatorial explosion is too big and the code becomes very messy. The better approach is to let the compiler deduce properties of the input/output based on stronger assertions than postconditions. And it should work for FFI as well as for D. That way D can obtain meta-information about C functions. Let's call it input-output-assumptions and take a look at: y = f(x) After calling f on x, f could provide propositions about y such as: - y is null iff x is null - length(y) == length(x) if x is not null - y is sorted for all x if x is not null - y contains the same elements as x - y is alias free (copy by value) It should also state whether the semantic side effects has been completely described. Then you need statments for f that says something about how functions affects the properties of input like. E.g.: z = g( f(x) ) For g we could say that it preserves: - element order - element values - z is an alias free copy of the input Thus we cannot assume length(z) == length(x), but we can assume that it is sorted and that z is a subset of x. That means the compiler can optimize expressions. E.g.: y = f(x); z = g(y); inline_sort(z); ==> ; x = union(z,y); ==> ; canFind(y,v) || canFind(z,v) ==> canFind(y,v); linear_find(x,v) => binary_search(y,v) With term rewriting and compile time reflection we can now perform optimizations by querying properties of expressions, looking up dependent results of an expression, and doing a search over various ways of ordering expressions. f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of the effects of sort() is complete and if the total effects of f,g and x completely covers what sort() do. These kinds of optimizations are very difficult to achieve in a low level backend, but you really need them in order to do generic programming properly. A simple start would be to not provide term rewriting as a language feature, but rather define a vocabulary that is useful for phobos and hardcode term rewriting for that vocabulary. I think this is feasible.
Oct 22 2014
On Wednesday, 22 October 2014 at 11:10:35 UTC, Ola Fosheim Grøstad wrote:[snip] These kinds of optimizations are very difficult to achieve in a low level backend, but you really need them in order to do generic programming properly. A simple start would be to not provide term rewriting as a language feature, but rather define a vocabulary that is useful for phobos and hardcode term rewriting for that vocabulary. I think this is feasible.Term rewriting is very interesting, and I believe some work has been done for this in Haskell. I don't believe anything has been done with this propositions and inference approach you describe. I see a number of problems: First, annotating this, in the presence of templates, is very hard. Consider: auto f(alias g, T)(T x) { return g(x); } We cannot possibly annotate this function with any of propositions you described because we know nothing about g or T. Like purity and nothrow, we'd have to deduce these properties, but most escape deduction in all but the most trivial cases. Suppose we could deduce a large subset of useful propositions, how does the programmer know what has been deduced? How can I tell what has been deduced and applied without having to disassemble the code to see what's actually going on? And even if everything is deduced correctly, and I know what's deduced, what if it does a transformation that's undesirable? For example, you changed linear_search to binary_search. What if I knew the element was likely to be at the front and would prefer linear_search to binary_search? If you have any, I'd love to see some papers on this kind of work.
Oct 22 2014
On Wednesday, 22 October 2014 at 21:34:51 UTC, Peter Alexander wrote:Term rewriting is very interesting, and I believe some work has been done for this in Haskell. I don't believe anything has been done with this propositions and inference approach you describe.I would always assume it has been done plenty of work related to proving properties about a program since this is the holy grail of CS! I think it is related to so-called dependent types? http://en.wikipedia.org/wiki/Dependent_type I'd like to take a look at Xanadu some time… http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html And Agda which also guarantee termination! http://en.wikipedia.org/wiki/Agda_(programming_language)I see a number of problems: First, annotating this, in the presence of templates, is very hard. Consider: auto f(alias g, T)(T x) { return g(x); } We cannot possibly annotate this function with any of propositions you described because we know nothing about g or T.Why not? You know the moment you instantiate. If you had pattern matching you could break the signature into multiple distinguishing ones, but the equivalent is to put conditionals into the postcondition.Like purity and nothrow, we'd have to deduce these properties, but most escape deduction in all but the most trivial cases.I think you can solve this with logic/constraints programming and simple dataflow analysis. // a is notnull and aliasfree b = sort(a) // b is notnull, aliasfree, sorted_ascending(somekey) // assume f is sortpreserving, pure c = f(b) // b is notnull, aliasfree, sorted_ascending(somekey) // assume reverse is sortpreserving, pure d = reverse(b) // d is notnull, aliasfree, sorted_descending(somekey) e = unknownfunction(b) // e is typeinfobasedSuppose we could deduce a large subset of useful propositions, how does the programmer know what has been deduced?He doesn't have to. The point is to express common patterns in the most terse and readable way. Developers analyze this and come up with the appropriate heuristics. Juste like with peephole optimization?How can I tell what has been deduced and applied without having to disassemble the code to see what's actually going on?If you are that performance oriented then you will not use templates at all! Generic programming is not suitable for performance… Because performance comes with changing the data to fit the hardware, e.g. SIMD instruction set, caches etcAnd even if everything is deduced correctly, and I know what's deduced, what if it does a transformation that's undesirable? For example, you changed linear_search to binary_search. What if I knew the element was likely to be at the front and would prefer linear_search to binary_search?That's a good point, but then you should be able to guide the search by adding an optimization hint that states that the sought element probably is at the front, or that you constrain the search to a linear_search. But that does not have to be part of the semantics of the program. So you can keep correctness and optimization separate. IMO it is always an advantage to keep the code that has to do with correctness short, simple and readable.
Oct 22 2014
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim Grøstad wrote:I think it is related to so-called dependent types? http://en.wikipedia.org/wiki/Dependent_typeAnd: http://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types And perhaps: http://en.wikipedia.org/wiki/Liskov_substitution_principle
Oct 22 2014
On a related note, semantic slicing is an interesting concept: http://en.wikipedia.org/wiki/FermaT_Transformation_System
Oct 22 2014
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim Grøstad wrote:I would always assume it has been done plenty of work related to proving properties about a program since this is the holy grail of CS! I think it is related to so-called dependent types? http://en.wikipedia.org/wiki/Dependent_typeYes, dependent types allow expressing properties like the ones you describe. However a) it's not easy at all even for simple data structures, often requiring defining many additional types and lemmas, b) checking them requires turning your compiler into a proof-checker, c) what works in "clean room" (like high-level total functional language) is hardly feasible in a "dirty" language like D where you can go as unsafe as you want.I'd like to take a look at Xanadu some time… http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.htmlThat's obsolete, I guess, better take a look at ATS language from the same author. It's really really close to what you're thinking of here, minus the rewriting.And Agda which also guarantee termination! http://en.wikipedia.org/wiki/Agda_(programming_language)Yep, all dependently typed languages like Agda, ATS or Idris require the functions you use in types to be terminating (otherwise your proofs become unsound and worthless) and they all include termination checking. To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)
Oct 23 2014
On Thursday, 23 October 2014 at 09:41:04 UTC, thedeemon wrote:Yes, dependent types allow expressing properties like the ones you describe. However a) it's not easy at all even for simple data structures, often requiring defining many additional types and lemmas,Which is why I only suggest language-builtin-propositions on the library level.b) checking them requires turning your compiler into a proof-checker,I don't propose checking, as in an assert(). I propose to assume() them and then run regular dataflow over the assumptions that have been vetted by the library author. Pretty close to what Walter wanted in the "assert as assume" thread, but safer.c) what works in "clean room" (like high-level total functional language) is hardly feasible in a "dirty" language like D where you can go as unsafe as you want.That is true, which is why you have to do this at the high level, and assume the worst.To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.htmlThanks! I'll look at it later, but note that I am not proposing something like coq, ats or agda. No theorem prover. What I am proposing is that libraries can provide assumed facts about the result, then propagate those facts in the dataflow until they have all become invalid.properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)No, I want to turn it into a more pragmatic mess. ;-)
Oct 23 2014
How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.
Oct 23 2014
On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning wrote:How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.Why is implicit conversion a problem? To the compiler it would just be another function call?
Oct 23 2014
On Thursday, 23 October 2014 at 15:18:09 UTC, Ola Fosheim Grøstad wrote:On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning wrote:Not everything is generic, for one reason or another. struct notimplicit(T) { T _x; enum fubared; } struct foo(T) { T _x; alias _x this; enum fubared; } unittest { notimplicit!int a; //int _a = a; //error foo!int b; int _b = b; } While it wouldn't matter for a fully generic pipeline, and you'd lose the fubared tag if you turned it back to the base type, it might be handy to propagate the fubared type while remaining compatible with the base.How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.Why is implicit conversion a problem? To the compiler it would just be another function call?
Oct 23 2014
On 10/23/14 2:41 AM, thedeemon wrote:To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Oct 27 2014
On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu wrote:On 10/23/14 2:41 AM, thedeemon wrote:Sound like madness :) I'd love to see it.To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Oct 27 2014
On 10/27/14 6:53 PM, deadalnix wrote:On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu wrote:Looked again for it, couldn't find it. Got this other one instead: http://lara.epfl.ch/~psuter/articles/BlancETAL13OverviewLeonVerificationSystem.pdf AndreiOn 10/23/14 2:41 AM, thedeemon wrote:Sound like madness :) I'd love to see it.To scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Oct 28 2014
Andrei Alexandrescu:I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted.Yes, there is a similar code even in ATS language (that is much simpler than Agda, you can't verify a generic proof as in Agda), this is a verified QuickSort-like on lists (I don't remember an equivalent verified QuickSort on arrays in ATS): http://dpaste.dzfl.pl/e60eeb30e3b6 Bye, bearophile
Oct 28 2014
http://dpaste.dzfl.pl/e60eeb30e3b6That code is in ATS1. Now there is ATS2 that has a better syntax, and is a bit more powerful (and can compile even to JavaScript). On the ATS site all the ATS1 examples apparently have being removed. Bye, bearophile
Oct 28 2014
On 10/28/14 2:26 AM, bearophile wrote:Andrei Alexandrescu:Looks like the one I saw years ago: a proof that you don't want that kind of stuff :o). -- AndreiI recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted.Yes, there is a similar code even in ATS language (that is much simpler than Agda, you can't verify a generic proof as in Agda), this is a verified QuickSort-like on lists (I don't remember an equivalent verified QuickSort on arrays in ATS): http://dpaste.dzfl.pl/e60eeb30e3b6
Oct 28 2014
Andrei Alexandrescu:Looks like the one I saw years ago: a proof that you don't want that kind of stuff :o). -- AndreiSilly Andrei :-) Bye, bearophile
Oct 28 2014
On Tuesday, 28 October 2014 at 09:26:35 UTC, bearophile wrote:Yes, there is a similar code even in ATS language (that is much simpler than Agda, you can't verify a generic proof as in Agda), this is a verified QuickSort-like on lists (I don't remember an equivalent verified QuickSort on arrays in ATS): http://dpaste.dzfl.pl/e60eeb30e3b6Quick sort and insertion sort is relatively easy to prove correct using induction: 1. prove termination 2. prove output is permutation of input 3. prove that the processed units are sorted (left part in insertion sort) But this is not what I am suggesting in this thread. This is not meant for application level code, but for libraries and compiler. So rather than proving property 2 and 3 it should just be provided as facts from the library, then the compiler will propagate that knowledge through the call graph until it has been deemed uncertain. For instance if you have asserted that a value exist in an array, then that knowledge will hold for all permutations of that array further down the call graph.
Oct 29 2014
On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu wrote:On 10/23/14 2:41 AM, thedeemon wrote:Here's one in Agda: http://twanvl.nl/blog/agda/sortingTo scare you well, here, for example, is my Smoothsort implementation in ATS http://stuff.thedeemon.com/lj/smooth_dats.html that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)I recall there was an earlier implementation of a statically-checked sort, maybe in Agda? It wouldn't typecheck if the output array weren't sorted. Andrei
Oct 28 2014