digitalmars.D - Algorithms, term rewriting and compile time reflection

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (52/52) Oct 22 2014 Generic programming without high level analytical capabilities

Peter Alexander (23/31) Oct 22 2014 Term rewriting is very interesting, and I believe some work has

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (44/65) Oct 22 2014 I would always assume it has been done plenty of work related to

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/8) Oct 22 2014 And:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/2) Oct 22 2014 On a related note, semantic slicing is an interesting concept:

thedeemon (25/34) Oct 23 2014 Yes, dependent types allow expressing properties like the ones

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/30) Oct 23 2014 Which is why I only suggest language-builtin-propositions on the

Low Functioning (2/2) Oct 23 2014 How about a function returns a T', which is implicitly

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/6) Oct 23 2014 Why is implicit conversion a problem? To the compiler it would

Low Functioning (22/28) Oct 23 2014 Not everything is generic, for one reason or another.

Andrei Alexandrescu (5/11) Oct 27 2014 I recall there was an earlier implementation of a statically-checked

deadalnix (3/18) Oct 27 2014 Sound like madness :) I'd love to see it.

Andrei Alexandrescu (4/20) Oct 28 2014 Looked again for it, couldn't find it. Got this other one instead:

bearophile (8/11) Oct 28 2014 Yes, there is a similar code even in ATS language (that is much

bearophile (6/7) Oct 28 2014 That code is in ATS1. Now there is ATS2 that has a better syntax,
Andrei Alexandrescu (3/12) Oct 28 2014 Looks like the one I saw years ago: a proof that you don't want that

bearophile (4/6) Oct 28 2014 Silly Andrei :-)

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/21) Oct 29 2014 Quick sort and insertion sort is relatively easy to prove correct

thedeemon (4/19) Oct 28 2014 Here's one in Agda:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Generic programming without high level analytical capabilities 
are problematic since you cannot optimize everything on a low 
level.

For instance, some algorithms run better if the input is sorted, 
and although you can cover one property with typing (e.g. sorted 
range) the combinatorial explosion is too big and the code 
becomes very messy.

The better approach is to let the compiler deduce properties of 
the input/output based on stronger assertions than 
postconditions. And it should work for FFI as well as for D. That 
way D can obtain meta-information about C functions.

Let's call it input-output-assumptions and take a look at:

y = f(x)

After calling f on x, f could provide propositions about y such 
as:

- y is null iff x is null
- length(y) == length(x) if x is not null
- y is sorted for all x if x is not null
- y contains the same elements as x
- y is alias free (copy by value)

It should also state whether the semantic side effects has been 
completely described.

Then you need statments for f that says something about how 
functions affects the properties of input like. E.g.:

z = g( f(x) )

For g we could say that it preserves:
- element order
- element values
- z is an alias free copy of the input

Thus we cannot assume length(z) == length(x), but we can assume 
that it is sorted and that z is a subset of x. That means the 
compiler can optimize expressions. E.g.:

y = f(x);
z = g(y);

inline_sort(z);  ==>  ;

x = union(z,y); ==> ;

canFind(y,v) || canFind(z,v) ==> canFind(y,v);

linear_find(x,v)  =>  binary_search(y,v)

With term rewriting and compile time reflection we can now 
perform optimizations by querying properties of expressions, 
looking up dependent results of an expression, and doing a search 
over various ways of ordering expressions.

f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of 
the effects of sort() is complete and if the total effects of f,g 
and x completely covers what sort() do.

These kinds of optimizations are very difficult to achieve in a 
low level backend, but you really need them in order to do 
generic programming properly.

A simple start would be to not provide term rewriting as a 
language feature, but rather define a vocabulary that is useful 
for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.

Oct 22 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Wednesday, 22 October 2014 at 11:10:35 UTC, Ola Fosheim 
Grøstad wrote:
 [snip]

 These kinds of optimizations are very difficult to achieve in a 
 low level backend, but you really need them in order to do 
 generic programming properly.

 A simple start would be to not provide term rewriting as a 
 language feature, but rather define a vocabulary that is useful 
 for phobos and hardcode term rewriting for that vocabulary.

 I think this is feasible.

Term rewriting is very interesting, and I believe some work has 
been done for this in Haskell. I don't believe anything has been 
done with this propositions and inference approach you describe. 
I see a number of problems:

First, annotating this, in the presence of templates, is very 
hard. Consider:

auto f(alias g, T)(T x) { return g(x); }

We cannot possibly annotate this function with any of 
propositions you described because we know nothing about g or T. 
Like purity and nothrow, we'd have to deduce these properties, 
but most escape deduction in all but the most trivial cases.

Suppose we could deduce a large subset of useful propositions, 
how does the programmer know what has been deduced? How can I 
tell what has been deduced and applied without having to 
disassemble the code to see what's actually going on?

And even if everything is deduced correctly, and I know what's 
deduced, what if it does a transformation that's undesirable? For 
example, you changed linear_search to binary_search. What if I 
knew the element was likely to be at the front and would prefer 
linear_search to binary_search?

If you have any, I'd love to see some papers on this kind of work.

Oct 22 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 22 October 2014 at 21:34:51 UTC, Peter Alexander 
wrote:
 Term rewriting is very interesting, and I believe some work has 
 been done for this in Haskell. I don't believe anything has 
 been done with this propositions and inference approach you 
 describe.

I  would always assume it has been done plenty of work related to 
proving properties about a program since this is the holy grail 
of CS!

I think it is related to so-called dependent types?
http://en.wikipedia.org/wiki/Dependent_type

I'd like to take a look at Xanadu some time…
http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html

And Agda which also guarantee termination!
http://en.wikipedia.org/wiki/Agda_(programming_language)


 I see a number of problems:

 First, annotating this, in the presence of templates, is very 
 hard. Consider:

 auto f(alias g, T)(T x) { return g(x); }

 We cannot possibly annotate this function with any of 
 propositions you described because we know nothing about g or T.

Why not? You know the moment you instantiate.

If you had pattern matching you could break the signature into 
multiple distinguishing ones, but the equivalent is to put 
conditionals into the postcondition.

 Like purity and nothrow, we'd have to deduce these properties, 
 but most escape deduction in all but the most trivial cases.

I think you can solve this with logic/constraints programming and 
simple dataflow analysis.

// a is  notnull and  aliasfree
b = sort(a)
// b is  notnull,  aliasfree,  sorted_ascending(somekey)

// assume f is  sortpreserving,  pure
c = f(b)
// b is  notnull,  aliasfree,  sorted_ascending(somekey)

// assume reverse is  sortpreserving,  pure
d = reverse(b)
// d is  notnull,  aliasfree,  sorted_descending(somekey)

e = unknownfunction(b)
// e is  typeinfobased

 Suppose we could deduce a large subset of useful propositions, 
 how does the programmer know what has been deduced?

He doesn't have to. The point is to express common patterns in 
the most terse and readable way. Developers analyze this and come 
up with the appropriate heuristics. Juste like with peephole 
optimization?

 How can I tell what has been deduced and applied without having 
 to disassemble the code to see what's actually going on?

If you are that performance oriented then you will not use 
templates at all! Generic programming is not suitable for 
performance… Because performance comes with changing the data to 
fit the hardware, e.g. SIMD instruction set, caches etc

 And even if everything is deduced correctly, and I know what's 
 deduced, what if it does a transformation that's undesirable? 
 For example, you changed linear_search to binary_search. What 
 if I knew the element was likely to be at the front and would 
 prefer linear_search to binary_search?

That's a good point, but then you should be able to guide the 
search by adding an optimization hint that states that the sought 
element probably is at the front, or that you constrain the 
search to a linear_search.

But that does not have to be part of the semantics of the 
program. So you can keep correctness and optimization separate.

IMO it is always an advantage to keep the code that has to do 
with correctness short, simple and readable.

Oct 22 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim 
Grøstad wrote:
 I think it is related to so-called dependent types?
 http://en.wikipedia.org/wiki/Dependent_type

And:

http://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types

And perhaps:

http://en.wikipedia.org/wiki/Liskov_substitution_principle

Oct 22 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On a related note, semantic slicing is an interesting concept:

http://en.wikipedia.org/wiki/FermaT_Transformation_System

Oct 22 2014

"thedeemon" <dlang thedeemon.com> writes:

On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim 
Grøstad wrote:

 I  would always assume it has been done plenty of work related 
 to proving properties about a program since this is the holy 
 grail of CS!

 I think it is related to so-called dependent types?
 http://en.wikipedia.org/wiki/Dependent_type

Yes, dependent types allow expressing properties like the ones 
you describe. However
a) it's not easy at all even for simple data structures, often 
requiring defining many additional types and lemmas,
b) checking them requires turning your compiler into a 
proof-checker,
c) what works in "clean room" (like high-level total functional 
language) is hardly feasible in a "dirty" language like D where 
you can go as unsafe as you want.

 I'd like to take a look at Xanadu some time…
 http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html

That's obsolete, I guess, better take a look at ATS language from 
the same author. It's really really close to what you're thinking 
of here, minus the rewriting.

 And Agda which also guarantee termination!
 http://en.wikipedia.org/wiki/Agda_(programming_language)

Yep, all dependently typed languages like Agda, ATS or Idris 
require the functions you use in types to be terminating 
(otherwise your proofs become unsound and worthless) and they all 
include termination checking.

To scare you well, here, for example, is my Smoothsort 
implementation in ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the 
Leonardo heaps used in the process have proper form and 
properties. Writing it took me a few weeks. You don't want to 
turn D into this mess. ;)

Oct 23 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 23 October 2014 at 09:41:04 UTC, thedeemon wrote:
 Yes, dependent types allow expressing properties like the ones 
 you describe. However
 a) it's not easy at all even for simple data structures, often 
 requiring defining many additional types and lemmas,

Which is why I only suggest language-builtin-propositions on the 
library level.

 b) checking them requires turning your compiler into a 
 proof-checker,

I don't propose checking, as in an assert(). I propose to 
assume() them and then run regular dataflow over the assumptions 
that have been vetted by the library author.

Pretty close to what Walter wanted in the "assert as assume" 
thread, but safer.

 c) what works in "clean room" (like high-level total functional 
 language) is hardly feasible in a "dirty" language like D where 
 you can go as unsafe as you want.

That is true, which is why you have to do this at the high level, 
and assume the worst.

 To scare you well, here, for example, is my Smoothsort 
 implementation in ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html

Thanks! I'll look at it later, but note that I am not proposing 
something like coq, ats or agda. No theorem prover.

What I am proposing is that libraries can provide assumed facts 
about the result, then propagate those facts in the dataflow 
until they have all become invalid.

 properties. Writing it took me a few weeks. You don't want to 
 turn D into this mess. ;)

No, I want to turn it into a more pragmatic mess. ;-)

Oct 23 2014

"Low Functioning" <nathan.judge gmail.com> writes:

How about a function returns a T', which is implicitly 
convertible to T, where T' has some enum "tags" attached to it.

Oct 23 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning 
wrote:
 How about a function returns a T', which is implicitly 
 convertible to T, where T' has some enum "tags" attached to it.

Why is implicit conversion a problem? To the compiler it would 
just be another function call?

Oct 23 2014

"Low Functioning" <nathan.judge gmail.com> writes:

On Thursday, 23 October 2014 at 15:18:09 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning 
 wrote:
 How about a function returns a T', which is implicitly 
 convertible to T, where T' has some enum "tags" attached to it.

 Why is implicit conversion a problem? To the compiler it would 
 just be another function call?

Not everything is generic, for one reason or another.

struct notimplicit(T) {
	T _x;
	enum fubared;
}

struct foo(T) {
	T _x;
	alias _x this;
	enum fubared;
}

unittest {
	notimplicit!int a;
	//int _a = a; //error

	foo!int b;
	int _b = b;
}

While it wouldn't matter for a fully generic pipeline, and you'd 
lose the fubared tag if you turned it back to the base type, it 
might be handy to propagate the fubared type while remaining 
compatible with the base.

Oct 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the Leonardo
 heaps used in the process have proper form and properties. Writing it
 took me a few weeks. You don't want to turn D into this mess. ;)

I recall there was an earlier implementation of a statically-checked 
sort, maybe in Agda? It wouldn't typecheck if the output array weren't 
sorted.

Andrei

Oct 27 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort 
 implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the 
 Leonardo
 heaps used in the process have proper form and properties. 
 Writing it
 took me a few weeks. You don't want to turn D into this mess. 
 ;)

 I recall there was an earlier implementation of a 
 statically-checked sort, maybe in Agda? It wouldn't typecheck 
 if the output array weren't sorted.

 Andrei

Sound like madness :) I'd love to see it.

Oct 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/27/14 6:53 PM, deadalnix wrote:
 On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
 wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the Leonardo
 heaps used in the process have proper form and properties. Writing it
 took me a few weeks. You don't want to turn D into this mess. ;)

 I recall there was an earlier implementation of a statically-checked
 sort, maybe in Agda? It wouldn't typecheck if the output array weren't
 sorted.

 Andrei

 Sound like madness :) I'd love to see it.

Looked again for it, couldn't find it. Got this other one instead:

http://lara.epfl.ch/~psuter/articles/BlancETAL13OverviewLeonVerificationSystem.pdf


Andrei

Oct 28 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 I recall there was an earlier implementation of a 
 statically-checked sort, maybe in Agda? It wouldn't typecheck 
 if the output array weren't sorted.

Yes, there is a similar code even in ATS language (that is much 
simpler than Agda, you can't verify a generic proof as in Agda), 
this is a verified QuickSort-like on lists (I don't remember an 
equivalent verified QuickSort on arrays in ATS):
http://dpaste.dzfl.pl/e60eeb30e3b6

Bye,
bearophile

Oct 28 2014

"bearophile" <bearophileHUGS lycos.com> writes:

 http://dpaste.dzfl.pl/e60eeb30e3b6

That code is in ATS1. Now there is ATS2 that has a better syntax, 
and is a bit more powerful (and can compile even to JavaScript). 
On the ATS site all the ATS1 examples apparently have being 
removed.

Bye,
bearophile

Oct 28 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/28/14 2:26 AM, bearophile wrote:
 Andrei Alexandrescu:

 I recall there was an earlier implementation of a statically-checked
 sort, maybe in Agda? It wouldn't typecheck if the output array weren't
 sorted.

 Yes, there is a similar code even in ATS language (that is much simpler
 than Agda, you can't verify a generic proof as in Agda), this is a
 verified QuickSort-like on lists (I don't remember an equivalent
 verified QuickSort on arrays in ATS):
 http://dpaste.dzfl.pl/e60eeb30e3b6

Looks like the one I saw years ago: a proof that you don't want that 
kind of stuff :o). -- Andrei

Oct 28 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 Looks like the one I saw years ago: a proof that you don't want 
 that kind of stuff :o). -- Andrei

Silly Andrei :-)

Bye,
bearophile

Oct 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 28 October 2014 at 09:26:35 UTC, bearophile wrote:
 Yes, there is a similar code even in ATS language (that is much 
 simpler than Agda, you can't verify a generic proof as in 
 Agda), this is a verified QuickSort-like on lists (I don't 
 remember an equivalent verified QuickSort on arrays in ATS):
 http://dpaste.dzfl.pl/e60eeb30e3b6

Quick sort and insertion sort is relatively easy to prove correct 
using induction:
1. prove termination
2. prove output is permutation of input
3. prove that the processed units are sorted (left part in 
insertion sort)

But this is not what I am suggesting in this thread. This is not 
meant for application level code, but for libraries and compiler.

So rather than proving property 2 and 3 it should just be 
provided as facts from the library, then the compiler will 
propagate that knowledge through the call graph until it has been 
deemed uncertain.

For instance if you have asserted that a value exist in an array, 
then that knowledge will hold for all permutations of that array 
further down the call graph.

Oct 29 2014

"thedeemon" <dlang thedeemon.com> writes:

On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu 
wrote:
 On 10/23/14 2:41 AM, thedeemon wrote:
 To scare you well, here, for example, is my Smoothsort 
 implementation in
 ATS
 http://stuff.thedeemon.com/lj/smooth_dats.html
 that includes proofs that the array really gets sorted and the 
 Leonardo
 heaps used in the process have proper form and properties. 
 Writing it
 took me a few weeks. You don't want to turn D into this mess. 
 ;)

 I recall there was an earlier implementation of a 
 statically-checked sort, maybe in Agda? It wouldn't typecheck 
 if the output array weren't sorted.

 Andrei

Here's one in Agda:
http://twanvl.nl/blog/agda/sorting

Oct 28 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Algorithms, term rewriting and compile time reflection