## digitalmars.D - Algorithms, term rewriting and compile time reflection

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```Generic programming without high level analytical capabilities
are problematic since you cannot optimize everything on a low
level.

For instance, some algorithms run better if the input is sorted,
and although you can cover one property with typing (e.g. sorted
range) the combinatorial explosion is too big and the code
becomes very messy.

The better approach is to let the compiler deduce properties of
the input/output based on stronger assertions than
postconditions. And it should work for FFI as well as for D. That
way D can obtain meta-information about C functions.

Let's call it input-output-assumptions and take a look at:

y = f(x)

After calling f on x, f could provide propositions about y such
as:

- y is null iff x is null
- length(y) == length(x) if x is not null
- y is sorted for all x if x is not null
- y contains the same elements as x
- y is alias free (copy by value)

It should also state whether the semantic side effects has been
completely described.

Then you need statments for f that says something about how
functions affects the properties of input like. E.g.:

z = g( f(x) )

For g we could say that it preserves:
- element order
- element values
- z is an alias free copy of the input

Thus we cannot assume length(z) == length(x), but we can assume
that it is sorted and that z is a subset of x. That means the
compiler can optimize expressions. E.g.:

y = f(x);
z = g(y);

inline_sort(z);  ==>  ;

x = union(z,y); ==> ;

canFind(y,v) || canFind(z,v) ==> canFind(y,v);

linear_find(x,v)  =>  binary_search(y,v)

With term rewriting and compile time reflection we can now
perform optimizations by querying properties of expressions,
looking up dependent results of an expression, and doing a search
over various ways of ordering expressions.

f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of
the effects of sort() is complete and if the total effects of f,g
and x completely covers what sort() do.

These kinds of optimizations are very difficult to achieve in a
low level backend, but you really need them in order to do
generic programming properly.

A simple start would be to not provide term rewriting as a
language feature, but rather define a vocabulary that is useful
for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.
```
Oct 22 2014
"Peter Alexander" <peter.alexander.au gmail.com> writes:
```On Wednesday, 22 October 2014 at 11:10:35 UTC, Ola Fosheim
[snip]

These kinds of optimizations are very difficult to achieve in a
low level backend, but you really need them in order to do
generic programming properly.

A simple start would be to not provide term rewriting as a
language feature, but rather define a vocabulary that is useful
for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.

Term rewriting is very interesting, and I believe some work has
been done for this in Haskell. I don't believe anything has been
done with this propositions and inference approach you describe.
I see a number of problems:

First, annotating this, in the presence of templates, is very
hard. Consider:

auto f(alias g, T)(T x) { return g(x); }

We cannot possibly annotate this function with any of
propositions you described because we know nothing about g or T.
Like purity and nothrow, we'd have to deduce these properties,
but most escape deduction in all but the most trivial cases.

Suppose we could deduce a large subset of useful propositions,
how does the programmer know what has been deduced? How can I
tell what has been deduced and applied without having to
disassemble the code to see what's actually going on?

And even if everything is deduced correctly, and I know what's
deduced, what if it does a transformation that's undesirable? For
example, you changed linear_search to binary_search. What if I
knew the element was likely to be at the front and would prefer
linear_search to binary_search?

If you have any, I'd love to see some papers on this kind of work.
```
Oct 22 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On Wednesday, 22 October 2014 at 21:34:51 UTC, Peter Alexander
wrote:
Term rewriting is very interesting, and I believe some work has
been done for this in Haskell. I don't believe anything has
been done with this propositions and inference approach you
describe.

I  would always assume it has been done plenty of work related to
proving properties about a program since this is the holy grail
of CS!

I think it is related to so-called dependent types?
http://en.wikipedia.org/wiki/Dependent_type

I'd like to take a look at Xanadu some time…

And Agda which also guarantee termination!
http://en.wikipedia.org/wiki/Agda_(programming_language)

I see a number of problems:

First, annotating this, in the presence of templates, is very
hard. Consider:

auto f(alias g, T)(T x) { return g(x); }

We cannot possibly annotate this function with any of
propositions you described because we know nothing about g or T.

Why not? You know the moment you instantiate.

If you had pattern matching you could break the signature into
multiple distinguishing ones, but the equivalent is to put
conditionals into the postcondition.

Like purity and nothrow, we'd have to deduce these properties,
but most escape deduction in all but the most trivial cases.

I think you can solve this with logic/constraints programming and
simple dataflow analysis.

// a is  notnull and  aliasfree
b = sort(a)
// b is  notnull,  aliasfree,  sorted_ascending(somekey)

// assume f is  sortpreserving,  pure
c = f(b)
// b is  notnull,  aliasfree,  sorted_ascending(somekey)

// assume reverse is  sortpreserving,  pure
d = reverse(b)
// d is  notnull,  aliasfree,  sorted_descending(somekey)

e = unknownfunction(b)
// e is  typeinfobased

Suppose we could deduce a large subset of useful propositions,
how does the programmer know what has been deduced?

He doesn't have to. The point is to express common patterns in
the most terse and readable way. Developers analyze this and come
up with the appropriate heuristics. Juste like with peephole
optimization?

How can I tell what has been deduced and applied without having
to disassemble the code to see what's actually going on?

If you are that performance oriented then you will not use
templates at all! Generic programming is not suitable for
performance… Because performance comes with changing the data to
fit the hardware, e.g. SIMD instruction set, caches etc

And even if everything is deduced correctly, and I know what's
deduced, what if it does a transformation that's undesirable?
For example, you changed linear_search to binary_search. What
if I knew the element was likely to be at the front and would
prefer linear_search to binary_search?

That's a good point, but then you should be able to guide the
search by adding an optimization hint that states that the sought
element probably is at the front, or that you constrain the
search to a linear_search.

But that does not have to be part of the semantics of the
program. So you can keep correctness and optimization separate.

IMO it is always an advantage to keep the code that has to do
with correctness short, simple and readable.
```
Oct 22 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim
I think it is related to so-called dependent types?
http://en.wikipedia.org/wiki/Dependent_type

And:

http://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types

And perhaps:

http://en.wikipedia.org/wiki/Liskov_substitution_principle
```
Oct 22 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On a related note, semantic slicing is an interesting concept:

http://en.wikipedia.org/wiki/FermaT_Transformation_System
```
Oct 22 2014
"thedeemon" <dlang thedeemon.com> writes:
```On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim

I  would always assume it has been done plenty of work related
to proving properties about a program since this is the holy
grail of CS!

I think it is related to so-called dependent types?
http://en.wikipedia.org/wiki/Dependent_type

Yes, dependent types allow expressing properties like the ones
you describe. However
a) it's not easy at all even for simple data structures, often
requiring defining many additional types and lemmas,
b) checking them requires turning your compiler into a
proof-checker,
c) what works in "clean room" (like high-level total functional
language) is hardly feasible in a "dirty" language like D where
you can go as unsafe as you want.

I'd like to take a look at Xanadu some time…

That's obsolete, I guess, better take a look at ATS language from
the same author. It's really really close to what you're thinking
of here, minus the rewriting.

And Agda which also guarantee termination!
http://en.wikipedia.org/wiki/Agda_(programming_language)

Yep, all dependently typed languages like Agda, ATS or Idris
require the functions you use in types to be terminating
(otherwise your proofs become unsound and worthless) and they all
include termination checking.

To scare you well, here, for example, is my Smoothsort
implementation in ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the
Leonardo heaps used in the process have proper form and
properties. Writing it took me a few weeks. You don't want to
turn D into this mess. ;)
```
Oct 23 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On Thursday, 23 October 2014 at 09:41:04 UTC, thedeemon wrote:
Yes, dependent types allow expressing properties like the ones
you describe. However
a) it's not easy at all even for simple data structures, often
requiring defining many additional types and lemmas,

Which is why I only suggest language-builtin-propositions on the
library level.

b) checking them requires turning your compiler into a
proof-checker,

I don't propose checking, as in an assert(). I propose to
assume() them and then run regular dataflow over the assumptions
that have been vetted by the library author.

Pretty close to what Walter wanted in the "assert as assume"

c) what works in "clean room" (like high-level total functional
language) is hardly feasible in a "dirty" language like D where
you can go as unsafe as you want.

That is true, which is why you have to do this at the high level,
and assume the worst.

To scare you well, here, for example, is my Smoothsort
implementation in ATS
http://stuff.thedeemon.com/lj/smooth_dats.html

Thanks! I'll look at it later, but note that I am not proposing
something like coq, ats or agda. No theorem prover.

What I am proposing is that libraries can provide assumed facts
about the result, then propagate those facts in the dataflow
until they have all become invalid.

properties. Writing it took me a few weeks. You don't want to
turn D into this mess. ;)

No, I want to turn it into a more pragmatic mess. ;-)
```
Oct 23 2014
"Low Functioning" <nathan.judge gmail.com> writes:
```How about a function returns a T', which is implicitly
convertible to T, where T' has some enum "tags" attached to it.
```
Oct 23 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning
wrote:
How about a function returns a T', which is implicitly
convertible to T, where T' has some enum "tags" attached to it.

Why is implicit conversion a problem? To the compiler it would
just be another function call?
```
Oct 23 2014
"Low Functioning" <nathan.judge gmail.com> writes:
```On Thursday, 23 October 2014 at 15:18:09 UTC, Ola Fosheim Grøstad
wrote:
On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning
wrote:
How about a function returns a T', which is implicitly
convertible to T, where T' has some enum "tags" attached to it.

Why is implicit conversion a problem? To the compiler it would
just be another function call?

Not everything is generic, for one reason or another.

struct notimplicit(T) {
T _x;
enum fubared;
}

struct foo(T) {
T _x;
alias _x this;
enum fubared;
}

unittest {
notimplicit!int a;
//int _a = a; //error

foo!int b;
int _b = b;
}

While it wouldn't matter for a fully generic pipeline, and you'd
lose the fubared tag if you turned it back to the base type, it
might be handy to propagate the fubared type while remaining
compatible with the base.
```
Oct 23 2014
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 10/23/14 2:41 AM, thedeemon wrote:
To scare you well, here, for example, is my Smoothsort implementation in
ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the Leonardo
heaps used in the process have proper form and properties. Writing it
took me a few weeks. You don't want to turn D into this mess. ;)

I recall there was an earlier implementation of a statically-checked
sort, maybe in Agda? It wouldn't typecheck if the output array weren't
sorted.

Andrei
```
Oct 27 2014
```On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
wrote:
On 10/23/14 2:41 AM, thedeemon wrote:
To scare you well, here, for example, is my Smoothsort
implementation in
ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the
Leonardo
heaps used in the process have proper form and properties.
Writing it
took me a few weeks. You don't want to turn D into this mess.
;)

I recall there was an earlier implementation of a
statically-checked sort, maybe in Agda? It wouldn't typecheck
if the output array weren't sorted.

Andrei

Sound like madness :) I'd love to see it.
```
Oct 27 2014
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 10/27/14 6:53 PM, deadalnix wrote:
On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
wrote:
On 10/23/14 2:41 AM, thedeemon wrote:
To scare you well, here, for example, is my Smoothsort implementation in
ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the Leonardo
heaps used in the process have proper form and properties. Writing it
took me a few weeks. You don't want to turn D into this mess. ;)

I recall there was an earlier implementation of a statically-checked
sort, maybe in Agda? It wouldn't typecheck if the output array weren't
sorted.

Andrei

Sound like madness :) I'd love to see it.

Looked again for it, couldn't find it. Got this other one instead:

http://lara.epfl.ch/~psuter/articles/BlancETAL13OverviewLeonVerificationSystem.pdf

Andrei
```
Oct 28 2014
"bearophile" <bearophileHUGS lycos.com> writes:
```Andrei Alexandrescu:

I recall there was an earlier implementation of a
statically-checked sort, maybe in Agda? It wouldn't typecheck
if the output array weren't sorted.

Yes, there is a similar code even in ATS language (that is much
simpler than Agda, you can't verify a generic proof as in Agda),
this is a verified QuickSort-like on lists (I don't remember an
equivalent verified QuickSort on arrays in ATS):
http://dpaste.dzfl.pl/e60eeb30e3b6

Bye,
bearophile
```
Oct 28 2014
"bearophile" <bearophileHUGS lycos.com> writes:
``` http://dpaste.dzfl.pl/e60eeb30e3b6

That code is in ATS1. Now there is ATS2 that has a better syntax,
and is a bit more powerful (and can compile even to JavaScript).
On the ATS site all the ATS1 examples apparently have being
removed.

Bye,
bearophile
```
Oct 28 2014
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 10/28/14 2:26 AM, bearophile wrote:
Andrei Alexandrescu:

I recall there was an earlier implementation of a statically-checked
sort, maybe in Agda? It wouldn't typecheck if the output array weren't
sorted.

Yes, there is a similar code even in ATS language (that is much simpler
than Agda, you can't verify a generic proof as in Agda), this is a
verified QuickSort-like on lists (I don't remember an equivalent
verified QuickSort on arrays in ATS):
http://dpaste.dzfl.pl/e60eeb30e3b6

Looks like the one I saw years ago: a proof that you don't want that
kind of stuff :o). -- Andrei
```
Oct 28 2014
"bearophile" <bearophileHUGS lycos.com> writes:
```Andrei Alexandrescu:

Looks like the one I saw years ago: a proof that you don't want
that kind of stuff :o). -- Andrei

Silly Andrei :-)

Bye,
bearophile
```
Oct 28 2014
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
```On Tuesday, 28 October 2014 at 09:26:35 UTC, bearophile wrote:
Yes, there is a similar code even in ATS language (that is much
simpler than Agda, you can't verify a generic proof as in
Agda), this is a verified QuickSort-like on lists (I don't
remember an equivalent verified QuickSort on arrays in ATS):
http://dpaste.dzfl.pl/e60eeb30e3b6

Quick sort and insertion sort is relatively easy to prove correct
using induction:
1. prove termination
2. prove output is permutation of input
3. prove that the processed units are sorted (left part in
insertion sort)

But this is not what I am suggesting in this thread. This is not
meant for application level code, but for libraries and compiler.

So rather than proving property 2 and 3 it should just be
provided as facts from the library, then the compiler will
propagate that knowledge through the call graph until it has been
deemed uncertain.

For instance if you have asserted that a value exist in an array,
then that knowledge will hold for all permutations of that array
further down the call graph.
```
Oct 29 2014
"thedeemon" <dlang thedeemon.com> writes:
```On Tuesday, 28 October 2014 at 01:10:18 UTC, Andrei Alexandrescu
wrote:
On 10/23/14 2:41 AM, thedeemon wrote:
To scare you well, here, for example, is my Smoothsort
implementation in
ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the
Leonardo
heaps used in the process have proper form and properties.
Writing it
took me a few weeks. You don't want to turn D into this mess.
;)

I recall there was an earlier implementation of a
statically-checked sort, maybe in Agda? It wouldn't typecheck
if the output array weren't sorted.

Andrei

Here's one in Agda:
http://twanvl.nl/blog/agda/sorting
```
Oct 28 2014