www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dependent typing and Refinement Typing for D

reply sighoya <sighoya gmail.com> writes:
Reviving bearophile's thread 
https://forum.dlang.org/thread/gjkzgykpgwqryfxonnsy forum.dlang.org

 https://github.com/tomprimozic/type-systems/tree/master/refined_types

The implementation of a refined type-checker is actually very 
straightforward and turned out to be much simpler than I 
expected. Essentially, program expressions and contracts on 
function parameters and return types are converted into a 
series of mathematical formulas and logical statements, the 
validity of which is then assessed using an external automated 
theorem prover Z3.<
It seems possible to integrate Z3 with D contract programming. Bye, bearophile
I find it that easier if we do it likewise D does it at compile time. With gradual typing, the checks are done at runtime and you didn't need to up propagate the runtime restrictions in the if clause like non gradual dependent typing does it with compile time checking. The idea is that you can apply any List with any Integer to "get(List list,int index) if index < list.length", the compiler inserts a runtime clause for asserting the condition in the if clause and subsequently throw an IndexOutOfBoundException in the false case. If the compiler is clever enough, it can remove some runtime assertions if the enclosing function gives already guarantee for it, but that together with overload resolution might be a bit problematic and requires some sort of SAT solving, or we decide to do it only partial and give errors for negative cases instead. The point is, even in non gradual dependent typing, you have to make some runtime assertion at some point, you never get runtime dependent typing for zero cost. Further, gradual dependent typing can be easily fit to Dlang's existing ecosystem. I would use it purely for argument constraints, not for result constraints as these incur unnecessary runtime checking except in unit tests where it makes sense.
Dec 20 2019
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 20 December 2019 at 22:06:03 UTC, sighoya wrote:
 I find it that easier if we do it likewise D does it at compile 
 time. With gradual typing, the checks are done at runtime and 
 you didn't need to up propagate the runtime restrictions in the 
 if clause like non gradual dependent typing does it with 
 compile time checking.
I understand what you mean, but I don't really think you mean gradual typing. That would be more common in dynamic languages where you may choose to gradually apply more and more constraints. So you can start with a runtime checked prototype and end up with a statically checked final program. Which is kinda cool, but not possible for D.
 If the compiler is clever enough, it can remove some runtime 
 assertions if the enclosing function gives already guarantee 
 for it, but that together with overload resolution might be a 
 bit problematic and requires some sort of SAT solving, or we 
 decide to do it only partial and give errors for negative cases 
 instead.
You need more machinery and annotations than just a SAT solver. I think it is better for D to instead explore what it already does, that is to do more thorough data flow analysis or abstract interpretation. Essentially add flow-typing and refinement typing (of some sort).
 I would use it purely for argument constraints, not for result 
 constraints as these incur unnecessary runtime checking except 
 in unit tests where it makes sense.
Wouldn't the problem be exactly the same? You need to deduce what the arguments are within the constraints, so you need to analyze all paths that led to the value you submit to the function to establish such constraint. Generally speaking, you need heavy inter-procedural analysis?
Dec 20 2019
next sibling parent reply sighoya <sighoya gmail.com> writes:
 I understand what you mean, but I don't really think you mean 
 gradual typing. That would be more common in dynamic  languages 
 where you may choose to gradually apply more and more 
 constraints.
I mentioned exactly gradual typing and D has already dynamic typing regarding generic constraints, though D's duck typing is at compile time, so gradual typing at runtime for runtime values would be the dual to compile duck typing for compile time values at compile time.
So you can start with a runtime checked prototype
 and end up with a statically checked final program.
No, I would only use runtime checking. Even general dependent typing requires at some point runtime assertions aka Path Dependent typing ala Scala.
 Which is kinda cool, but not possible for D.
Why?
 You need more machinery and annotations than just a SAT solver.
Why? You only need SAT solving in cases where you want to up propagate runtime assertion inserted by the compiler or for overload resolution. It don't have to be perfect though and we may can help with some kind of annotation, i.e. manually specifying the overload order.
 I think it is better for D to instead explore what it already 
 does, that is to do more thorough data flow analysis or 
 abstract interpretation. Essentially add flow-typing and 
 refinement typing (of some sort).
This is non gradual dependent typing (general/classic dependent typing). At first, it is very compile time consuming because of tracking, e.g. proofing a list is extended with m additional members requires to track the count of push functions called in scope. At second, it is generally undecidable you need to insert manually proofs for it which you don't need for gradual dependent typing as all runtime variables become constants at runtime.
 Wouldn't the problem be exactly the same? You need to deduce 
 what the arguments are within the constraints, so you need to 
 analyze all paths that led to the value you submit to the 
 function to establish such constraint.
No, return values depending on input values would be evaluated at runtime before the function is called. The point is the following, it seems to add additional runtime checking, but you do it anyway in Lists and Vectors to throw IndexOutOfBoundException or NullPointerException, instead of doing this manually, the compiler can automate these exception handling by inserting runtime asserts at compile time before the function is called. Instead of an NPE or IndexOutOfBound you will get a ConstraintViolationException.
Dec 21 2019
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 21 December 2019 at 12:52:47 UTC, sighoya wrote:
So you can start with a runtime checked prototype
 and end up with a statically checked final program.
No, I would only use runtime checking. Even general dependent typing requires at some point runtime assertions aka Path Dependent typing ala Scala.
I don't know Scala, how does that work?
 You need more machinery and annotations than just a SAT solver.
Why? You only need SAT solving in cases where you want to up propagate runtime assertion inserted by the compiler or for overload resolution. It don't have to be perfect though and we may can help with some kind of annotation, i.e. manually specifying the overload order.
It will be very limited considering you can call functions in assertions. So you need to emit verification conditions to get anywhere, and that takes quite a bit of machinery.
 At first, it is very compile time consuming because of 
 tracking, e.g. proofing a list is extended with m additional 
 members requires to track the count of push functions called in 
 scope.
Verification is time consuming, dataflow/abstract interpretion is more reasonable (but less powerful).
 At second, it is generally undecidable you need to insert 
 manually proofs for it which you don't need for gradual 
 dependent typing as all runtime variables become constants at 
 runtime.
Yes, it will not always succeed. However, you might be able to cache results from previous compiles or make it optional (i.e. just emit dynamic checks where applicable).
 No, return values depending on input values would be evaluated 
 at runtime before the function is called.
After?
 The point is the following, it seems to add additional runtime 
 checking, but you do it anyway in Lists and Vectors to throw 
 IndexOutOfBoundException or NullPointerException, instead of 
 doing this manually, the compiler can automate these exception 
 handling by inserting runtime asserts at compile time before 
 the function is called.
Contracts are supposed to have their asserts evaluated at the calling point, just not implemented in D IIRC.
Dec 21 2019
prev sibling next sibling parent reply sighoya <sighoya gmail.com> writes:
Wouldn't the problem be exactly the same? You need to deduce 
what the arguments are within the constraints, so you need to 
analyze all paths that led to the value you submit to the 
function to establish such constraint.
If you suggest to take values from the function body into the return value expression, then no, we shouldn't allow this. What happens if we have a constant value expression in the function signature return value, then we need to check indeed all possible paths leading to this value, therefore I would only use it for unit tests as unit tests could be executed at compile time then (because of constants inputs). Another option would be to add a runtime assertion before returning a value inside the function block.
Dec 21 2019
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 21 December 2019 at 13:13:26 UTC, sighoya wrote:
Wouldn't the problem be exactly the same? You need to deduce 
what the arguments are within the constraints, so you need to 
analyze all paths that led to the value you submit to the 
function to establish such constraint.
If you suggest to take values from the function body into the return value expression, then no, we shouldn't allow this.
I think it would be easier to follow what the scope of your idea is if we looked at some pseduo code examples?
 What happens if we have a constant value expression in the 
 function signature return value, then we need to check indeed 
 all possible paths leading to this value, therefore I would 
 only use it for unit tests as unit tests could be executed at 
 compile time then (because of constants inputs).
Yes, that is a good point actually. Verification is kinda like unit-tests at compile time. :-)
Dec 21 2019
next sibling parent reply sighoya <sighoya gmail.com> writes:
 I think it would be easier to follow what the scope of your 
 idea is if we looked at some pseduo code examples?
Good, point: ElemType method(Array array) { uint index=readUInt(...); return array(index); } with ElemType array(uint index) if index<array.length becomes: ElemType method(Array array) { uint index=readUInt(...); if(index<array.length) { return array(index); } else { throw new ContradictionError("Violation of Constraint index<array.length where index=$index and array.length=$array.length"); } If we have ElemType method(Array array, uint index) if index<array.length return array(index); } instead, then the compiler could check if above constraint implies the constraint for array(index), but that may be complicated, maybe we add some hint to the compiler that is is trivial to check it: constraintsforMethod ElemType method(Array array, uint index) if index<array.length { return array(index) impliedBy( constraintsForMethod); } such that no ContradictionException can be thrown in method but before each call of "method" at the call site except the method call is implied by the constraint of the enclosing function, then the runtime assertion further up propagates. Overload resolution: method1(uint32 i, uint32 j, uint32 k) if i^2 < j^2 < k^2 method2(int64 i, int64 j, int64 k) if i^2 < j^2 < k^2 May be hard to grok for the compiler and likewise for Z3, a better approach will be to reject it by the compiler and require the user to give an annotation: MoreSpecificThan(method2) method(uint32 i, uint32 j, uint32 k) if i^2 < j^2 < k^2 method2 method(int64 i, int64 j, int64 k) if i^2 < j^2 < k^2 Dispatching of method "method" is done at runtime like follows: If i,j,k is applicable to only one of the methods, then this method is choosen, If i,j,k is to non of these applicable, the throw an ContradictionException If i,j,k is to more than one applicable, choose the most specific by follow the annotations given to them. Unit Test: void method():a if 2<a<10 { testObject=mock(...); return functionToTest(testObject) //will be run at compile time in order to check signature of method }
Verification is time consuming, dataflow/abstract interpretion is
more reasonable (but less powerful). What's the difference? Can you give an example?
Dec 21 2019
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 21 December 2019 at 14:51:39 UTC, sighoya wrote:
 instead, then the compiler could check if above constraint 
 implies the constraint for array(index), but that may be 
 complicated, maybe we add some hint to the compiler that is is 
 trivial to check it:
Ok, I believe I understand, but how is that dynamic check different from the in/out contracts that D has? Well, except that in-contracts should be inlined before the function is called.
 such that no ContradictionException can be thrown in method but 
 before each call of "method" at the call site except the method 
 call is implied by the constraint of the enclosing function, 
 then the runtime assertion further up propagates.
Understood.
 If i,j,k is applicable to only one of the methods, then this 
 method is choosen,
 If i,j,k is to non of these applicable, the throw an 
 ContradictionException
 If i,j,k is to more than one applicable, choose the most 
 specific by follow the annotations given to them.
Do you want to have overloading on dependent/refined types? That would be interesting, but then it has to be resolved at compile time...
Verification is time consuming, dataflow/abstract interpretion 
is more reasonable (but less powerful).
What's the difference? Can you give an example?
Verification tries to prove assertions by transforming the code into something that can be handled by a SMT/SAT solver. It is often done by overapproximation (gross oversimplificaton: putting a line between two points instead of calculating the exact position of points) Flow analysis/abstract interpretation tries to collect information over the various paths in the code, usually using very specific measures that are dealt with in an approximate manner. Like keeping track of variables that are negative/positive/unknown or value ranges. Can move both backwards and forwards.
Dec 21 2019
parent reply sighoya <sighoya gmail.com> writes:
 Ok, I believe I understand, but how is that dynamic check 
 different from the in/out contracts that D has? Well, except 
 that in-contracts should be inlined before the function is 
 called.
Additional to contract overloading, we provide overloading over these conditions and optional up propagation which both require either a SAT/SMT/Flow Typing or manual annotations.
 Do you want to have overloading on dependent/refined types?
Absolutely.
 That would be interesting, but then it has to be resolved at 
 compile time...
Yes, as I said together with constraint up propagation it requires SAT/SMT/Flow Typing or manual annotations, I prefer the latter for simplicity, it doesn't have to be perfect, even the upcoming overload resolution of C++ 20 isn't perfect.
 Verification tries to prove assertions by transforming the code 
 into something that can be handled by a SMT/SAT solver. It is 
 often done by overapproximation (gross oversimplificaton: 
 putting a line between two points instead of calculating the 
 exact position of points)
Hmm..., I have to inform myself again, I thought it is equivalent to solving a simplex algorithm (in linear case) but it doesn't seem to have an objective.
 Flow analysis/abstract interpretation tries to collect 
 information over the various paths in the code, usually using 
 very specific measures that are dealt with in an approximate 
 manner. Like keeping track of variables that are 
 negative/positive/unknown or value ranges. Can move both 
 backwards and forwards.
Understand, but what is for more general cases like if we know x+y<=z for the if branch in function f and we call a function g in this branch requiring x^2+y^2=z^2, I think in this case we require an SAT/SMT. However in simple cases, we can just fallback to simple static refinement types like: enum {1,2,3} x=... enum {"hello","world","!"} x=... enum(n,m) {"repetition{$n,$m}"} x=... enum Range(n,k,m) {n..k..m} x=... It would be nice if D would support these kind of things. As long as we constraint one variable, solving compatibility becomes (mostly) easy. I wasn't aware of dlang contracts, so we may should extend their semantic for overload resolution and constraint up propagation in a backward compatible manner, I think this is possible.
Dec 22 2019
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 22 December 2019 at 13:25:31 UTC, sighoya wrote:
 Yes, as I said together with constraint up propagation it 
 requires SAT/SMT/Flow Typing or manual annotations, I prefer 
 the latter for simplicity, it doesn't have to be perfect, even 
 the upcoming overload resolution of C++ 20 isn't perfect.
Hm, C++ has only added deduction guides, nothing to do with dependent types? Isn't that mostly a convenience feature?
 Hmm..., I have to inform myself again, I thought it is 
 equivalent to solving a simplex algorithm (in linear case) but 
 it doesn't seem to have an objective.
In order to speed up SMT solvers use over-approximation, so that they don't have "trace the full behaviour of a function", but can rather deduce that a function is always below a line (or something else). It is an optimization. So they model types/sorts like integers. So how you formulate the problem... can effect performance and indeed if the problem is solved at all. That makes automation a bit tricky.
 Understand, but what is for more general cases like if we know 
 x+y<=z for the if branch in function f and we call a function g 
 in this branch requiring x^2+y^2=z^2, I think in this case we 
 require an SAT/SMT.
Dataflow is usually not precise, so if you want to precise results to overload on then you need SAT/SMT.
Dec 22 2019
prev sibling parent sighoya <sighoya gmail.com> writes:
Yes, it will not always succeed. However,  you might be able to
cache results from previous compiles or make it optional (i.e. just emit dynamic checks where applicable). Is an option, but release builds will be slow, runtime checking isn't good in this case, as you would proof argument to return value relationships each time a function is called. I know that general dependent typing is from an idiomatic view point right, but it has the following drawbacks: 1.) Function signatures become much more complicated with respect to gradual dependent typed function signatures in that all return values are related to their input arguments 2.) You can't characterize all properties of a function/type in the signature, you don't know them immediately and the compiler can't infer them for you, adding some properties later might break your code. 3.) Signature become more sensible to changes (unstable), changing them causes a cascade of errors following the stack upwards. 4.) It doesn't fit to D, as D is already lazy (and unsafe) with compile time duck typing, e.g. you can create generic entities which can never be valid for any type parameter. Further type errors are thrown at specialization time deep in the call stack (unice), doing it differently for runtime values is an idomatic break to how it is handled by D. Theoretically, general dependent typing could replace all kinds of unit testing, but to do so, you need to write the whole function block as the return value in the function signature which is unreadable and undecidable for the compiler to infer all kinds of properties. It would further break the idiom of encapsulation which is important for any kind of commercialism. So practically, you still need to unit test.
Dec 21 2019
prev sibling parent sighoya <sighoya gmail.com> writes:
Generally speaking, for enterprises, gradual dependent typing is 
fine, because if they don't want to get accustomed to it, they 
can simply ignore to up propagate additional constraints upwards 
to the stack whereas for general dependent typing, they are 
forced to do so, giving them a second Rust.
Dec 21 2019
prev sibling parent reply bpr <brogoff gmail.com> writes:
On Friday, 20 December 2019 at 22:06:03 UTC, sighoya wrote:
.
.
.

I realize this is a D forum, but it seems highly unlikely to me 
that you'll be able to get anywhere trying to add this stuff to 
D. If you don't know, there is a research language called F* or 
Fstar that integrates dependent types, refinement types, the Z3 
prover, and more. It has a low level subset called Low* and a 
backend which can emit verified C, and has been used to implement 
verified cryptographic libraries.
Dec 21 2019
parent sighoya <sighoya gmail.com> writes:
On Sunday, 22 December 2019 at 00:52:56 UTC, bpr wrote:
 On Friday, 20 December 2019 at 22:06:03 UTC, sighoya wrote:
 .
 .
 .

 I realize this is a D forum, but it seems highly unlikely to me 
 that you'll be able to get anywhere trying to add this stuff to 
 D.
The point is that D already offers a part of a kind of dependent typing I'm searching for, we just need to extend it in some ways to make it more usable. Further some kinds of refinement types like: enum Range(n,k,m) {n..k..m} would be nice. D already offers some limited kind of theses things like static arrays which can be seen as a static refinement of dynamic arrays. If you don't know, there is a research language called F* or
 Fstar that integrates dependent types, refinement types, the Z3 
 prover, and more.
Yes, it has general (non gradual) dependent typing which is I think too much for a pragmatic language like D. And really, who knows F* it will stay as a research language only. Idris and Agda are candidates as well but didn't support refinement types directly, instead they have to define them over some weird Successor machinery. I think F*, Idris and Agda are too alienated because of their ML like syntax and their dogmatism about functional programming.
It has a low level subset called Low* and a
 backend which can emit verified C, and has been used to 
 implement verified cryptographic libraries.
Interesting, I will may look onto it.
Dec 22 2019