digitalmars.D - DIP1000
- Andrei Alexandrescu (17/17) Aug 30 2016 I'd like to initiate collaboration on an effort to do DIP1000 rigorously...
- Lodovico Giaretta (11/15) Aug 30 2016 If I may suggest, a repository with some LaTeX code may be a good
- Seb (9/22) Aug 30 2016 Okay this is a bit unrelated to the original question, but it's
- Andrei Alexandrescu (3/23) Aug 30 2016 Isn't it amazing how fast we got to a debate on choosing tools... :o) --...
- Seb (15/53) Aug 30 2016 Sorry, please ignore, but it would still be nice to put it on
- Timon Gehr (5/17) Aug 30 2016 No, it should be ArgumentList, which is a comma-separated list of
- Andrei Alexandrescu (5/18) Aug 30 2016 Good idea. In fact I'm using a .dd file right now, which can be purposed...
- Timon Gehr (15/35) Aug 30 2016 - Missing a production rule for ArgumentList.
- Andrei Alexandrescu (7/30) Aug 30 2016 Yah. There are a few papers on modeling heaps, it's not difficult.
- Kagamin (5/6) Aug 31 2016 Return values can't have `scope` annotation? If the container has
- Andrei Alexandrescu (3/7) Sep 12 2016 I don't think scope returns are supported, but I'll think of it. Thanks....
- Stefan Koch (4/8) Sep 12 2016 That link leads to to a grammar.
- ZombineDev (20/41) Aug 31 2016 BTW, the Rust Mid-level IR (MIR) has a similar purpose: it lowers
- Andrei Alexandrescu (51/51) Sep 14 2016 Some progress with the DIP1000 semantics. So I made a few changes to the...
- Timon Gehr (41/99) Sep 14 2016 I would support an arbitrary number of parameters. Leaving that out does...
- Kagamin (11/11) Sep 15 2016 Well, aliasing can be reproduced with locals
- Andrei Alexandrescu (11/22) Sep 15 2016 Nit: in MiniD1000 you'd need to declare vars first, assign them second:
- Timon Gehr (6/14) Sep 15 2016 This fails when considering scope and return annotations though.
- Andrei Alexandrescu (2/4) Sep 15 2016 I was planning to allow scope annotations for struct fields. -- Andrei
- Timon Gehr (2/6) Sep 15 2016 I see. In that case, single-parameter functions only is probably okay.
I'd like to initiate collaboration on an effort to do DIP1000 rigorously. First we need to reduce D to a bare subset that only has integers, structs, pointers, and functions. That's a working subset of actual D code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html. There is no type deduction, member functions, classes, arrays, constructors, loops, etc. etc. etc. Only the ability to create arbitrarily complex graphs containing integers and pointers to other nodes. On this language we need to define typing rules and evaluation semantics. Once we have those, we need to prove what we want: for scope variables the accessibility never outlives lifetime. As a consequence we're good to deallocate them early etc. The model for typing, evaluation, and proofs is at https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf. It would be great if those interested in helping could give the paper a close read. Once we get this done we'll have a fantastic model for any other language changes. Andrei
Aug 30 2016
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:I'd like to initiate collaboration on an effort to do DIP1000 rigorously. [...]If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l'you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf. By the way, do you want to formalize with the aforementioned notation the D subset? If that's the case, I'd really like to help (even if I don't have much time).(I mean, diagrams like the ones at page 9 of the paper). Then
Aug 30 2016
On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta wrote:On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)I'd like to initiate collaboration on an effort to do DIP1000 rigorously. [...]If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Aug 30 2016
On 08/30/2016 12:39 PM, Seb wrote:On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta wrote:Isn't it amazing how fast we got to a debate on choosing tools... :o) -- AndreiOn Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)I'd like to initiate collaboration on an effort to do DIP1000 rigorously. [...]If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Aug 30 2016
On Tuesday, 30 August 2016 at 17:46:41 UTC, Andrei Alexandrescu wrote:On 08/30/2016 12:39 PM, Seb wrote:Sorry, please ignore, but it would still be nice to put it on Github, so that we can make pull requests. A couple of nits from a first pass: - ArgumentList is not defined -> ParameterList - The parameters is limited to 0,1,2 - on purpose? (opt -> * would allow arbitrary numbers of parameters) - In contrast to FeatherweightJava the grammar allows a lot of weird stuff like: ref int** foo { 42 = 42; return null; *42 = &42; }On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta wrote:Isn't it amazing how fast we got to a debate on choosing tools... :o) -- AndreiOn Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)I'd like to initiate collaboration on an effort to do DIP1000 rigorously. [...]If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Aug 30 2016
On 30.08.2016 21:27, Seb wrote:Sorry, please ignore, but it would still be nice to put it on Github, so that we can make pull requests. A couple of nits from a first pass: - ArgumentList is not defined -> ParameterListNo, it should be ArgumentList, which is a comma-separated list of expressions.- The parameters is limited to 0,1,2 - on purpose? (opt -> * would allow arbitrary numbers of parameters) - In contrast to FeatherweightJava the grammar allows a lot of weird stuff like: ref int** foo {Parentheses are required by the grammar, no?42 = 42; return null; *42 = &42; }Those would be ruled out by type checking.
Aug 30 2016
On 08/30/2016 12:27 PM, Lodovico Giaretta wrote:On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:Good idea. In fact I'm using a .dd file right now, which can be purposed to generate latex :o).I'd like to initiate collaboration on an effort to do DIP1000 rigorously. [...]If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.By the way, do you want to formalize with the aforementioned notation the D subset? If that's the case, I'd really like to help (even if I don't have much time).Correct. Awesome! Andrei
Aug 30 2016
On 30.08.2016 18:12, Andrei Alexandrescu wrote:I'd like to initiate collaboration on an effort to do DIP1000 rigorously. First we need to reduce D to a bare subset that only has integers, structs, pointers, and functions. That's a working subset of actual D code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html. ...- Missing a production rule for ArgumentList. - There is no ParameterList of length three or more. - 'return' annotations missing.There is no type deduction, member functions, classes, arrays, constructors, loops, etc. etc. etc. Only the ability to create arbitrarily complex graphs containing integers and pointers to other nodes. On this language we need to define typing rules and evaluation semantics. Once we have those, we need to prove what we want: for scope variables the accessibility never outlives lifetime. As a consequence we're good to deallocate them early etc. ...I'd just actually deallocate the memory slots when their lifetime ends and then prove type safety. (This implies in particular that the accessed addressed are still allocated for all dereferencing expressions.)The model for typing, evaluation, and proofs is at https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf.Typing and proofs yes, to some extent, evaluation not really. This paper does not allow mutation and does not model memory/aliasing. One possibility would be small-step semantics with an explicit heap. (The heap would be infinite and never reuse memory slots between allocations. This way, memory slots can be reliably marked as already deallocated. The stack would not need to be modeled separately, just allocate non-ref parameters and locals when entering the function and deallocate them when the function exits.)It would be great if those interested in helping could give the paper a close read. Once we get this done we'll have a fantastic model for any other language changes. Andrei
Aug 30 2016
On 8/30/16 3:42 PM, Timon Gehr wrote:On 30.08.2016 18:12, Andrei Alexandrescu wrote:Cool, thx.I'd like to initiate collaboration on an effort to do DIP1000 rigorously. First we need to reduce D to a bare subset that only has integers, structs, pointers, and functions. That's a working subset of actual D code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html. ...- Missing a production rule for ArgumentList. - There is no ParameterList of length three or more. - 'return' annotations missing.I'd just actually deallocate the memory slots when their lifetime ends and then prove type safety. (This implies in particular that the accessed addressed are still allocated for all dereferencing expressions.)Yah. There are a few papers on modeling heaps, it's not difficult.The model for typing, evaluation, and proofs is at https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf.Typing and proofs yes, to some extent, evaluation not really. This paper does not allow mutation and does not model memory/aliasing. One possibility would be small-step semantics with an explicit heap. (The heap would be infinite and never reuse memory slots between allocations. This way, memory slots can be reliably marked as already deallocated.The stack would not need to be modeled separately, just allocate non-ref parameters and locals when entering the function and deallocate them when the function exits.)Ah, neat. Didn't think of that. Thanks. I'm embarking on a trip right now, but will get to this as soon as I can. Andrei
Aug 30 2016
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:http://erdani.com/d/DIP1000.htmlReturn values can't have `scope` annotation? If the container has trusted opAssign, use after free in this case is not accounted for?
Aug 31 2016
On 08/31/2016 08:05 AM, Kagamin wrote:On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:I don't think scope returns are supported, but I'll think of it. Thanks. -- Andreihttp://erdani.com/d/DIP1000.htmlReturn values can't have `scope` annotation? If the container has trusted opAssign, use after free in this case is not accounted for?
Sep 12 2016
On Monday, 12 September 2016 at 14:00:53 UTC, Andrei Alexandrescu wrote:On 08/31/2016 08:05 AM, Kagamin wrote:That link leads to to a grammar. Isn't DIP1000 the scope and safe DIP ?On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:http://erdani.com/d/DIP1000.html
Sep 12 2016
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:I'd like to initiate collaboration on an effort to do DIP1000 rigorously. First we need to reduce D to a bare subset that only has integers, structs, pointers, and functions. That's a working subset of actual D code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html. There is no type deduction, member functions, classes, arrays, constructors, loops, etc. etc. etc. Only the ability to create arbitrarily complex graphs containing integers and pointers to other nodes. On this language we need to define typing rules and evaluation semantics. Once we have those, we need to prove what we want: for scope variables the accessibility never outlives lifetime. As a consequence we're good to deallocate them early etc. The model for typing, evaluation, and proofs is at https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf. It would be great if those interested in helping could give the paper a close read. Once we get this done we'll have a fantastic model for any other language changes. AndreiBTW, the Rust Mid-level IR (MIR) has a similar purpose: it lowers the complex AST to a sufficiently simpler one so that one can more easily do things like lifetime analysis / borrow checking on it. Similarly, Swift have a similar step in their compiler pipeline for ARC. I don't know if such intermediate representation in the DMD FE is worth pursuing (obviously would be big development effort), but it may be worth having a look to see how such IR analysis is done in practice (on actual non-toy, non-purely academic languages), even if only for checking that your theoretical model captures sufficient information. Here are some links, which I hope you would find helpful: https://blog.rust-lang.org/2016/04/19/MIR.html (high-level non-technical introduction) https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md (Rust RFC ~ equivalent of DIP) https://news.ycombinator.com/item?id=10487502 (Presentation slides about Swift's compiler pipeline)
Aug 31 2016
Some progress with the DIP1000 semantics. So I made a few changes to the grammar of the mini-language (call it MiniD1000), which is at http://erdani.com/d/DIP1000-grammar.html. Notable aspects: * Programs are a sequence of function definitions, struct definitions, and global variable definitions. A pass is supposed to go through all and create the initial typing environment. * Functions now only have one parameter. This is to reduce the size of the language (two parameters increase the numbers of typing and evaluation rules). Aliasing can be studied with globals. I'm unclear whether we need the second parameter for investigating things like cross-parameter aliasing; if so, I'll add it back. * All functions are required to end with a return statement. This avoids any need for flow control. * There are no rvalues in MinbiD1000, which I thought is quite clever. That means `null` and integral literals are not expressions; they are explicitly present in assignment statements. This simplifies MiniD1000 a great deal, but again will require us to be careful when we "port" whatever conclusions we draw back to D. * The `if` statement has no `else` clause. It only increases the size of the language without an increase in expressiveness. Next, I defined a baseline set of typing rules at http://erdani.com/d/DIP1000-typing-baseline.html for "bad" MiniD1000: all ref, scope and return attributes are ignored, so there's no special protection against bad escapes. Based on these rules we'll later define better rules that enforce safety. A few notes: * For a general understanding on how to read these rules, see the Featherweight Java (FJ) paper or see e.g. http://www.hedonisticlearning.com/posts/understanding-typing-judgments.html. * Like in FJ, sequences of zero or more things are shown with an overbar. * In order to identify the current function being compiled, I introduce the symbol f_{crt}. For example, in rule S-VarDeclaration, a prerequisite of the judgment is that f_{crt} exists, i.e. the rule S-VarDeclaration only applies inside a function, not outside. For the outside rule see D-GlobalVarDeclaration. * S-VarDeclaration introduces a new name visible to the statements that follow. This is done by typing not only the variable declaration, but also all following statements in its scope. The typing environment for the statements is the union of the preexisting typing environment and the new variable. * In fact the typing rules of all statements carry the appendix "and here are the following statements in the scope" which is workable but possibly a bit goofy. Any cleverer idea? * For that matter I need the rule S-NoStmts: "a sequence of zero statements types just fine". * In D-Struct, I vaguely state that a struct shall not be a transitive field of itself. Is there a better way to enforce that succintly? Please let me know of anything I missed or got wrong. Once the baseline typing is in place, the hard part starts: defining the enhanced typing that enforces safe scoping rules. Thanks, Andrei
Sep 14 2016
On 14.09.2016 21:49, Andrei Alexandrescu wrote:Some progress with the DIP1000 semantics. So I made a few changes to the grammar of the mini-language (call it MiniD1000), which is at http://erdani.com/d/DIP1000-grammar.html. Notable aspects: * Programs are a sequence of function definitions, struct definitions, and global variable definitions. A pass is supposed to go through all and create the initial typing environment. ...There probably should be typing rules for that.* Functions now only have one parameter. This is to reduce the size of the language (two parameters increase the numbers of typing and evaluation rules). Aliasing can be studied with globals. I'm unclear whether we need the second parameter for investigating things like cross-parameter aliasing; if so, I'll add it back. ...I would support an arbitrary number of parameters. Leaving that out does not buy much in terms of simplicity, and it makes miniD quite a lot less relevant for arguing about the soundness of the full language.* All functions are required to end with a return statement. This avoids any need for flow control. ...How so? You still allow return statements at arbitrary places in the function.* There are no rvalues in MinbiD1000,Yes there are. Function calls might be rvalues because 'ref' is optional on them.which I thought is quite clever. That means `null` and integral literals are not expressions; they are explicitly present in assignment statements. This simplifies MiniD1000 a great deal, but again will require us to be careful when we "port" whatever conclusions we draw back to D. ...That should be fine. I think a better way is to special-case only the lhs though. I.e. x=e, *e₁=e₂, f(e₁,…,eₙ)=eₙ₊₁.* The `if` statement has no `else` clause. It only increases the size of the language without an increase in expressiveness. Next, I defined a baseline set of typing rules at http://erdani.com/d/DIP1000-typing-baseline.html for "bad" MiniD1000: all ref, scope and return attributes are ignored, so there's no special protection against bad escapes. Based on these rules we'll later define better rules that enforce safety. A few notes:Looking good, except: - struct types are not checked for existence. E.g. the following program type checks: void main(){ Undefined x; Undefined* y=&x; } - 'fields' is not defined. (And how is it supposed to know what the fields of some syntactic type are?) Probably struct declarations should be part of the environment, then 'fields' can take the environment as an additional argument. I think it would make sense if your grammar enforced that all struct declarations come before all global variables which come before all function declarations. - Functions cannot call each other, because they cannot look up each other's names.* For a general understanding on how to read these rules, see the Featherweight Java (FJ) paper or see e.g. http://www.hedonisticlearning.com/posts/understanding-typing-judgments.html. * Like in FJ, sequences of zero or more things are shown with an overbar. * In order to identify the current function being compiled, I introduce the symbol f_{crt}. For example, in rule S-VarDeclaration, a prerequisite of the judgment is that f_{crt} exists, i.e. the rule S-VarDeclaration only applies inside a function, not outside. For the outside rule see D-GlobalVarDeclaration. * S-VarDeclaration introduces a new name visible to the statements that follow. This is done by typing not only the variable declaration, but also all following statements in its scope. The typing environment for the statements is the union of the preexisting typing environment and the new variable. * In fact the typing rules of all statements carry the appendix "and here are the following statements in the scope" which is workable but possibly a bit goofy. Any cleverer idea? ...I think it is fine.* For that matter I need the rule S-NoStmts: "a sequence of zero statements types just fine". * In D-Struct, I vaguely state that a struct shall not be a transitive field of itself.Well, the notation does not really make a lot of sense.Is there a better way to enforce that succintly? ...Typing rules for structs need to state that all struct declarations together are fine if each struct is fine individually given all other structs present in the environment. Then define something like transitiveFieldTypes(Γ,S) := μX. {T | ∃x. T x; ∈ (fields(Γ,S) ∪ ⋃{ fields(Γ,T') | T'∈X }) } (In English: transitiveFieldTypes(Γ,S) is the smallest set such that all field types of S and field types of other types in the set are also in the set.) You can then require that S∉transitiveFieldTypes(Γ,S)Please let me know of anything I missed or got wrong. Once the baseline typing is in place, the hard part starts: defining the enhanced typing that enforces safe scoping rules. ...Maybe we should define the semantics first (it is easier and it is necessary for providing formal counter-examples for the soundness of proposed typing rules).
Sep 14 2016
Well, aliasing can be reproduced with locals S s; int* r = getPayload(s); freePayload(s); int v = *r; //UAF Multiparameter functions can be declared to be equivalent to struct P { S* s; int* r; } P p; p.s = &s; p.r = getPayload(s); f(p); //as if f(S*,int*)
Sep 15 2016
On 09/15/2016 04:44 AM, Kagamin wrote:Well, aliasing can be reproduced with locals S s; int* r = getPayload(s); freePayload(s); int v = *r; //UAFNit: in MiniD1000 you'd need to declare vars first, assign them second: S s; int* r; r = getPayload(s); freePayload(s); int v; v = *r; (and there are no comments :o))Multiparameter functions can be declared to be equivalent to struct P { S* s; int* r; } P p; p.s = &s; p.r = getPayload(s); f(p); //as if f(S*,int*)Thanks for making this point. Andrei
Sep 15 2016
On 15.09.2016 14:57, Andrei Alexandrescu wrote:This fails when considering scope and return annotations though. Parameters can be annotated or not annotated with scope/return scope independently, but the struct can either be annotated or not annotated, the fields cannot be treated independently (which is one of the weaknesses of DIP1000).Multiparameter functions can be declared to be equivalent to struct P { S* s; int* r; } P p; p.s = &s; p.r = getPayload(s); f(p); //as if f(S*,int*)Thanks for making this point.
Sep 15 2016
On 9/15/16 5:14 PM, Timon Gehr wrote:struct can either be annotated or not annotated, the fields cannot be treated independently (which is one of the weaknesses of DIP1000).I was planning to allow scope annotations for struct fields. -- Andrei
Sep 15 2016
On 16.09.2016 01:07, Andrei Alexandrescu wrote:On 9/15/16 5:14 PM, Timon Gehr wrote:I see. In that case, single-parameter functions only is probably okay.struct can either be annotated or not annotated, the fields cannot be treated independently (which is one of the weaknesses of DIP1000).I was planning to allow scope annotations for struct fields. -- Andrei
Sep 15 2016