digitalmars.D.bugs - [Issue 8185] New: Pure functions and pointers
- d-bugmail puremagic.com (44/44) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (18/18) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (7/7) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (31/39) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (31/47) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (22/26) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (52/71) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (53/77) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (13/14) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (8/8) Jun 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (30/35) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (13/17) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (20/23) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (12/36) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (25/60) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (62/62) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (32/44) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (21/21) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (20/22) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (21/21) Jun 03 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (88/89) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (55/55) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (31/92) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (40/81) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (8/12) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (41/65) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (26/26) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (11/16) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (6/6) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (39/113) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (7/8) Jun 04 2012 pure doesn't restrict pointers in any way shape or form. That's an
- d-bugmail puremagic.com (45/47) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (15/50) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (12/16) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (12/24) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (23/23) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (18/27) Jun 04 2012 Your g(b) causes h to be impure, because it accesses tmp, which is __gsh...
- d-bugmail puremagic.com (9/10) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (65/150) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (7/8) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (31/56) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (15/19) Jun 04 2012 You're casting a size_t to a pointer. That's breaking the type system. T...
- d-bugmail puremagic.com (18/35) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (32/41) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (27/31) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (10/11) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (16/23) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (8/12) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (23/40) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (20/20) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (9/11) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (34/73) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (12/15) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (23/30) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (8/15) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (14/29) Jun 04 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (19/19) Jul 01 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
- d-bugmail puremagic.com (10/10) Jul 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8185
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Summary: Pure functions and pointers Product: D Version: D2 Platform: All OS/Version: All Status: NEW Keywords: spec Severity: major Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: verylonglogin.reg gmail.com 12:10:50 MSD --- Look's like there is a big problem with pure functions and pointers. Consider these functions: --- int* f1(in int* i) pure; int** f2(in int** i) pure; void* g1(in void* p) pure; void** g2(in void** p) pure; struct MyArray { int* p; size_t len; } void** h(in MyArray arg) pure; --- The Question: What exactly does these pure functions consider as `argument value` and as `returned value`? Looks like this is neither documented nor obvious. I see the only two ways to document it properly (yes, the main problem is with `h` function): * disallow pure functions to accept pointers or types with pointers; * once pure function accepts a pointer it is considered depending on all process memory; * state with BIG RED LETTERS that pure function depends on the address only and restrict dereferencing of the pointer on a compiler level. The second way obviously just means the function isn't pure any more. The third way means the pointer isn't a pointer any more so I'd prefer to replace is with "The first way" + "f(cast(size_t) ptr)". More than that, the situation is very dangerous now. E.g. one can consider `strlen` to be pure. It should be clearly stated that purity is compiler checkable, not user checkable with examples like `strlen`. See discussion in Issue 3057. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 klickverbot <code klickverbot.at> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |code klickverbot.at Severity|major |enhancement --- The current behavior is by design, and perfectly fine – note that `pure` in D just means that a function doesn't access global (mutable) state. A pointer somewhere isn't a problem either, since the caller must have obtained the address from somewhere, and if it was indeed from global state, the calling code couldn't be pure. Do you have any suggestions on how to make this clearer in the spec? I admit that the design can take some time to wrap one's head around, but I'm not sure what's the best way to make the concept easier to grasp. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 --- Also, please note that issue 3057 is really old – I think at that point we didn't even have the relaxed purity rules yet. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 14:29:01 MSD ---The current behavior is by design, and perfectly fine – note that `pure` in D just means that a function doesn't access global (mutable) state. A pointer somewhere isn't a problem either, since the caller must have obtained the address from somewhere, and if it was indeed from global state, the calling code couldn't be pure.OK. Looks like everything works but I don't understand how. So could you please answer the question (read this to the end). According to http://dlang.org/function.html#pure-functionsPure functions are functions that produce the same result for the same arguments.And my original question isThe Question: What exactly does these pure functions consider as `argumentvalue` and as `returned value`? Illustration: --- int f(in int* p) pure; void g() { auto arr = new int[5]; auto res = f(arr.ptr); assert(res == f(arr.ptr)); assert(res == f(arr.ptr + 1)); // *p isn't changed arr[1] = 7; assert(res == f(arr.ptr)); // neither p nor *p is changed arr[0] = 7; assert(res == f(arr.ptr)); // p isn't changed } --- Which asserts must pass? The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/ (yes, it's "Indirections in the Return Type?" section, but sentences looks general and I think it can be treated this way):The first essential point are addresses, respectively the definition of equality applied when considering referential transparency. In functional languages, the actual memory address that some value resides at is usually of little to no importance. D being a system programming language, however, exposes this concept.-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 klickverbot <code klickverbot.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|enhancement |normal ---And my original question isThanks for the example, this certainly makes your concerns easier to see. You are right, the spec is really not clear in this regard – but in my opinion, only a single interpretation makes sense, in that it is actually enforceable by the compiler: ---The Question: What exactly does these pure functions consider as `argumentvalue` and as `returned value`? Illustration: --- int f(in int* p) pure;auto res = f(arr.ptr); assert(res == f(arr.ptr));This one obviously has to pass.assert(res == f(arr.ptr + 1)); // *p isn't changedMight fail, f is allowed to return cast(int)p.arr[1] = 7; assert(res == f(arr.ptr)); // neither p nor *p is changedMust pass, reading/modifying random bits of memory inside pure functions is obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in safe code anyway, and in system code, you as the programmer are responsible for not violating the type system guarantees – for example, you can just call any impure function in a pure context using a cast. This also means that e.g. C string functions cannot not be pure in D.arr[0] = 7; assert(res == f(arr.ptr)); // p isn't changedMight fail, as discussed in the »What about Referential Transparency« section of the article – only if the parameters are _transitively_ equal (as defined by their type), then pure functions are guaranteed to return the same value.The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/.Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision. --- Do you disagree with any of these points? If so, I'd be happy to provide a more in-depth explanation of my view, so we can clarify the spec afterwards. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 art.08.09 gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |art.08.09 gmail.comI see the only two ways to document it properly (yes, the main problem is with `h` function):* once pure function accepts a pointer it is considered depending on all process memory;That would work, but would probably be too limiting. * Allow only dereferencing the pointer, disallow any kind of indexing. Note it's not trivial, as pointer arithmetic should still work. But probably doable, by disallowing dereferencing at all, and making a special exception for accessing via an unmodified argument. This would also have to work recursively, so it basically comes down to introducing a special kind of pointer, that behaves a bit more like a reference. The alternatives are the ones you listed, either banning pointers or assuming the function depends on everything - neither is really acceptable. A pure function shouldn't deal with unbounded arrays, so this kind of restriction should be fine (the alternative is to have to slice everything, which is not a sane solution, eg when working with pointers to structs) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 19:59:12 MSD ---Am I understanding correct that: --- int[] f() pure; int g(in int[] a) pure; int gs(in int[] a) safe pure; void h() { assert(g(f()) == g(f())); // May or may not pass assert(gs(f()) == gs(f())); // Should pass } --- ?assert(res == f(arr.ptr + 1)); // *p isn't changedMight fail, f is allowed to return cast(int)p.So this code is invalid: --- void f(int* i) pure safe // or unsafe, doesn't matter { ++i[1]; } --- and this is invalid too: --- struct MyArray { int* p; size_t len; ... int opIndex(size_t i) pure safe // or unsafe, doesn't matter in { assert(i < len); } body { return p[len]; } } --- ? And this is valid: --- void f(int* i) pure safe // or unsafe, doesn't matter { ++*i; } --- ?arr[1] = 7; assert(res == f(arr.ptr)); // neither p nor *p is changedMust pass,...reading/modifying random bits of memory inside pure functions is obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in safe code anyway, and in system code, you as the programmer are responsible for not violating the type system guarantees – for example, you can just call any impure function in a pure context using a cast. This also means that e.g. C string functions cannot not be pure in D.I'm a bit confused because I didn't mention safe attribute. If you have a time I'd like to see about safe/unsafe pure functions differences in your article because it looks like these things are really different.Not sure, my English is rather bad so I could just misunderstand something.The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/.Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision.Do you disagree with any of these points? If so, I'd be happy to provide a more in-depth explanation of my view, so we can clarify the spec afterwards.`void f(void*) pure;` is still unclear for me. What can it do? What can it do if it's safe? And I completely misunderstand why pure functions can't be optimized out as Steven Schveighoffer sad in druntime pull 198 comment:The fact that it returns mutable makes it weak pure (the optimizer cannot remove any calls to gc_malloc)(yes, this is a general question, not pointers only) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Steven Schveighoffer <schveiguy yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |schveiguy yahoo.com 17:48:23 PDT ---According to http://dlang.org/function.html#pure-functionsThis is certainly true. However, it's not practical nor always possible for the compiler to determine if a call can be optimized out. Consider that on any call to a pure function that takes mutable data, the function could modify the data, so even calling with the same exact pointer again may result in a new effective parameter. However, if a function has only immutable or implicitly convertible to immutable parameters and return values, the function *can* be optimized out, because it's guaranteed nothing ever changes. This situation is what has been called "strong pure". It's the equivalent to functional language purity. It's possible in certain situations for a "weak pure" function to be considered strong pure. For example, consider a function which takes a const parameter, and returns a const. Pass an immutable into it, and nothing could possibly have changed before the next call, it can be optimized out. The compiler does not take advantage of these yet.Pure functions are functions that produce the same result for the same arguments.And my original question isargument value is all the data reachable via the parameters. Argument result is all the data reachable via the result. For pointers, you are under the same rules as normal functions -- safe functions cannot use pointers, unsafe ones can. If an unsafe pure function is called, a certain degree of freedom to screw up is available, just like any other unsafe function.The Question: What exactly does these pure functions consider as `argumentvalue` and as `returned value`?int f(in int* p) pure; void g() { auto arr = new int[5]; auto res = f(arr.ptr); assert(res == f(arr.ptr));obviously this passes, all the parameters are identical, and nothing could have changed between the two calls. The call will not currently be optimized out, because the compiler isn't smart enough yet.assert(res == f(arr.ptr + 1)); // *p isn't changedmay or may not pass, parameter is different.arr[1] = 7; assert(res == f(arr.ptr)); // neither p nor *p is changedmay or may not pass. f is not safe, so it could possibly access arr[1].arr[0] = 7; assert(res == f(arr.ptr)); // p isn't changedmay or may not pass, the parameter is different.And I completely misunderstand why pure functions can't be optimized out as Steven Schveighoffer sad in druntime pull 198 comment:I hope I have helped to further your understanding with this post. Don just looked up the original thread which outlined the weak-pure proposal, which was submitted to digitalmars.D on August 2010. You may want to read that entire thread. In general response to this bug, I'm unsure how pointers should be treated by the optimizer. My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer. Consider an interesting case, using BSD sockets: int f(immutable sockaddr *addr) pure; sockaddr is a specific size, yet it's a "base class" of different types of address structures. Typically, one casts the sockaddr into the correct struct based on the sa_family member. But this may technically mean f accesses more data than it is given, based on a rigid interpretation of the type system. Should the compiler enforce this given it makes this kind of function practically useless? I think not. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Jonathan M Davis <jmdavisProg gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jmdavisProg gmx.com PDT --- This isn't true:safe functions cannot use pointers, unsafe ones can.safe functions can use pointers just fine. Pointers themselves are considered safe (e.g. the AA's in operator works just fine in safe code). It's unsafe pointer operations such as pointer arithmetic which are not safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 10:23:09 MSD --- Such a mess! The more people write here the more different opinions I see. IMHO, Walter and Andrei must also participate here to help with conclusion (or to finally mix everything up). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185argument value is all the data reachable via the parameters. Argument result is all the data reachable via the result.[...]the optimizer. My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer.Having "pure" as an user provided attribute, the compiler completely trusting the programmer and only checking/enforcing certain assumptions when it is easy to do, is a reasonable solution. Anybody that understands the purity concept will have no problem determining if some function is "pure" or not, this is how it is in C, in dialects supporting pure. Unfortunately, D has purity inference. uint f()(immutable ubyte* p) { uint r; foreach (i; 0..size_t.max) r += p[i]; return r; } Can this still be considered pure? What about "uint f2()(Struct* p) {/*same body*/}"? Or uint f3()(ubyte* p) { uint r; foreach (i; 0..size_t.max) r += p[i]++; return r; } ? All three functions are tagged as pure by the compiler... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 timon.gehr gmx.ch changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timon.gehr gmx.chThe Question: What exactly does these pure functions consider as `argument value` and as `returned value`? Looks like this is neither documented nor obvious.Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value. But why does it even matter? Isn't this discussion mostly philosophical? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.What does 'their own memory block' mean? The problem is a pointer is basically an unbounded array, and, if the access isn't restricted somehow, makes the function dependent on global memory state.But why does it even matter? Isn't this discussion mostly philosophical?The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185The allocated memory block it points into.Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.What does 'their own memory block' mean?The problem is a pointer is basically an unbounded array,That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.and, if the access isn't restricted somehow, makes the function dependent on global memory state.? A function independent of memory state is useless.f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------But why does it even matter? Isn't this discussion mostly philosophical?The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer. This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.The allocated memory block it points into.Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.What does 'their own memory block' mean?And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known. But i don't think anybody would want to write "f(pointer_to_some_struct[0..1])"...The problem is a pointer is basically an unbounded array,That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.int n(int i) {return i+42;}and, if the access isn't restricted somehow, makes the function dependent on global memory state.? A function independent of memory state is useless.f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable. The compiler can assume that a pure function does not access any mutable state other than what can be directly or indirectly reached via the arguments -- that is what function purity is all about. If the compiler has to assume that a pure function that takes a pointer argument can read or modify everything, the "pure" tag becomes worthless. And what's worse, it allows other "truly" pure function to call our immoral one. Hmm, another way out of this could be to require all pointers args in a pure function to target 'immutable' - but that, again, seems to limiting; "bool f(in Struct* s)" could not be pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.But why does it even matter? Isn't this discussion mostly philosophical?The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- The _only_ thing that the pure attribute means by itself is that that function cannot directly access any mutable global or static variables. That is _all_. It means _nothing_ else. It can mess with pointers. It can mess with in, ref, out, and lazy parameters. It can mess with the elements in a slice (thereby alterining external state). It can mess with mutable global or static variables _indirectly_ via the arguments that it's passed (e.g if a pointer or ref is passed to a global variable). It just cannot _directly_ access any mutable global or static variables. pure by itself indicates a weakly pure function. That function enables _zero_ optimizations. It is _not_ pure in the sense that the functional or mathematical community would consider pure. It is not even _trying_ to be pure in that sense. What weak purity does is enable _strong_ purity to actually be useful. When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure. Currently, that gurantee is in effect only when all of the parameters of the function are immutable or implicitly convertible to immutable. It could be extended to const parameters in the case when they're passed immutable arguments, but that isn't currently done. A strongly pure function cannot alter its arguments at all, but it _can_ allocate memory, and it _can_ mutate any of its local state. _weakly_ pure functions can therefore be called from within a strongly pure function, because the only state that they can alter is the state of what's passed to them (because the fact that they're marked with pure means that they cannot access mutable global or mutable static state except via their arguments), and the only state that the strongly pure function _can_ pass to them is local to it, because it can't access global or static mutable state any more than they can, and it can't even access it via its arguments, because it's strongly pure. This is all very clear and well-defined. Having pointers sent off into la-la land doing unsafe system stuff is a _completely_ separate issue. You can break pretty much _anything_ with system code. You could even cast a function which called writeln so that that the signature was pure and then call it from a pure function. All bets are off when you're in system land. It's _your_ job to make sure that your code isn't doing something completely screwy at that point. Any function or operation which the compiler doesn't consider pure would still make a templated function be considered impure in such cases, but because it's system, you can trick it if you want to (e.g. by casting a function's signature). But it's system code - unsafe code - so it's your fault at that point, not the compiler's. I really don't know how the documentation could be much clearer. ref and pointer arguments are't "returned." Only the return value is returned. And arguments are clearly the arguments to the function. And as long as the compiler can determine that nothing has been done to an argument to alter it, it's going to consider to be the same value (and it's going to be _extremely_ conservative about that - even altering a reference or pointer of the same type would make its value be considered different, because they both might point to the same thing). As for stuff like strlen, in that case, you're doing the system thing of saying that yes, I know what I'm doing. I know that this function isn't marked as pure, because it's a C function, but I also know that it _is_ actually pure. I know that it won't access global mutable state. So, I will mark it as pure so that it can be used in pure code. I'm telling the compiler that I know better than it does. And in this caes, I do. If I didn't, then you'd have a bug, and it would be the my fault, because they I the compiler what was best, and I was wrong. At that point, it's up to me to make sure that that the compiler's guarantees aren't being violated. That's system for you. D is a systems programming language. You can do that sort of thing. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185pure by itself indicates a weakly pure function. That function enables _zero_Inventing terminology doesn't help, especially when the result is so confusing.optimizations. It is _not_ pure in the sense that the functional or mathematical community would consider pure. It is not even _trying_ to be pure in that sense. What weak purity does is enable _strong_ purity to actually be useful. When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure. Currently, that gurantee is in effect only when all of the parameters of the function are immutable or implicitly convertible to immutable. It could be extended to const parameters in the case when they're passed immutable arguments, but that isn't currently done.[...] tl;dr. The bugtracker is probably not the right place for this discussion; we could move it to the ML, but talking about it only makes sense if D can be fixed; otherwise we would be wasting our time... Limiting "pure" to just immutable data would work indeed, but it's much too limiting. struct S {int a,b; int[64] c; bool f() const pure {return a||b;}} int g(S* p) { int r; foreach (i; 0..64) if (p.f()) r |= p.c[i]; return r; } Using your "weak pure" definition, f's "pure" would be a NOOP - that is not what most people would expect, and is not a sane purity implementation. It's not a problem for trivial examples such as this one because inlining should take care of it, but would make "pure" almost useless in real code, as it would almost never be, to use your terminology again, "strongly" pure (and couldn't be moved out of the loop). Note that, even when using your "strong purity" definition, the compiler still does the wrong thing - some of the examples I gave previously in this bug are (and others can be trivially modified to be) inferred as "strongly" pure functions, when they are not pure at all. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- They aren't _my_ definitions. They're official. They've been discussed in the newsgroup. They've even been used by folks like Walter Bright in talks at conferences. How purity is implemented in D has been discussed and was decided a while ago. It works well and is not going to change. Weak purity solved a real need. All we had before was strong purity, and it was almost useless, because it was so limited. It is _far_ more useful now that it was before. A pure function is clearly defined as a function which cannot access global or static state which is mutable. It doesn't matter how other languages use the term pure. That's how D uses it. And in cases where a function is strongly pure, you _do_ get the optimizations based on passing the same arguments to the same pure function multiple times that you'd expect from a more functional language. If you don't like how D's pure works, that's fine - you're free to have your own opinion, be it dissenting or otherwise - but how pure works in D is _not_ going to change. If bugs are found in the compiler's implementation of it, they will be addressed, but at this point, the design is what it is. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 09:38:21 MSD ---I really don't know how the documentation could be much clearer.Once it will have examples showing what asserts have to/may/shouldn't pass and/or (I prefer and) what optimizations can be done. Even Setting Dynamic Array Length section has such examples but it is far more simple.As for stuff like strlen, in that case, you're doing the system thing ofsaying that yes, I know what I'm doing. And the missing now words "What exactly does these pure functions consider as `argument value` and as `returned value`" from my original question because it's treated by someone as "only pointer dereferencing" and by someone "access to any logically accessible address". Again, all misunderstanding of pure functions in D can be easily solved by just adding (lots of) examples with difficult cases into docs. IMHO, Jonathan M Davis e.g. will save at least lots of his time (yes, and our time too) by just adding such examples with minimal comments into docs instead of writing such big answers. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- I honestly don't understand why much in the way of examples are needed. The documentation explains what pure is. When the compiler is able to optimize out calls to pure functions is an implementation detail - just like optimizations with const or immutable are. You use pure wherever you can, and the compiler will optimize where it can. The documentation could go into more detail on weakly pure vs strongly pure (since it doesn't mention either), but that's pretty much the only relevant improvement that I can think of, and I know that Don would be annoyed by that, since he wants the terms strongly pure and weakly pure to die and just leave them as implementation details (though I think that he's the only one who really feels that way). I think that there's a lot of overthinking of this going on here. The documentation quite clearly states what a pure function is and what it can and can't do. I don't see how more examples would really help much with that. But anyone has an idea that they think will improve the documentation, then feel free to create a pull request with the changes. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 11:54:40 MSD ---I honestly don't understand why much in the way of examples are needed.OK. I have written some examples. Are they too obvious to not be in docs? Honestly, I'll be amazed if most of D programmers have thought about most of that cases. Examples: pure functions (not sure if system only or safe too) in D are guaranteed to be pure only if used according to it's documentation. There is no guarantees in other case. --- /// b argument have to be true or result will depend on global state size_t f(size_t i, bool b) pure; // strongly pure void main() { size_t i1 = f(1, false); // can depend on global state size_t i2 = f(1, false); // f is free to produce different result here // And if second f call is optimized out using i2 = i1, // (because f is strongly pure) a program will behave // differently in release mode so be careful. } --- For system pure functions, it's your responsibility to pass correct arguments to functions. These functions (even strongly pure) can be impure for "incorrect" arguments and even results in "undefined behavior". --- extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure /// cstr must be zero-ended size_t myStrlen(in char[] cstr) pure // strongly pure { return strlen(cstr.ptr); } void main() { char[3] str = "abc"; // str isn't zero-ended so myStrlen call // results in undefined behavior. size_t l1 = myStrlen(str); size_t l2 = myStrlen(str); // can give different result } --- system strongly pure functions often can't be optimized out: --- extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure void f(in char* cstr, int* n) pure { // strlen have to be executed every iteration, // because compiler doesn't know if n is // connected with cstr someway for(size_t i = 0; i < strlen(cstr); ++i) { *n += cstr[i]; } } --- Same apply even if these functions hasn't pointers/arrays in it's signature: --- size_t f(size_t) nothrow pure; // strongly pure void g(size_t i1, ref size_t i2) pure { // f have to be executed every iteration, // because compiler doesn't know if i1 is // connected with i2 someway (f can expect // that it's argument is an address of i2) for(size_t i = 0; i < f(i1); ++i) { i2 *= 3; } } --- One has to carefully watch if a function is strongly pure by it's signature (the compiler is guaranteed to determine function purity type by it's signature only to prevent different behavior between cases with/without a signature): --- void f(size_t x) pure // strongly pure, can't have side effects { *cast(int*) x = 5; // undefined behavior } __gshared int tmp; void g(size_t x, ref int dummy = tmp) pure // weakly pure, can have side effects { *cast(int*) x = 5; // correct } --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- ??? Why would you be marking a function as pure if it can access global state? The compiler would flag that unless you cheated through casts or the use of extern(C) functions where you marked the declaration as pure but not the definition (since pure isn't part of the name mangling for extern(C) functions). Also, none of your examples using in are strongly pure. At present, the parameters must be _immutable_ or implicitly convertible to immutable for the function to be strongly pure. The only way that const or in would work is if they were passed immutable arguments, but the compiler doesn't treat that as strongly pure right now. system has _nothing_ to do with purity. There's no need to bring it up. It's just that system will let you do dirty tricks (such as casting) to get around pure. Certainly, an system pure function isn't pure based on its arguments unless it's doing something very wrong. The function would have to be specifically trying to break purity to do that, and then it's the same as when you're dealing with const and the like. There's no need to even bring it up. It's a given with _anything_ where you can cast to do nasty system stuff. Adding a description of weakly pure vs strongly pure to the documentation may be valuable, but adding any examples like these would be pointless without it. Also, if you'll notice, the documentation in general is very light on unnecessary examples. It explains exactly what the feature does and gives minimal examples on it. Any that are added should add real value. pure functions cannot access global mutable state or call any other functions which aren't pure. The compiler will give an error if a function marked as pure does either of those things. What the compiler does in terms of optimizations is up to its implementation. I don't see how going into great detail on whether this particular function signature or that particular function signature can be optimized is going to help much. It seems to me that the core problem is that many programmers are having a hard time understanding that all that pure means is that pure functions cannot access global mutable state or call any other functions which aren't pure. They keep thinking that it means more than that, and it doesn't. The compiler will use that information to do optimizations where it can (which aren't even always related to strongly pure - e.g. combining const and weakly pure enable optimizations, just not the kind which elide function calls). If programmers would just believe what the description says about what pure means and stop trying to insist that it must mean more than that, I think that they would be a lot less confused. In some respects, discussing stuff like weakly pure and strongly pure just confuses matters. They're effectively implementation details of how some pure-related optimizations are triggered. It's so very simple and understandable if you leave it at something like "pure functions cannot access global or static variables which are at all mutable - either by the pure function or anything else - and they cannot call other functions which are not pure." That tells you all that you really need to know, and is quite valuable even if _zero_ optimizations were done based on pure, because it helps immensely in being able to think about and understand your program, because you know that a pure function cannot mutate anything which isn't passed to it. I think that you're just overthinking this and overcomplicating things. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 13:07:21 MSD ---Why would you be marking a function as pure if it can access global state? The compiler would flag that unless you cheated through casts or the use of extern(C) functions where you marked the declaration as pure but not the definition (since pure isn't part of the name mangling for extern(C) functions).From your comment before:As for stuff like strlen, in that case, you're doing the system thing of saying that yes, I know what I'm doing. I know that this function isn't marked as pure, because it's a C function, but I also know that it _is_ actually pure.`strlen` is now pure (marked by Andrei Alexandrescu) and it can access global state once used with non-zero-ended string. I just made situation more evident.Also, none of your examples using in are strongly pure. At present, the parameters must be _immutable_ or implicitly convertible to immutable for the function to be strongly pure. The only way that const or in would work is if they were passed immutable arguments, but the compiler doesn't treat that as strongly pure right now.From your comment before:When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure.So I just don't know how strlen can change its argument...system has _nothing_ to do with purity. There's no need to bring it up.IMHO, yes it is. Because safe and system pure functions looks very different for me. And yes, I can be wrong.It's just that system will let you do dirty tricks (such as casting) to get around pure. Certainly, an system pure function isn't pure based on its arguments unless it's doing something very wrong. The function would have to be specifically trying to break purity to do that, and then it's the same as when you're dealing with const and the like. There's no need to even bring it up. It's a given with _anything_ where you can cast to do nasty system stuff.Does strlen doing something very wrong or specifically trying to break purity when it accessing random memory?Adding a description of weakly pure vs strongly pure to the documentation may be valuable, but adding any examples like these would be pointless without it. Also, if you'll notice, the documentation in general is very light on unnecessary examples. It explains exactly what the feature does and gives minimal examples on it. Any that are added should add real value. pure functions cannot access global mutable state or call any other functions which aren't pure. The compiler will give an error if a function marked as pure does either of those things. What the compiler does in terms of optimizations is up to its implementation. I don't see how going into great detail on whether this particular function signature or that particular function signature can be optimized is going to help much.Yes it is because as I wrote:Once it will have examples showing what asserts have to/may/shouldn't pass and/or (I prefer and) what optimizations can be done.optimizations = what asserts should pure functions confirm = what is pure functionIt seems to me that the core problem is that many programmers are having a hard time understanding that all that pure means is that pure functions cannot access global mutable state or call any other functions which aren't pure. They keep thinking that it means more than that, and it doesn't. The compiler will use that information to do optimizations where it can (which aren't even always related to strongly pure - e.g. combining const and weakly pure enable optimizations, just not the kind which elide function calls). If programmers would just believe what the description says about what pure means and stop trying to insist that it must mean more than that, I think that they would be a lot less confused. In some respects, discussing stuff like weakly pure and strongly pure just confuses matters. They're effectively implementation details of how some pure-related optimizations are triggered.strlen and other system functions does access global state in some cases. It's pure. And I'm confused if there is no explanation on _how exactly pure functions can access global state_.It's so very simple and understandable if you leave it at something like "pure functions cannot access global or static variables which are at all mutable - either by the pure function or anything else - and they cannot call other functions which are not pure."No. They call everything that want and do everything they want (see druntme pull 198). They just should behave like a pure functions for a user. And I don't clearly understand what does it mean "to behave like a pure function". That's why this issue is created. That's why I want to see what asserts should pure functions confirm.That tells you all that you really need to know, and is quite valuable even if _zero_ optimizations were done based on pure,Again, I'm not interesting in optimizations for optimization now. They just can explain what is a pure function.because it helps immensely in being able to think about and understand your program, because you know that a pure function cannot mutate anything which isn't passed to it.It gives me nothing because I still doesn't know what is passed to it as I wrote:What exactly does these pure functions consider as `argument value` and as `returned value`?I think that you're just overthinking this and overcomplicating things.May be. Just like a contrary case. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=81851. It does not need the information. Dereferencing a pointer outside the valid bounds results in undefined behavior. Therefore the compiler can just ignore the possibility. 2. It can gain some information at the call site. Eg: int foo(const(int)* y)pure; void main(){ int* x = new int; int* y = new int; auto a = foo(x); auto b = foo(y); auto c = foo(x); assert(a == c); } 3. Aliasing is the classic optimization killer even without 'pure'. 4. Invalid use of pointers can break every other aspect of the type system. Why single out 'pure' ?But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.The allocated memory block it points into.Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.What does 'their own memory block' mean?This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.This is too restrictive.And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.Where do you store the parameter 'i' if not in some memory location?int n(int i) {return i+42;}and, if the access isn't restricted somehow, makes the function dependent on global memory state.? A function independent of memory state is useless.Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.The compiler can assume that a pure function does not access any mutable state other than what can be directly or indirectly reached via the arguments -- that is what function purity is all about. If the compiler has to assume that a pure function that takes a pointer argument can read or modify everything, the "pure" tag becomes worthless.No pointer _argument_ necessary. int foo()pure{ enum int* everything = cast(int*)...; return *everything; } As I already pointed out, unsafe language features can be used to subvert the type system. If pure functions should be restricted to the safe subset, they can be marked safe, or compiled with the -safe compiler switch.And what's worse, it allows other "truly" pure function to call our immoral one.Nothing wrong with that.Hmm, another way out of this could be to require all pointers args in a pure function to target 'immutable' - but that, again, seems to limiting; "bool f(in Struct* s)" could not be pure.This is why the restriction was dropped. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185`strlen` is now pure (marked by Andrei Alexandrescu) and it can access global state once used with non-zero-ended string. I just made situation more evident.It may not be used with a non-zero-ended string. See eg. http://www.cplusplus.com/reference/clibrary/cstring/strlen/ -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 --- I am partly playing Devil's advocate here, but:Why?This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.This is too restrictive.There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value. --- int foo(int* p) pure { return *(p - 1); // Is this legal? } auto a = new int[10]; foo(a.ptr + 1); ---And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.In a register, but that's besides the point – which is that the type of i, int, makes it clear that n depends on exactly four bytes of memory. In »struct Node { Node* next; } void foo(Node* n) pure;«, on the other hand, following your interpretation foo() might depend on an almost arbitrarily large amount of memory (consider e.g. uninitialized memory in the area between a heap-allocated Node instance and the end of the block where it resides, which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).Where do you store the parameter 'i' if not in some memory location?? A function independent of memory state is useless.int n(int i) {return i+42;}Is it? Could you please repeat the definition then, and point out how this is clear from the definition of purity according to the spec, »Pure functions are functions that produce the same result for the same arguments«.Then it is the caller's fault. What is considered reachable is well-defined […]f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.and f4 must document its valid inputs.--- /// Passing anything other than `false` is illegal. int g_state; void foo(bool neverTrue) pure { if (neverTrue) g_state = 42; } --- Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but isn't this too permissive of an interpretation, as the type system can't actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be required if called with know good values, just as in other cases where the type system can't prove a certain invariant, but the programmer can? Purity by convention works just fine without the pure keyword as well… -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- I'd actually argue that the line "Pure functions are functions that produce the same result for the same arguments" should be removed from the spec. Ostensibly, yes. The same arguments will result in the same result, but that doesn't really have anything to do with how pure is defined. It's more like it's a side effect of the fact that you can't access global mutable state. It's true that the compiler will elide additional function calls within an expression in cases where the same function is called multiple times with the same arguments and the compiler can guarantee that the result will be the same, but that's arguably an implementation detail of the optimizer. While the origin and original motivation for pure in D was to enable optimizations based on functional purity (multiple calls to the same function with the same arguments are guaranteed to have the same results), that's not really what pure in D does now, and talking about that clouds the issue something awful, as this bug report demonstrates. Pure means solely that the function cannot access any global or static variables which can be mutated either directly or indirectly once instantiated and that the function cannot call any other functions which are not pure. That enables the whole "same result for the same arguments" thing, but it does _not_ mean that in and of itself. The simple fact that an argument could have a function on it which returns the value of a mutable global variable without that variable being part of its state at all negates that. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---While the origin and original motivation for pure in D was to enable optimizations based on functional purity (multiple calls to the same function with the same arguments are guaranteed to have the same results), that's not really what pure in D does now, and talking about that clouds the issue something awful, as this bug report demonstrates.I think you've provided a good explanation of the high-level design of the pure keyword, more than once, but it seems that you are missing that this issue, at least as stated in comment 3, is actually about a very specific detail: The extent to which memory reachably by manipulating passed in pointers is still considered »local«, i.e. accessible by pure functions. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT --- https://github.com/D-Programming-Language/d-programming-language.org/pull/128 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185I am partly playing Devil's advocate here, but:Because safety is an orthogonal concern. eg. strlen is a pure function. By the same way of reasoning, all unsafe features could be banned in all parts of the code, not just in pure functions.Why?This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.This is too restrictive.Yes, 'per se'. Effectively, it references all memory in the same allocated memory block. (This is also the view taken by the GC.)There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value.And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.--- int foo(int* p) pure { return *(p - 1); // Is this legal? }If it is legal depends on whether or not *(p-1) is part of the same memory block. A conservative analysis (as is done in safe code) would have to flag the access as illegal.auto a = new int[10]; foo(a.ptr + 1); ---a.ptr is a pointer. The arithmetics are flagged as illegal in safe code even though it is safe. What do the examples show?Indeed, because a register is just memory after all.In a register, but that's besides the pointWhere do you store the parameter 'i' if not in some memory location?? A function independent of memory state is useless.int n(int i) {return i+42;}– which is that the type of i, int, makes it clear that n depends on exactly four bytes of memory. In »struct Node { Node* next; } void foo(Node* n) pure;«, on the other hand, following your interpretation foo() might depend on an almost arbitrarily large amount of memory (consider e.g. uninitialized memory in the area between a heap-allocated Node instance and the end of the block where it resides, which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).The language does not define such a thing. Accessing this area therefore results in undefined behavior.It is written down in the C standard. There is no formal specification for D.Is it? Could you please repeat the definition then,Then it is the caller's fault. What is considered reachable is well-defined […]f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.and point out how this is clear from the definition of purity according to the spec,This would not be defined in the pages about purity, but rather in the pages about pointer arithmetics, which are missing, presumably because they would be the same as in C.»Pure functions are functions that produce the same result for the same arguments«.This is not a definition of the 'pure' keyword. It relies on informal terms such as 'the same' and does not require annotation of a function. Therefore the sentence should be dropped from the documentation. If a function is marked with 'pure', then it may not reference mutable free variables.No, because it is trivial to devise an equivalent implementation that does not require the compiler to read documentation comments: int g_state; void foo(bool neverTrue) pure in{assert(!neverTrue);} body { } The same does not hold for 'strlen', therefore the analogy immediately breaks down.and f4 must document its valid inputs.--- /// Passing anything other than `false` is illegal. int g_state; void foo(bool neverTrue) pure { if (neverTrue) g_state = 42; } --- Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, butisn't this too permissive of an interpretation, as the type system can't actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be required if called with know good values, just as in other cases where the type system can't prove a certain invariant, but the programmer can?The type system of an unsafe language cannot prove _any_ invariants, because unsafe operations may result in undefined behavior. This does not imply we'd better have to drop the entire type system.Purity by convention works just fine without the pure keyword as well…This is not only about purity by convention, it is about memory safety by convention. In safe code, all the concerns raised immediately disappear. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT ---I think you've provided a good explanation of the high-level design of the pure keyword, more than once, but it seems that you are missing that this issue, at least as stated in comment 3, is actually about a very specific detail: The extent to which memory reachably by manipulating passed in pointers is still considered »local«, i.e. accessible by pure functions.pure doesn't restrict pointers in any way shape or form. That's an safe/ trusted/ system issue, and is completely orthogonal to pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---pure doesn't restrict pointers in any way shape or form. That's an safe/ trusted/ system issue, and is completely orthogonal to pure.I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with safe/ trusted/ system, even though safe code cannot manipulate pointers for other reasons. There are two options: Either, allow pure functions taking pointers to read other memory locations in the same block of allocated values, or restrict access to just the data directly pointed at (which incidentally is also what safe does, but, again, that's not relevant). Both options are equally valid, and I think the current »spec« is not clear on which one should apply. The first option, which is currently implemented in DMD, allows functions like strlen() to be pure. On the other hand, it also makes the semantics/implications of `pure` a lot more complex, because it links it to something which is fundamentally not expressible by the type system, namely that for any level of indirection, surrounding parts of the memory might be accessible or not, depending on how it was originally allocated. This is assuming C semantics, because, as Timon mentioned as well, OTOH the D docs don't have a formal definition for this as all. For example, consider »struct Node { int val; Node* next; } int foo(in Node* head) pure;«. Using the first rule, it is almost impossible to figure out statically what parts of the program state »foo(someHead)« depends on, because if any of the Node instances in the chain was allocated as part of a contiguous block (i.e. array), it would be legal for foo() to read them as well, even though the function calling foo() might not even have been involved in the construction of the list. Thus, the compiler is forced to always assume the worst case in terms of optimization (at least without elaborate DFA), which, in most D programs, is needlessly conservative. The second option avoid such complications, and allows functions calls with parameters on the heap (and thus pointers) to receive the same kind of optimizations as if the parameters were passed on the stack, which might be impractical. It is also the expected behavior if you are thinking of a pointer literally just as an indirection to a single value stored somewhere else. Personally, I am not sure what is the better choice; the second option seems like the cleaner design, but I can see the merits of the first one as well. But that's not my point – I am just trying to convince you that the »spec« (or whatever it should really be called) needs improvement in this area, because it their arguments« either, yet this is the crucial point. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Don <clugdbug yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |clugdbug yahoo.com.auIpure doesn't restrict pointers in any way shape or form. That's an safe/ trusted/ system issue, and is completely orthogonal to pure.I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with safe/ trusted/ system, even though safe code cannot manipulate pointers for other reasons.There are two options: Either, allow pure functions taking pointers to read other memory locations in the same block of allocated values, or restrict access to just the data directly pointed at (which incidentally is also what safe does, but, again, that's not relevant). Both options are equally valid, and I think the current »spec« is not clear on which one should apply. The first option, which is currently implemented in DMD, allows functions like strlen() to be pure. On the other hand, it also makes the semantics/implications of `pure` a lot more complex, because it links it to something which is fundamentally not expressible by the type system, namely that for any level of indirection, surrounding parts of the memory might be accessible or not, depending on how it was originally allocated. This is assuming C semantics, because, as Timon mentioned as well, OTOH the D docs don't have a formal definition for this as all. For example, consider »struct Node { int val; Node* next; } int foo(in Node* head) pure;«. Using the first rule, it is almost impossible to figure out statically what parts of the program state »foo(someHead)« depends on, because if any of the Node instances in the chain was allocated as part of a contiguous block (i.e. array), it would be legal for foo() to read them as well, even though the function calling foo() might not even have been involved in the construction of the list. Thus, the compiler is forced to always assume the worst case in terms of optimization (at least without elaborate DFA), which, in most D programs, is needlessly conservative.That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function.Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 19:08:08 MSD ---Have you noticed that as I wrote in comment 20 strong unsafe pure functions like --- size_t f(size_t) nothrow pure; --- also almost always can't be optimized out? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function.Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise.
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 19:18:33 MSD --- For Jonathan M Davis: here (as before) when I say "optimization" I mean "doesn't behave such way that can be optimized" which means "doesn't behave such way that is expected/desired (IMHO)/etc.". Example (for everybody): --- int f(size_t) pure; __gshared int tmp; void g(size_t, ref int dummy = tmp) pure; void h(size_t a, size_t b) pure { int res = f(a); g(b); assert(res == f(a)); // may fail, no guaranties by language! } --- So pure looks for me more then just useless. It looks dangerous because it confuses people and forces them to think that the second `assert` will pass. At least, with existing docs (or with pull 128). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT ---int f(size_t) pure;__gshared int tmp; void g(size_t, ref int dummy = tmp) pure;void h(size_t a, size_t b) pure { int res = f(a); g(b); assert(res == f(a)); // may fail, no guaranties by language! }Your g(b) causes h to be impure, because it accesses tmp, which is __gshared. Also, as far as eliding additional calls to pure functions, at present, they only occur within the same line, and I think that may only ever occur within the same expression (it's either expression or statement, I'm not sure which). So, the eliding of additional pure function calls is going to be quite rare. The _primary_ benefit of pure is how it enables you to reason about your code. You _know_ that f doesn't mess with anything other than the argument that you passed to it without having to look at its body at all. Oh, and the assertion _is_ guaranteed to pass. a and res are both value types. Neither res nor a are passed to anything or accessed in any way other than in the the lines with the calls to f, and even if g were impure, and it screwed with whatever argument was passed as the first argument to the h call, it wouldn't be able to mess with the value of a, because it was already copied. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---[…] strong unsafe pure functions […]Please note that safe-ty of a function has nothing to do with purity. Yes in a system/ trusted pure function, it's easy to do impure things, but if you do, it's your fault, not that of the language/type system. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185The problem is there are no "valid bounds". Unless you'd like to declare (char* p) {return p[1];} as invalid, which as you yourself say is restrictive (but IMO acceptable for pure functions, at least the ones that are automatically inferred as pure).1. It does not need the information. Dereferencing a pointer outside the valid bounds results in undefined behavior. Therefore the compiler can just ignore the possibility.But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.The allocated memory block it points into.Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.What does 'their own memory block' mean?2. It can gain some information at the call site. Eg: int foo(const(int)* y)pure; void main(){ int* x = new int; int* y = new int; auto a = foo(x); auto b = foo(y); auto c = foo(x); assert(a == c); }According to certain replies in this report, that assertion could fail. :) But i get what you're saying - now consider this foo() definition instead: int foo()(const(int)* y) { int r; foreach (i; 0..size_t.max) r += y[i]; return r; } /* same main () */ The compiler will treat foo() as pure, so if it would be able to act on the a==c assumption above, it could also do the same here. And now it would be completely wrong - the function doesn't even try to pretend that it's pure, yet it will be inferred as if it were and there's no (clean) way to prevent that. If the compiler optimizes based on a==c, it will miscompile the program. This is why the restrictions on what is accessed via a pointer in a pure function is necessary. Note it only matters for templates/literals/lambdas, ie the cases where purity is inferred; the programmer can always add the purity tag when he knows it is (logically) safe (eg most C string functions). And yes, my example code doesn't make sense as-is, but it only servers to illustrate the problem, there are sane implementations of foo(T*p) which under the right conditions will have the same issues. BTW, is my foo() above safe? According to the compiler here - it is.3. Aliasing is the classic optimization killer even without 'pure'.Yes. Maybe it's a good thing that D doesn't attempt to define it, given the amount of confusion something like "pure" causes...4. Invalid use of pointers can break every other aspect of the type system. Why single out 'pure' ?It has nothing to do with "invalid use of pointers", unless, again, p[1] is deemed invalid.What else do you want to be able to do with a pointer in a pure function? Dereferencing it and working with the value itself should work, anything else? Note that you should be able to explicitly tell the compiler to assume something is pure even when the code accesses more than just the pointed-to element.This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.This is too restrictive.Having well defined aliasing rules would help, yes, but I think that's beyond the scope of this bug.And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.I said "global memory state". The parameters are *local* state, just like variables - they can not escape (you can't return their address) and the values depend only on function inputs. Arguments containing references can be seen as part of the global state, but those are explicitly defined as inputs that the function depends on. And that definition wrt to pointers is exactly what this bug is about.Where do you store the parameter 'i' if not in some memory location?int n(int i) {return i+42;}and, if the access isn't restricted somehow, makes the function dependent on global memory state.? A function independent of memory state is useless.f4() takes a pointer; AFAICT you've said above that it should be able to do more than just dereference it. So what exactly is considered reachable?Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.p[i] can be just as dangerous as the cast. The questions is - can the compiler treat a function containing these constructs as still pure? If the programmer says so, it's fine - purity by convention works.The compiler can assume that a pure function does not access any mutable state other than what can be directly or indirectly reached via the arguments -- that is what function purity is all about. If the compiler has to assume that a pure function that takes a pointer argument can read or modify everything, the "pure" tag becomes worthless.No pointer _argument_ necessary. int foo()pure{ enum int* everything = cast(int*)...; return *everything; } As I already pointed out, unsafe language features can be used to subvert thetype system. If pure functions should be restricted to the safe subset, they can be marked safe, or compiled with the -safe compiler switch.int foo()(int* y) safe { int r; foreach (i; 0..size_t.max) r += y[i]++; return r; } But it's not related to this bug.It is wrong - if a pure functions can be optimized out and it calls another one that has side effects. Again, the case when a human incorrectly tags a function is not really the problem, it's when the compiler does that behind the programmers back. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------And what's worse, it allows other "truly" pure function to call our immoral one.Nothing wrong with that.
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---BTW, is my foo() above safe? According to the compiler here - it is.If so, please open a new issue – this is clearly a bug. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 20:27:24 MSD ---Yes, my mistake. Lets call "g(b, b)".int f(size_t) pure;__gshared int tmp; void g(size_t, ref int dummy = tmp) pure;void h(size_t a, size_t b) pure { int res = f(a); g(b); assert(res == f(a)); // may fail, no guaranties by language! }Your g(b) causes h to be impure, because it accesses tmp, which is __gshared.Also, as far as eliding additional calls to pure functions, at present, they only occur within the same line, and I think that may only ever occur within the same expression (it's either expression or statement, I'm not sure which). So, the eliding of additional pure function calls is going to be quite rare. The _primary_ benefit of pure is how it enables you to reason about your code. You _know_ that f doesn't mess with anything other than the argument that you passed to it without having to look at its body at all.No, because the assert may not pass. See below.Oh, and the assertion _is_ guaranteed to pass. a and res are both value types. Neither res nor a are passed to anything or accessed in any way other than in the the lines with the calls to f, and even if g were impure, and it screwed with whatever argument was passed as the first argument to the h call, it wouldn't be able to mess with the value of a, because it was already copied.Again, assert may not pass. Were it pass, I will not write this question. Example: --- int f(size_t p) pure { return *cast(int*) p; } void g(size_t p, ref size_t) pure { ++*cast(int*) p; } void h(size_t a, size_t b) pure { int res = f(a); g(b, b); assert(res == f(a)); // may fail, no guaranties by language! } void main() { int a; h(cast(size_t) &a, cast(size_t) &a); } --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 PDT ---void g(size_t p, ref size_t) pure { ++*cast(int*) p; }You're casting a size_t to a pointer. That's breaking the type system. The assertion is guaranteed to pass as long as you don't break the type system. That's exactly the same as occurs when casting away const. When you subvert the type system, the compiler can't guarantee anything. It's the _programmer's_ job at that point to maintain the compiler's guarantees. The compiler is free to assume that the programmer did not violate those guarantees. If you do, you've created a bug. This is precisely the sort of thing that comes up when someone is crazy enough to cast away const on somethnig and try and mutate it. Such an example is ultimately irrelevant, precisely because it violates the type system. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 20:52:56 MSD ---It isn't and here is the point! It's explicitly stated that when I'm casting away const and than modify date the result is undefined. I will be happy if I'm missing that this casting results in undefined result too.void g(size_t p, ref size_t) pure { ++*cast(int*) p; }You're casting a size_t to a pointer. That's breaking the type system. The assertion is guaranteed to pass as long as you don't break the type system. That's exactly the same as occurs when casting away const.When you subvert the type system, the compiler can't guarantee anything. It's the _programmer's_ job at that point to maintain the compiler's guarantees. The compiler is free to assume that the programmer did not violate those guarantees.No it's not. Otherwise every such break of the rules will result in undefined behavior. E.g. C++ have strict aliasing and can shrink what function arguments can refer to and if C++ program has `strlen` source it can inline and move it out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be done because every `int*` can refer to every `char*`. So C++ support pure functions better than D. :)If you do, you've created a bug. This is precisely the sort of thing that comes up when someone is crazy enough to cast away const on somethnig and try and mutate it. Such an example is ultimately irrelevant, precisely because it violates the type system.Every system function can do it. It can even be written in assembly language. I'm just saying here that it doesn't violate definition of a `pure` function and here is the problem. I will be happy once it will violate the definition. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 10:30:19 PDT ---It isn't and here is the point! It's explicitly stated that when I'm casting away const and than modify date the result is undefined. I will be happy if I'm missing that this casting results in undefined result too.I believe it is undefined to cast a size_t to a pointer and use it as a pointer. But I could be wrong. In any case, pure function optimizations do not conservatively assume you will be doing that -- the compiler will optimize assuming you do *not* use it as a pointer. Whenever you cast, you are telling the compiler "I know what I'm doing." At that point, you are on your own as far as guaranteeing type safety and pure functions are actually pure.No it's not. Otherwise every such break of the rules will result in undefined behavior. E.g. C++ have strict aliasing and can shrink what function arguments can refer to and if C++ program has `strlen` source it can inline and move it out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be done because every `int*` can refer to every `char*`. So C++ support pure functions better than D. :)If you don't want the compiler to make bad optimization decisions, then don't use casting. At best, this will be implementation defined. I think you are way overthinking this. D's compiler and optimizer are based on a C++ compiler, written by the same person. Most of the same rules from C++ apply to D. The compiler does not "assume the worst," it "assumes the reasonable," until you tell it otherwise. In other words, no reasonable developer will write code like you have, so the compiler assumes you are reasonable. Using toy examples to show how the compiler *must* behave does not work. Yes, maybe this isn't spelled out fully in the spec, and it should be. But you are coming at this problem from the wrong end, start with what the compiler acutally *does*, not what you *think it should do* based on the spec. The spec, like most software products, is usually the last to be updated when it comes to additional features, and the new pure rules are quite recent. The priority of "who is right" goes like this: 1. TDPL (the book) 2. The reference implementation (DMD) 3. dlang.org -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 10:45:33 PDT ---In general response to this bug, I'm unsure how pointers should be treated by the optimizer. My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer.After thinking about this for a couple days (and watching the emails pour in with differing opinions), here is what I think pure functions with pointers should mean: For system or trusted functions, the definition of what data the pointer has access to is defined by the programmer, and not expressed in possible way to the type system or the compiler. In other words, if I have a pointer to something, the actual data referenced includes any number of bytes before or after the memory pointed at. The scope of that data is defined by the programmer of the function/type, and should be clearly documented to the user of the function. For safe functions, the compiler should allow access only to the specific item pointed to as defined by the pointed-at type, and nothing else (pointer math is disallowed, pointer indexing is disallowed, and casting is disallowed). For pure functions, no conservative assumptions should be made or acted upon during optimizations that expect the function has access to global data. In other words, a system pure function that accepts a pointer should rightly assume that the function does *not* access global data, and that whatever data the function accesses via its pointer was passed via its parameter as expected by the caller. If the function incorrectly accesses global data via its pointer, then it results in undefined behavior. These expectations and behaviors should be spelled out in the spec. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 --- Still thinking about the rest of the proposal, but:[…] or trusted functions […]If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 10:59:49 PDT ---Still thinking about the rest of the proposal, but:The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------[…] or trusted functions […]If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code.
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 2012-06-04 22:13:05 MSD ---The compiler does not "assume the worst," it "assumes the reasonable," until you tell it otherwise. In other words, no reasonable developer will write code like you have, so the compiler assumes you are reasonable. Using toy examples to show how the compiler *must* behave does not work.Common! System language must have strict rights. You just have said that D is JavaScript. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---Sorry, but I think you got this wrong. Consider this example: --- void gun(int* a) trusted; int fun() safe { auto val = new int; gun(val); return *val; } --- Here, calling gun needs to be safe under _any_ circumstances. Thus, the only memory location which gun is allowed to access is val. If it does so by evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's fine, it's trusted, but ultimately k must always be zero. Otherwise, it might violate the memory safety guarantees that need to hold for fun(). This is definitely not »defined by the programmer, and not expressed in possible way to the type system or the compiler«. Makes sense? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code.The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe.
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 As this discussions was mostly about what *should* be happening, I decided to see what actually *is* happening right now. It seems that the compiler will only optimize based on "pureness" if a function takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the optimization off. So, right now, it is extremely conservative - and there is no bug in the implementation. (accessing mutable data via an immutable pointer can be done, but would be clearly illegal, just as using a cast) But that also means that a lot of valid optimizations aren't done, making purity significantly less useful than it could be. Basically, only functions that don't take any (non-immutable) references as arguments can benefit from "pure". But it also means D can still be incrementally fixed, as long as a sane definition of function purity is used. But this bug is a spec issue, hence probably INVALID, as there is no specification. Sorry for the noise. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 11:35:27 PDT ---Common! System language must have strict rights. You just have said that D is JavaScript.A systems language is very strict as long as you play within the type system. Once you use casts, all bets are off. The compiler can make *wrong assumptions* and your code may not do what you think it should. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 11:48:22 PDT ---No, it does not. Once you use trusted, the compiler stops checking that it's safe.Sorry, but I think you got this wrong. Consider this example: --- void gun(int* a) trusted; int fun() safe { auto val = new int; gun(val); return *val; } --- Here, calling gun needs to be safe under _any_ circumstances.If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code.The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe.Thus, the only memory location which gun is allowed to access is val. If it does so by evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's fine, it's trusted, but ultimately k must always be zero. Otherwise, it might violate the memory safety guarantees that need to hold for fun(). This is definitely not »defined by the programmer, and not expressed in possible way to the type system or the compiler«.Yeah, that's a hard one to spell out in docs. I'd recommend not writing that function :) But there's no way to specify this to the compiler, it must assume you have communicated it properly. Here is an interesting example (I pointed it out before in terms of sockaddr): struct PacketHeader { int nBytes; int packetType; } struct DataPacket { PacketHeader header = {packetType:5}; ubyte[1] data; // extends through length of packet } How to specify to the compiler that PacketHeader * with packetType of 5 is really a DataPacket, and it's data member has nBytes bytes in it? Such a well-described data structure system can be perfectly safe, as long as you follow the rules of construction. Now, in order to ensure any function that receives a PacketHeader * is trusted, you will have to control construction of the PacketHeader somehow. Perhaps you make PacketHeader an opaque type, and safe functions can therefore never muck with the header information, or maybe you mark nBytes and packetType as private, so it can never be changed outside the module that knows how to build PacketHeaders. In any case, it is wrong to assume that there isn't a valid way to make a trusted call that is free to go beyond the target. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 11:51:14 PDT ---It seems that the compiler will only optimize based on "pureness" if a function takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the optimization off.This is a bug, both should be optimized equally: void foo(immutable int * _param) pure { immutable(int)* param = _param; // legal ... // same code as if you had written void foo(immutable(int)* param) } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---Yes, it does. As you noted correctly, you as the one implementing gun() must take care of that, the compiler doesn't help you here. But still, you must ensure that gun() never violates memory safety, regardless of what is passed in, because otherwise it might cause safe code to be no longer memory safe.Here, calling gun needs to be safe under _any_ circumstances.No, it does not. Once you use trusted, the compiler stops checking that it's safe.Now, in order to ensure any function that receives a PacketHeader * is trusted, you will have to control construction of the PacketHeader somehow. […]Okay, iff you are using a pointer more or less exclusively as an opaque handle, then I guess you are right – I thought only about pointers that are directly obtainable in safe code. But then, please be careful with including something along the lines of »For safe functions, the compiler should allow access only to the specific item pointed to as defined by the pointed-at type, and nothing else« in the docs, because it is quite misleading (or even technically wrong, although I know what you are trying to say): A safe function _can_ in effect access other memory, if only with the help from a trusted function. On a related note, the distinction between safe and trusted (especially the difference in mangling) is a horrible abomination and should die in a fire. safe and system are contracts, trusted is an implementation detail – mixing them makes no sense. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 ---This is a bug, both should be optimized equally: void foo(immutable int * _param) pure { immutable(int)* param = _param; // legal ... // same code as if you had written void foo(immutable(int)* param) }Yep, both should be recognized PUREstrong in DMD – if not, please open a new bug report for that. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 13:34:50 PDT ---I think I misunderstood your original point. I thought you were saying that gun must be *prevented from* modifying other memory relative to its parameter. Were you simply saying that gun is not stopped by the compiler, but must avoid it in order to maintain safety? If so, I agree, for your example. I can also see that my response was misleading. I did not mean it should not be safe, I meant it's not enforced as safe. Obviously something that is trusted needs to maintain safety.Yes, it does. As you noted correctly, you as the one implementing gun() must take care of that, the compiler doesn't help you here. But still, you must ensure that gun() never violates memory safety, regardless of what is passed in, because otherwise it might cause safe code to be no longer memory safe.Here, calling gun needs to be safe under _any_ circumstances.No, it does not. Once you use trusted, the compiler stops checking that it's safe.On a related note, the distinction between safe and trusted (especially the difference in mangling) is a horrible abomination and should die in a fire. safe and system are contracts, trusted is an implementation detail – mixing them makes no sense.I'm not sure what you're saying here, but trusted is *definitely* needed. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Commits pushed to master at https://github.com/D-Programming-Language/d-programming-language.org https://github.com/D-Programming-Language/d-programming-language.org/commit/59670a7823d066f5146e276bdf5aac7bd93a3f45 This clarifies the definition of pure, since so many people seem to have a hard time understanding that _all_ that pure means is that the function cannot access global or static, mutable state or call impure functions. Everything else with regards to pure is a matter of implementation-specific optimizations - which does in some cases relate to full, functional purity, but pure itself does not indicate anything of the sort. https://github.com/D-Programming-Language/d-programming-language.org/commit/8cc3ba694bc07ec684f2d1c5a088728aa18e7d93 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 01 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |bugzilla digitalmars.com Resolution| |FIXED -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 02 2012