digitalmars.D - Array literals REALLY should be immutable
- Don (35/35) Nov 12 2009 I think this is quite horrible. [1, 2, 3] looks like an array literal,
- Moritz Warning (4/7) Nov 12 2009 I've hit the problem around four times last week
- Denis Koroskin (7/43) Nov 12 2009 Can't agree more.
- dsimcha (20/22) Nov 12 2009 I can see the value in this, but two issues:
- Denis Koroskin (7/46) Nov 12 2009 I can't give a formal definition of that, but for me a is allowed to
- Andrei Alexandrescu (3/26) Nov 12 2009 Please bugzilla that, thanks. I'll fix.
- Denis Koroskin (26/65) Nov 12 2009 I can't give a formal definition of that, but for me a function is allow...
- Bill Baxter (7/20) Nov 12 2009 I'm pretty sure the reason is that it means library code that's easier
- Denis Koroskin (6/28) Nov 12 2009 It also means that the former function can't be used in programs that
- dsimcha (7/29) Nov 12 2009 I don't understand this attitude. There are definitely times when reada...
- Max Samukha (4/7) Nov 12 2009 I absolutely agree.
- Don (2/13) Nov 12 2009 Yes, that's the intention. See bug 2559.
- Eldar Insafutdinov (2/51) Nov 12 2009 I agree too. that will be consistent.
- grauzone (6/9) Nov 12 2009 Can we make
- Walter Bright (7/9) Nov 12 2009 The inconsistency bothers me, too, but then there's the case:
- Steven Schveighoffer (10/18) Nov 12 2009 I thought so too, but I think Don is right. A library function can solv...
- Don (24/36) Nov 13 2009 I don't think it should work.
- Denis Koroskin (4/37) Nov 13 2009 They aren't: "abcd" has a null-terminator past the string, and ['a', 'b'...
- Don (7/54) Nov 13 2009 You mean the race condition is not an issue? It is an issue because the
I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be. Q1. How do you declare an array literal [1,2,3]? A. It took me four attempts before I got it. ========================== int main() { immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time static int[] x2 = [1, 2, 3]; // NO - uses thread local storage enum int[] x3 = [1, 2, 3]; // NO - not indexable at run time. static immutable int[] x4 = [1, 2, 3]; // OK static const int[] x5 = [1, 2, 3]; // also OK for (int i=0; i< 3; ++i) { if (x4[i]==3) return i; } return 0; } (x3 is currently accepted, but that's a bug -- the whole point of 'enum' is that you can't take the address of it). This is really ugly and non-intuitive for something so simple. x1 should just work. Q2: How do you create such an array literal and pass it in a function call? A. ??? Is this even possible right now? My code is *full* of these guys. For example, function approximations use them (look at any of the special functions code in Tango.math, or etc.gamma). Unit tests are full of them. Everyone uses look-up tables. Bug 2356 is a consequence of this. By constrast, the stupid array constructors we have now can be implemented in a trivial library function: T[] array(T)(T[] x...) { return x.dup; } I really don't see how syntax sugar for something so simple can be justified, at the expense of basic functionality (lookup tables, essentially). Especially when it's creating an inconsistency with string literals.
Nov 12 2009
On Thu, 12 Nov 2009 14:28:05 +0100, Don wrote:I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be.I've hit the problem around four times last week in combination with C bindings. I agree with the proposal, but I miss action.
Nov 12 2009
On Thu, 12 Nov 2009 16:28:05 +0300, Don <nospam nospam.com> wrote:I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be. Q1. How do you declare an array literal [1,2,3]? A. It took me four attempts before I got it. ========================== int main() { immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time static int[] x2 = [1, 2, 3]; // NO - uses thread local storage enum int[] x3 = [1, 2, 3]; // NO - not indexable at run time. static immutable int[] x4 = [1, 2, 3]; // OK static const int[] x5 = [1, 2, 3]; // also OK for (int i=0; i< 3; ++i) { if (x4[i]==3) return i; } return 0; } (x3 is currently accepted, but that's a bug -- the whole point of 'enum' is that you can't take the address of it). This is really ugly and non-intuitive for something so simple. x1 should just work. Q2: How do you create such an array literal and pass it in a function call? A. ??? Is this even possible right now? My code is *full* of these guys. For example, function approximations use them (look at any of the special functions code in Tango.math, or etc.gamma). Unit tests are full of them. Everyone uses look-up tables. Bug 2356 is a consequence of this. By constrast, the stupid array constructors we have now can be implemented in a trivial library function: T[] array(T)(T[] x...) { return x.dup; } I really don't see how syntax sugar for something so simple can be justified, at the expense of basic functionality (lookup tables, essentially). Especially when it's creating an inconsistency with string literals.Can't agree more. I see no problem writing [1, 2, 3].dup; In fact, I can count all the uses of *dynamic* array literals on the fingers of one hand. I mostly need then for either indexing or iteration so the contents is read-only. I strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).
Nov 12 2009
== Quote from Denis Koroskin (2korden gmail.com)'s articleI strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it? 2. How do you really design high-level library functions if they're not allowed to allocate memory? If you require the user to provide all kinds of details about where the memory they use comes from then you lose some of the high level-ness and make it seem more like an ugly C API that doesn't "just work" and requires attention to the irrelevant the 90% of the time that you don't care about an extra allocation. The solution I personally use in my dstats lib, which works pretty well in the limited case of arrays of primitives, but might not generalize, is: a. For stuff that returns an array, the last argument to the function is an optional buffer. If it is provided and is big enough, the results are returned in it. If it is not provided or is too small, a new one is allocated. b. For temporary buffers used within a function, I use a thread-local second stack (TempAlloc). While this is not **guaranteed** never to result in an allocation (if we're out of space in our current chunk of memory, a new one will be allocated), it very seldom does and only when the only alternative would be to crash, throw an exception, etc.
Nov 12 2009
On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:== Quote from Denis Koroskin (2korden gmail.com)'s articleI can't give a formal definition of that, but for me a is allowed to allocate if it produces something new or unique. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it.I strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?2. How do you really design high-level library functions if they're not allowed to allocate memory? If you require the user to provide all kinds of details about where the memory they use comes from then you lose some of the high level-ness and make it seem more like an ugly C API that doesn't "just work" and requires attention to the irrelevant the 90% of the time that you don't care about an extra allocation. The solution I personally use in my dstats lib, which works pretty well in the limited case of arrays of primitives, but might not generalize, is: a. For stuff that returns an array, the last argument to the function is an optional buffer. If it is provided and is big enough, the results are returned in it. If it is not provided or is too small, a new one is allocated. b. For temporary buffers used within a function, I use a thread-local second stack (TempAlloc). While this is not **guaranteed** never to result in an allocation (if we're out of space in our current chunk of memory, a new one will be allocated), it very seldom does and only when the only alternative would be to crash, throw an exception, etc.-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Nov 12 2009
Denis Koroskin wrote:On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:Please bugzilla that, thanks. I'll fix. Andrei== Quote from Denis Koroskin (2korden gmail.com)'s articleI can't give a formal definition of that, but for me a is allowed to allocate if it produces something new or unique. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it.I strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?
Nov 12 2009
On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:== Quote from Denis Koroskin (2korden gmail.com)'s articleI can't give a formal definition of that, but for me a function is allowed to allocate if that allocation is returned back to the user. If function allocates and the memory become unreferenced after function returns, then this allocation is redundant and should be get rid of. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it. (It invokes mkdir() for each directory in a path, and mkdir allocates a new string to make sure it end with \0. Alternatively, a copy of path could be created only once - on a stack buffer - and get reused by putting \0 in place of slashes to terminate it. Something like this: // untested void mkdirRecurse(string path) { char* buffer = alloca(path.length); memcpy(buffer, path); foreach (i, c; buffer[0..path.length]) { if (c == '/') { buffer[i] = 0; mkdir(buffer); buffer[i] = '/'; } } } There are a lot of functions that allocate without a clear reason.)I strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?2. How do you really design high-level library functions if they're not allowed to allocate memory? If you require the user to provide all kinds of details about where the memory they use comes from then you lose some of the high level-ness and make it seem more like an ugly C API that doesn't "just work" and requires attention to the irrelevant the 90% of the time that you don't care about an extra allocation. The solution I personally use in my dstats lib, which works pretty well in the limited case of arrays of primitives, but might not generalize, is: a. For stuff that returns an array, the last argument to the function is an optional buffer. If it is provided and is big enough, the results are returned in it. If it is not provided or is too small, a new one is allocated. b. For temporary buffers used within a function, I use a thread-local second stack (TempAlloc). While this is not **guaranteed** never to result in an allocation (if we're out of space in our current chunk of memory, a new one will be allocated), it very seldom does and only when the only alternative would be to crash, throw an exception, etc.Yes, this is a good solution.
Nov 12 2009
2009/11/12 Denis Koroskin <2korden gmail.com>:// untested void mkdirRecurse(string path) { =A0 =A0char* buffer =3D alloca(path.length); =A0 =A0memcpy(buffer, path); =A0 =A0foreach (i, c; buffer[0..path.length]) { =A0 =A0 =A0 =A0if (c =3D=3D '/') { =A0 =A0 =A0 =A0 =A0 =A0buffer[i] =3D 0; =A0 =A0 =A0 =A0 =A0 =A0mkdir(buffer); =A0 =A0 =A0 =A0 =A0 =A0buffer[i] =3D '/'; =A0 =A0 =A0 =A0} =A0 =A0} } There are a lot of functions that allocate without a clear reason.)I'm pretty sure the reason is that it means library code that's easier to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb
Nov 12 2009
On Thu, 12 Nov 2009 20:26:44 +0300, Bill Baxter <wbaxter gmail.com> wrote:2009/11/12 Denis Koroskin <2korden gmail.com>:It also means that the former function can't be used in programs that disable GC (kernels, embedded development etc). Quality is in the details like that. languages, and their GCs are a lot better that D's one.// untested void mkdirRecurse(string path) { char* buffer = alloca(path.length); memcpy(buffer, path); foreach (i, c; buffer[0..path.length]) { if (c == '/') { buffer[i] = 0; mkdir(buffer); buffer[i] = '/'; } } } There are a lot of functions that allocate without a clear reason.)I'm pretty sure the reason is that it means library code that's easier to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb
Nov 12 2009
== Quote from Bill Baxter (wbaxter gmail.com)'s article2009/11/12 Denis Koroskin <2korden gmail.com>:I don't understand this attitude. There are definitely times when readability and maintainability count more than performance, but library code that will be used in hundreds of different places isn't one of them. Knuth says we should forget about small efficiencies about 97% of the time. He's right. However, when you are writing this kind of generic library code, the odds are pretty good that at least one place where it's used is going to be in the 3%.// untested void mkdirRecurse(string path) { char* buffer = alloca(path.length); memcpy(buffer, path); foreach (i, c; buffer[0..path.length]) { if (c == '/') { buffer[i] = 0; mkdir(buffer); buffer[i] = '/'; } } } There are a lot of functions that allocate without a clear reason.)I'm pretty sure the reason is that it means library code that's easier to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb
Nov 12 2009
On Thu, 12 Nov 2009 14:28:05 +0100, Don <nospam nospam.com> wrote:I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be.I absolutely agree. One note: I hope that x3 will remain valid and be indexable with a compile-time value.
Nov 12 2009
Max Samukha wrote:On Thu, 12 Nov 2009 14:28:05 +0100, Don <nospam nospam.com> wrote:Yes, that's the intention. See bug 2559.I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be.I absolutely agree. One note: I hope that x3 will remain valid and be indexable with a compile-time value.
Nov 12 2009
Don Wrote:I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be. Q1. How do you declare an array literal [1,2,3]? A. It took me four attempts before I got it. ========================== int main() { immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time static int[] x2 = [1, 2, 3]; // NO - uses thread local storage enum int[] x3 = [1, 2, 3]; // NO - not indexable at run time. static immutable int[] x4 = [1, 2, 3]; // OK static const int[] x5 = [1, 2, 3]; // also OK for (int i=0; i< 3; ++i) { if (x4[i]==3) return i; } return 0; } (x3 is currently accepted, but that's a bug -- the whole point of 'enum' is that you can't take the address of it). This is really ugly and non-intuitive for something so simple. x1 should just work. Q2: How do you create such an array literal and pass it in a function call? A. ??? Is this even possible right now? My code is *full* of these guys. For example, function approximations use them (look at any of the special functions code in Tango.math, or etc.gamma). Unit tests are full of them. Everyone uses look-up tables. Bug 2356 is a consequence of this. By constrast, the stupid array constructors we have now can be implemented in a trivial library function: T[] array(T)(T[] x...) { return x.dup; } I really don't see how syntax sugar for something so simple can be justified, at the expense of basic functionality (lookup tables, essentially). Especially when it's creating an inconsistency with string literals.I agree too. that will be consistent.
Nov 12 2009
Don wrote:I think this is quite horrible. [1, 2, 3] looks like an array literal, but it isn't -- it's an array constructor. It doesn't look like a function call. It shouldn't be.Can we make int[3] a = [1,2,x]; Just Work (tm)? Because right now (D1), it allocates an array literal, and then copies it into the static array. Incredibly stupid.
Nov 12 2009
Don wrote:Especially when it's creating an inconsistency with string literals.The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.
Nov 12 2009
On Thu, 12 Nov 2009 14:46:29 -0500, Walter Bright <newshound1 digitalmars.com> wrote:Don wrote:I thought so too, but I think Don is right. A library function can solve that problem: auto arr = array(1,2,x); BTW, there is legitimate inconsistency here: int[] x = [1,2,3]; // compiles and does what you expect char[] str = "abc"; // should allocate a mutable string on the heap, should it not? -SteveEspecially when it's creating an inconsistency with string literals.The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.
Nov 12 2009
Walter Bright wrote:Don wrote:I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition. The latter is just a standard lookup table, that results in no code generation. I don't see why these two very different operations should share the same syntax -- they don't actually have much in common. It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are completely identical. C++ got away with using the same syntax for these two totally different things, because (1) it doesn't have any kind of constant folding/CTFE, so it can look at the array entries and determine whether it's immutable or not; and (2) it ignores multi-core issues. The fact that the language doesn't have any syntax for an immutable array literal is really a problem. Some people are getting around it by using a CTFE function to convert all the values to a string literal, then casting that string literal to (say) an array of ints. That's currently the only way to make the compiler generate decent code, and it's quite dreadful. BTW, I'm pretty sure that making array literals immutable would simplify the compiler. EG, I've noticed that mutable array literals cause many problems for the interpreter.Especially when it's creating an inconsistency with string literals.The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.
Nov 13 2009
On Fri, 13 Nov 2009 12:57:52 +0300, Don <nospam nospam.com> wrote:Walter Bright wrote:With thread-local-by-default in mind, this is not an issue.Don wrote:I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition.Especially when it's creating an inconsistency with string literals.The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.The latter is just a standard lookup table, that results in no code generation. I don't see why these two very different operations should share the same syntax -- they don't actually have much in common. It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are completely identical.They aren't: "abcd" has a null-terminator past the string, and ['a', 'b', 'c', 'd'] doesn't.C++ got away with using the same syntax for these two totally different things, because (1) it doesn't have any kind of constant folding/CTFE, so it can look at the array entries and determine whether it's immutable or not; and (2) it ignores multi-core issues. The fact that the language doesn't have any syntax for an immutable array literal is really a problem. Some people are getting around it by using a CTFE function to convert all the values to a string literal, then casting that string literal to (say) an array of ints. That's currently the only way to make the compiler generate decent code, and it's quite dreadful. BTW, I'm pretty sure that making array literals immutable would simplify the compiler. EG, I've noticed that mutable array literals cause many problems for the interpreter.
Nov 13 2009
Denis Koroskin wrote:On Fri, 13 Nov 2009 12:57:52 +0300, Don <nospam nospam.com> wrote:You mean the race condition is not an issue? It is an issue because the compiler needs to deal with it (perhaps by using thread local variables!) But the second case is not implemented in DMD anyway.Walter Bright wrote:With thread-local-by-default in mind, this is not an issue.Don wrote:I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition.Especially when it's creating an inconsistency with string literals.The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.You're right. Though, it'd be easy to add a null terminator to the end of memory allocated to char-typed array literals, and make them identical in every respect.The latter is just a standard lookup table, that results in no code generation. I don't see why these two very different operations should share the same syntax -- they don't actually have much in common. It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are completely identical.They aren't: "abcd" has a null-terminator past the string, and ['a', 'b', 'c', 'd'] doesn't.C++ got away with using the same syntax for these two totally different things, because (1) it doesn't have any kind of constant folding/CTFE, so it can look at the array entries and determine whether it's immutable or not; and (2) it ignores multi-core issues. The fact that the language doesn't have any syntax for an immutable array literal is really a problem. Some people are getting around it by using a CTFE function to convert all the values to a string literal, then casting that string literal to (say) an array of ints. That's currently the only way to make the compiler generate decent code, and it's quite dreadful. BTW, I'm pretty sure that making array literals immutable would simplify the compiler. EG, I've noticed that mutable array literals cause many problems for the interpreter.
Nov 13 2009