digitalmars.D.learn - Problems using strings in D
- Grzegorz Adam Hankiewicz (22/22) Jan 24 2006 I am trying to run this little program:
- Chris Miller (8/29) Jan 24 2006 Use copy-on-write unless you know you are the sole owner of a string.
- Jarrett Billingsley (8/20) Jan 24 2006 The equivalent Windows code (changing / to \ in the path name) doesn't
- Walter Bright (7/21) Jan 25 2006 String literals are read-only. std.path.getBaseName() is returing a slic...
- James Dunne (2/33) Jan 25 2006 const might've told us this. =D
- Sean Kelly (8/39) Jan 25 2006 The irritating thing is that the string literal is merely used for
- Sean Kelly (7/46) Jan 25 2006 Alternately, perhaps it should be a popular D idiom to do the following:
- Kris (12/20) Jan 25 2006 Alternatively, the compiler should support the notion that /some/ data i...
- Sean Kelly (12/31) Jan 25 2006 Agreed :-) And now that I think about it, the compiler should be able
- Sean Kelly (11/18) Jan 25 2006 I take it back :-P. Passing through an opaque function call as in the
- Sean Kelly (26/26) Jan 25 2006 Okay, I've given this some thought and perhaps the best approach would
- Russ Lewis (4/33) Jan 26 2006 IMHO, this is a very good idea! Assuming that it is part of bounds
- Ameer Armaly (3/36) Jan 26 2006 I agree; integrating this with bounds checks would be real nice.
- Walter Bright (11/29) Jan 25 2006 It is getting some detection - a seg fault. The whole reason for putting...
- Sean Kelly (22/35) Jan 25 2006 Is there any way to trap such a write attempt in Windows? For example,
- Jarrett Billingsley (17/24) Jan 26 2006 Using Visual Studio 6 (which uses WinDbg), and converting the given gode...
- Derek Parnell (6/25) Jan 29 2006 Would it be possible to detect this at compile time rather than run time...
- Grzegorz Adam Hankiewicz (19/21) Jan 28 2006 Why does D allow assignment of read only data to a read/write
I am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called. (gdb) bt (gdb) f 0 8 test_string = getBaseName(original); Why does this happen and how do I prevent this?
Jan 24 2006
On Tue, 24 Jan 2006 17:18:04 -0500, Grzegorz Adam Hankiewicz <fake dont.use> wrote:I am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called. (gdb) bt (gdb) f 0 8 test_string = getBaseName(original); Why does this happen and how do I prevent this?Use copy-on-write unless you know you are the sole owner of a string. getBaseName() returns a slice of original. test_string = getBaseName(original); test_string = test_string.dup; // Get my own copy. test_string[2] = 'a'; test_string[3] = 'b'; // I'm still the sole owner.
Jan 24 2006
"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message news:pan.2006.01.24.22.18.02.498385 dont.use...I am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; }The equivalent Windows code (changing / to \ in the path name) doesn't segfault. Try putting a .dup on the end of that string literal; I know there's a problem (?) in Linux where string literals are stored in a read-only segment, so trying to modify them (which is what your code will do) will cause a .. problem. Maybe gdb or DMD got the line off by one, as I would expect the segfault to happen on line 9.
Jan 24 2006
"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message news:pan.2006.01.24.22.18.02.498385 dont.use...I am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called.String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
Jan 25 2006
Walter Bright wrote:"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message news:pan.2006.01.24.22.18.02.498385 dont.use...const might've told us this. =DI am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called.String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
Jan 25 2006
James Dunne wrote:Walter Bright wrote:The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification? Sean"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message news:pan.2006.01.24.22.18.02.498385 dont.use...const might've told us this. =DI am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called.String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
Jan 25 2006
Sean Kelly wrote:James Dunne wrote:Alternately, perhaps it should be a popular D idiom to do the following: char[] original = "/home/.resource".dup; This would allow for efficiency when it is desired (and eliminate the need for a language change), but should dramatically reduce the chance of such errors. SeanWalter Bright wrote:The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification?"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message news:pan.2006.01.24.22.18.02.498385 dont.use...const might've told us this. =DI am trying to run this little program: import std.stdio; import std.path; int main() { char[] test_string = null; char[] original = "/home/.resource"; test_string = getBaseName(original); test_string[2] = 'a'; writefln("is %s like %s?", original, test_string); return 0; } But I get a core dump. gdb points at the line where getBaseName is being called.String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
Jan 25 2006
"Sean Kelly" <sean f4.ca> wrote ...Alternatively, the compiler should support the notion that /some/ data is actually read-only; and report it as such. That would solve many problems. CoW may very well look OK on paper ~ yet in my experience, when applying it to anything but trivialities, it's actually full of hollow promise. Reality rarely follows academic theory. The true problem here is not convention per se. Instead it is the lack of compiler enforcement with respect to one convention or another. It's easy to say "Oh, one should follow the gentleman's agreement of copy upon write" ~ that's just cheap talk. It would be quite another thing if the compiler would enforce this. I rather suspect such enforcement would be more difficult than providing a limited, language-supported, read-only attribute.The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification?Alternately, perhaps it should be a popular D idiom to do the following:
Jan 25 2006
Kris wrote:"Sean Kelly" <sean f4.ca> wrote ...Agreed :-) And now that I think about it, the compiler should be able to detect such problems, as it does not seem terribly difficult to determine whether a write is being performed on something in the const data area vs. somewhere else.Alternatively, the compiler should support the notion that /some/ data is actually read-only; and report it as such. That would solve many problems.The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification?Alternately, perhaps it should be a popular D idiom to do the following:The true problem here is not convention per se. Instead it is the lack of compiler enforcement with respect to one convention or another. It's easy to say "Oh, one should follow the gentleman's agreement of copy upon write" ~ that's just cheap talk. It would be quite another thing if the compiler would enforce this. I rather suspect such enforcement would be more difficult than providing a limited, language-supported, read-only attribute.See above. I think such a flag may not actually be necessary in this case, simply because code generation for const data tends to be somewhat distinct. Perhaps some late stage analysis could be performed to detect this problem? I'm kind of guessing here, but in the small amount of compiler work I've done in the past I think this would have been fairly simple to implement. Sean
Jan 25 2006
Sean Kelly wrote:See above. I think such a flag may not actually be necessary in this case, simply because code generation for const data tends to be somewhat distinct. Perhaps some late stage analysis could be performed to detect this problem? I'm kind of guessing here, but in the small amount of compiler work I've done in the past I think this would have been fairly simple to implement.I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case. Also, it would be nice if the system reported a meaningful error message if this occurs--perhaps something indicating that the segfault occurred from an attempted write to const data? But once you're stuck with runtime detection, I don't really care if the problem is first noticed by a software flag or a hardware fault. In fact, loading a core dump makes reproducing the problem fairly simple in most cases. Sean
Jan 25 2006
Okay, I've given this some thought and perhaps the best approach would be to reconsider bounds checking under the looser category of "data access checking." Bounds checking would be a required minimum and anything beyond that would be left as a QOI issue for the compiler developers. Adding "write to static data" checking should be a trivial modification of the existing bounds checking code. If you assume the existing bounds checking code is this: // assume p is a pointer to the write // location and a is the array object if( p < &a[0] || p >= &a[$] ) { onArrayBoundsError( __FILE__, __LINE__ ); } The it would simply be a matter of adding two new constant variables to store the top and bottom of the static area (or determining the locations dynamically as in the current DMD GC code) and adding an additional check: // assume sb is a pointer to the base of the const data area // and st is a pointer to one past the top of that area if( p >= sb && p < st ) { onInvalidWriteError( __FILE__, __LINE__ ); } This eliminates the need for per-variable flag maintenance and offers an easy way to turn off the checking if it is not desired. And since this is conceptually (and functionally) quite similar to bounds checking anyway, it should be a fairly painless extension of established practice. Sean
Jan 25 2006
Sean Kelly wrote:Okay, I've given this some thought and perhaps the best approach would be to reconsider bounds checking under the looser category of "data access checking." Bounds checking would be a required minimum and anything beyond that would be left as a QOI issue for the compiler developers. Adding "write to static data" checking should be a trivial modification of the existing bounds checking code. If you assume the existing bounds checking code is this: // assume p is a pointer to the write // location and a is the array object if( p < &a[0] || p >= &a[$] ) { onArrayBoundsError( __FILE__, __LINE__ ); } The it would simply be a matter of adding two new constant variables to store the top and bottom of the static area (or determining the locations dynamically as in the current DMD GC code) and adding an additional check: // assume sb is a pointer to the base of the const data area // and st is a pointer to one past the top of that area if( p >= sb && p < st ) { onInvalidWriteError( __FILE__, __LINE__ ); } This eliminates the need for per-variable flag maintenance and offers an easy way to turn off the checking if it is not desired. And since this is conceptually (and functionally) quite similar to bounds checking anyway, it should be a fairly painless extension of established practice.IMHO, this is a very good idea! Assuming that it is part of bounds checking, and thus it would disappear on release builds, then this would be a very good thing to do on debug builds.
Jan 26 2006
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message news:drbdai$20ns$1 digitaldaemon.com...Sean Kelly wrote:I agree; integrating this with bounds checks would be real nice.Okay, I've given this some thought and perhaps the best approach would be to reconsider bounds checking under the looser category of "data access checking." Bounds checking would be a required minimum and anything beyond that would be left as a QOI issue for the compiler developers. Adding "write to static data" checking should be a trivial modification of the existing bounds checking code. If you assume the existing bounds checking code is this: // assume p is a pointer to the write // location and a is the array object if( p < &a[0] || p >= &a[$] ) { onArrayBoundsError( __FILE__, __LINE__ ); } The it would simply be a matter of adding two new constant variables to store the top and bottom of the static area (or determining the locations dynamically as in the current DMD GC code) and adding an additional check: // assume sb is a pointer to the base of the const data area // and st is a pointer to one past the top of that area if( p >= sb && p < st ) { onInvalidWriteError( __FILE__, __LINE__ ); } This eliminates the need for per-variable flag maintenance and offers an easy way to turn off the checking if it is not desired. And since this is conceptually (and functionally) quite similar to bounds checking anyway, it should be a fairly painless extension of established practice.IMHO, this is a very good idea! Assuming that it is part of bounds checking, and thus it would disappear on release builds, then this would be a very good thing to do on debug builds.
Jan 26 2006
"Sean Kelly" <sean f4.ca> wrote in message news:dr9642$rh$1 digitaldaemon.com...Sean Kelly wrote:It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.See above. I think such a flag may not actually be necessary in this case, simply because code generation for const data tends to be somewhat distinct. Perhaps some late stage analysis could be performed to detect this problem? I'm kind of guessing here, but in the small amount of compiler work I've done in the past I think this would have been fairly simple to implement.I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case.Also, it would be nice if the system reported a meaningful error message if this occurs--perhaps something indicating that the segfault occurred from an attempted write to const data?You should get such an indication if you're running it under a decent debugger.But once you're stuck with runtime detection, I don't really care if the problem is first noticed by a software flag or a hardware fault. In fact, loading a core dump makes reproducing the problem fairly simple in most cases.All seg faults are are the hardware doing the checking for you rather than having to do it by adding instructions. Along with a good debugger, it's pretty good, and has the nice characteristic that it doesn't bloat the code or slow the execution.
Jan 25 2006
Walter Bright wrote:"Sean Kelly" <sean f4.ca> wrote in message news:dr9642$rh$1 digitaldaemon.com...Is there any way to trap such a write attempt in Windows? For example, this code: import std.c.stdio; const char[] c = "hello"; void main() { c[1] = 'a'; printf( "%.*s\n", c ); } runs to completion in Windows and prints "hello" (ie. the assignment is effectively ignored). Removing the 'const' prints "hallo" as expected. But while this is better than having the const data altered by a write, it also doesn't make bugs known. All in all, I do really prefer to rely on the hardware to signal this, but if that's not possible I still want to have *some* indication that such a write was attempted--this was one reason I suggested extending bounds checking. Is it simply that Windows doesn't have a trap set up for this situation?I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case.It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.All seg faults are are the hardware doing the checking for you rather than having to do it by adding instructions. Along with a good debugger, it's pretty good, and has the nice characteristic that it doesn't bloat the code or slow the execution.Agreed. And every debugger I've used can halt on such errors to allow the problem to be debugged. But I'm not sure whether a debugger would catch the above situation in Windows (I'll admit I've never tried it). Sean
Jan 25 2006
"Sean Kelly" <sean f4.ca> wrote in message news:dr9teh$hmt$1 digitaldaemon.com...Using Visual Studio 6 (which uses WinDbg), and converting the given gode into a WinMain() function, the debugger does indeed catch the access violation, but most of the time it breaks at.. some dissasembly in the middle of NTDLL. Which is useless. And the call stack in that case doesn't help either - WinDbg doesn't really seem to like the D calling convention, so it somehow just hides calls to D functions, making the call stack something like NTDLL WinMain NTKERNEL When there are supposed to be calls to any number of D functions between NTDLL and WinMain. In fact, I've tried several different scenarios, and have yet to get VS6 to break to the line of the access violation. It always breaks to the middle of NTDLL.All seg faults are are the hardware doing the checking for you rather than having to do it by adding instructions. Along with a good debugger, it's pretty good, and has the nice characteristic that it doesn't bloat the code or slow the execution.Agreed. And every debugger I've used can halt on such errors to allow the problem to be debugged. But I'm not sure whether a debugger would catch the above situation in Windows (I'll admit I've never tried it).
Jan 26 2006
On Thu, 26 Jan 2006 12:45:20 +1100, Walter Bright <newshound digitalmars.com> wrote:"Sean Kelly" <sean f4.ca> wrote in message news:dr9642$rh$1 digitaldaemon.com...Would it be possible to detect this at compile time rather than run time? -- Derek Parnell Melbourne, AustraliaSean Kelly wrote:It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.See above. I think such a flag may not actually be necessary in this case, simply because code generation for const data tends to be somewhat distinct. Perhaps some late stage analysis could be performed to detect this problem? I'm kind of guessing here, but in the small amount of compiler work I've done in the past I think this would have been fairly simple to implement.I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case.
Jan 29 2006
The Wed, 25 Jan 2006 11:48:30 -0800, Walter Bright wrote:String literals are read-only. [...] The seg fault comes from attempting to write into that read-only data.Why does D allow assignment of read only data to a read/write variable (without an explicit cast)? Why is the following code allowed to compile with no warning and crash at runtime? int main() { const char[] test = "this is a test"; test[2] = 'b'; return 0; } Why does the following not crash and yields "String is thbs is a"? import std.stdio; int main() { const char[10] test = "this is a "; test[2] = 'b'; writefln("String is %s", test); return 0; }
Jan 28 2006