www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Problems using strings in D

reply Grzegorz Adam Hankiewicz <fake dont.use> writes:
I am trying to run this little program:

    import std.stdio;
    import std.path;
    
    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }

But I get a core dump. gdb points at the line where getBaseName is
being called.

 (gdb) bt


 (gdb) f 0

 8           test_string = getBaseName(original);

Why does this happen and how do I prevent this?
Jan 24 2006
next sibling parent "Chris Miller" <chris dprogramming.com> writes:
On Tue, 24 Jan 2006 17:18:04 -0500, Grzegorz Adam Hankiewicz  
<fake dont.use> wrote:

 I am trying to run this little program:

     import std.stdio;
     import std.path;
    int main()
     {
         char[] test_string = null;
         char[] original = "/home/.resource";
         test_string = getBaseName(original);
         test_string[2] = 'a';
         writefln("is %s like %s?", original, test_string);
         return 0;
     }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.

  (gdb) bt


  (gdb) f 0

  8           test_string = getBaseName(original);

 Why does this happen and how do I prevent this?
Use copy-on-write unless you know you are the sole owner of a string. getBaseName() returns a slice of original. test_string = getBaseName(original); test_string = test_string.dup; // Get my own copy. test_string[2] = 'a'; test_string[3] = 'b'; // I'm still the sole owner.
Jan 24 2006
prev sibling next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
news:pan.2006.01.24.22.18.02.498385 dont.use...
I am trying to run this little program:

    import std.stdio;
    import std.path;

    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }
The equivalent Windows code (changing / to \ in the path name) doesn't segfault. Try putting a .dup on the end of that string literal; I know there's a problem (?) in Linux where string literals are stored in a read-only segment, so trying to modify them (which is what your code will do) will cause a .. problem. Maybe gdb or DMD got the line off by one, as I would expect the segfault to happen on line 9.
Jan 24 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
news:pan.2006.01.24.22.18.02.498385 dont.use...
I am trying to run this little program:

    import std.stdio;
    import std.path;

    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.
String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
Jan 25 2006
next sibling parent reply James Dunne <james.jdunne gmail.com> writes:
Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...
 
I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

But I get a core dump. gdb points at the line where getBaseName is
being called.
String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
const might've told us this. =D
Jan 25 2006
parent reply Sean Kelly <sean f4.ca> writes:
James Dunne wrote:
 Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...

 I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.
String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
const might've told us this. =D
The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification? Sean
Jan 25 2006
parent reply Sean Kelly <sean f4.ca> writes:
Sean Kelly wrote:
 James Dunne wrote:
 Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...

 I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.
String literals are read-only. std.path.getBaseName() is returing a slice of its argument, which will be into read-only data. The seg fault comes from attempting to write into that read-only data. The COW (copy-on-write) fix to your code would be: test_string = getBaseName(original).dup;
const might've told us this. =D
The irritating thing is that the string literal is merely used for initialization in the above case. This almost has me wishing such cases would always cause an allocation/memcpy instead of referencing the original string. Perhaps this could be a rule when non-const arrays are initialized with const data? What happens if a static initializer is used for an int[] array and then someone attempts an in-place modification?
Alternately, perhaps it should be a popular D idiom to do the following: char[] original = "/home/.resource".dup; This would allow for efficiency when it is desired (and eliminate the need for a language change), but should dramatically reduce the chance of such errors. Sean
Jan 25 2006
parent reply "Kris" <fu bar.com> writes:
"Sean Kelly" <sean f4.ca> wrote ...
 The irritating thing is that the string literal is merely used for 
 initialization in the above case.  This almost has me wishing such cases 
 would always cause an allocation/memcpy instead of referencing the 
 original string.  Perhaps this could be a rule when non-const arrays are 
 initialized with const data?  What happens if a static initializer is 
 used for an int[] array and then someone attempts an in-place 
 modification?
Alternately, perhaps it should be a popular D idiom to do the following:
Alternatively, the compiler should support the notion that /some/ data is actually read-only; and report it as such. That would solve many problems. CoW may very well look OK on paper ~ yet in my experience, when applying it to anything but trivialities, it's actually full of hollow promise. Reality rarely follows academic theory. The true problem here is not convention per se. Instead it is the lack of compiler enforcement with respect to one convention or another. It's easy to say "Oh, one should follow the gentleman's agreement of copy upon write" ~ that's just cheap talk. It would be quite another thing if the compiler would enforce this. I rather suspect such enforcement would be more difficult than providing a limited, language-supported, read-only attribute.
Jan 25 2006
parent reply Sean Kelly <sean f4.ca> writes:
Kris wrote:
 "Sean Kelly" <sean f4.ca> wrote ...
 The irritating thing is that the string literal is merely used for 
 initialization in the above case.  This almost has me wishing such cases 
 would always cause an allocation/memcpy instead of referencing the 
 original string.  Perhaps this could be a rule when non-const arrays are 
 initialized with const data?  What happens if a static initializer is 
 used for an int[] array and then someone attempts an in-place 
 modification?
Alternately, perhaps it should be a popular D idiom to do the following:
Alternatively, the compiler should support the notion that /some/ data is actually read-only; and report it as such. That would solve many problems.
Agreed :-) And now that I think about it, the compiler should be able to detect such problems, as it does not seem terribly difficult to determine whether a write is being performed on something in the const data area vs. somewhere else.
 The true problem here is not convention per se. Instead it is the lack of 
 compiler enforcement with respect to one convention or another. It's easy to 
 say "Oh, one should follow the gentleman's agreement of copy upon write" ~ 
 that's just cheap talk. It would be quite another thing if the compiler 
 would enforce this. I rather suspect such enforcement would be more 
 difficult than providing a limited, language-supported, read-only attribute. 
See above. I think such a flag may not actually be necessary in this case, simply because code generation for const data tends to be somewhat distinct. Perhaps some late stage analysis could be performed to detect this problem? I'm kind of guessing here, but in the small amount of compiler work I've done in the past I think this would have been fairly simple to implement. Sean
Jan 25 2006
parent reply Sean Kelly <sean f4.ca> writes:
Sean Kelly wrote:
 
 See above.  I think such a flag may not actually be necessary in this 
 case, simply because code generation for const data tends to be somewhat 
 distinct.  Perhaps some late stage analysis could be performed to detect 
 this problem?  I'm kind of guessing here, but in the small amount of 
 compiler work I've done in the past I think this would have been fairly 
 simple to implement.
I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case. Also, it would be nice if the system reported a meaningful error message if this occurs--perhaps something indicating that the segfault occurred from an attempted write to const data? But once you're stuck with runtime detection, I don't really care if the problem is first noticed by a software flag or a hardware fault. In fact, loading a core dump makes reproducing the problem fairly simple in most cases. Sean
Jan 25 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Okay, I've given this some thought and perhaps the best approach would 
be to reconsider bounds checking under the looser category of "data 
access checking."  Bounds checking would be a required minimum and 
anything beyond that would be left as a QOI issue for the compiler 
developers.  Adding "write to static data" checking should be a trivial 
modification of the existing bounds checking code.  If you assume the 
existing bounds checking code is this:

// assume p is a pointer to the write
// location and a is the array object
if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
}

The it would simply be a matter of adding two new constant variables to 
store the top and bottom of the static area (or determining the 
locations dynamically as in the current DMD GC code) and adding an 
additional check:

// assume sb is a pointer to the base of the const data area
// and st is a pointer to one past the top of that area
if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
}

This eliminates the need for per-variable flag maintenance and offers an 
easy way to turn off the checking if it is not desired.  And since this 
is conceptually (and functionally) quite similar to bounds checking 
anyway, it should be a fairly painless extension of established practice.


Sean
Jan 25 2006
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Sean Kelly wrote:
 Okay, I've given this some thought and perhaps the best approach would 
 be to reconsider bounds checking under the looser category of "data 
 access checking."  Bounds checking would be a required minimum and 
 anything beyond that would be left as a QOI issue for the compiler 
 developers.  Adding "write to static data" checking should be a trivial 
 modification of the existing bounds checking code.  If you assume the 
 existing bounds checking code is this:
 
 // assume p is a pointer to the write
 // location and a is the array object
 if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
 }
 
 The it would simply be a matter of adding two new constant variables to 
 store the top and bottom of the static area (or determining the 
 locations dynamically as in the current DMD GC code) and adding an 
 additional check:
 
 // assume sb is a pointer to the base of the const data area
 // and st is a pointer to one past the top of that area
 if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
 }
 
 This eliminates the need for per-variable flag maintenance and offers an 
 easy way to turn off the checking if it is not desired.  And since this 
 is conceptually (and functionally) quite similar to bounds checking 
 anyway, it should be a fairly painless extension of established practice.
IMHO, this is a very good idea! Assuming that it is part of bounds checking, and thus it would disappear on release builds, then this would be a very good thing to do on debug builds.
Jan 26 2006
parent "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
news:drbdai$20ns$1 digitaldaemon.com...
 Sean Kelly wrote:
 Okay, I've given this some thought and perhaps the best approach would be 
 to reconsider bounds checking under the looser category of "data access 
 checking."  Bounds checking would be a required minimum and anything 
 beyond that would be left as a QOI issue for the compiler developers. 
 Adding "write to static data" checking should be a trivial modification 
 of the existing bounds checking code.  If you assume the existing bounds 
 checking code is this:

 // assume p is a pointer to the write
 // location and a is the array object
 if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
 }

 The it would simply be a matter of adding two new constant variables to 
 store the top and bottom of the static area (or determining the locations 
 dynamically as in the current DMD GC code) and adding an additional 
 check:

 // assume sb is a pointer to the base of the const data area
 // and st is a pointer to one past the top of that area
 if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
 }

 This eliminates the need for per-variable flag maintenance and offers an 
 easy way to turn off the checking if it is not desired.  And since this 
 is conceptually (and functionally) quite similar to bounds checking 
 anyway, it should be a fairly painless extension of established practice.
IMHO, this is a very good idea! Assuming that it is part of bounds checking, and thus it would disappear on release builds, then this would be a very good thing to do on debug builds.
I agree; integrating this with bounds checks would be real nice.
Jan 26 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dr9642$rh$1 digitaldaemon.com...
 Sean Kelly wrote:
 See above.  I think such a flag may not actually be necessary in this 
 case, simply because code generation for const data tends to be somewhat 
 distinct.  Perhaps some late stage analysis could be performed to detect 
 this problem?  I'm kind of guessing here, but in the small amount of 
 compiler work I've done in the past I think this would have been fairly 
 simple to implement.
I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case.
It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.
  Also, it would be nice if the system reported a meaningful error message 
 if this occurs--perhaps something indicating that the segfault occurred 
 from an attempted write to const data?
You should get such an indication if you're running it under a decent debugger.
  But once you're stuck with runtime detection, I don't really care if the 
 problem is first noticed by a software flag or a hardware fault.  In fact, 
 loading a core dump makes reproducing the problem fairly simple in most 
 cases.
All seg faults are are the hardware doing the checking for you rather than having to do it by adding instructions. Along with a good debugger, it's pretty good, and has the nice characteristic that it doesn't bloat the code or slow the execution.
Jan 25 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dr9642$rh$1 digitaldaemon.com...

 I take it back :-P.  Passing through an opaque function call as in the 
 original example tosses the possibility of code analysis out the window. 
 But some detection might be better than none in this case.
It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.
Is there any way to trap such a write attempt in Windows? For example, this code: import std.c.stdio; const char[] c = "hello"; void main() { c[1] = 'a'; printf( "%.*s\n", c ); } runs to completion in Windows and prints "hello" (ie. the assignment is effectively ignored). Removing the 'const' prints "hallo" as expected. But while this is better than having the const data altered by a write, it also doesn't make bugs known. All in all, I do really prefer to rely on the hardware to signal this, but if that's not possible I still want to have *some* indication that such a write was attempted--this was one reason I suggested extending bounds checking. Is it simply that Windows doesn't have a trap set up for this situation?
 All seg faults are are the hardware doing the checking for you rather than 
 having to do it by adding instructions. Along with a good debugger, it's 
 pretty good, and has the nice characteristic that it doesn't bloat the code 
 or slow the execution. 
Agreed. And every debugger I've used can halt on such errors to allow the problem to be debugged. But I'm not sure whether a debugger would catch the above situation in Windows (I'll admit I've never tried it). Sean
Jan 25 2006
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dr9teh$hmt$1 digitaldaemon.com...
 All seg faults are are the hardware doing the checking for you rather 
 than having to do it by adding instructions. Along with a good debugger, 
 it's pretty good, and has the nice characteristic that it doesn't bloat 
 the code or slow the execution.
Agreed. And every debugger I've used can halt on such errors to allow the problem to be debugged. But I'm not sure whether a debugger would catch the above situation in Windows (I'll admit I've never tried it).
Using Visual Studio 6 (which uses WinDbg), and converting the given gode into a WinMain() function, the debugger does indeed catch the access violation, but most of the time it breaks at.. some dissasembly in the middle of NTDLL. Which is useless. And the call stack in that case doesn't help either - WinDbg doesn't really seem to like the D calling convention, so it somehow just hides calls to D functions, making the call stack something like NTDLL WinMain NTKERNEL When there are supposed to be calls to any number of D functions between NTDLL and WinMain. In fact, I've tried several different scenarios, and have yet to get VS6 to break to the line of the access violation. It always breaks to the middle of NTDLL.
Jan 26 2006
prev sibling parent "Derek Parnell" <derek psych.ward> writes:
On Thu, 26 Jan 2006 12:45:20 +1100, Walter Bright  
<newshound digitalmars.com> wrote:

 "Sean Kelly" <sean f4.ca> wrote in message
 news:dr9642$rh$1 digitaldaemon.com...
 Sean Kelly wrote:
 See above.  I think such a flag may not actually be necessary in this
 case, simply because code generation for const data tends to be  
 somewhat
 distinct.  Perhaps some late stage analysis could be performed to  
 detect
 this problem?  I'm kind of guessing here, but in the small amount of
 compiler work I've done in the past I think this would have been fairly
 simple to implement.
I take it back :-P. Passing through an opaque function call as in the original example tosses the possibility of code analysis out the window. But some detection might be better than none in this case.
It is getting some detection - a seg fault. The whole reason for putting const data into a read-only segment is to get hardware detection and enforcement.
Would it be possible to detect this at compile time rather than run time? -- Derek Parnell Melbourne, Australia
Jan 29 2006
prev sibling parent Grzegorz Adam Hankiewicz <fake dont.use> writes:
The Wed, 25 Jan 2006 11:48:30 -0800, Walter Bright wrote:
 String literals are read-only. [...] The seg fault comes from
 attempting to write into that read-only data.
Why does D allow assignment of read only data to a read/write variable (without an explicit cast)? Why is the following code allowed to compile with no warning and crash at runtime? int main() { const char[] test = "this is a test"; test[2] = 'b'; return 0; } Why does the following not crash and yields "String is thbs is a"? import std.stdio; int main() { const char[10] test = "this is a "; test[2] = 'b'; writefln("String is %s", test); return 0; }
Jan 28 2006