digitalmars.D.learn - GC seems to crash my C-code function
- frame (27/27) Sep 16 2021 I have C-code translated in D that acts sometimes incorrect if
- bauss (3/30) Sep 16 2021 Use toStringz and not .ptr.
- bauss (10/44) Sep 16 2021 Also see the documentation for "toStringz" which has this:
- frame (4/6) Sep 16 2021 Stupid me should really know that already, thanks =)
- Steven Schveighoffer (8/21) Sep 16 2021 `dup` is a GC allocation. Are you using that in your C code? the GC
- frame (14/20) Sep 16 2021 The compiler doesn't show that lines with -vgc. Maybe it knows
- Steven Schveighoffer (9/36) Sep 16 2021 Are you sure? Be very pedantic about what C functions do with the data
- frame (8/18) Sep 16 2021 If you want to have a look the original C-library is here
- jfondren (79/82) Sep 16 2021 looks to me like it calls
- frame (3/8) Sep 17 2021 ...
- Steven Schveighoffer (27/49) Sep 17 2021 Looking at that signature, it does not appear that it uses
- frame (6/17) Sep 18 2021 No, the string appears inside the C-function. I'm calling the
- Steven Schveighoffer (21/31) Sep 18 2021 Oh wow, I totally misunderstood what you are doing. I thought you were
- frame (20/34) Sep 18 2021 ...
- Steven Schveighoffer (12/53) Sep 18 2021 `dup` definitely allocates. But when I look at the [dlang docs for
- Steven Schveighoffer (6/14) Sep 18 2021 Did you mean "long to char" cast? In that case, yes, you have to cast it...
- frame (8/16) Sep 18 2021 Not really :D but without cast it complains:
- frame (3/12) Sep 18 2021 But I guess it's because it tries to assign to a `char*`
- Steven Schveighoffer (10/26) Sep 18 2021 Well, the variable you are assigning it to should be a `const char *` if...
- Steven Schveighoffer (25/35) Sep 18 2021 So here is why you need a cast:
- frame (6/18) Sep 19 2021 Thanks again, good to know how D sees things.
I have C-code translated in D that acts sometimes incorrect if the GC has made some collect. I would like to know why. - Code runs correct if the GC collections are off - There are no allocations within the C-translated-code except `throw new` (but they are not called) - All allocations made in C-translated-code are still calloc/malloc `ed - Even if I disable the GC before calling the function and just enable it after there will be an incorrect result - Data passed to the function belongs to a struct and the function is called in a member function and is always correct The main public function accepts a char* and returns a char*. Signature is like this: ```d char* fun(ref int, ref int, size_t, const char*, out int, out int, out int, uint); ``` Input paramter gets the pointer from char[] `.ptr` property (and length must be supplied too). I didn't want to change the code much so I have some piece like that: ```d ``` Could this cause the issue? But the pointer is not used outside the function where it's created.
Sep 16 2021
On Thursday, 16 September 2021 at 10:28:37 UTC, frame wrote:I have C-code translated in D that acts sometimes incorrect if the GC has made some collect. I would like to know why. - Code runs correct if the GC collections are off - There are no allocations within the C-translated-code except `throw new` (but they are not called) - All allocations made in C-translated-code are still calloc/malloc `ed - Even if I disable the GC before calling the function and just enable it after there will be an incorrect result - Data passed to the function belongs to a struct and the function is called in a member function and is always correct The main public function accepts a char* and returns a char*. Signature is like this: ```d char* fun(ref int, ref int, size_t, const char*, out int, out int, out int, uint); ``` Input paramter gets the pointer from char[] `.ptr` property (and length must be supplied too). I didn't want to change the code much so I have some piece like that: ```d ``` Could this cause the issue? But the pointer is not used outside the function where it's created.Use toStringz and not .ptr. Or append \0 to your string.
Sep 16 2021
On Thursday, 16 September 2021 at 10:48:19 UTC, bauss wrote:On Thursday, 16 September 2021 at 10:28:37 UTC, frame wrote:Also see the documentation for "toStringz" which has this: ``` Important Note: When passing a char* to a C function, and the C function keeps it around for any reason, make sure that you keep a reference to it in your D code. Otherwise, it may become invalid during a garbage collection cycle and cause a nasty bug when the C code tries to use it. ``` It probably should tell that somewhere else too.I have C-code translated in D that acts sometimes incorrect if the GC has made some collect. I would like to know why. - Code runs correct if the GC collections are off - There are no allocations within the C-translated-code except `throw new` (but they are not called) - All allocations made in C-translated-code are still calloc/malloc `ed - Even if I disable the GC before calling the function and just enable it after there will be an incorrect result - Data passed to the function belongs to a struct and the function is called in a member function and is always correct The main public function accepts a char* and returns a char*. Signature is like this: ```d char* fun(ref int, ref int, size_t, const char*, out int, out int, out int, uint); ``` Input paramter gets the pointer from char[] `.ptr` property (and length must be supplied too). I didn't want to change the code much so I have some piece like that: ```d ``` Could this cause the issue? But the pointer is not used outside the function where it's created.Use toStringz and not .ptr. Or append \0 to your string.
Sep 16 2021
On Thursday, 16 September 2021 at 10:48:19 UTC, bauss wrote:Use toStringz and not .ptr. Or append \0 to your string.Stupid me should really know that already, thanks =) Of course I have dup'ed the \0 from the string away... But still I don't know why it works if the GC is off?
Sep 16 2021
On Thursday, 16 September 2021 at 11:06:04 UTC, frame wrote:On Thursday, 16 September 2021 at 10:48:19 UTC, bauss wrote:Did you see my second response?Use toStringz and not .ptr. Or append \0 to your string.Stupid me should really know that already, thanks =) Of course I have dup'ed the \0 from the string away... But still I don't know why it works if the GC is off?
Sep 16 2021
On Thursday, 16 September 2021 at 11:11:56 UTC, bauss wrote:On Thursday, 16 September 2021 at 11:06:04 UTC, frame wrote:Yes, but as I mentioned this pointer data isn't hold outside the function and the GC collect runs after data is processed, not in between. This pointer should be fully stack allocated and also newly generated each time the function is called, so I see no impact by the GC? But GC collecting works now without issues.On Thursday, 16 September 2021 at 10:48:19 UTC, bauss wrote:Did you see my second response?Use toStringz and not .ptr. Or append \0 to your string.Stupid me should really know that already, thanks =) Of course I have dup'ed the \0 from the string away... But still I don't know why it works if the GC is off?
Sep 16 2021
On Thursday, 16 September 2021 at 11:35:27 UTC, frame wrote:On Thursday, 16 September 2021 at 11:11:56 UTC, bauss wrote:My guess is something is undefined behavior in the C code, idk reallyOn Thursday, 16 September 2021 at 11:06:04 UTC, frame wrote:Yes, but as I mentioned this pointer data isn't hold outside the function and the GC collect runs after data is processed, not in between. This pointer should be fully stack allocated and also newly generated each time the function is called, so I see no impact by the GC? But GC collecting works now without issues.On Thursday, 16 September 2021 at 10:48:19 UTC, bauss wrote:Did you see my second response?Use toStringz and not .ptr. Or append \0 to your string.Stupid me should really know that already, thanks =) Of course I have dup'ed the \0 from the string away... But still I don't know why it works if the GC is off?
Sep 16 2021
On 9/16/21 6:28 AM, frame wrote:I have C-code translated in D that acts sometimes incorrect if the GC has made some collect. I would like to know why. - Code runs correct if the GC collections are off - There are no allocations within the C-translated-code except `throw new` (but they are not called)...I didn't want to change the code much so I have some piece like that: ```d ``` Could this cause the issue? But the pointer is not used outside the function where it's created.`dup` is a GC allocation. Are you using that in your C code? the GC might be collecting that string. You are better off to cast away the immutable (as long as you are 100% sure the C code isn't writing to it), as the string literal will not be collected. -Steve
Sep 16 2021
On Thursday, 16 September 2021 at 15:34:25 UTC, Steven Schveighoffer wrote:`dup` is a GC allocation. Are you using that in your C code? the GC might be collecting that string.The compiler doesn't show that lines with -vgc. Maybe it knows that it is only stack allocated? Technically, the GC could collect that data if it wants - it's not longer used after the function returns. At least I control that the GC cannot collect it till my data is processed, so there should be no problem. I guess the C-methods did corrupt the memory which the compiler has reserved for that function data statically and then the GC collect marks it as free or some other UB.You are better off to cast away the immutable (as long as you are 100% sure the C code isn't writing to it), as the string literal will not be collected. -SteveYes, I changed it to stringz and a cast and the problem is gone so far. All char* are read only in the function or passed to stdlib functions that should not modify it.
Sep 16 2021
On 9/16/21 1:08 PM, frame wrote:On Thursday, 16 September 2021 at 15:34:25 UTC, Steven Schveighoffer wrote:Are you sure? Be very pedantic about what C functions do with the data you send it. Sometimes they store it somewhere to use later. Sometimes they expect it to be allocated by the C heap, etc. Without seeing how you use it, I can't tell you if it's wrong or not.`dup` is a GC allocation. Are you using that in your C code? the GC might be collecting that string.The compiler doesn't show that lines with -vgc. Maybe it knows that it is only stack allocated? Technically, the GC could collect that data if it wants - it's not longer used after the function returns. At least I control that the GC cannot collect it till my data is processed, so there should be no problem.I guess the C-methods did corrupt the memory which the compiler has reserved for that function data statically and then the GC collect marks it as free or some other UB.If it's a literal, you don't need to toStringz (which also allocates). All string literals are zero-terminated (and actually implicitly castable to `immutable char *`). -SteveYou are better off to cast away the immutable (as long as you are 100% sure the C code isn't writing to it), as the string literal will not be collected.Yes, I changed it to stringz and a cast and the problem is gone so far. All char* are read only in the function or passed to stdlib functions that should not modify it.
Sep 16 2021
On Thursday, 16 September 2021 at 18:02:44 UTC, Steven Schveighoffer wrote:Are you sure? Be very pedantic about what C functions do with the data you send it. Sometimes they store it somewhere to use later. Sometimes they expect it to be allocated by the C heap, etc. Without seeing how you use it, I can't tell you if it's wrong or not.If you want to have a look the original C-library is here https://github.com/rdoeffinger/iec16022 I'm only using the encoder function iec16022ecc200f.If it's a literal, you don't need to toStringz (which also allocates). All string literals are zero-terminated (and actually implicitly castable to `immutable char *`). -SteveThanks, I'm just careful with casting. Does it really allocate from a literal if it's used on the stack only? Is `-vgc` switch reliable?
Sep 16 2021
On Friday, 17 September 2021 at 06:27:40 UTC, frame wrote:Thanks, I'm just careful with casting. Does it really allocate from a literal if it's used on the stack only? Is `-vgc` switch reliable?looks to me like it calls ```d // object private U[] _dup(T, U)(scope T[] a) pure nothrow trusted if (__traits(isPOD, T)) { if (__ctfe) return _dupCtfe!(T, U)(a); import core.stdc.string : memcpy; auto arr = _d_newarrayU(typeid(T[]), a.length); memcpy(arr.ptr, cast(const(void)*) a.ptr, T.sizeof * a.length); return *cast(U[]*) &arr; } ``` -> ```d // rt.lifetime extern (C) void[] _d_newarrayU(const scope TypeInfo ti, size_t length) pure nothrow weak { ... auto info = __arrayAlloc(size, ti, tinext); ... } ``` -> ```d // rt.lifetime BlkInfo __arrayAlloc(size_t arrsize, const scope TypeInfo ti, const TypeInfo tinext) nothrow pure { ... auto bi = GC.qalloc(padded_size, attr, tinext); ... } ``` -> ```d // gc.impl.conservative.gc BlkInfo qalloc( size_t size, uint bits, const TypeInfo ti) nothrow { if (!size) { return BlkInfo.init; } BlkInfo retval; retval.base = runLocked!(mallocNoSync, mallocTime, numMallocs)(size, bits, retval.size, ti); if (!(bits & BlkAttr.NO_SCAN)) { memset(retval.base + size, 0, retval.size - size); } retval.attr = bits; return retval; } ``` which you can also follow in an objdump. Conclusion: -vgc is missing this GC allocation. To stack-allocate a mutable copy of a string literal go with ```d safe nogc nothrow unittest { enum S = "hello world"; char[S.length+1] s1 = S; char* s2 = &s1[0]; // Normally you'd expect s1.ptr[s1.length] here, // but the '\0' byte is explicitly part of s1 due to length+1 above. assert(s1[s1.length-1] == '\0'); (() trusted { import core.stdc.stdio : puts; import std.string : fromStringz; puts(s2); assert(s2.fromStringz == S); })(); } ```
Sep 16 2021
On Friday, 17 September 2021 at 06:58:01 UTC, jfondren wrote:On Friday, 17 September 2021 at 06:27:40 UTC, frame wrote:... Interesting analysis, thanks for your effort!Thanks, I'm just careful with casting. Does it really allocate from a literal if it's used on the stack only? Is `-vgc` switch reliable?looks to me like it calls
Sep 17 2021
On 9/17/21 2:27 AM, frame wrote:On Thursday, 16 September 2021 at 18:02:44 UTC, Steven Schveighoffer wrote:Looking at that signature, it does not appear that it uses zero-termination at all, as it takes a length. So using `dup` and therefore the gc is totally unnecessary. I'm assuming that string is the barcode argument? What does your call look like?Are you sure? Be very pedantic about what C functions do with the data you send it. Sometimes they store it somewhere to use later. Sometimes they expect it to be allocated by the C heap, etc. Without seeing how you use it, I can't tell you if it's wrong or not.If you want to have a look the original C-library is here https://github.com/rdoeffinger/iec16022 I'm only using the encoder function iec16022ecc200f.The `-vgc` switch appears to only identify allocations that the compiler invokes via hooks, not ones that other functions invoke (or ones that are direct calls into the GC). In other words, it helps you find your direct allocations using the compiler, not ones that are buried in already-compiled code. Try this with `-vgc`, and it reports nothing: ```d import core.memory; void main() { auto x = GC.malloc(10); } ``` This makes sense, as it may not have the function code to analyze, and it also cannot infer GC allocation from a lack of nogc (a non- nogc function *may* allocate, but does not *necessarily* allocate). It also would be quite useless to see some function buried inside druntime that allocates, not knowing what the call stack was. The docs for `-vgc` should really be updated to clarify. It currently just says "List all gc allocations including hidden ones". -SteveIf it's a literal, you don't need to toStringz (which also allocates). All string literals are zero-terminated (and actually implicitly castable to `immutable char *`).Thanks, I'm just careful with casting. Does it really allocate from a literal if it's used on the stack only? Is `-vgc` switch reliable?
Sep 17 2021
On Friday, 17 September 2021 at 14:29:23 UTC, Steven Schveighoffer wrote:Looking at that signature, it does not appear that it uses zero-termination at all, as it takes a length. So using `dup` and therefore the gc is totally unnecessary. I'm assuming that string is the barcode argument?No, the string appears inside the C-function. I'm calling the function with .ptr and the .length property as it expects.The `-vgc` switch appears to only identify allocations that the compiler invokes via hooks, not ones that other functions invoke (or ones that are direct calls into the GC)....The docs for `-vgc` should really be updated to clarify. It currently just says "List all gc allocations including hidden ones". -SteveThanks for clarification!
Sep 18 2021
On 9/18/21 5:40 AM, frame wrote:On Friday, 17 September 2021 at 14:29:23 UTC, Steven Schveighoffer wrote:Oh wow, I totally misunderstood what you are doing. I thought you were *calling* that function, but you are *reimplementing* that code in D. Have you tried: ```d const(char)* s2 = "..."; ``` This will work because string literals are zero terminated and implicitly castable to `immutable(char)*`, which will also implicitly cast to `const(char)*`. That should allow s2 to be reassigned but not modify the data. IIRC, string literal data even in C is put into a read-only section by many compilers, so the code shouldn't be changing it. Now that I understand what you are doing, it becomes clear that this isn't a situation of C code being called. Or are you calling other parts of the C library with that translated function? The first rule of porting -- just translate, don't change anything. I would try to do exactly what C does without using the GC at all. Continue to use malloc/free. If you have issues with type representation, you may need to adjust for that. -SteveLooking at that signature, it does not appear that it uses zero-termination at all, as it takes a length. So using `dup` and therefore the gc is totally unnecessary. I'm assuming that string is the barcode argument?No, the string appears inside the C-function. I'm calling the function with .ptr and the .length property as it expects.
Sep 18 2021
On Saturday, 18 September 2021 at 11:47:52 UTC, Steven Schveighoffer wrote:Have you tried: ```d const(char)* s2 = "..."; ``` This will work because string literals are zero terminated and implicitly castable to `immutable(char)*`, which will also implicitly cast to `const(char)*`. That should allow s2 to be reassigned but not modify the data. IIRC, string literal data even in C is put into a read-only section by many compilers, so the code shouldn't be changing it....The first rule of porting -- just translate, don't change anything. I would try to do exactly what C does without using the GC at all. Continue to use malloc/free. If you have issues with type representation, you may need to adjust for that.This is what I try to achieve - not to change much. But I see there is a mistake by me, s2 __is__ const in the original C-code too. Unfortunately, with `const(char)*`, `strchr()` did complain and then I would have to cast it to `char*` - so I didn't used it in first place because I really thought `.dup` wouldn't allocate here. There were also parts where the pointer is used in calculations - which is accepted by the compiler - it just complains about implicitly `long` to `char*` cast: ``` // const char *e // char *w out[p++] = ((w - e) + 3) % 40; ``` Why doesn't the compiler complain about `char* - const(char)*`?
Sep 18 2021
On 9/18/21 12:52 PM, frame wrote:On Saturday, 18 September 2021 at 11:47:52 UTC, Steven Schveighoffer wrote:`dup` definitely allocates. But when I look at the [dlang docs for strchr](https://dlang.org/phobos/core_stdc_string.html#.strchr), it properly adds the `inout` type modifier which should correct that problem. Are you defining the prototype for `strchr` yourself instead of importing it from `core.stdc.string`?Have you tried: ```d const(char)* s2 = "..."; ``` This will work because string literals are zero terminated and implicitly castable to `immutable(char)*`, which will also implicitly cast to `const(char)*`. That should allow s2 to be reassigned but not modify the data. IIRC, string literal data even in C is put into a read-only section by many compilers, so the code shouldn't be changing it....The first rule of porting -- just translate, don't change anything. I would try to do exactly what C does without using the GC at all. Continue to use malloc/free. If you have issues with type representation, you may need to adjust for that.This is what I try to achieve - not to change much. But I see there is a mistake by me, s2 __is__ const in the original C-code too. Unfortunately, with `const(char)*`, `strchr()` did complain and then I would have to cast it to `char*` - so I didn't used it in first place because I really thought `.dup` wouldn't allocate here.There were also parts where the pointer is used in calculations - which is accepted by the compiler - it just complains about implicitly `long` to `char*` cast: ``` // const char *e // char *w out[p++] = ((w - e) + 3) % 40; ``` Why doesn't the compiler complain about `char* - const(char)*`?`long` doesn't implicitly cast to `char *`. But of course, subtracting two pointers works for the same base type. Note that `long` in C is *not the same* as `long` in D. You should take a look at https://dlang.org/spec/interfaceToC.html#data_type_compat -Steve
Sep 18 2021
On 9/18/21 12:52 PM, frame wrote:There were also parts where the pointer is used in calculations - which is accepted by the compiler - it just complains about implicitly `long` to `char*` cast: ``` // const char *e // char *w out[p++] = ((w - e) + 3) % 40; ```Did you mean "long to char" cast? In that case, yes, you have to cast it. Note, `out` is a keyword, it can't be used as a variable, but you probably already figured that out. But if `out` here is a `char *`, then yes, you need a cast there. -Steve
Sep 18 2021
On Saturday, 18 September 2021 at 18:48:07 UTC, Steven Schveighoffer wrote:Are you defining the prototype for strchr yourself instead of importing it from core.stdc.string?Not really :D but without cast it complains: ``` Error: cannot implicitly convert expression strchr(e, cast(int)c) of type const(char)* to char* ```Did you mean "long to char" cast? In that case, yes, you have to cast it. Note, `out` is a keyword, it can't be used as a variable, but you probably already figured that out. But if `out` here is a `char *`, then yes, you need a cast there. -SteveYes, of course. `out(put)` is a `char[6]` here.
Sep 18 2021
On Saturday, 18 September 2021 at 21:16:13 UTC, frame wrote:On Saturday, 18 September 2021 at 18:48:07 UTC, Steven Schveighoffer wrote:But I guess it's because it tries to assign to a `char*` therefore inout() doesn't work.Are you defining the prototype for strchr yourself instead of importing it from core.stdc.string?Not really :D but without cast it complains: ``` Error: cannot implicitly convert expression strchr(e, cast(int)c) of type const(char)* to char* ```
Sep 18 2021
On 9/18/21 5:20 PM, frame wrote:On Saturday, 18 September 2021 at 21:16:13 UTC, frame wrote:Well, the variable you are assigning it to should be a `const char *` if the source variable is a `const char *`. Possibly, the C code didn't mark it as `const char *`, because I'm pretty sure C has `strchr` being: `char * strchr(const char *, int)` Because, you know, const doesn't matter in C ;) If that doesn't work, then you *may* need to start allocating. But I'd try that first. -SteveOn Saturday, 18 September 2021 at 18:48:07 UTC, Steven Schveighoffer wrote:But I guess it's because it tries to assign to a `char*` therefore inout() doesn't work.Are you defining the prototype for strchr yourself instead of importing it from core.stdc.string?Not really :D but without cast it complains: ``` Error: cannot implicitly convert expression strchr(e, cast(int)c) of type const(char)* to char* ```
Sep 18 2021
On 9/18/21 5:16 PM, frame wrote:On Saturday, 18 September 2021 at 18:48:07 UTC, Steven Schveighoffer wrote:So here is why you need a cast: First, the code is doing math on 2 pointers, which returns a `ptrdiff_t`, a.k.a. `long` in 64-bits. Second, you are using mod 40, which is great, because D will recognize that the range of values must be within 40! However, since it's signed, that's -40 to 40. Which doesn't fit in a `char` (which is unsigned). D does not allow an implicit narrowing conversion. Since -40 to -1 won't fit into a char (dubious anyway for char to be an integer IMO), you need a cast. So my recommendation is shoehorn it into ulong to take away the sign before the mod, or cast to char at the end. Is there any chance that `w` is less than `e`? if so, do NOT use the first option. ```d // option 1 output[p++] = (ulong(w - e) + 3) % 40; // option 2 output[p++] = cast(char)(((w - e) + 3) % 40); ``` Remember also, `char` is C's only way to say "byte". So this may just be data, and not unicode data. You may want to consider using `ubyte` in your translation instead of `char`. But wait until your code is compiling and working as it did in C. -SteveDid you mean "long to char" cast? In that case, yes, you have to cast it. Note, `out` is a keyword, it can't be used as a variable, but you probably already figured that out. But if `out` here is a `char *`, then yes, you need a cast there.Yes, of course. `out(put)` is a `char[6]` here.
Sep 18 2021
On Sunday, 19 September 2021 at 02:18:20 UTC, Steven Schveighoffer wrote:```d // option 1 output[p++] = (ulong(w - e) + 3) % 40; // option 2 output[p++] = cast(char)(((w - e) + 3) % 40); ``` Remember also, `char` is C's only way to say "byte". So this may just be data, and not unicode data. You may want to consider using `ubyte` in your translation instead of `char`. But wait until your code is compiling and working as it did in C. -SteveThanks again, good to know how D sees things. That data is indeed meant to be raw bytes. Option 2 seems to be always the most reliable one and that kind of casts are the only one I did in first place ;-)
Sep 19 2021