digitalmars.D.learn - Program crash: GC destroys an object unexpectedly
- eugene (52/52) Sep 13 2021 Code snippets
- user1234 (13/14) Sep 13 2021 At first glance and given the fact that some code necessary to
- user1234 (2/16) Sep 13 2021 of sorry, or maybe the opposite, so addRange...
- eugene (10/12) Sep 13 2021 final Signal newSignal(int signum) {
- user1234 (3/15) Sep 13 2021 thx, so the problem is not what I suspected to be (mixed
- eugene (7/9) Sep 13 2021 I am actually C coder and do not have much experience
- Steven Schveighoffer (41/56) Sep 13 2021 The GC only scans things that it knows about.
- eugene (16/21) Sep 13 2021 That was a sort of exercise with operator overloading.
- Tejas (7/12) Sep 13 2021 Umm is it okay that he declared variables `init` and `idle` of
- eugene (5/7) Sep 14 2021 States of a machine are in associative array.
- Steven Schveighoffer (12/26) Sep 14 2021 Declaring a member/field named `init` is likely a bad idea, but this is
- eugene (28/33) Sep 14 2021 Yeah, in my first version I had
- Steven Schveighoffer (14/21) Sep 14 2021 Philosophically, it places the responsibility of making sure the object
- eugene (34/40) Sep 14 2021 run the server (do not run client):
- Adam D Ruppe (9/11) Sep 14 2021 I had a problem just like this before because I was sending
- eugene (5/6) Sep 14 2021 Really, "too big and complex"?
- jfondren (13/21) Sep 14 2021 A 5-pound phone isn't "too heavy" for an adult to carry but it
- eugene (30/32) Sep 14 2021 It is another "not so funny joke", isn't it?
- jfondren (36/49) Sep 14 2021 That's 96 bits. Add 32.
- eugene (16/18) Sep 14 2021 here:
- jfondren (10/28) Sep 14 2021 It is an example of deliberately static storage that does not fix
- eugene (3/7) Sep 14 2021 where did you find 'misaligned pointer'?...
- jfondren (6/14) Sep 14 2021 It doesn't seem like communication between us is possible, in the
- eugene (4/8) Sep 14 2021 I am not a 'selling boy'
- Steven Schveighoffer (4/16) Sep 14 2021 People are trying to help you here. With that attitude, you are likely
- eugene (5/6) Sep 14 2021 Then, answer the questions.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (21/26) Sep 14 2021 I think it's the align(1) for EpollEvent.
- jfondren (45/74) Sep 15 2021 Yep. This patch is sufficient to prevent the segfault:
- eugene (10/11) Sep 19 2021 Your idea (hold references to all event sources somewhere) is
- eugene (48/57) Sep 18 2021 The definition of this struct was taken from
- jfondren (9/14) Sep 18 2021 The struct's fine as far as libc and the kernel are concerned.
- eugene (15/29) Sep 19 2021 Ok...
- eugene (9/11) Sep 19 2021 I do not think it's a problem, otherwise **both programs would
- jfondren (97/112) Sep 19 2021 The GC doesn't reliably punish objects living past there not
- Steven Schveighoffer (7/32) Sep 14 2021 I don't think this is the problem.
- eugene (7/9) Sep 14 2021 take a look at
- eugene (27/29) Sep 19 2021 I've also made two simple examples, just in case
- eugene (34/35) Sep 13 2021 It is echo-server/echo-client pair.
- jfondren (25/28) Sep 13 2021 engine/ecap.d(54): Error: field `EpollEvent.es` cannot assign to
- eugene (5/8) Sep 13 2021 Actually, initial version of all that was using array,
- eugene (14/18) Sep 14 2021 What? Allocate struct epoll_event on the heap?
- eugene (17/20) Sep 14 2021 ... forget to mention, crashes here:
- Steven Schveighoffer (8/37) Sep 14 2021 Note that s likely still points at a valid memory address. However, when...
- eugene (5/14) Sep 14 2021 yeah, this address is obtained from OS (epoll_event struct),
- eugene (16/22) Sep 19 2021 Look...
- jfondren (18/21) Sep 21 2021 Conclusion:
- eugene (42/48) Sep 21 2021 Okay, but how could you explain this then
- jfondren (10/27) Sep 21 2021 I don't think this is reliably OK. If you're not using Stopper
- eugene (8/14) Sep 22 2021 I saw a thread on this forum named
- eugene (7/8) Sep 21 2021 Actually, all proposed 'fixes'
- H. S. Teoh (16/28) Sep 21 2021 It's not strange. You're seeing these problems because you failed to
- H. S. Teoh (25/46) Sep 21 2021 [...]
- eugene (10/14) Sep 22 2021 In other words, compiler is trying to be smarter than a
- Steven Schveighoffer (37/79) Sep 22 2021 Here is what is happening. The compiler keeps track of how long it needs...
- eugene (3/4) Sep 22 2021 Many thanks for this so exhaustive explanation!
- eugene (5/10) Sep 22 2021 And it follows that programming in GC-supporting languages
- Steven Schveighoffer (5/18) Sep 22 2021 Only when interfacing with C ;) Which admittedly is a stated goal for D.
- eugene (27/35) Sep 22 2021 I meant my this particular trouble...
- Steven Schveighoffer (26/62) Sep 22 2021 In terms of any kind of memory management, whether it be ARC, manual,
- eugene (9/13) Sep 23 2021 Oh, yeah - I have special trait of bumping against
- Steven Schveighoffer (9/21) Sep 23 2021 They are here: https://dlang.org/spec/interfaceToC.html#storage_allocati...
- eugene (5/13) Sep 23 2021 Yes, as you explained me, the root of the
- Steven Schveighoffer (9/14) Sep 23 2021 C# KeepAlive (and Go KeepAlive) are a mechanism to do exactly what you
- eugene (13/15) Sep 23 2021 "
- Steven Schveighoffer (12/31) Sep 23 2021 Same effect, but writeln actually executes code to write data to the
- eugene (20/43) Sep 23 2021 ```d
- Steven Schveighoffer (4/56) Sep 23 2021 With dmd -O -inline, there is a chance it will be collected. Inlining is...
- eugene (3/5) Sep 23 2021 never mind, GC.addRoot() looks more trustworthy, anyway :)
- eugene (9/16) Sep 23 2021 For the moment I am personally quite happy
- eugene (12/13) Sep 23 2021 ```d
- jfondren (6/19) Sep 23 2021 Nice. I thought of GC.addRoot several times but I was distracted
- eugene (6/11) Sep 23 2021 Yes, these two must live until the end of main().
- Steven Schveighoffer (5/17) Sep 23 2021 Technically, they should live past the end of main, because it's still
- eugene (9/11) Sep 23 2021 No, as soon as an application get SIGTERM/SIGINT,
- Steven Schveighoffer (12/26) Sep 23 2021 That's not what is triggering the segfault though. The segfault is
- eugene (12/21) Sep 23 2021 With ease!
- Steven Schveighoffer (10/39) Sep 23 2021 Yes, I would recommend that. Always good for a destructor to clean up
- eugene (4/8) Sep 23 2021 but both SIGINT and SIGTERM are still **blocked**,
- eugene (9/15) Sep 23 2021 100% agree.
- eugene (16/19) Sep 23 2021 Any language is just an instrument,
- eugene (12/17) Sep 23 2021 Me? Blaming *myself* for C 'idiosyncrasies'? :) Where?
- Steven Schveighoffer (11/29) Sep 23 2021 "When my C program crashes, I'm 100% sure I made something stupid"
- H. S. Teoh (13/25) Sep 21 2021 Quick and dirty workaround: keep references to those objects in static
- eugene (2/6) Sep 23 2021 Now I guess, gdc optimization by size imply DSE.
Code snippets ```d class Stopper : StageMachine { enum ulong M0_IDLE = 0; Signal sg0, sg1; this() { super("STOPPER"); Stage init, idle; init = addStage("INIT", &stopperInitEnter); idle = addStage("IDLE", &stopperIdleEnter); init.addReflex("M0", idle); idle.addReflex("S0", &stopperIdleS0); idle.addReflex("S1", &stopperIdleS1); } void stopperInitEnter() { sg0 = newSignal(Signal.sigInt); sg1 = newSignal(Signal.sigTerm); msgTo(this, M0_IDLE); } ``` The instance of Stopper is created in the scope of main(): ```d void main(string[] args) { auto stopper = new Stopper(); stopper.run(); ``` stopperInitEnter(), where sg0 and sg1 are created, is invoked inside run() method. After ~6 seconds from the start (dummy) destructors of sg0 and sg1 are called: !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 24) this 0x7fa5410d4f60 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 25) this 0x7fa5410d4f90 Then after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure). First I thought I am stupid and I do not see some obvious mistake, but... That crash happens if the program was compiled with dmd (v2.097.2). When using gdc (as well as ldc, both from Debian 8 official repo), I do not observe no crashes - program may run for hours and after interrupting by ^C it terminates as expected. And the most strange thing is this - if using gdc with -Os flag, the program behaves exactly as when compiled with fresh dmd - destructors for sg0 and sg1 are called soon after program start. I do not understand at all why GC considers those sg0 and sg1 as unreferenced. And why old gdc (without -Os) and old ldc do not.
Sep 13 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:[...]At first glance and given the fact that some code necessary to diagnose the problem accurately is missing: `new Stopper` allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including `sg0` and `sg1`. So far everything is simple. The problems seems to lies in `newSignal()` which "would" not allocate using the GC. So when the GC reaches `sg0` and `sg1` values, indirectly when scanning a `Stopper` instance, it thinks that these they are unused and, consequently, free them. If you dont want them to be managed by the GC remove them from the GC, using `removeRange()`.
Sep 13 2021
On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:of sorry, or maybe the opposite, so addRange...[...]At first glance and given the fact that some code necessary to diagnose the problem accurately is missing: `new Stopper` allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including `sg0` and `sg1`. So far everything is simple. The problems seems to lies in `newSignal()` which "would" not allocate using the GC. So when the GC reaches `sg0` and `sg1` values, indirectly when scanning a `Stopper` instance, it thinks that these they are unused and, consequently, free them. If you dont want them to be managed by the GC remove them from the GC, using `removeRange()`.
Sep 13 2021
On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:The problems seems to lies in `newSignal()` which "would" not allocate using the GC.final Signal newSignal(int signum) { Signal sg = new Signal(signum); sg.owner = this; sg.number = sg_number++; sg.register(); return sg; } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
Sep 13 2021
On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:thx, so the problem is not what I suspected to be (mixed gc-managed and manually managed memory). sorrry...The problems seems to lies in `newSignal()` which "would" not allocate using the GC.final Signal newSignal(int signum) { Signal sg = new Signal(signum); sg.owner = this; sg.number = sg_number++; sg.register(); return sg; } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
Sep 13 2021
On Monday, 13 September 2021 at 17:56:34 UTC, user1234 wrote:thx, so the problem is not what I suspected to be (mixed gc-managed and manually managed memory). sorrry...I am actually C coder and do not have much experience with GC languages, so I did not even attempt to try use D without GC yet, just want to understand how all that GC magic works. The programs does not contain manual malloc()/free(), I am just not ready for such mix.
Sep 13 2021
On 9/13/21 1:54 PM, eugene wrote:On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:The GC only scans things that it knows about. Inside your EventQueue you have this code: ```d void registerEventSource(EventSource es) { auto e = EpollEvent(0, es); int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e); assert(r == 0, "epoll_ctl(ADD) failed"); } EventQueue opOpAssign(string op)(EventSource es) if (("+" == op) || ("~" == op)) { registerEventSource(es); return this; } void deregisterEventSource(EventSource es) { auto e = EpollEvent(0, es); int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e); assert(r == 0, "epoll_ctl(DEL) failed"); } EventQueue opOpAssign(string op)(EventSource es) if ("-" == op) { deregisterEventSource(es); return this; } ``` And you are registering your signals using the `+=` operator. What is happening here, is, `epoll_ctl` is adding your event source to a *C allocated* structure (namely the epoll struct, allocated by `epoll_create1`, and possibly even managed by the OS). The GC does not have access to this struct, so if that's the only reference to them, they will get cleaned up by the GC. Now, with your stopper code that you showed, it looks like you are storing the reference to stopper right on the main stack frame. This *should* prevent those from being destroyed, since Stopper has a reference to both signals. But I would recommend using `core.memory.GC.addRoot` on your EventSource when registering it with epoll, and using `core.memory.GC.removeRoot` when unregistering. That will ensure they do not get cleaned up before being unregistered. If this doesn't fix the problem, perhaps there is some other issue happening. -SteveThe problems seems to lies in `newSignal()` which "would" not allocate using the GC.final Signal newSignal(int signum) { Signal sg = new Signal(signum); sg.owner = this; sg.number = sg_number++; sg.register(); return sg; } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
Sep 13 2021
On Monday, 13 September 2021 at 18:42:47 UTC, Steven Schveighoffer wrote:And you are registering your signals using the `+=` operator.That was a sort of exercise with operator overloading.Now, with your stopper code that you showed, it looks like you are storing the reference to stopper right on the main stack frame. This *should* prevent those from being destroyed, since Stopper has a reference to both signals.Exactly - this is the main point of my confusion. On my expectation, GC should not mark those as unreferenced. Also, notice those dynamic arrays void main(string[] args) { RxSm[] rxMachines; auto rxPool = new RestRoom(); foreach (k; 0 .. nConnections) { auto sm = new RxSm(rxPool); rxMachines ~= sm; sm.run(); } rxMachines (and alike) are not needed by the prog itself, they are just to keep references for GC.
Sep 13 2021
On Monday, 13 September 2021 at 18:42:47 UTC, Steven Schveighoffer wrote:On 9/13/21 1:54 PM, eugene wrote:Umm is it okay that he declared variables `init` and `idle` of type `Stage` inside the constructor? Maybe that has something to do with this? Also, calling a variable `init` could be problematic since the compiler assigns a property of the same name to every single type?[...]The GC only scans things that it knows about. Inside your EventQueue you have this code: [...]
Sep 13 2021
On Tuesday, 14 September 2021 at 05:49:58 UTC, Tejas wrote:Umm is it okay that he declared variables `init` and `idle` of type `Stage` inside the constructor?States of a machine are in associative array. All other machines create their states in constructor, local variables are for using addReflex() method. But this stopper machine is 'special' for GC somehow.
Sep 14 2021
On 9/14/21 1:49 AM, Tejas wrote:On Monday, 13 September 2021 at 18:42:47 UTC, Steven Schveighoffer wrote:Declaring a member/field named `init` is likely a bad idea, but this is not a member, it's just a variable. That's fine. `idle` doesn't mean anything special to D. This project is too big and complex for me to diagnose by just reading, it would take some effort, and I don't have the time, sorry. Though as I have learned helping C converts before, most of the time things like this have to do with forgetting to store a GC reference somewhere. It can be subtle too... I still recommend pinning the object when adding the epoll event and seeing if that helps. -SteveOn 9/13/21 1:54 PM, eugene wrote:Umm is it okay that he declared variables `init` and `idle` of type `Stage` inside the constructor? Maybe that has something to do with this? Also, calling a variable `init` could be problematic since the compiler assigns a property of the same name to every single type?[...]The GC only scans things that it knows about. Inside your EventQueue you have this code: [...]
Sep 14 2021
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:Though as I have learned helping C converts before, most of the time things like this have to do with forgetting to store a GC reference somewhere.Yeah, in my first version I had ```d foreach (k; 0 .. nConnections) { auto sm = new EchoClient(rxPool, txPool); sm.run(); } ``` instead of ```d EchoClient[] wrkMachines; foreach (k; 0 .. nConnections) { auto sm = new EchoClient(rxPool, txPool); wrkMachines ~= sm; sm.run(); } ``` and even ```d { auto stopper = new Stopper(); stopper.run(); } ``` :)I still recommend pinning the object when adding the epoll event and seeing if that helps.I understand your idea, but even if this will help, the question remains - why that particular object is so special for GC.
Sep 14 2021
On 9/14/21 8:42 AM, eugene wrote:On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:Philosophically, it places the responsibility of making sure the object is valid while using it on the thing that chooses to store it outside the GC's view. Looking at your examples, you are having to store these object references elsewhere, surrounding seemingly innocuous and normal D usage of objects. You have put the burden on the caller to make sure the implementation details are sound. But I agree that a superficial reading of your code seems like it ought to not be collected, and that problem is also worth figuring out. I have high confidence that it's probably not a design flaw in the GC, but rather some misunderstanding of GC-allocated lifetimes in your code. But that doesn't mean it's not actually a bug somewhere in D. -SteveI still recommend pinning the object when adding the epoll event and seeing if that helps.I understand your idea, but even if this will help, the question remains - why that particular object is so special for GC.
Sep 14 2021
On Tuesday, 14 September 2021 at 12:52:44 UTC, Steven Schveighoffer wrote:But I agree that a superficial reading of your code seems like it ought to not be collected, and that problem is also worth figuring out. I have high confidence that it's probably not a design flaw in the GC, but rather some misunderstanding of GC-allocated lifetimes in your code. But that doesn't mean it's not actually a bug somewhere in D.run the server (do not run client): 'LISTENER INIT' got 'M0' from 'SELF' 'LISTENER' registered 104 (esrc.TCPListener) 'LISTENER' enabled 104 (esrc.TCPListener) 'LISTENER' enabled 105 (esrc.Signal) 'LISTENER' enabled 106 (esrc.Signal) wait > 6 seconds press ^C observe ___!!!___edsm.StageMachine.~this(): WORKER-95 destroyed... ___!!!___edsm.StageMachine.~this(): WORKER-96 destroyed... ___!!!___edsm.StageMachine.~this(): LISTENER destroyed... run client (do not run the server) observe 'CLIENT-9 CONN' got 'M2' from 'TX-1' CLIENT-9:client.EchoClient.clientConnM2() : connection to 'localhost:1111' failed error111) CLIENT-9:client.EchoClient.clientConnM2() : connection to 'localhost:1111' failed(Connection refused) press ^C observe ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... run server again run client like this: ./echo-client | grep owner wait >6.seconds see !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 24) this 0x7fa6cf12cf60 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 25) this 0x7fa6cf12cf90 WHY this is not happening with echo-server???
Sep 14 2021
On Tuesday, 14 September 2021 at 12:42:51 UTC, eugene wrote:I understand your idea, but even if this will help, the question remains - why that particular object is so special for GC.I had a problem just like this before because I was sending objects through the pipe. And while they were in the pipe - after send but before receive on the other side - it was liable to be collected. idk your code though if you have a separate reference to the object you should be ok. but here's the comment from my code when i broke it https://github.com/adamdruppe/arsd/blob/master/eventloop.d#L180
Sep 14 2021
On Tuesday, 14 September 2021 at 12:53:27 UTC, Adam D Ruppe wrote:I had a problem just like this before because I was sending objects through the pipe.This reminds my (not very successfull) attempts to implement the idea in Rust: ```rust pub struct Edsm { name: String, pub states: Vec<State>, current: usize, // pub state : *mut State, (?) pub data: *const void, // long live void* !!! // pub buddy : &'a Edsm, // ... and a hell begins... mb: Option<Box<EventSource>>, /* self-pipe write end fd, for sending internal events to this machine */ mxfd: i32, io: Option<Box<EventSource>>, pub tm: Vec<EventSource>, sg: Vec<EventSource>, // pub fs : Option<Box<EventSource>>, // pub ecap : &'a mut Ecap, // Welcome to <'x> HELL again!!! ecap: *mut Ecap, running: bool, /* self.run() has been invoked */ } ``` When something (a struct, for ex.) goes to a queue (DList for ex.), it is out of ANY scope and clever things like borrow checker can not analyze it's lifetime, oops...
Sep 14 2021
On Tuesday, 14 September 2021 at 12:53:27 UTC, Adam D Ruppe wrote:I had a problem just like this before because I was sending objects through the pipe. And while they were in the pipe -```rust pub fn msg(&self, code: u32) { let ptr: *const u32 = &code; let n = unsafe { write(self.mxfd, ptr as *const void, 4) }; if -1 == n { panic!("write({}): {:?}", self.mxfd, Error::last_os_error()); } } ``` I failed to implement message queue as a wrapper over double list, rust borrow checker has beaten me :)
Sep 14 2021
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:This project is too big and complexReally, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.
Sep 14 2021
On Tuesday, 14 September 2021 at 14:40:55 UTC, eugene wrote:On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:A 5-pound phone isn't "too heavy" for an adult to carry but it won't sell well. It's not just about capabilities but what efforts people are willing to expend. I would troubleshoot your issue by gradually making it safe and thinking about exceptions. One exception I didn't think about earlier was the 'misaligned pointer' one that I said I suppressed just to find the next safe complaint: https://dlang.org/spec/garbage.html says:This project is too big and complexReally, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.Do not misalign pointers if those pointers may point into the GC heap,So even if the lifetimes of your EventSource structs are fixed, the GC can reap the object they're pointing to. You could fix this by having a 128-bit struct and passing C an index into it, so to speak.
Sep 14 2021
On Tuesday, 14 September 2021 at 14:56:00 UTC, jfondren wrote:You could fix this by having a 128-bit struct and passing C an index into itIt is another "not so funny joke", isn't it? Look ```c typedef union epoll_data { void *ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ } __EPOLL_PACKED; // inside the system struct epoll_event { __u32 events; __u64 data; } EPOLL_PACKED; ``` and notice ```d align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; /* just do not want to use that union, epoll_data_t */ } static assert(EpollEvent.sizeof == 12); ```
Sep 14 2021
On Tuesday, 14 September 2021 at 15:37:27 UTC, eugene wrote:On Tuesday, 14 September 2021 at 14:56:00 UTC, jfondren wrote:No. And when was the first one?You could fix this by having a 128-bit struct and passing C an index into itIt is another "not so funny joke", isn't it?```d align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; /* just do not want to use that union, epoll_data_t */ } static assert(EpollEvent.sizeof == 12); ```That's 96 bits. Add 32. ```d class EventSource { } align(1) struct EpollEvent { align(1): uint event_mask; EventSource es; } struct OuterEpollEvent { int _dummy; uint event_mask; EventSource es; } EpollEvent* epollEvent(return ref OuterEpollEvent ev) trusted { return cast(EpollEvent*) &ev.event_mask; } void dumpEpollEvent(EpollEvent* ev) trusted { import std.stdio : writeln; writeln(*ev); } unittest { // can't be safe: // Error: field `EpollEvent.es` cannot modify misaligned pointers in ` safe` code EpollEvent ev; ev.es = new EventSource; // misaligned } safe unittest { // this is fine OuterEpollEvent ev; ev.event_mask = 0; ev.es = new EventSource; // not misaligned ev.epollEvent.dumpEpollEvent; } ```
Sep 14 2021
On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:No. And when was the first one?here: On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);What? Allocate struct epoll_event on the heap? It is a feeble joke ;) ```c static int ecap__add(int fd, void *dptr) { struct epoll_event waitfor = {0}; int flags, r; waitfor.data.ptr = dptr; r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor); if (-1 == r) { ``` All fd's (sockets, timers etc) are added the same way and corresponding EventSources are not destroyed by GC.
Sep 14 2021
On Tuesday, 14 September 2021 at 16:15:20 UTC, eugene wrote:On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:It is an example of deliberately static storage that does not fix your problem, thereby proving that the broken lifetimes of the struct are not your only problem. I explained that one at the time, and I explained this one. If it comes with an explanation, it's probably not a joke.No. And when was the first one?here: On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);What? Allocate struct epoll_event on the heap? It is a feeble joke ;)```c static int ecap__add(int fd, void *dptr) { struct epoll_event waitfor = {0}; int flags, r; waitfor.data.ptr = dptr; r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor); if (-1 == r) { ``` All fd's (sockets, timers etc) are added the same way and corresponding EventSources are not destroyed by GC.GC needs to be able to stop your program and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.
Sep 14 2021
On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:GC needs to be able to stop your programnice fantasies...and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.where did you find 'misaligned pointer'?...
Sep 14 2021
On Tuesday, 14 September 2021 at 16:56:52 UTC, eugene wrote:On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:It doesn't seem like communication between us is possible, in the "a five-pound phone won't sell" way. You can find this answer explained with code in an earlier post. My suggestion remains: try troubleshooting by making your program safe.GC needs to be able to stop your programnice fantasies...and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.where did you find 'misaligned pointer'?...
Sep 14 2021
On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:It doesn't seem like communication between us is possibleand you are wrong, as usual ,)in the "a five-pound phone won't sell" way.I am not a 'selling boy'My suggestion remains: try troubleshooting by making your program safe.Please, take that clever bot away.
Sep 14 2021
On 9/14/21 2:05 PM, eugene wrote:On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:People are trying to help you here. With that attitude, you are likely to stop getting help. -SteveIt doesn't seem like communication between us is possibleand you are wrong, as usual ,)in the "a five-pound phone won't sell" way.I am not a 'selling boy'My suggestion remains: try troubleshooting by making your program safe.Please, take that clever bot away.
Sep 14 2021
On Tuesday, 14 September 2021 at 18:33:33 UTC, Steven Schveighoffer wrote:People are trying to help you here.Then, answer the questions. Why those sg0 and sg1 are 'collected' by this so f... antstic GC?
Sep 14 2021
On 9/14/21 9:56 AM, eugene wrote:On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:I think it's the align(1) for EpollEvent. I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor: align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; this(uint event_mask, EventSource es) { this.event_mask = event_mask; this.es = es; living ~= es; // <-- Introduced this constructor for this line } /* just do not want to use that union, epoll_data_t */ } // Here is the array that keeps EventSource alive: EventSource[] living; If that really is the fix, of course the references must be taken out of that container when possible. AliThe misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.where did you find 'misaligned pointer'?...
Sep 14 2021
On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:On 9/14/21 9:56 AM, eugene wrote:Yep. This patch is sufficient to prevent the segfault: ``` diff --git a/engine/ecap.d b/engine/ecap.d index 71cb646..d57829c 100644 --- a/engine/ecap.d +++ b/engine/ecap.d -32,6 +32,7 final class EventQueue { private int id; private bool done; private MessageQueue mq; + private EventSource[] sources; private this() { id = epoll_create1(0); -52,6 +53,7 final class EventQueue { void registerEventSource(EventSource es) { auto e = EpollEvent(0, es); + sources ~= es; int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e); assert(r == 0, "epoll_ctl(ADD) failed"); } -63,7 +65,10 final class EventQueue { } void deregisterEventSource(EventSource es) { + import std.algorithm : countUntil, remove; + auto e = EpollEvent(0, es); + sources = sources.remove(sources.countUntil(es)); int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e); assert(r == 0, "epoll_ctl(DEL) failed"); } ``` Going through the project and adding safe: to the top of everything results in these errors: https://gist.github.com/jrfondren/c7f7b47be057273830d6a31372895895 some I/O, some system functions, some weird C APIs ... and misaligned assignments to EpollEvent.es. So debugging with safe isn't bad, but I'd still like rustc-style error codes: ``` engine/ecap.d(89): Error E415: field `EpollEvent.es` cannot assign to misaligned pointers in ` safe` code $ dmd --explain E415 Yeah see, the garbage collector only looks for pointers at pointer-aligned addresses. ```On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:yourThe misaligned pointer and the reference-containing struct that vanishes on the return ofI think it's the align(1) for EpollEvent. I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor: align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; this(uint event_mask, EventSource es) { this.event_mask = event_mask; this.es = es; living ~= es; // <-- Introduced this constructor for this line } /* just do not want to use that union, epoll_data_t */ } // Here is the array that keeps EventSource alive: EventSource[] living; If that really is the fix, of course the references must be taken out of that container when possible. Alicorresponding function are both problems for this.where did you find 'misaligned pointer'?...
Sep 15 2021
On Wednesday, 15 September 2021 at 23:07:45 UTC, jfondren wrote:Yep. This patch is sufficient to prevent the segfault:Your idea (hold references to all event sources somewhere) is quite clear, but it confuses me a bit, since 1) there **are** references to all event sources **already**, they are data members in StageMachine subclasses. 2) only two of many events sources are destroyed, namely, those which are referenced by sg1 and sg0 in Stopper machine of echo-client. All other event sources are not destroyed.
Sep 19 2021
On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:On 9/14/21 9:56 AM, eugene wrote:The definition of this struct was taken from /usr/include/dmd/druntime/import/core/sys/linux/epoll.d ```d version (X86_Any) { align(1) struct epoll_event { align(1): uint events; epoll_data_t data; } } ``` I am using my own definition, because data field has not any special meaning for the Linux kernel, it is returned as is by epoll_wait(). I am always using this field as pointer to EventSource. This struct has to be 12 bytes for x86 arch, in /usr/include/linux/eventpoll.h it looks like this: ```c struct epoll_event { __u32 events; __u64 data; } EPOLL_PACKED; ``` At some moment I had different definition (align is only inside): ```d struct EpollEvent { align(1): uint event_mask; EventSource es; /* just do not want to use that union, epoll_data_t */ } ``` But it's appeared: 1) relatively fresh gdc (from Linux Mint 19) does the right thing, the structure is packed and has 12 bytes size. 2) old gdc (from Debian 8) produces 16 bytes EventEpoll and both programs gets SIGSEGV right after first return from epoll_wait(), hence this check: ```d static assert(EpollEvent.sizeof == 12); ``` If the reason for crash was in EpollEvent alignment, programs would segfaults always very soon after start, just right after the very first return from epoll_wait().On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:yourThe misaligned pointer and the reference-containing struct that vanishes on the return ofI think it's the align(1) for EpollEvent.corresponding function are both problems for this.where did you find 'misaligned pointer'?...
Sep 18 2021
On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:The definition of this struct was taken from /usr/include/dmd/druntime/import/core/sys/linux/epoll.d...If the reason for crash was in EpollEvent alignment, programs would segfaults always very soon after start, just right after the very first return from epoll_wait().The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer. They are however not 64-bit aligned so D's GC is collecting objects that only they point to.
Sep 18 2021
On Saturday, 18 September 2021 at 09:54:05 UTC, jfondren wrote:On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:Exactly.The definition of this struct was taken from /usr/include/dmd/druntime/import/core/sys/linux/epoll.d...If the reason for crash was in EpollEvent alignment, programs would segfaults always very soon after start, just right after the very first return from epoll_wait().The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer.They are however not 64-bit aligned so D's GC is collecting objects that only they point to.Ok... 1) There are 303 event sources in echo-server, 200 in RX machines (100 Ios and 100 Timers), 100 Ios in TX machines and finally 3 in Listener (one Io and two signals, **sg0 and sg1**) All of these 303 references in EpollEvent struct are 'misaligned' in this sense, but **non of corresponding objects are collected**. 2) There are 22 event sources in echo-client, 20 in RX machines (10 Ios and 10 Timers), 10 Ios in TX machines and finally 2 in Stopper machines (**sg0 and sg1**, for handling SIGINT and SIGTERM), but **only the two last are collected**, all other are not - here is the problem.
Sep 19 2021
reference-containing struct that vanishes on the return of your corresponding functionI do not think it's a problem, otherwise **both programs would not work at all**. However, echo-server works without any surprises; echo-client also works, except that EventSources pointed by sg0 and sg1 data members in the Stopper instance, are cleared by GC soon after echo-client start. This does not mean that echo-client gets SIGSEGV right after those objects are destroyed, no - the crash happens later, upon receiving SIGINT or SIGTERM.
Sep 19 2021
On Sunday, 19 September 2021 at 08:51:31 UTC, eugene wrote:The GC doesn't reliably punish objects living past there not being any references to them because it's not always operating. If you have a tight loop where the GC is never invoked, you can do what ever crazy things you want. Your program doesn't crash until you hit ctrl-C after all.reference-containing struct that vanishes on the return of your corresponding functionI do not think it's a problem, otherwise **both programs would not work at all**.Look... I have added stopper into an array... ```d Stopper[] stoppers; auto stopper = new Stopper(); stoppers ~= stopper; stopper.run(); ``` and, you won't believe, this have fixed the problem - the objects, referenced by sg0 and sg1 are not destroyed anymore.This is a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..5ec41df 100644 --- a/echo_client.d +++ b/echo_client.d -32,7 +32,7 void main(string[] args) { sm.run(); } - auto stopper = new Stopper(); + scope stopper = new Stopper(); stopper.run(); writeln(" === Hello, world! === "); ``` The `scope` stack-allocates Stopper. This is also a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..0b968a8 100644 --- a/echo_client.d +++ b/echo_client.d -39,4 +39,6 void main(string[] args) { auto md = new MessageDispatcher(); md.loop(); writeln(" === Goodbye, world! === "); + writeln(stopper.sg0.number); + //writeln(stopper.sg1.number); } ``` either one of those writelns will do it. Without either of the above, STOPPER is destroyed a few seconds into a run of echo-client: ``` $ ./echo-client | grep STOPPER 'STOPPER' registered 24 (esrc.Signal) 'STOPPER' registered 25 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 24 (esrc.Signal) 'STOPPER' enabled 25 (esrc.Signal) (seconds pass) stopper.Stopper.~this(): STOPPER destroyed ``` You can hit ctrl-C prior to Stopper's destruction and there's no segfault. (On my system, it won't show the usual 'segfault' message to the terminal when grep is filtering like that, but if you turn on coredumps you can see one is only generated with a ctrl-C after Stopper's destroyed.) So this looks at first to me like a bug: dmd is allowing Stopper to be collected before the end of its lexical scope if it isn't used later in it. Except, forcing a collection right after `stopper.run()` doesn't destroy it. Here's a patch that destroys Stopper almost immediately, so that a ctrl-C within milliseconds of the program starting will still segfault it. This also no longer requires the server to be active. diff --git a/engine/edsm.d b/engine/edsm.d index 513d8a5..ea9ac3a 100644 --- a/engine/edsm.d +++ b/engine/edsm.d -176,6 +176,8 class StageMachine { "'%s %s' got '%s' from '%s'", name, currentStage.name, eventName, m.src ? (m.src is this ? "SELF" : m.src.name) : "OS" ); + import core.memory : GC; + GC.collect; if (eventName !in currentStage.reflexes) { valgrind: ``` ^C==14893== Thread 1: ==14893== Jump to the invalid address stated on the next line ==14893== at 0x2: ??? ==14893== by 0x187A3C: void disp.MessageDispatcher.loop() ==14893== by 0x1BED89: _Dmain ``` with Stopper's collection prevented and some logging around reactTo: ``` ^Csi.sizeof = 128 about to react to Message(null, stopper.Stopper, 0, esrc.Signal) 'STOPPER IDLE' got 'S0' from 'OS' goodbye, world reacted === Goodbye, world! === 1 ecap.EventQueue.~this stopper.Stopper.~this(): STOPPER destroyed ``` So the problem here is that ctrl-C causes that message to come but Stopper's been collected and that address contains garbage. Since the Message in the MessageQueue should keep it alive, I think this is probably a bug in dmd.
Sep 19 2021
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:So the problem here is that ctrl-C causes that message to come but Stopper's been collected and that address contains garbage.This is exactly what I was trying to say... Thanx a lot for your in-depth investigation of the trouble! I'll try your patches later.Since the Message in the MessageQueue should keep it alive, I think this is probably a bug in dmd.In the starting post I noticed that - when compiled with gdc, echo-client does not crash - when compiled with ldc, no crash - but when compiled with gdc -Os, same crash as with dmd. The last was (and still is) the most confusing observation for me.
Sep 19 2021
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:This is a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..5ec41df 100644 --- a/echo_client.d +++ b/echo_client.d -32,7 +32,7 void main(string[] args) { sm.run(); } - auto stopper = new Stopper(); + scope stopper = new Stopper(); stopper.run();I tried stack allocated stopper in my second 'simple example' and... No segfault, but: http://zed.karelia.ru/0/e/oops.png As can be seen from the screenshot, destructors of sg0 and sg1 were not called, but at the very end something went completely wrong.
Sep 19 2021
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:This is also a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..0b968a8 100644 --- a/echo_client.d +++ b/echo_client.d -39,4 +39,6 void main(string[] args) { auto md = new MessageDispatcher(); md.loop(); writeln(" === Goodbye, world! === "); + writeln(stopper.sg0.number); + //writeln(stopper.sg1.number); }This one really helps, program terminates as expected: ``` 'MAIN IDLE' got 'T0' from 'OS' 'MAIN IDLE' got 'T0' from 'OS' ^Csi.sizeof = 128 'STOPPER IDLE' got 'S0' from 'OS' 0 === Goodbye, world! === ___!!!___edsm.StageMachine.~this(): MAIN destroyed... ecap.EventQueue.~this !!! esrc.EventSource.~this() : esrc.Timer (owner MAIN, fd = 4) this 0x7f15e6c870c0 ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7f15e6c8a150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7f15e6c8a180 ```
Sep 19 2021
On 9/14/21 10:56 AM, jfondren wrote:On Tuesday, 14 September 2021 at 14:40:55 UTC, eugene wrote:I don't think this is the problem. The misaligned pointers are only happening within the stack frame, along with references to the objects stored also in another parameter. So they should not cause problems with the GC. The storage of the references inside other objects is not misaligned. -SteveOn Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:A 5-pound phone isn't "too heavy" for an adult to carry but it won't sell well. It's not just about capabilities but what efforts people are willing to expend. I would troubleshoot your issue by gradually making it safe and thinking about exceptions. One exception I didn't think about earlier was the 'misaligned pointer' one that I said I suppressed just to find the next safe complaint: https://dlang.org/spec/garbage.html says:This project is too big and complexReally, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.Do not misalign pointers if those pointers may point into the GC heap,So even if the lifetimes of your EventSource structs are fixed, the GC can reap the object they're pointing to. You could fix this by having a 128-bit struct and passing C an index into it, so to speak.
Sep 14 2021
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:This project is too big and complex for me to diagnose by just reading, it would take some efforttake a look at https://www.routledge.com/Modeling-Software-with-Finite-State-Machines-A-Practical-Approach/Wagner-Schmuki-Wagner-Wolstenholme/p/book/9780367390860# 'Event/Message Driven State Machines' (http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz) was inspired by this nice book.
Sep 14 2021
On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gzI've also made two simple examples, just in case - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-1.tar.gz Program does nothing, just waits for ^c, does not crash upon SIGINT. Now, let's put some pressure on garbage collector - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-2.tar.gz Every 10 ms do some allocations: ```d void mainIdleEnter() { tm0.enable(); tm0.heartBeat(10); // milliseconds } void mainIdleT0(StageMachine src, Object o) { int[] a; foreach (k; 0 .. 1000) { a ~= k; } } ``` After 3 seconds from the start destructors are called edsm-in-d-simple-example-2 $ ./test | grep owner !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7fa267872150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7fa267872180 After this happens, pressing ^C results in segfault.
Sep 19 2021
On Sunday, 19 September 2021 at 20:12:45 UTC, eugene wrote:On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:I rearranged the code of main() like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); writeln(" === Hello, world! === "); auto md = new MessageDispatcher(); md.loop(); writeln(" === Goodbye, world! === "); } ``` And it works correctly! Miracles... :)full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gzI've also made two simple examples, just in case - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-1.tar.gz Program does nothing, just waits for ^c, does not crash upon SIGINT. Now, let's put some pressure on garbage collector - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-2.tar.gz
Sep 19 2021
On Sunday, 19 September 2021 at 21:10:16 UTC, eugene wrote:I rearranged the code of main() like this:Similar rearrangement fixed the echo-client as well. (I moved creation of Stopper to the very beginning of main())
Sep 20 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:And the most strange thing is thisIt is echo-server/echo-client pair. And it is echo-client that crashes upon SIGINT. echo-server contains very similar code class Listener : StageMachine { enum ulong M0_WORK = 0; enum ulong M1_WORK = 1; enum ulong M0_GONE = 0; RestRoom workerPool; ushort port; TCPListener reception; Signal sg0, sg1; this(RestRoom wPool, ushort port = 1111) { super("LISTENER"); workerPool = wPool; this.port = port; Stage init, work; init = addStage("INIT", &listenerInitEnter); work = addStage("WORK", &listenerWorkEnter); init.addReflex("M0", work); work.addReflex("L0", &listenerWorkL0); work.addReflex("M0", &listenerWorkM0); work.addReflex("S0", &listenerWorkS0); work.addReflex("S1", &listenerWorkS1); } void listenerInitEnter() { reception = newTCPListener(port); sg0 = newSignal(Signal.sigInt); sg1 = newSignal(Signal.sigTerm); msgTo(this, M0_WORK); } but it does not crashes (destruc). The only significant difference - it has TCPListener instance, besides absolutely the same sg0 and sg1 'channels'.
Sep 13 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:Then after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure).engine/ecap.d(54): Error: field `EpollEvent.es` cannot assign to misaligned pointers in ` safe` code engine/ecap.d(56): Error: cannot take address of local `e` in ` safe` function `registerEventSource` from adding safe to ecap.EventQueue.registerEventSource, and then from using a trusted block to silence the first complaint. Instead of using a temporary EpollEvent array in EventQueue.wait, you could make the array an instance variable and have registerEventSource populate it directly, so that the GC can always trace from this array to an EnventSource. ... however, I don't think this fixes your problem, or is your only problem, since the segfault's still observed when this memory is leaked: ```d void registerEventSource(EventSource es) { import core.memory : pureMalloc, GC; auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof); p.event_mask = 0; p.es = es; GC.addRoot(p); int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, p); assert(r == 0, "epoll_ctl(ADD) failed"); } ```
Sep 13 2021
On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:Instead of using a temporary EpollEvent array in EventQueue.wait, you could make the array an instance variable and have registerEventSource populate it directlyActually, initial version of all that was using array, allocated in constructor, but then (when struggling with GC) I thought that array in stack will press GC less... ... It seems I said something stupid just now )
Sep 13 2021
On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:```d auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof); ```What? Allocate struct epoll_event on the heap? It is a feeble joke ;) ```c static int ecap__add(int fd, void *dptr) { struct epoll_event waitfor = {0}; int flags, r; waitfor.data.ptr = dptr; r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor); if (-1 == r) { ``` All fd's (sockets, timers etc) are added the same way and corresponding EventSources are not destroyed by GC.
Sep 14 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:Then after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure).... forget to mention, crashes here: ```d bool wait() { const int maxEvents = 8; EpollEvent[maxEvents] events; if (done) return false; int n = epoll_wait(id, events.ptr, maxEvents, -1); if (-1 == n) return false; foreach (k; 0 .. n) { EventSource s = events[k].es; ulong ecode = s.eventCode(events[k].event_mask); // <<<<< SIGSEGV ``` sg0/sg1 are destroyed, so s points to wrong location.
Sep 14 2021
On 9/14/21 7:31 AM, eugene wrote:On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:Note that s likely still points at a valid memory address. However, when an object is destroyed, its vtable is nulled out (precisely to cause a segfault if you try to use an already-freed object). There is also the possibility the memory block has been reallocated to something else, and that is causing the segfault. But if the segfault is consistent, most likely it's the former problem. -SteveThen after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure).... forget to mention, crashes here: ```d bool wait() { const int maxEvents = 8; EpollEvent[maxEvents] events; if (done) return false; int n = epoll_wait(id, events.ptr, maxEvents, -1); if (-1 == n) return false; foreach (k; 0 .. n) { EventSource s = events[k].es; ulong ecode = s.eventCode(events[k].event_mask); // <<<<< SIGSEGV ``` sg0/sg1 are destroyed, so s points to wrong location.
Sep 14 2021
On Tuesday, 14 September 2021 at 12:13:15 UTC, Steven Schveighoffer wrote:On 9/14/21 7:31 AM, eugene wrote:yeah, this address is obtained from OS (epoll_event struct), compiler can not zero it.On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote: EventSource s = events[k].es; ulong ecode = s.eventCode(events[k].event_mask); // <<<<< SIGSEGVNote that s likely still points at a valid memory address.However, when an object is destroyed, its vtable is nulled out (precisely to cause a segfault if you try to use an already-freed object).that's right - calling eventCode() method results in segfault.
Sep 14 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:The instance of Stopper is created in the scope of main(): ```d void main(string[] args) { auto stopper = new Stopper(); stopper.run(); ```Look... I have added stopper into an array... ```d Stopper[] stoppers; auto stopper = new Stopper(); stoppers ~= stopper; stopper.run(); ``` and, you won't believe, this have fixed the problem - the objects, referenced by sg0 and sg1 are not destroyed anymore. This is much more acceptable 'solition' for me than adding all of that bunch of event sources into some array. But I'am still puzzled - what is so special in the stopper? echo-server has it 'reception' just as single variable and it works fine.
Sep 19 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:I do not understand at all why GC considers those sg0 and sg1 as unreferenced. And why old gdc (without -Os) and old ldc do not.Conclusion: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the function and because there are apparently no other references to it that the GC can find (because the only reference is hidden inside the Linux epoll API). More discussion: https://forum.dlang.org/thread/siajpj$3p2$1 digitalmars.com http://dpldocs.info/this-week-in-d/Blog.Posted_2021_09_20.html Misaligned pointers are one way to hide objects from the GC but in this case they really weren't relevant. I just had a confused idea of the epoll API, because I'd only ever used it with a single static array that all epoll functions referenced, similarly to poll(). But actually epoll copies the event structures that you give it, and returns them on epoll_wait. That's wild.
Sep 21 2021
On Tuesday, 21 September 2021 at 19:42:48 UTC, jfondren wrote:On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the functionOkay, but how could you explain this then ```d void main(string[] args) { auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7fc9ab1a9150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7fc9ab1a9180 ``` Now, change operation order in the main like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ``` Everything is Ok now, stopper is not collected soon after start. So the question is how this innocent looking change can affect GC behavior so much?...Misaligned pointers are one way to hide objects from the GC but in this case they really weren't relevant.For sure.
Sep 21 2021
On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:Now, change operation order in the main like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ``` Everything is Ok now,I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions. and Java's just as aggressive about potential collection. It's just something that mostly doesn't matter until it becomes an incredibly weird bug with code like yours.
Sep 21 2021
On Tuesday, 21 September 2021 at 20:28:33 UTC, jfondren wrote:I saw a thread on this forum named 'Why are so many programmers do not like GC' or something like that. After this adventure I would add my 5 cents: because (sometimes, ok) it behaves absolutely unpredictable, depending on operation order, "particular compilers/options/versions" etc.Everything is Ok now,I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions.
Sep 22 2021
On Wednesday, 22 September 2021 at 08:03:59 UTC, eugene wrote:On Tuesday, 21 September 2021 at 20:28:33 UTC, jfondren wrote:Nondeterminism in heap collection is a very common complaint, but here we have data is that apparently on the stack that is collected nondeterministically. I can't say I like that.I saw a thread on this forum named 'Why are so many programmers do not like GC' or something like that. After this adventure I would add my 5 cents: because (sometimes, ok) it behaves absolutely unpredictable, depending on operation order, "particular compilers/options/versions" etc.Everything is Ok now,I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions.
Sep 22 2021
On Wednesday, 22 September 2021 at 10:05:05 UTC, jfondren wrote:Nondeterminism in heap collection is a very common complaint,It is another kind of nondeterminism that is usually complained about ("*sometime* in the future GC will collect if it wants" or so)but here we have data is that apparently on the stack that is collected nondeterministically. I can't say I like that.Exactly, so we've finally come to an agreement, great! :)
Sep 22 2021
On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:Now, change operation order in the main like this:Actually, all proposed 'fixes' - use stopper somehow in the end (writeln(stopper.sg0.number)) - change operation order - etc are strange. I mean it's strange (for me) that these fixes make garbage collector behave as needed.
Sep 21 2021
On Tue, Sep 21, 2021 at 08:36:49PM +0000, eugene via Digitalmars-d-learn wrote:On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:It's not strange. You're seeing these problems because you failed to inform the GC about the dependency between Main and stopper. So it's free to assume that these are two independent, unrelated objects, and therefore it can collect either one as soon as there are no more references to it. And since stopper isn't used anymore after declaration, an optimizing compiler is free to assume that it's not needed afterwards, so it's not obligated to keep the reference alive until the end of the function. Since in actually there *is* a dependency between these objects, the most "correct" solution is to include a reference to stopper somewhere in Main. Then the GC would be guaranteed never to collect stopper before Main becomes unreferenced. T -- Век живи - век учись. А дураком помрёшь.Now, change operation order in the main like this:Actually, all proposed 'fixes' - use stopper somehow in the end (writeln(stopper.sg0.number)) - change operation order - etc are strange. I mean it's strange (for me) that these fixes make garbage collector behave as needed.
Sep 21 2021
On Tuesday, 21 September 2021 at 20:47:41 UTC, H. S. Teoh wrote:And since stopper isn't used anymore after declaration, an optimizing compiler is free to assume that it's not needed afterwards, so it's not obligated to keep the reference alive until the end of the function.It was not obvious for me, I thought lifetimes always lasts until the end of a scope (main in this case).
Sep 21 2021
On Tuesday, 21 September 2021 at 20:47:41 UTC, H. S. Teoh wrote:Век живи - век учись. А дураком помрёшь.:) "Век живи - век учись, всё равно дураком помрёшь." is correct version. :)
Sep 21 2021
On Tue, Sep 21, 2021 at 08:17:15PM +0000, eugene via Digitalmars-d-learn wrote: [...]```d void main(string[] args) { auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); ```[...]```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); ```[...]Everything is Ok now, stopper is not collected soon after start. So the question is how this innocent looking change can affect GC behavior so much?...In the first example, the compiler sees that the lifetime of Main is disjoint from the lifetime of stopper, so it's free to reuse the same stack space (or register(s)) to store both variables. (This is a pretty standard optimization FYI.) So the line `auto stopper = new Stopper();` would overwrite the reference to Main, and the GC would see Main as an unreferenced object and may collect it at any point after the line `Main.run();`. In the second case, since the lifetimes of Main and stopper overlap, the compiler (probably) conservatively assumes that their lifetimes last until the end of the function, and so reserves disjoint places for them on the stack. This does not mean you're 100% safe, however. A sufficiently optimizing compiler may determine that since Main and stopper are independent, it is free to reorder the code such that the two lifetimes are independent, and therefore end up with the same situation as the first example. If Main really depends on the existence of stopper, I'd argue that it really should store a reference to stopper somewhere, so that as long as Main is not unreferenced the GC would not collect stopper. T -- What's an anagram of "BANACH-TARSKI"? BANACH-TARSKI BANACH-TARSKI.
Sep 21 2021
On Tuesday, 21 September 2021 at 20:42:12 UTC, H. S. Teoh wrote:A sufficiently optimizing compiler may determine that since Main and stopper are independent, it is free to reorder the code such that the two lifetimes are independent, and therefore end up with the same situation as the first example.In other words, compiler is trying to be smarter than a programmer :) With a poor result... But... it is **main** function, after all! Maybe, main() should be an exception when performing that 'smart' optimizations? ;) Btw, is there any dmd option for turning all/some optimizations off? Or some 'pragma/attribute'?
Sep 22 2021
On 9/21/21 4:17 PM, eugene wrote:On Tuesday, 21 September 2021 at 19:42:48 UTC, jfondren wrote:Here is what is happening. The compiler keeps track of how long it needs to keep `stopper` around. In assembly, the `new Stopper()` call is a function which returns in a register. On the very next instruction, you are calling the function `stopper.run` where it needs the value of the register (either pushed into an argument register, or put on the call stack, depending on the ABI). Either way, this is the last time in the function the value `stopper` is needed. Therefore, it does not store it on the stack frame of `main`. This is an optimization, but one that is taken even without optimizations enabled in some compilers. It's called [dead store elimination](https://en.wikipedia.org/wiki/Dead_store). Since the register is overwritten by subsequent function calls, there no longer exists a reference to `stopper`, and it gets collected (along with the members that are only referenced via `stopper`).On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the functionOkay, but how could you explain this then ```d void main(string[] args) { auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); ```Now, change operation order in the main like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ``` Everything is Ok now, stopper is not collected soon after start. So the question is how this innocent looking change can affect GC behavior so much?...In this case, at the point you call `Main.run`, `stopper` is only in a register. Yet, it's needed later, so the compiler has no choice but to put `stopper` on the stack so it has access to it to call `stopper.run`. If it didn't, it's likely that `Main.run` will overwrite that register. Once it's on the stack, the GC can see it for the full run of `main`. This is why this case is different. Note that Java is even more aggressive, and might *still* collect it, because it could legitimately set `stopper` to null after the last use to signify that it's no longer needed. I don't anticipate D doing this though. I recommend you read the blog post, it has details on how this is happening. How do you fix it? I have proposed a possible solution, but I'm not sure if it's completely sound, see [here](https://forum.dlang.org/post/sichju$2gth$1 digitalmars.com). It may be that this works today, but a future more clever compiler can potentially see through this trick and still not store the pinned value. I think the spec is wrong to say that just storing something as a local variable should solve the problem. We should follow the lead of other GC-supporting languages, and provide a mechanism to ensure a pointer is scannable by the GC through the entire scope. -Steve
Sep 22 2021
On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven Schveighoffer wrote:Here is what is happening.Many thanks for this so exhaustive explanation!
Sep 22 2021
On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven Schveighoffer wrote:Once it's on the stack, the GC can see it for the full run of `main`. This is why this case is different. Note that Java is even more aggressive, and might *still* collect it, because it could legitimately set `stopper` to null after the last use to signify that it's no longer needed.And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?
Sep 22 2021
On 9/22/21 8:22 AM, eugene wrote:On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven Schveighoffer wrote:Only when interfacing with C ;) Which admittedly is a stated goal for D. It's telling that I've been using D for 14 years and never had or seen this problem. -SteveOnce it's on the stack, the GC can see it for the full run of `main`. This is why this case is different. Note that Java is even more aggressive, and might *still* collect it, because it could legitimately set `stopper` to null after the last use to signify that it's no longer needed.And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?
Sep 22 2021
On Wednesday, 22 September 2021 at 12:26:53 UTC, Steven Schveighoffer wrote:On 9/22/21 8:22 AM, eugene wrote:I meant my this particular trouble... I do not want to understand how and what compiler generates, I just want to get working program without any oddities. Nevertheless, thank you again for your nice explanation!And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?Only when interfacing with C ;) Which admittedly is a stated goal for D.I know. And this is just fine to have the ability of using libc (especially system calls) withoutIt's telling that I've been using D for 14 years and never had or seen this problem.Bond. James Bond. :) 25 years of C coding. Now, imaging a shock I was under when I 'discovered' that swapping two lines of code can magically fix my prog and make GC do the right thing. :) Actually, D is a nice language per se and I truly wish it to be as popular as java/python/etc. But these GC ... mmm... 'features' may reduce to zero any wish to learn D, that's about it. When my C program crashes, I'm 100% sure I made something stupid - forget to initialize a pointer, easy to find and fix - did some memory corruption (worse, but then electric fence is my best friend) But if a crash is caused by 'optimization' + GC... It looks like a programmer must keep some implicit/unwritten rules in order to write correctly...
Sep 22 2021
On 9/22/21 11:47 AM, eugene wrote:On Wednesday, 22 September 2021 at 12:26:53 UTC, Steven Schveighoffer wrote:In terms of any kind of memory management, whether it be ARC, manual, GC, or anything else, there will always be pitfalls. It's just that you have to get used to the pitfalls and how to avoid them. I could see a person used to GC complaining that C requires you to free every pointer *exactly once*. I mean, how can that be acceptable? ;)On 9/22/21 8:22 AM, eugene wrote:I meant my this particular trouble...And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?I do not want to understand how and what compiler generates, I just want to get working program without any oddities.And for the most part, you do not. It's just when you travel outside the language, you must obey certain constraints. Those constraints are laid out, and unfortunately not exactly correct (we need to amend the spec/library for this), but given correct constraints, the rules are not super-difficult to follow.Nevertheless, thank you again for your nice explanation!You are welcome!I'm right there with you (been writing code for about 25 years, maybe 26, depending on when I switched majors to CS in college). But realize that C has it's share of "shocks" as well, you are just more used to them (or maybe you have been lucky so far?)It's telling that I've been using D for 14 years and never had or seen this problem.Bond. James Bond. :) 25 years of C coding. Now, imaging a shock I was under when I 'discovered' that swapping two lines of code can magically fix my prog and make GC do the right thing. :)Actually, D is a nice language per se and I truly wish it to be as popular as java/python/etc. But these GC ... mmm... 'features' may reduce to zero any wish to learn D, that's about it.Your experience is not typical though (clearly, as many of us long-time D users had no idea why it was happening). But for sure if this turns you off, I can understand how it can be too frustrating to learn the new rules. I personally would probably never write C code again if I can help it, despite having decades of experience in C/C++. I did recently have to port a C plugin library from PHP 5 to PHP 7, and it wasn't pleasant.When my C program crashes, I'm 100% sure I made something stupid - forget to initialize a pointer, easy to find and fix - did some memory corruption (worse, but then electric fence is my best friend) But if a crash is caused by 'optimization' + GC... It looks like a programmer must keep some implicit/unwritten rules in order to write correctly...I find it interesting how you blame yourself for C's idiosyncrasies, but not for D's ;) I would say C has far more pitfalls than D. Check out the undefined behaviors for C. -Steve
Sep 22 2021
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:Your experience is not typical though (clearly, as many of us long-time D users had no idea why it was happening).Oh, yeah - I have special trait of bumping against various low probability things :)But for sure if this turns you off, I can understand how it can be too frustrating to learn the new rules.Show me these rules! Always use an object at the end of a function? Make a second reference to an object somewhere on the heap? The 'problem' here is that there is no clear rule. Any reasonable 'hack' will do.
Sep 23 2021
On 9/23/21 3:27 AM, eugene wrote:On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:They are here: https://dlang.org/spec/interfaceToC.html#storage_allocation With the caveat, of course, that the recommendation to "leave a pointer on the stack" is not as easy to follow as one might think with the optimizer fighting against that. We need to add a better way to do that [attempt](https://code.dlang.org/packages/keepalive), but I think it's not guaranteed to work, I've already found ways to prove it fails. -SteveYour experience is not typical though (clearly, as many of us long-time D users had no idea why it was happening).Oh, yeah - I have special trait of bumping against various low probability things :)But for sure if this turns you off, I can understand how it can be too frustrating to learn the new rules.Show me these rules!
Sep 23 2021
On Thursday, 23 September 2021 at 12:53:14 UTC, Steven Schveighoffer wrote:Yes, as you explained me, the root of the problem in my examples were dead store elimination.Show me these rules!They are here: https://dlang.org/spec/interfaceToC.html#storage_allocation With the caveat, of course, that the recommendation to "leave a pointer on the stack" is not as easy to follow as one might think with the optimizer fighting against that.KeepAlive).Do you mean some function attribute?..
Sep 23 2021
On 9/23/21 9:18 AM, eugene wrote:On Thursday, 23 September 2021 at 12:53:14 UTC, Steven Schveighoffer wrote:suggested -- use the object later. However, they are recognized by the compiler as an intrinsic which generates no code or side effects, but is not subject to elimination by the optimizer. See more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks -SteveDo you mean some function attribute?..
Sep 23 2021
On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer wrote:See more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Sep 23 2021
On 9/23/21 12:58 PM, eugene wrote:On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer wrote:Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything. Essentially, you get the side effect of keeping the object as live, without paying the penalty of inserting frivolous code. All my efforts to achieve the same via a library were thwarted by at least LDC (whose optimizer is very good). The only possible solution I can think of is to generate an opaque function that LDC cannot see into, in order to force it to avoid inlining, and have that function do nothing. However, there's always Link-Time-Optmization... -SteveSee more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive? iew=net-5.0#remarks" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Sep 23 2021
On Thursday, 23 September 2021 at 17:16:23 UTC, Steven Schveighoffer wrote:On 9/23/21 12:58 PM, eugene wrote:```d void keepAlive(Object o) { } void main(string[] args) { import core.memory : GC; auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); writeln(" === Hello, world! === "); auto md = new MessageDispatcher(); md.loop(); keepAlive(Main); keepAlive(stopper); writeln(" === Goodbye, world! === "); } ``` works ok with dmd, stopper is not collected.On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer wrote:Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything.See more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Sep 23 2021
On 9/23/21 2:18 PM, eugene wrote:On Thursday, 23 September 2021 at 17:16:23 UTC, Steven Schveighoffer wrote:With dmd -O -inline, there is a chance it will be collected. Inlining is key here. -SteveOn 9/23/21 12:58 PM, eugene wrote:```d void keepAlive(Object o) { } void main(string[] args) { import core.memory : GC; auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); writeln(" === Hello, world! === "); auto md = new MessageDispatcher(); md.loop(); keepAlive(Main); keepAlive(stopper); writeln(" === Goodbye, world! === "); } ``` works ok with dmd, stopper is not collected.On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer wrote:Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything.See more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive? iew=net-5.0#remarks" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Sep 23 2021
On Thursday, 23 September 2021 at 18:43:36 UTC, Steven Schveighoffer wrote:With dmd -O -inline, there is a chance it will be collected. Inlining is key here.never mind, GC.addRoot() looks more trustworthy, anyway :)
Sep 23 2021
On Thursday, 23 September 2021 at 12:53:14 UTC, Steven Schveighoffer wrote:With the caveat, of course, that the recommendation to "leave a pointer on the stack" is not as easy to follow as one might think with the optimizer fighting against that. We need to add [attempt](https://code.dlang.org/packages/keepalive), but I think it's not guaranteed to work, I've already found ways to prove it fails.For the moment I am personally quite happy with any reasonable workaround (use an object in the end of the main function, put the reference to an object into some AA, whatever), because now I firmly understand, that the source of strange GC behavior is DSE optimization (in this case).
Sep 23 2021
On Thursday, 23 September 2021 at 14:00:30 UTC, eugene wrote:For the moment I am personally quite happy```d void main(string[] args) { import core.memory : GC; auto Main = new Main(); GC.addRoot(cast(void*)Main); Main.run(); auto stopper = new Stopper(); GC.addRoot(cast(void*)stopper); stopper.run(); ``` Fine, works!
Sep 23 2021
On Thursday, 23 September 2021 at 14:23:40 UTC, eugene wrote:On Thursday, 23 September 2021 at 14:00:30 UTC, eugene wrote:Nice. I thought of GC.addRoot several times but I was distracted by the general solution of using object lifetimes with it, so that a struct's destructor would call GC.removeRoot. For your case just pinning these and forgetting about them is the easiest way to do it.For the moment I am personally quite happy```d void main(string[] args) { import core.memory : GC; auto Main = new Main(); GC.addRoot(cast(void*)Main); Main.run(); auto stopper = new Stopper(); GC.addRoot(cast(void*)stopper); stopper.run(); ``` Fine, works!
Sep 23 2021
On Thursday, 23 September 2021 at 14:31:34 UTC, jfondren wrote:Nice. I thought of GC.addRoot several times but I was distracted by the general solution of using object lifetimes with it, so that a struct's destructor would call GC.removeRoot. For your case just pinning these and forgetting about them is the easiest way to do it.Yes, these two must live until the end of main(). Moreover, in real (C) programs I (usually) do not create state machines on the fly, instead I keep them in pools, like RX/TX machines pools in echo-server and in echo-client.
Sep 23 2021
On 9/23/21 10:55 AM, eugene wrote:On Thursday, 23 September 2021 at 14:31:34 UTC, jfondren wrote:Technically, they should live past the end of main, because it's still possible to receive signals then. But the chances of someone hitting ctrl-c in that window are quite small. -SteveNice. I thought of GC.addRoot several times but I was distracted by the general solution of using object lifetimes with it, so that a struct's destructor would call GC.removeRoot. For your case just pinning these and forgetting about them is the easiest way to do it.Yes, these two must live until the end of main(). Moreover, in real (C) programs I (usually) do not create state machines on the fly, instead I keep them in pools, like RX/TX machines pools in echo-server and in echo-client.
Sep 23 2021
On Thursday, 23 September 2021 at 15:53:37 UTC, Steven Schveighoffer wrote:Technically, they should live past the end of main, because it's still possible to receive signals then.No, as soon as an application get SIGTERM/SIGINT, event queue is stopped and we do not need no more notifications from OS (POLLIN/POLLOUT I mean). Stopping event queue in this case is just closing file descriptor obtained from epoll_create(). After this getting POLLIN from any fd (including signal fd) is just impossible.
Sep 23 2021
On 9/23/21 12:53 PM, eugene wrote:On Thursday, 23 September 2021 at 15:53:37 UTC, Steven Schveighoffer wrote:That's not what is triggering the segfault though. The segfault is triggered by the signal handler referencing the destroyed object. So imagine the sequence: 1. ctrl-c, signal handler triggers, shutting down the loop 2. main exits 3. GC finalizes all objects, including the Stopper and it's members 4. ctrl-c happens again, but you didn't unregister the signal handler, so it's run again, referencing the now-deleted object. 5. segfault It's theoretically a very very small window. -SteveTechnically, they should live past the end of main, because it's still possible to receive signals then.No, as soon as an application get SIGTERM/SIGINT, event queue is stopped and we do not need no more notifications from OS (POLLIN/POLLOUT I mean). Stopping event queue in this case is just closing file descriptor obtained from epoll_create(). After this getting POLLIN from any fd (including signal fd) is just impossible.
Sep 23 2021
On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:So imagine the sequence:With ease!1. ctrl-c, signal handler triggers, shutting down the loopJust a note: there is no 'signal handler' in the program. SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are received via epoll_wait().2. main exits 3. GC finalizes all objects, including the Stopper and it's membersProbably, a destructor for Signal class should be added, in which - close fd, obtained from signalfd() - unblock the signal (thus default signal handler is back again)4. ctrl-c happens again, but you didn't unregister the signal handler, so it's run again, referencing the now-deleted object.At this point we have default signal handler5. segfault It's theoretically a very very small window.But even without destructor, no segfault will happen, because **there is no signal handler**
Sep 23 2021
On 9/23/21 1:44 PM, eugene wrote:On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:Oh interesting! I didn't read the code closely enough.So imagine the sequence:With ease!1. ctrl-c, signal handler triggers, shutting down the loopJust a note: there is no 'signal handler' in the program. SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are received via epoll_wait().Yes, I would recommend that. Always good for a destructor to clean up any non-GC resources that haven't already been cleaned up. That's actually what class destructors are for.2. main exits 3. GC finalizes all objects, including the Stopper and it's membersProbably, a destructor for Signal class should be added, in which - close fd, obtained from signalfd() - unblock the signal (thus default signal handler is back again)So it gets written to the file descriptor instead? And nobody is there reading it, so it's just closed along with the process? I've not done signals this way, it seems pretty clever and less prone to asynchronous issues. -Steve4. ctrl-c happens again, but you didn't unregister the signal handler, so it's run again, referencing the now-deleted object.At this point we have default signal handler5. segfault It's theoretically a very very small window.But even without destructor, no segfault will happen, because **there is no signal handler**
Sep 23 2021
On Thursday, 23 September 2021 at 18:53:25 UTC, Steven Schveighoffer wrote:On 9/23/21 1:44 PM, eugene wrote:"everything in Unix is a file" (c) All event sources (sockets, timers, signal, file system events) can be 'routed' through i/o multiplexing facilities, like select/poll(posix)/epoll(linux)/queue(freebsd) etc.Just a note: there is no 'signal handler' in the program. SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are received via epoll_wait().Oh interesting! I didn't read the code closely enough.Probably, a destructor for Signal class should be added, in whichYes, I would recommend that. Always good for a destructor to clean up any non-GC resources that haven't already been cleaned up. That's actually what class destructors are for.No, destructors are not necessary, since after SIGINT/SIRTERM program is about to terminate and all resources will be released anyway. In C I do same way - do not close fd, which live from start to end, do not free() pointers and so on, no need.So it gets written to the file descriptor instead?When signal happens (or timer expires, or file is deleted) process get EPOLLIN on corresponding file descriptor via epoll_wait() and then process has to read some info from these file descriptors.And nobody is there reading it, so it's just closed along with the process?Yes, as any other file descriptor.I've not done signals this way, it seems pretty clever and less prone to asynchronous issues.It's just great, thanks to Linux kernel developers. Look in to engine dir in the source. C (more elaborated) variant: http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz
Sep 23 2021
On Thursday, 23 September 2021 at 19:32:12 UTC, eugene wrote:C (more elaborated) variant: http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gzSound, GUI? Easy, see http://zed.karelia.ru/mmedia/bin/xjiss4.tar.gz It's computer keyboard 'piano', based on the same engine. As I've already mentioned, I was inspired several years ago by very nice book, 'Modeling Software with Finite State Machines: A Practical Approach' by Wagner F. et al. I applied some ideas from the book to Posix/Linux API and developed EDSM - 'Event driven state machines' - and since I do not need no libev, libevent and alike. State machines per se are very powerful methodology to model program behavior. Also notice, that machines communicates with each other by messages - remember Alan Key main OOP principle? It's message exchange, not class hierarchy (inheritance blah-blah-blah). As to client-server echo pair in D, it is my 3rd (and most successful!!!) attempt to re-implement EDSM in some 'modern' language. Initially my criteria for choosing a lang were: - compiles to native code, goodby Java - no garbage collector, goodby almost all :) - no classes, only interfaces. - maybe, something else, do not remember The only language, that fit to my initial desires, was Rust. But borrow checker (and especially <'a>, expicite lifetimes) appeared to be the real hell for me. One can use raw pointers instead of references, but then your code is from head to toe is in unsafe {} blocks. Here, I was not be able to make signals work properly and I dropped it. After reading some texts a la 'alternatives to c++', I decided to try D, despite it's 'unpopularity'.
Sep 23 2021
On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:1. ctrl-c, signal handler triggers, shutting down the loop 2. main exits 3. GC finalizes all objects, including the Stopper and it's membersbut both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
Sep 23 2021
On Thursday, 23 September 2021 at 17:49:43 UTC, eugene wrote:On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:oops.. closing epoll fd should be moved from EventQueue dtor to stop() method, then everything will be Ok.1. ctrl-c, signal handler triggers, shutting down the loop 2. main exits 3. GC finalizes all objects, including the Stopper and it's membersbut both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
Sep 23 2021
On Thursday, 23 September 2021 at 17:53:00 UTC, eugene wrote:On Thursday, 23 September 2021 at 17:49:43 UTC, eugene wrote:no oops, that's all right. - when creating Signal instance, corresponding signal becames blocked ```d final class Signal : EventSource { enum int sigInt = SIGINT; enum int sigTerm = SIGTERM; ulong number; this(int signo) { super('S'); sigset_t sset; sigset_t old_sset; /* block the signal */ sigemptyset(&sset); sigaddset(&sset, signo); sigprocmask(SIG_BLOCK, &sset, &old_sset); id = signalfd(-1, &sset, SFD_CLOEXEC); ``` - upon receiving SIGINT stopperIdleS0() is called. now stop variable of EventQueue is false - next call to wait() method just return. (remember, signals are still blocked)On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:oops..1. ctrl-c, signal handler triggers, shutting down the loop 2. main exits 3. GC finalizes all objects, including the Stopper and it's membersbut both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
Sep 23 2021
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:In terms of any kind of memory management, whether it be ARC, manual, GC, or anything else, there will always be pitfalls. It's just that you have to get used to the pitfalls and how to avoid them.100% agree.I could see a person used to GC complaining that C requires you to free every pointer *exactly once*.C (at compiler level) does not require this. You can do it free()ly. ,) With a subsequent... yes, 'shock' after you see 'double free or corruption' message. Tell that imaginary person this: if (p) {free(p); p = NULL}, that's all.
Sep 23 2021
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:But realize that C has it's share of "shocks" as wellAny language is just an instrument, most of the 'shocks' come not from languages themselves, but from the 'enviromment', so to say. An example that came to mind... Did you know that sending data via write()/send() to a socket, that is in CLOSE_WAIT state, results in sending data to nowhere and write() indicates no error? By the way, GC is a sort of 'environment' (not the language itself), it acts behind the scenes (unless you are using it directly)you are just more used to them (or maybe you have been lucky so far?)Once I've been very 'lucky' with unaligned pointer dereference on ARM...
Sep 23 2021
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:I find it interesting how you blame yourself for C's idiosyncrasiesMe? Blaming *myself* for C 'idiosyncrasies'? :) Where?but not for D's ;)I've been learning D for about 3 months only.I would say C has far more pitfalls than D.No doubt - and I've never said C is "better" than D. I was going to try betterC subset (say, try to implement dynamic arrays), but did not have much free time yet.Check out the undefined behaviors for C.Nothing interesting... Most of UB in C are just programmer's sloppiness. C requires a programmer to be careful/punctual, much more careful, than ... a python, for ex.
Sep 23 2021
On 9/23/21 8:10 AM, eugene wrote:On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer wrote:"When my C program crashes, I'm 100% sure I made something stupid" One might argue that C's approach to memory management is a contributor to people writing code that fails.I find it interesting how you blame yourself for C's idiosyncrasiesMe? Blaming *myself* for C 'idiosyncrasies'? :) Where?Your assertion that programming in GC languages may be harder than manual memory languages is what I was addressing. My point is that C has a lot more memory management pitfalls than D, not addressing any "better than" arguments.I would say C has far more pitfalls than D.No doubt - and I've never said C is "better" than D. I was going to try betterC subset (say, try to implement dynamic arrays), but did not have much free time yet.UB in C leaves traps for the programmer, similar to this trap you have found in the GC. Where code doesn't do what you are expecting it to do. -SteveCheck out the undefined behaviors for C.Nothing interesting... Most of UB in C are just programmer's sloppiness. C requires a programmer to be careful/punctual, much more careful, than ... a python, for ex.
Sep 23 2021
On Thursday, 23 September 2021 at 13:05:07 UTC, Steven Schveighoffer wrote:UB in C leaves traps for the programmer, similar to this trap you have found in the GC. Where code doesn't do what you are expecting it to do.There is a difference, though. As I've already said, GC is a sort of 'environment', the code of GC exits by it's own. In C, no such code is 'inserted' by compiler into code written by a programmer. So, in C it is MY (potentially wrong) code. In D, it is NOT MY code, it is GC. From this point of view debugging *may* be harder (especially by beginners, like me)
Sep 23 2021
On Thursday, 23 September 2021 at 13:30:42 UTC, eugene wrote:So, in C it is MY (potentially wrong) code. In D, it is NOT MY code, it is GC.Actually in both cases it is MY+the compiler's code. A very similar example from C-land (without my digging up the exact details) is something like ```c for (int i = 0; i >= 0; i++) { // exit loop on signed integer overflow } ``` where gcc 2.95 would do what "MY code" said, but later gcc versions would 'optimize' into an infinite loop (followed by dead code that can now be removed): ```c for (;;) { // never exit loop } ``` Because in math, positive numbers never +1 into negative numbers. And in C this is undefined behavior which is (modern understanding:) complete license for the compiler to do anything at all. And on the specific architecture we are specifically compiling for there is specific behavior--but who cares about that, this is optimization! And if you complained about it, well you were a sloppy coder actually, for wanting the target architecture's actual behavior with your actual code as you actually wrote it. (If you feel like defending C's honor here, please, I've heard it already. Everybody thinks very highly of the nasal demons joke.) There are other cases where very security-minded software had defensive code that an optimizer decided would never be needed, that then exposed a software vulnerability, or there are 'unnecessary' writes that are intended to remove a password from memory: https://duckduckgo.com/?q=dead+code+elimination+security+vulnerability
Sep 23 2021
On Tue, Sep 21, 2021 at 07:42:48PM +0000, jfondren via Digitalmars-d-learn wrote:On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:Quick and dirty workaround: keep references to those objects in static variables to prevent GC collection: auto myFunc(...) { static MyType* dontCollect = null; MyType* obj = new MyObject(...); dontCollect = obj; scope(exit) dontCollect = null; // may collect after function exits ... // function body goes here } T -- Verbing weirds language. -- Calvin (& Hobbes)I do not understand at all why GC considers those sg0 and sg1 as unreferenced. And why old gdc (without -Os) and old ldc do not.Conclusion: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the function and because there are apparently no other references to it that the GC can find (because the only reference is hidden inside the Linux epoll API).
Sep 21 2021
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:And the most strange thing is this - if using gdc with -Os flag, the program behaves exactly as when compiled with fresh dmd - destructors for sg0 and sg1 are called soon after program start.Now I guess, gdc optimization by size imply DSE.
Sep 23 2021