digitalmars.D - GC buggy in windows?
- tchaloupka (85/85) Nov 08 2019 We've experiencing some really strange nasty GC behavior in our
- rikki cattermole (3/3) Nov 08 2019 Just to confirm, this code snippet is meant to lock the entire process
- tchaloupka (8/11) Nov 08 2019 Yep, it just outputs:
- tchaloupka (5/17) Nov 08 2019 We've just tried it on 5 more physical PCs (all win 10 x86_64
- Dennis (5/6) Nov 08 2019 I tried it a few times on my Windows 10 laptop with dmd 2.088, it
- rikki cattermole (5/12) Nov 08 2019 Yes.
- tchaloupka (12/25) Nov 08 2019 Thanks for feedback, I've tried it on more servers where it
- rikki cattermole (13/41) Nov 08 2019 Bug on Eset's side.
- bachmeier (4/7) Nov 08 2019 But isn't that the purpose of antivirus software? Isn't the whole
- Gregor =?UTF-8?B?TcO8Y2ts?= (6/13) Nov 08 2019 It's not OK if the interference consists of injecting random bugs
- Fathou (5/20) Nov 08 2019 OT, but the last time I used an AV was to disinfect a relative's
- bachmeier (6/21) Nov 09 2019 When you install antivirus on your computer, you're giving it
- Gregor =?UTF-8?B?TcO8Y2ts?= (10/22) Nov 09 2019 I'm trying to make exactly that argument: ALL antivirus software
- JN (14/19) Nov 09 2019 I consider most of the AV software snake oil. They slow down your
We've experiencing some really strange nasty GC behavior in our IOCP I/O heavy windows app. Sometimes it hangs with just: "Unable to load thread context" I've spend last three days with experimenting and trying to narrow it somehow to find exact cause :( The problem is in GC and it's stop the world behavior. In core.thread.osthread.sleep method there is basically: ``` SuspendThread( t.m_hndl ); GetThreadContext( t.m_hndl, &context ); ``` In some cases GetThreadContext returns `ERROR_GEN_FAILURE(31)` which leads to the error being thrown. First problem is, that application doesn't terminate after this error, but just hangs. That's because thread is still suspended and somewhere down the line `join` is called on this thread which won't return - ever. This is a nice blog explaining that the `SuspendThread` is actually asynchronnous: https://devblogs.microsoft.com/oldnewthing/?p=44743 But it also states that when `GetThreadContext` is called on it, we can be sure that it is actually already suspended. So what could lead to the error? Searching in windows API documentation - nah, nothing as usual.. Searching on the internet - sure a lot of problems with some game engines using GC (unity) combined with some anticheat or antivirus programs - not our case. Ok, so I've tried to compile custom druntime (what a pleasure itself) and found that: * when you try to Thread.yield and get context again, it doesn't help, still error * only way I could workaround this problem was resuming back the thread again, Thread.yield, suspend thread and try the context again, usually first or second try succeeds - HOORAY. Then I've spent a lot of time figuring what is actually causing the error and I have a theory that the problem is with some IO operation being run in kernel context that can't finish when the thread is suspended and so the error is returned. I ended up with this minimized test app that causes this error really fast. ``` import core.memory : GC; import core.stdc.stdio; import core.thread; import std.random; import std.range; void main() { Thread t; while (true) { GC.collect(); if (t is null || !t.isRunning) { t = new Thread(&threadProc); t.start(); } } } void threadProc() { foreach (_; iota(uniform(0, 100))) { FILE* f = fopen("dummy", "a"); scope (exit) fclose(f); } } ``` compiled with: `dmd -m64 -debug test.d` Tested on 64bit Windows 10. I definitely think that this is a bug in a windows GC implementation. Should I fill it? What seems to be a fix to both of them is: * retry the resume/suspend/get context on the failing thread some more - how many times? * before returning the error resume the thread so it can be joined (I haven't looked from where it's being called on termination) For me it is also questionable if terminating the application in this case is even the correct behavior. It might be better to scratch the GC attempt, resume the threads and retry on next collection? That might lead to other problems but as this occurs pretty rarely it might have a better outcome. Ideas? PS: I'm beginning to understand the C/C++ devs to don't like GC languages ;-) PPS: Now I hate windows even more.. (normally a linux dev) PPPS: This kind of experience would definitely led away devs that just need to have "shit done" and don't bother with the tool used..
Nov 08 2019
Just to confirm, this code snippet is meant to lock the entire process up and CPU usage go down to 0%? If so, so far I have not confirmed it using dmd 2.087.0.
Nov 08 2019
On Friday, 8 November 2019 at 14:39:34 UTC, rikki cattermole wrote:Just to confirm, this code snippet is meant to lock the entire process up and CPU usage go down to 0%? If so, so far I have not confirmed it using dmd 2.087.0.Yep, it just outputs: C:\Users\tcha\Workspace>test.exe core.thread.osthread.ThreadError src\core\thread\osthread.d(3176): Unable to load thread context ---------------- and hangs on Thread.join (0% CPU). Tested both on physical and virtual windows 10 x86_64.
Nov 08 2019
On Friday, 8 November 2019 at 14:47:18 UTC, tchaloupka wrote:On Friday, 8 November 2019 at 14:39:34 UTC, rikki cattermole wrote:We've just tried it on 5 more physical PCs (all win 10 x86_64 with ssd/m2, core i5/i7 of various models). With dmd-master, dmd-2.086.1, dmd-2.089.0. All ended up same within a few secs.Just to confirm, this code snippet is meant to lock the entire process up and CPU usage go down to 0%? If so, so far I have not confirmed it using dmd 2.087.0.Yep, it just outputs: C:\Users\tcha\Workspace>test.exe core.thread.osthread.ThreadError src\core\thread\osthread.d(3176): Unable to load thread context ---------------- and hangs on Thread.join (0% CPU). Tested both on physical and virtual windows 10 x86_64.
Nov 08 2019
On Friday, 8 November 2019 at 15:01:26 UTC, tchaloupka wrote:All ended up same within a few secs.I tried it a few times on my Windows 10 laptop with dmd 2.088, it just sat there for minutes using ~14% CPU (note: I have 8 logical processors) taking 10 Mb, and nothing appeared in the console. So unfortunately I couldn't reproduce it either.
Nov 08 2019
On 09/11/2019 4:21 AM, Dennis wrote:On Friday, 8 November 2019 at 15:01:26 UTC, tchaloupka wrote:Yes. This is looking more and more like an environment issue, not a bug on druntime's end. Potentially AV related (I use Avast) and I'm on Windows 10 Home 64bit.All ended up same within a few secs.I tried it a few times on my Windows 10 laptop with dmd 2.088, it just sat there for minutes using ~14% CPU (note: I have 8 logical processors) taking 10 Mb, and nothing appeared in the console. So unfortunately I couldn't reproduce it either.
Nov 08 2019
On Friday, 8 November 2019 at 15:30:18 UTC, rikki cattermole wrote:On 09/11/2019 4:21 AM, Dennis wrote:Thanks for feedback, I've tried it on more servers where it actually worked as you both described. At the end the difference was Eset antivirus installed. I had it whole disabled to eliminate exactly this but only after it's uninstall it started to work.. So some crap was still active in it. Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together." So bug or not a bug?On Friday, 8 November 2019 at 15:01:26 UTC, tchaloupka wrote:Yes. This is looking more and more like an environment issue, not a bug on druntime's end. Potentially AV related (I use Avast) and I'm on Windows 10 Home 64bit.All ended up same within a few secs.I tried it a few times on my Windows 10 laptop with dmd 2.088, it just sat there for minutes using ~14% CPU (note: I have 8 logical processors) taking 10 Mb, and nothing appeared in the console. So unfortunately I couldn't reproduce it either.
Nov 08 2019
On 09/11/2019 4:54 AM, tchaloupka wrote:On Friday, 8 November 2019 at 15:30:18 UTC, rikki cattermole wrote:Bug on Eset's side. They are misbehaving in some way. You can confirm that this is the case by installing an AV like Avast with full firewall capability turned on (may need to pay, but worth while to confirm). The reason I am confident that it is a bug on the AV side and not D's is because I don't remember hearing about this happening before. It may be possible to add a workaround on our end, but we'll need Eset on our side for that I think. Based upon a quick search on Google, its looking like Eset consider this a feature not a bug. https://forum.unity.com/threads/getthreadcontext-failed.140925/On 09/11/2019 4:21 AM, Dennis wrote:Thanks for feedback, I've tried it on more servers where it actually worked as you both described. At the end the difference was Eset antivirus installed. I had it whole disabled to eliminate exactly this but only after it's uninstall it started to work.. So some crap was still active in it. Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together." So bug or not a bug?On Friday, 8 November 2019 at 15:01:26 UTC, tchaloupka wrote:Yes. This is looking more and more like an environment issue, not a bug on druntime's end. Potentially AV related (I use Avast) and I'm on Windows 10 Home 64bit.All ended up same within a few secs.I tried it a few times on my Windows 10 laptop with dmd 2.088, it just sat there for minutes using ~14% CPU (note: I have 8 logical processors) taking 10 Mb, and nothing appeared in the console. So unfortunately I couldn't reproduce it either.
Nov 08 2019
On Friday, 8 November 2019 at 15:54:56 UTC, tchaloupka wrote:Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together."But isn't that the purpose of antivirus software? Isn't the whole point to allow it to be able to interfere with the execution of other programs?
Nov 08 2019
On Friday, 8 November 2019 at 20:30:28 UTC, bachmeier wrote:On Friday, 8 November 2019 at 15:54:56 UTC, tchaloupka wrote:It's not OK if the interference consists of injecting random bugs into legitimate programs. Antivirus programs have a pretty awful track record in this regard. I can't think of an antivirus product that I used that didn't turn out to be defective in one way or another.Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together."But isn't that the purpose of antivirus software? Isn't the whole point to allow it to be able to interfere with the execution of other programs?
Nov 08 2019
On Saturday, 9 November 2019 at 03:02:12 UTC, Gregor Mückl wrote:On Friday, 8 November 2019 at 20:30:28 UTC, bachmeier wrote:OT, but the last time I used an AV was to disinfect a relative's laptop... that already had AV on it. I don't see the point of it as a preventative measure, especially on mobile devices like phones. But, perhaps that's myopia.On Friday, 8 November 2019 at 15:54:56 UTC, tchaloupka wrote:It's not OK if the interference consists of injecting random bugs into legitimate programs. Antivirus programs have a pretty awful track record in this regard. I can't think of an antivirus product that I used that didn't turn out to be defective in one way or another.Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together."But isn't that the purpose of antivirus software? Isn't the whole point to allow it to be able to interfere with the execution of other programs?
Nov 08 2019
On Saturday, 9 November 2019 at 03:02:12 UTC, Gregor Mückl wrote:On Friday, 8 November 2019 at 20:30:28 UTC, bachmeier wrote:When you install antivirus on your computer, you're giving it control over your computer. If other programs had a way around that, it would be useless. You could make the argument that the antivirus is crappy at its job. There's nothing the D compiler (or a compiler for any other language) can do about it.On Friday, 8 November 2019 at 15:54:56 UTC, tchaloupka wrote:It's not OK if the interference consists of injecting random bugs into legitimate programs. Antivirus programs have a pretty awful track record in this regard. I can't think of an antivirus product that I used that didn't turn out to be defective in one way or another.Well still it's pretty unfortunate if some 3rd side app can brick the GC runtime. We can't just say to customers "You've got Eset installed? Screw you it won't work together."But isn't that the purpose of antivirus software? Isn't the whole point to allow it to be able to interfere with the execution of other programs?
Nov 09 2019
On Saturday, 9 November 2019 at 15:43:45 UTC, bachmeier wrote:On Saturday, 9 November 2019 at 03:02:12 UTC, Gregor Mückl wrote:I'm trying to make exactly that argument: ALL antivirus software I've ever used turned out to be crappy. They induced bugs in legitimate, clean programs in various ways. They were somewhat successful at stopping the occasional malicious file, but the collateral damage is not pretty. I could list a few fun examples, if you want. There is a reason why first level support often tells people to temporarily deactivate their virus scanner and try again. This works more often than it actually should.It's not OK if the interference consists of injecting random bugs into legitimate programs. Antivirus programs have a pretty awful track record in this regard. I can't think of an antivirus product that I used that didn't turn out to be defective in one way or another.When you install antivirus on your computer, you're giving it control over your computer. If other programs had a way around that, it would be useless. You could make the argument that the antivirus is crappy at its job. There's nothing the D compiler (or a compiler for any other language) can do about it.
Nov 09 2019
On Saturday, 9 November 2019 at 03:02:12 UTC, Gregor Mückl wrote:It's not OK if the interference consists of injecting random bugs into legitimate programs. Antivirus programs have a pretty awful track record in this regard. I can't think of an antivirus product that I used that didn't turn out to be defective in one way or another.I consider most of the AV software snake oil. They slow down your OS too. The only AV software I trust is Windows Defender. For simple reason. It's in AV vendors best interest for your PC to be infected, because it sells their software and other "malware removal" "PC optimization" crapware. In case of Microsoft, it's in their best interest not to have any viruses at all because it reflects on them badly as a platform. Also, it's in their best interest to minimize any slowdowns and inconveniences AV brings. Also, these are different times. In the pre-internet times virus infections were prevalent, carried over with USB drives or in drive-by Java applet/Flash attacks. Modern web environment is sandboxed well enough that it protects you from most attacks.
Nov 09 2019