digitalmars.D - Memory Corruption with AAs
- dsimcha (8/8) Apr 02 2010 Has anyone else still been noticing difficult to reproduce memory corrup...
- Walter Bright (3/11) Apr 02 2010 1. is it multithreaded?
- dsimcha (7/18) Apr 02 2010 The program as a whole is multithreaded, but the part where the bug occu...
- Walter Bright (6/26) Apr 02 2010 It should be easier to find then, by removing all the main code and ever...
- dsimcha (11/37) Apr 02 2010 The code has so many dependencies (both other code from the same project...
- Walter Bright (5/10) Apr 02 2010 Andrei is working on the design of the D collection class library. After...
- Andrei Alexandrescu (5/18) Apr 03 2010 I wouldn't call it research, but I agonized a fair amount over it. I
- Michel Fortin (13/32) Apr 04 2010 I think this is a sound decision. And I'm not necessarily talking about
- dsimcha (6/8) Apr 04 2010 The way I'm picturing this being implemented is that a GC'd class instan...
- Michel Fortin (17/29) Apr 04 2010 That wouldn't work with realloc: realloc copies to a new location then
- Steven Schveighoffer (8/37) Apr 05 2010 Another problem is if the elements of the container have references to
- Jordi (7/45) Apr 02 2010 I am having exactly the same situation. My project, which is also quite
- Walter Bright (2/3) Apr 02 2010 Any chance you can reduce it to a small test case?
- Jordi (14/18) Apr 02 2010 Actually i am trying to add an "autotest" mode to my project to be able
- Walter Bright (5/25) Apr 02 2010 Since it is single-threaded, it should crash in the same place in the sa...
- Steve Teale (7/11) Apr 02 2010 According to an article I was reading that constancy does not apply with...
- Steve Teale (3/18) Apr 02 2010 Got it - http://gcc.gnu.org/wiki/DebuggingGCC
- Jordi (10/30) Apr 03 2010 Well, i don't know if this applies to my case, but it is definitely rand...
- Walter Bright (3/23) Apr 03 2010 It links to this page which shows how to turn it off:
- Steven Schveighoffer (12/24) Apr 02 2010 Are you using the latest trunk for druntime, or the stock 2.042 version?...
- Steven Schveighoffer (10/22) Apr 02 2010 I just thought of a way to rule out or not the stomping patch, as long a...
Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.
Apr 02 2010
dsimcha wrote:Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.1. is it multithreaded? 2. does your code have any dangling pointers into AAs?
Apr 02 2010
== Quote from Walter Bright (newshound1 digitalmars.com)'s articledsimcha wrote:The program as a whole is multithreaded, but the part where the bug occurs is an initialization routine that is executed before any threads other than the main one are launched. As far as the dangling pointers question, I don't understand how there could be dangling pointers into GC-managed memory, since if there are pointers to it, it won't be freed. (Ignoring dirty tricks that I'm not using in this case.)Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.1. is it multithreaded? 2. does your code have any dangling pointers into AAs?
Apr 02 2010
dsimcha wrote:== Quote from Walter Bright (newshound1 digitalmars.com)'s articleIt should be easier to find then, by removing all the main code and everything it calls.dsimcha wrote:The program as a whole is multithreaded, but the part where the bug occurs is an initialization routine that is executed before any threads other than the main one are launched.Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.1. is it multithreaded? 2. does your code have any dangling pointers into AAs?As far as the dangling pointers question, I don't understand how there could be dangling pointers into GC-managed memory, since if there are pointers to it, it won't be freed. (Ignoring dirty tricks that I'm not using in this case.)What I meant was, do you save any pointers into the AAs, as in: auto p = &aa[key]; ?
Apr 02 2010
== Quote from Walter Bright (newshound1 digitalmars.com)'s articledsimcha wrote:The code has so many dependencies (both other code from the same project and libraries) and is such a mess (because it's a research prototype that evolved more than it was designed and also has all kinds of speed hacks) that it would probably be easier to try to reproduce it from scratch. I'll try tonight because I've got a long train ride with nothing else to do anyhow.== Quote from Walter Bright (newshound1 digitalmars.com)'s articleIt should be easier to find then, by removing all the main code and everything it calls.dsimcha wrote:The program as a whole is multithreaded, but the part where the bug occurs is an initialization routine that is executed before any threads other than the main one are launched.Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.1. is it multithreaded? 2. does your code have any dangling pointers into AAs?No, I definitely wasn't. I almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.As far as the dangling pointers question, I don't understand how there could be dangling pointers into GC-managed memory, since if there are pointers to it, it won't be freed. (Ignoring dirty tricks that I'm not using in this case.)What I meant was, do you save any pointers into the AAs, as in: auto p = &aa[key]; ?
Apr 02 2010
dsimcha wrote:I almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.Andrei is working on the design of the D collection class library. After much thought and research, he finally came to the conclusion that a collection class should not allow the address of a member to be taken. I think his reasoning on the issue is pretty sound, and is consistent with your take on it.
Apr 02 2010
On 04/02/2010 03:53 PM, Walter Bright wrote:dsimcha wrote:I wouldn't call it research, but I agonized a fair amount over it. I think Phobos containers will all use malloc, realloc, and free for their own storage, while still being safe. AndreiI almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.Andrei is working on the design of the D collection class library. After much thought and research, he finally came to the conclusion that a collection class should not allow the address of a member to be taken. I think his reasoning on the issue is pretty sound, and is consistent with your take on it.
Apr 03 2010
On 2010-04-03 23:21:48 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:On 04/02/2010 03:53 PM, Walter Bright wrote:I think this is a sound decision. And I'm not necessarily talking about using malloc, realloc, and free (even though a container capable of using realloc is certainly a plus), but the one about decoupling the container interface from any particular memory management implementation. Question: if the container's memory isn't garbage-collected, how do you implement iterators, eh, ranges so that they are still memory-safe? -- Michel Fortin michel.fortin michelf.com http://michelf.com/dsimcha wrote:I wouldn't call it research, but I agonized a fair amount over it. I think Phobos containers will all use malloc, realloc, and free for their own storage, while still being safe.I almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.Andrei is working on the design of the D collection class library. After much thought and research, he finally came to the conclusion that a collection class should not allow the address of a member to be taken. I think his reasoning on the issue is pretty sound, and is consistent with your take on it.
Apr 04 2010
== Quote from Michel Fortin (michel.fortin michelf.com)'s articleQuestion: if the container's memory isn't garbage-collected, how do you implement iterators, eh, ranges so that they are still memory-safe?The way I'm picturing this being implemented is that a GC'd class instance exists at the top level, and then the internal implementation-detail storage that the class uses is implemented via malloc and free. This storage would get freed in the class finalizer when the instance is GC'd. In this case all you'd need to do is make the range hold a reference to the class instance so it wouldn't be GC'd.
Apr 04 2010
On 2010-04-04 09:45:36 -0400, dsimcha <dsimcha yahoo.com> said:== Quote from Michel Fortin (michel.fortin michelf.com)'s articleThat wouldn't work with realloc: realloc copies to a new location then frees the old memory if it cannot expand in place. You can't keep the old copy allocated. I've been thinking of another method to ensure safety: don't allow expanding a container as long as there are ranges pointing to it. Easily implemented with a reference count. For instance, if you have a vector container, expanding the container would invalidate ranges. Instead of allowing ranges to become invalid and potentially dangerous, just disallow expanding the container. The range would contain a pointer to its upper and lower bound, and a pointer to the container to increment the reference count when it's copied and decrement it when it's destroyed. -- Michel Fortin michel.fortin michelf.com http://michelf.com/Question: if the container's memory isn't garbage-collected, how do you implement iterators, eh, ranges so that they are still memory-safe?The way I'm picturing this being implemented is that a GC'd class instance exists at the top level, and then the internal implementation-detail storage that the class uses is implemented via malloc and free. This storage would get freed in the class finalizer when the instance is GC'd. In this case all you'd need to do is make the range hold a reference to the class instance so it wouldn't be GC'd.
Apr 04 2010
On Sun, 04 Apr 2010 09:28:44 -0400, Michel Fortin <michel.fortin michelf.com> wrote:On 2010-04-03 23:21:48 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Another problem is if the elements of the container have references to GC-managed data. This means you have to addroot any memory you allocate with malloc. Non-reference type elements of course can use C's malloc and free. This is how Tango works. -SteveOn 04/02/2010 03:53 PM, Walter Bright wrote:I think this is a sound decision. And I'm not necessarily talking about using malloc, realloc, and free (even though a container capable of using realloc is certainly a plus), but the one about decoupling the container interface from any particular memory management implementation. Question: if the container's memory isn't garbage-collected, how do you implement iterators, eh, ranges so that they are still memory-safe?dsimcha wrote:I wouldn't call it research, but I agonized a fair amount over it. I think Phobos containers will all use malloc, realloc, and free for their own storage, while still being safe.I almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.Andrei is working on the design of the D collection class library. After much thought and research, he finally came to the conclusion that a collection class should not allow the address of a member to be taken. I think his reasoning on the issue is pretty sound, and is consistent with your take on it.
Apr 05 2010
dsimcha wrote:== Quote from Walter Bright (newshound1 digitalmars.com)'s articleI am having exactly the same situation. My project, which is also quite big, crashes randomly in 2.041 and 2.042 (without patches), but it works fine in 2.040. I didn't bother reporting because i thought i was doing something wrong (being a newbie in D). Mine is single-threaded, no pointers in AAs. j.dsimcha wrote:The code has so many dependencies (both other code from the same project and libraries) and is such a mess (because it's a research prototype that evolved more than it was designed and also has all kinds of speed hacks) that it would probably be easier to try to reproduce it from scratch. I'll try tonight because I've got a long train ride with nothing else to do anyhow.== Quote from Walter Bright (newshound1 digitalmars.com)'s articleIt should be easier to find then, by removing all the main code and everything it calls.dsimcha wrote:The program as a whole is multithreaded, but the part where the bug occurs is an initialization routine that is executed before any threads other than the main one are launched.Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.1. is it multithreaded? 2. does your code have any dangling pointers into AAs?No, I definitely wasn't. I almost never do this with any data structure other than an array because, even if it works for now, I consider it a horrible violation of encapsulation because you're relying on the details of how the data structure manipulates memory. This is also why, when I designed RandAA I didn't see this as an issue until you pointed it out to me.As far as the dangling pointers question, I don't understand how there could be dangling pointers into GC-managed memory, since if there are pointers to it, it won't be freed. (Ignoring dirty tricks that I'm not using in this case.)What I meant was, do you save any pointers into the AAs, as in: auto p = &aa[key]; ?
Apr 02 2010
Jordi wrote:I am having exactly the same situation.Any chance you can reduce it to a small test case?
Apr 02 2010
Walter Bright wrote:Jordi wrote:Actually i am trying to add an "autotest" mode to my project to be able to test and benchmark different compiler versions, compilation options and even GC implementations in the context of a realistic application (in size, kind of things it does and "average" quality of code). However i don't expect to be able to narrow this down in the short term, due to the relatively big amount of data involved and the randomness of its occurrence. Sometimes it is a crash somewhere gdb cannot provide info on, sometimes i get my own assert in places i am pretty sure that are not possible unless i get some corruption or wrong result from an AA. And finally, i am not entirely sure yet that it is really the compiler's fault... if i manage to have a "send-able" case i will submit it in the bugtracker. j.I am having exactly the same situation.Any chance you can reduce it to a small test case?
Apr 02 2010
Jordi wrote:Walter Bright wrote:Since it is single-threaded, it should crash in the same place in the same way every time. This means you can put an assert on the crashing data (even without gdb it can be found by inserting printf's), and slowly work it backward to where the data goes wrong.Jordi wrote:Actually i am trying to add an "autotest" mode to my project to be able to test and benchmark different compiler versions, compilation options and even GC implementations in the context of a realistic application (in size, kind of things it does and "average" quality of code). However i don't expect to be able to narrow this down in the short term, due to the relatively big amount of data involved and the randomness of its occurrence. Sometimes it is a crash somewhere gdb cannot provide info on, sometimes i get my own assert in places i am pretty sure that are not possible unless i get some corruption or wrong result from an AA. And finally, i am not entirely sure yet that it is really the compiler's fault... if i manage to have a "send-able" case i will submit it in the bugtracker.I am having exactly the same situation.Any chance you can reduce it to a small test case?
Apr 02 2010
On Fri, 02 Apr 2010 23:01:48 -0700, Walter Bright wrote:According to an article I was reading that constancy does not apply with current Linux kernels. They deliberately randomize things to make life difficult for hackers. I'll try to find it again - it's about debugging GCC itself. Stevesame way every time. This means you can put an assert on the crashing data (even without gdb it can be found by inserting printf's), and slowly work it backward to where the data goes wrong.Since it is single-threaded, it should crash in the same place in the
Apr 02 2010
On Sat, 03 Apr 2010 06:24:20 +0000, Steve Teale wrote:On Fri, 02 Apr 2010 23:01:48 -0700, Walter Bright wrote:Got it - http://gcc.gnu.org/wiki/DebuggingGCC Right at the end.According to an article I was reading that constancy does not apply with current Linux kernels. They deliberately randomize things to make life difficult for hackers. I'll try to find it again - it's about debugging GCC itself. Stevesame way every time. This means you can put an assert on the crashing data (even without gdb it can be found by inserting printf's), and slowly work it backward to where the data goes wrong.Since it is single-threaded, it should crash in the same place in the
Apr 02 2010
Steve Teale wrote:On Sat, 03 Apr 2010 06:24:20 +0000, Steve Teale wrote:Well, i don't know if this applies to my case, but it is definitely random: - Compile - Run - it crashes after some interaction. - Run the same again - crashes immediately. I am currently trying to run it in windows to see what comes out. Trying to narrow it down makes it disappear in all of my attempts so far. j.On Fri, 02 Apr 2010 23:01:48 -0700, Walter Bright wrote:Got it - http://gcc.gnu.org/wiki/DebuggingGCC Right at the end.According to an article I was reading that constancy does not apply with current Linux kernels. They deliberately randomize things to make life difficult for hackers. I'll try to find it again - it's about debugging GCC itself. Stevesame way every time. This means you can put an assert on the crashing data (even without gdb it can be found by inserting printf's), and slowly work it backward to where the data goes wrong.Since it is single-threaded, it should crash in the same place in the
Apr 03 2010
Steve Teale wrote:On Sat, 03 Apr 2010 06:24:20 +0000, Steve Teale wrote:It links to this page which shows how to turn it off: http://gcc.gnu.org/wiki/RandomizationOn Fri, 02 Apr 2010 23:01:48 -0700, Walter Bright wrote:Got it - http://gcc.gnu.org/wiki/DebuggingGCC Right at the end.According to an article I was reading that constancy does not apply with current Linux kernels. They deliberately randomize things to make life difficult for hackers. I'll try to find it again - it's about debugging GCC itself. Stevesame way every time. This means you can put an assert on the crashing data (even without gdb it can be found by inserting printf's), and slowly work it backward to where the data goes wrong.Since it is single-threaded, it should crash in the same place in the
Apr 03 2010
On Fri, 02 Apr 2010 13:15:36 -0400, dsimcha <dsimcha yahoo.com> wrote:Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.Are you using the latest trunk for druntime, or the stock 2.042 version? Because the AAs have been changed significantly since 2.042. Not saying I'm blaming that, but let's make sure we are all on the same page. What is the AA key/value types? Do you use appending at all in your code? The AA code only uses appending in one location, that is when setting the length of its array. Can you narrow down what your code is doing when the corruption occurs? (I realize this might be impossible, but just thought I'd ask) -Steve
Apr 02 2010
On Fri, 02 Apr 2010 13:15:36 -0400, dsimcha <dsimcha yahoo.com> wrote:Has anyone else still been noticing difficult to reproduce memory corruption issues in the presence of associative arrays with 2.042? They seem to happen very infrequently and non-deterministically. I can only reproduce them in the context of a large program. However, they don't occur in 2.040 (the release before the array stomping patch), and they are clearly a result of memory corruption, as contents of arrays change from what I expect them to be to completely random-looking values inside a loop that does a lot of memory management and uses AAs heavily but doesn't modify the values.I just thought of a way to rule out or not the stomping patch, as long as your code does not depend on preventing array stomping. Change your runtime's version of lifetime.d to the version before the append stomping patch: http://www.dsource.org/projects/druntime/browser/trunk/src/rt/lifetime.d?rev=251 This version should be forwards compatible with the runtime in 2.042, so you should be able to run your code with this runtime. See if the errors still occur. -Steve
Apr 02 2010