www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 3929] New: GC.free() creates subtle bugs

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929

           Summary: GC.free() creates subtle bugs
           Product: D
           Version: 2.041
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: regression
          Priority: P2
         Component: druntime
        AssignedTo: sean invisibleduck.org
        ReportedBy: dsimcha yahoo.com



See bug 3930 for one observable manifestation of this bug, though I'm filing
this as a separate bug report because it's probably the root cause of a lot of
general weird behavior I've noticed in 2.041.

It seems that, when GC.free() is called on an array now that Steve's LRU patch
is in, the BlkInfo stays in the LRU cache, leading to subtle bugs.  The patch
seems to rely on the reference from cache preventing the array from getting
GC'd.  This is also bad in that it prevents memory from being reclaimed.  A
proper fix would be for the cache to be cleared whenever the GC runs or
GC.free() is called.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 10 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929




Clarification:  The LRU cache would avoid causing an otherwise GC-able array
from being retained by casting the pointer to size_t or something and storing
the BlkInfo struct somewhere that's not scanned completely conservatively, or
maybe by xoring it with some arbitrary pattern.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 10 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com



03:43:16 PST ---
Hm... first off, the LRU cache is only used for appending.  So in order for the
LRU cache to mistakenly access unallocated memory, you must be appending to an
invalid array.

Second, the LRU cache stores blockinfos, which should never change except in
one case.  That one case is when an array greater than half a page size is
appended to, it can grow in place by appending more pages to the blockinfo. 
This means a stale blockinfo could identify a memory block as being larger or
smaller than it actually is.  I would expect AA nodes to be small enough to not
trigger this problem.

If you clear the cache on a GC run, then you remove the cache's usefulness
since a GC run occurs only when new memory is requested.  If you clear it on GC
free, you have to stop all threads.  I don't think either of these solutions
will work, but I also don't think the LRU cache is causing problems.

Can we try to disable the cache to rule it out?  I suspect there is another
issue with AA's related to the appending fix, but I think it has to do with the
fact that AAs are allocating arrays without using the runtime functions for
arrays (it calls GC.malloc directly and builds the array structure manually).

The only trouble is, I see the only place where arrays are appended (via adding
to the length) is in the rebalance function.  This seems to coincide with
another bug report (bug 3898) but the array starts out as a static array, so
unless the stack frame is heap allocated (which it should *not* be), the append
patch should work properly.

I will create some patches to disable the cache and the whole array stomping
fix, and see if either of those makes the problem go away.  If not, then we
have to look elsewhere.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929




05:11:55 PST ---
See my fix as identified in bug 3930

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929




What about the issue of unnecessary GC retention of (possibly large) arrays?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929




18:08:26 PST ---
The cache actually depends on it.  If the block info in the cache is freed and
then reallocated as smaller blocks, using append may utilize a stale blockinfo
and allow overwriting of data that is not allocated to the array.

I agree the situation is not ideal.  Do you have any ideas on how to fix it? 
One possibility is to allow manual clearing of the cache if you want to tune
the application.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929


David Simcha <dsimcha yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|GC.free() creates subtle    |Interactions between LRU
                   |bugs                        |array cache, memory
                   |                            |recycling



Well,

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929




Well, there are really two issues here:  What happens when GC.free() gets
called and what happens when the GC collects.  As much as people (Andrei comes
to mind) hate it from a theoretical purity point of view, I believe it's
absolutely necessary to be able to GC.free() a large array while the GC sucks
as bad as it currently does.

For the GC collection case I still don't understand what's wrong with clearing
the LRU.  If I understand how this stuff works correctly, the information is
also stored at the end of every block, so on the next append the cache will be
repopulated.  It will only cost one non-cached lookup per array per GC
collection.

For the GC.free() case you raise a very good point about thread safety.  I
really don't have a good answer for it.  Calling free() doesn't have to be
cheap, but stopping the world is a little too expensive.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|sean invisibleduck.org      |schveiguy yahoo.com
         OS/Version|Windows                     |All
           Severity|regression                  |enhancement



04:39:55 PST ---

 Well, there are really two issues here:  What happens when GC.free() gets
 called and what happens when the GC collects.  As much as people (Andrei comes
 to mind) hate it from a theoretical purity point of view, I believe it's
 absolutely necessary to be able to GC.free() a large array while the GC sucks
 as bad as it currently does.
GC.free, I don't agree with. Deleting an array, yes. The issue is that the GC is unaware of the runtime's array features, it's just a mechanism to allocate and free memory. It's the same issue as something like in C++ how you should never call C's free on something that you allocated with new. deleting an array calls a runtime function that is in the perfect place -- where all my other fixes are. One thing I have thought of which should help somewhat is to mark arrays with a flag in the blockinfo attributes, thereby disallowing in-place appending to a memory block that was not allocated via the arraynew routines. My biggest worry (indirectly identified by searching for this nasty bug we just fixed) is that someone will try to append to a stack-allocated item, but because the function is a closure, it's actually heap allocated and succeeds in appending. It also gets rid of a bizarre consequence of requiring padding for class allocations.
 For the GC collection case I still don't understand what's wrong with clearing
 the LRU.  If I understand how this stuff works correctly, the information is
 also stored at the end of every block, so on the next append the cache will be
 repopulated.  It will only cost one non-cached lookup per array per GC
 collection.
I just don't like how it would affect array append performance across threads in a strangely coupled way. If you have 15 threads all doing appending, as soon as one triggers a collection cycle, they all are affected. That's the potential to degrade 8x15 arrays. While it would not be a hugely noticeable degradation, any *avoidable* degradation should be discouraged. What I would be willing to look at is having the collection cycle *selectively* remove freed blocks from the cache, and leave allocated blocks alone. Since the GC already has to stop the world, this shouldn't be too much of an extra burden. What I don't know is how it will deal with threading issues. I think I can make sure it's OK if the entire blockinfo matches before erasing. A block shouldn't be being cycled *and* inserted into the cache at the same time. Does that make sense?
 For the GC.free() case you raise a very good point about thread safety.  I
 really don't have a good answer for it.  Calling free() doesn't have to be
 cheap, but stopping the world is a little too expensive.
Clearing all the caches on every free is not an option. Removing the associated block from the cache on an array delete is a good option, and should work well enough to satisfy. BTW, I'm changing this to enhancement, because that's what it really is. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 12 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3929


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED



06:59:47 PST ---
Fixed in http://www.dsource.org/projects/druntime/changeset/496

Note that only destroying an array via delete arr will clear from the cache. 
Freeing using GC.free does not remove the data from the cache.

If it becomes necessary to use GC.free to free an array because delete is
deprecated, then we can revisit this problem.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 10 2011