www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10876] New: foreach and remove over associative array causes invalid data to appear

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876

           Summary: foreach and remove over associative array causes
                    invalid data to appear
           Product: D
           Version: D2
          Platform: x86_64
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: maximechevalierb gmail.com



2013-08-23 09:35:40 PDT ---
This issue shows up using DMD64 D Compiler v2.063 on Linux.

I have a foreach loop in which I remove values from an associative array. The
behavior is completely broken. Items which are not in the associative array
(including the null value!) end up appearing in the iteration. The following
code snippet from my compiler produces the issue:

writeln("associative array length: ", allocState.length);
writeln("keys length: ", allocState.keys.length);

// Remove dead values from the alloc state
foreach (value, allocSt; allocState)
{
    writeln("key: ", value);
    writeln((value in allocState)? true:false);

    if (value is null)
        writeln(allocState[null]);

    if (liveInfo.liveAtEntry(value, block) is false)
        allocState.remove(value);
}

The output is as follows:

associative array length: 4
keys length: 4
key: $22 = arg 6 "v"
true
key: $20 = arg 4 "o"
true
key: $8 = add_i32 $1, $21
true
key: $21 = arg 5 "i"
true
key: $7 = phi [call_reg(5662):0 => $14, call_cont(565C):0 => $12]
false
key: $12 = load_u32 $20, $11
false
key: $12 = load_u32 $20, $11
false
key: $1 = add_i32 32, $0
false
key: $1 = add_i32 32, $0
false
key: $0 = mul_i32 8, $7
false
key: $0 = mul_i32 8, $7
false
key: null
false
core.exception.RangeError jit.jit(1410): Range violation

As you can see, most of the keys listed are not in the associative array, and
the null value appears as a key, out of nowhere, which causes a crash.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 23 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com



09:36:54 PDT ---
This will be marked as invalid, you can't safely iterate over an associative
array and remove its keys while iterating.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 23 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876


hsteoh quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh quickfur.ath.cx



Modifying a container that you're iterating over has undefined results, because
you're iterating over a set of keys that changes from under you as you go. In
the case of AAs, you may end up dereferencing invalid pointers.

The safe way to do it is to get an array of keys and iterate over that:

    foreach (k; aa.keys) {
        auto value = aa[k];
        ...
        if (...)
            aa.remove(k);
    }

N.B. you can only use .keys, not .byKey, because the latter also amounts to
walking the data structure while it is being modified, which will cause
problems. Using .keys is OK because it creates an array of keys (a snapshot of
the set of keys at that point in time) before starting to modify the data
structure.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 23 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876




14:16:26 PDT ---

 N.B. you can only use .keys, not .byKey, because the latter also amounts to
 walking the data structure while it is being modified, which will cause
 problems. Using .keys is OK because it creates an array of keys (a snapshot of
 the set of keys at that point in time) before starting to modify the data
 structure.
Yeah, as I learned in Issue 10821. Perhaps one could do an "!is null" check against the key if iterating via .byKey and removing at the same time? Well anyway, the safest bet is to use .keys. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 23 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876




The problem with trying to iterate over a container that's being modified at
the same time is that you can't guarantee much of anything. For example, you
could insert !is null checks, but what happens if removing the key causes the
AA to rehash itself (e.g., to optimize for space as the number of keys
shrinks)? Then your subsequent iteration will be completely screwed up.
Basically, such hacks can only work by relying on implementation details, which
are not guaranteed by the spec. IOW, it's undefined behaviour.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 23 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876




Some relevant references:

http://stackoverflow.com/questions/3346696/python-why-is-it-not-safe-to-modify-sequence-being-iterated-on

http://stackoverflow.com/questions/13981886/simultaneously-iterating-over-and-modifying-an-unordered-set

The bottomline is that modifying the container while you're iterating over it
is a bad idea (unless the container was specifically designed to support it).
It almost always doesn't do what you think it should do.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 23 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10876




14:34:18 PDT ---

 The bottomline is that modifying the container while you're iterating over it
 is a bad idea (unless the container was specifically designed to support it).
 It almost always doesn't do what you think it should do.
Btw, we should put this info up on dlang.org if it's not already there. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 23 2013