www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4489] New: std.array.insert is slow

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489

           Summary: std.array.insert is slow
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: torhu yahoo.com



DMD 2.046.

std.array.insert seemed slow when I tried it, so I made a simple benchmark
comparing it to a simple alternative using memmove.

When compiled with -O -release -inline, the lowest numbers I got where 127 ms
for myInsert and 254 ms for std.array.insert.  That's a 100% speedup just from
using the most obvious implementation.

---
import std.array;
import std.perf;
import std.stdio;
import std.c.string;


enum COUNT = 100_000;


void myInsert(T)(ref T[] arr, size_t where, T value)
{
    assert(where <= arr.length);

    size_t oldLen = arr.length;
    arr.length = arr.length + 1;

    T* ptr = arr.ptr + where;
    memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof);
    arr[where] = value;
}


void bench(alias insert)(ref int[] arr)
{
    auto pc = new PerformanceCounter;
    pc.start();
    foreach (_; 0..10) {
        arr.length = 0;
        foreach (i; 0..COUNT)
            insert(arr, i, i);
    }
    pc.stop();
    writeln(pc.milliseconds);
}


void main()
{
    int[] arr = new int[COUNT];

    bench!(myInsert)(arr);
    bench!(std.array.insert)(arr);
}
---

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 20 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489




I just revisited this.  I tried with with DMD 2.046 again, just to be sure I'm
doing the same benchmark, and I got:
myInsert(T)     130
insert(T,Range) 240

Basically the same numbers.  Then I tried DMD 2.059 and got this:
myInsert(T)     123
insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) :
T))
        78716

Wow. About 700 times slower.  Not a very realistic use case, though.

So I tried inserting in the middle instead (i/2 instead if i):
myInsert(T)     19694
insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) :
T))
        210606

Well, that's only about 20 times slower.

I don't what's going on here, but this is pretty much in line with my general
impression of Phobos 2.  There should be a big ALPHA sticker on the whole
thing.


Signed,

Disgruntled D1/Tango user

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 01 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489




16:46:38 PDT ---
Brought the code to date with:

----
import std.array;
import std.datetime;
import std.stdio;
import std.c.string;

enum COUNT = 100_000;


void myInsert(T)(ref T[] arr, size_t where, T value)
{
    assert(where <= arr.length);

    size_t oldLen = arr.length;
    arr.length = arr.length + 1;

    T* ptr = arr.ptr + where;
    memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof);
    arr[where] = value;
}


void bench(alias insert)(ref int[] arr)
{
    auto sw = StopWatch(AutoStart.yes);
    foreach (_; 0..10) {
        arr.length = 0;
        foreach (i; 0..COUNT)
            insert(arr, i, i);
    }
    writeln(sw.peek.msecs);
}


void main()
{
    int[] arr = new int[COUNT];

    bench!(myInsert)(arr);
    bench!(std.array.insertInPlace)(arr);
}
----

I was able to measure a notable the performance mismatch. The culprit is:

----
void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff)
    if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U))
{
    T[staticConvertible!(T, U)] stackSpace = void;
    auto range = chain(makeRangeTuple(stackSpace[], stuff).expand);
    insertInPlaceImpl(array, pos, range);
}
----

I replaced that with:

----
void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff)
    if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U))
{
    immutable oldLen = array.length;
    array.length += stuff.length;
    auto ptr = array.ptr + pos;
    import core.stdc.string;
    memmove(ptr + stuff.length, ptr, (oldLen - pos) * T.sizeof);

    ptr = array.ptr + pos;
    foreach (i, Unused; U)
    {
        emplace(ptr + i, stuff[i]);
    }
}
----

and was able to measure similar performance.

Clearly the code once approved belongs to the entire team, not only to its
author, and I should be a better man than this, but I can't stop noticing the
irony of this given
http://forum.dlang.org/thread/mailman.811.1343564241.31962.digitalmars-d puremagic.com

I'll make a pull request soon.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 01 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com



11:27:53 PDT ---
Actually the culprit is the final step and not the simmingly complicated
packing of parameters into a range. I'm talking of this one:

private void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range
stuff)
    if(isInputRange!Range &&
       (is(ElementType!Range : T) ||
        isSomeString!(T[]) && is(ElementType!Range : dchar)))
{
    auto app = appender!(T[])();
    app.put(array[0 .. pos]);
    app.put(stuff);
    app.put(array[pos .. $]);
    array = app.data;
}

I'm actually surprized that there is no other specialization but surely there
was was one. Basically the thing above allocates an appender and does up to 3
resizes (1 per each put).

To taste waters I tried this:

    static if(hasLength!Range)
    {
        import core.stdc.string;
        immutable to_insert = stuff.length;
        immutable len = array.length;
        T* ptr = array.ptr+pos;
        array.length += to_insert;
        memmove(ptr+to_insert, ptr, (len-pos)*T.sizeof);
        //TODO: need to blit T.init over vacant places if T.__dtor exists?
        copy(stuff, array[pos..pos+to_insert]); 
    }
    else
    {// Generic implementation 
        auto app = appender!(T[])();
        app.put(array[0 .. pos]);
        app.put(stuff);
        app.put(array[pos .. $]);
        array = app.data;
    }

And for inserting at front the numbers were 
23387
23599

For inserting at i-th (last) index  I got:
58
92
These last ones were not very stable but indicate that the this packing into a
range also adds up.

I'll get myself busy with pull request since I was adding this packing thingy.

Still no idea when and how the underlying insertInPlaceImpl become 'create a
copy and return'. Sorry wasn't on my watch :)
Back when I last touched it, it looked like this:

void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range stuff)
    static if(hasLength!Range &&
              is(ElementEncodingType!Range : T) &&
              !is(T == const T) &&
              !is(T == immutable T))
    {
        immutable
            delta = stuff.length,
            oldLength = array.length,
            newLength = oldLength + delta;

        // Reallocate the array to make space for new content
        array = (cast(T*) core.memory.GC.realloc(array.ptr,
                        newLength * array[0].sizeof))[0 .. newLength];
        assert(array.length == newLength);

        // Move data in pos .. pos + stuff.length to the end of the array
        foreach_reverse (i; pos .. oldLength)
        {
            // This will be guaranteed to not throw
            move(array[i], array[i + delta]);
        }

        // Copy stuff into array
        copy(stuff, array[pos .. pos + stuff.length]);
    }
    else
    {
        auto app = appender!(T[])();
        app.put(array[0 .. pos]);
        app.put(stuff);
        app.put(array[pos .. $]);
        array = app.data;
    }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 28 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489




13:33:58 PDT ---
Pending a bugfix in compiler:
https://github.com/D-Programming-Language/phobos/pull/858

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 15 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |regression



11:14:03 PST ---
Given the magnitude of slowdown I belive it worths the title of regression.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 10 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla digitalmars.com



22:24:00 PST ---

 Pending a bugfix in compiler:
 https://github.com/D-Programming-Language/phobos/pull/858
Phobos pulls are not part of the compiler. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 10 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489




00:19:55 PST ---


 Pending a bugfix in compiler:
 https://github.com/D-Programming-Language/phobos/pull/858
Phobos pulls are not part of the compiler.
Sorry, my comment must have been poorly phrased. What I meant was that there is a pull and it's waiting on a fix for the compiler on 64 bits. Either way I worked it around just a moment before you fixed the bug :) Now I going to revert the workaround in this pull and so that it's hopefully merged in time for 2.061. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 11 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489




Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/14e457b37d0a870a8ed7dd901e10679456edc6b3
fix issue 4489 std.array.insert is slow

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 13 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4489


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED



09:23:17 PST ---

 Commit pushed to master at https://github.com/D-Programming-Language/phobos
 
 https://github.com/D-Programming-Language/phobos/commit/14e457b37d0a870a8ed7dd901e10679456edc6b3
 fix issue 4489 std.array.insert is slow
With latest master and compile options: -noboundscheck -O -inline -release I see no measurable difference in inserting at front or end as it should have been. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 13 2012