digitalmars.D.bugs - [Issue 5813] New: [patch] std.array.Appender has severe performance and memory leak problems.

d-bugmail puremagic.com (311/311) Apr 03 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813

d-bugmail puremagic.com (224/224) May 20 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (7/7) Jun 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (31/31) Jun 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (22/46) Jun 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (23/41) Jun 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (12/12) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (7/7) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (9/9) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (17/17) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (26/41) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (11/11) Dec 02 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (20/20) Dec 28 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (10/10) Dec 29 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (11/11) Dec 29 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (26/33) Jan 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (13/30) Jan 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (36/55) Jan 02 2012 For part 1 (fastest possible 'chained' appender): Simply construct speci...
d-bugmail puremagic.com (23/48) Jan 02 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (6/6) Mar 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5813
d-bugmail puremagic.com (12/16) Mar 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5813

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813

           Summary: [patch] std.array.Appender has severe performance and
                    memory leak problems.
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Keywords: patch, performance
          Severity: normal
          Priority: P3
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: sandford jhu.edu
        Depends on: 5233,5780



std.array.Appender currently suffers from a major memory leak, and consequently
has a major performance problem. The primary problem comes from the fact that
Appender isn't sealed, and must let the GC collect the unused arrays it
allocates. Even in a micro-benchmark program, false pointers can lead to excess
memory being kept live. For example, a micro-benchmark which simply generates a
10MB array of reals, used anywhere between 30-116MB of ram (50 seemed about
average). For me, the major issue with Appender has been the tendency for
generating a 150MB array to run my program out of memory (and to take minutes
even when it succeeds). 

Recently, bearophile posted a set of micro-benchmarks
(http://www.digitalmars.com/d/archives/digitalmars/D/Conversion_to_string_string_building_benchmark_132129.html)
I'll highlight two results, test1 which appended to!string(uint) and test2
which appended "12345678" took 7.38 / 5.91 seconds, respectively, using the old
appender. With this patch, the times on my machine are 9.0 / 0.40 seconds,
respectively. Memory usage also decreased from 290 MB to 146 MB with this
patch.

Besides better large-scale performance, I wrote some micro-benchmarks to test
usage as a small buffer and small array builder (~ 100 elements). I saw a minor
performance increase of 3% and 10%, respectively.

Needly to say, I've seen a dramatic performance increase in my production code,
quite literally from minutes to milliseconds.

Implementation changes  
=======================================================
I've changed the basic architecture of Appender from a 'smart' array to a
linked-list of arrays. data(), therefore, has to combine these segments to
return an array. If data() performs a new allocation & copy, this array is used
to create a new head node, thus preventing repeated calls to data from
allocating additional arrays. 

The growth schedule has also changed, now the capacity of each segment is
doubled on growth, this leads to a greater than 2x length schedule, while the
length is smallish. I've limited the capacity based growth to 4 KB, i.e. 1
memory page. As no copying is done on growth, the only real cost I need to
amortize is allocation, and I felt 4 Kb was a good trade-off point between
keeping memory tight and allocation cost.  

For performance, I found that I shouldn't support a current node pointer, and
instead should use a tail node pointer. A side effect of this optimization is
that shrinkTo may drop previously allocated segments and clear only retains the
last node, as it is assumed to be the largest.

RefAppender's was updated to support the changes in Appender cleanly.

I've added some additional methods to the API: 
-slack: returns the number of elements that can be added before allocation.
-extend: Increases the slack by at least N elements
-dup: Returns: a mutable copy of the data.
-idup: Returns: a immutable copy of the data.
-this(size_t N): Construct an appender with a capacity of at least N elements


[patch] Don't forget the dependent patches (bugs 5780, 5233)  
=================================================================================

/** Implements an output range which stores appended data. This is recommended
 *  over $(D a ~= data) when appending many elements because it is more memory
 *  and computationally efficient.
 *
 *  Example:
    ----
    auto buffer = Appender!(char[])(4);             // Reserve 4 elements
    foreach( str; ["Hello"d, "World"d] ) {
        buffer.clear;                               // Use appender as a
        foreach( dchar c; str )                     // dynamically sized buffer
            put( buffer, c    );
        assert( buffer.data == str );
    }

    int[] a = [ 1, 2 ];
    auto builder = appender(a);                     // Or for array building
    builder.put(3);
    builder.put([ 4, 5, 6 ]);
    assert(builder.data == [ 1, 2, 3, 4, 5, 6 ]);
    ----
 */
struct Appender(A : T[], T) {
    private {
        enum  PageSize = 4096;          // Memory page size
        alias Unqual!T E;               // Internal element type

        struct Data {
            Data*       next;           // The next data segment
            size_t      capacity;       // Capacity of this segment
            E[]         arr;            // This segment's array

            // Initialize a segment using an existing array
            void opAssign(E[] _arr) {
                next           = null;
                capacity       = _arr.capacity;
                arr            = _arr;
                if(_arr.length < capacity) {
                    arr.length = capacity;
                    arr.length = _arr.length;
                }
                assert(_arr.ptr is arr.ptr,"Unexpected reallocation occurred");
            }

            // Create a new segment using an existing array
            this(Unqual!T[] _arr) { this = _arr; }

            // Create a new segment with at least size bytes
            this(size_t size) {
                if(size > PageSize)
                    size = (size +  PageSize-1) & ~(PageSize-1);
                auto bi  = GC.qalloc(size, !hasIndirections!T * 2);
                next     = null;
                capacity = bi.size / T.sizeof;
                arr      = (cast(E*)bi.base)[0..0];
                static assert(!false*2 == GC.BlkAttr.NO_SCAN);
            }
        }
        Data*  _head = null;                   // The head data segment
        Data*  _tail = null;                   // The last data segment

        // Returns: the total number of elements in the appender
        size_t _length() {
            size_t len = 0;
            for(auto d = _head; d !is null; d = d.next)
                len   += d.arr.length;
            return len;
        }

        // Flatten all the data segments into a single array
        E[] flatten() {
            if(_head is _tail)
                return _head ? _head.arr : null;

            size_t N   = _length;
            size_t len = N;
            size_t i   = 0;
            auto arr   = new E[N];
            for(auto d = _head; N > 0; d = d.next, N -= len, i += len) {
                len    = min(N, d.arr.length);
                memcpy(arr.ptr+i, d.arr.ptr, len * T.sizeof);
            }
            return arr;
        }

        // Returns: the next capacity size
        size_t nextCapacity() nothrow pure {
            auto   cap = _tail.capacity * T.sizeof * 2;
            return cap < PageSize ? cap : PageSize;
        }
    }

    /** Construct an appender with a given array.  Note that this does not copy
     *  the data.  If the array has a larger capacity as determined by
     *  arr.capacity, it will be used by the appender.  After initializing an
     *  appender on an array, appending to the original array will reallocate.
     */
    this(T[] arr) {
        if(arr is null) _head = _tail = new Data( 16 * T.sizeof );
        else            _head = _tail = new Data( cast(E[]) arr );
    }

    /// Construct an appender with a capacity of at least N elements.
    this(size_t N) { _head = _tail = new Data( N * T.sizeof ); }

    /// Returns: a mutable copy of the data.
    E[] dup()  {
        return _head !is _tail ? flatten : flatten.dup;
    }

    /// Returns: a immutable copy of the data.
    immutable(E)[] idup() {
        return cast(immutable(E)[]) dup;
    }

    /// Returns: the appender's data as an array.
    T[] data() {
        auto arr = flatten;
        if(_head !is _tail) {
            *_head = arr;
             _tail = _head;
        }
        return cast(T[]) arr;
    }

    /// Returns: the number of elements that can be added before allocation.
    size_t slack() {
        return _tail ? _tail.capacity - _tail.arr.length : 0;
    }

    /// Increases the slack by at least N elements
    void extend(size_t N) {
        assert( size_t.max / T.sizeof  >= N, "Capacity overflow.");
        auto size = N * T.sizeof;

        // Initialize if not done so.
        if(_tail) return _head = _tail = new Data( size );

        // Try extending
        if( auto u = GC.extend(_tail.arr.ptr, size, size) )
            return _tail.capacity = u / T.sizeof;

        // If full, add a segment
        if(_tail.arr.length == _tail.capacity)
             return _tail = _tail.next = new Data( size );

        // Allocate & copy
        auto next = Data(size);
        memcpy(next.arr.ptr, _tail.arr.ptr, _tail.arr.length * T.sizeof);
        _tail.arr       = next.arr.ptr[0.._tail.arr.length];
        _tail.capacity  = next.capacity;
    }

    /// Returns: the total number of elements currently allocated.
    size_t capacity() {
        size_t cap = 0;
        for(auto d = _head; d !is null; d = d.next)
            cap   += d.capacity;
        return cap;
    }

    /// Ensures that the capacity is a least newCapacity elements.
    void reserve(size_t newCapacity) {
        auto cap  = capacity;
        if(  cap >= newCapacity) return;
        extend( newCapacity - cap );
    }

    /// Appends to the output range
    void put(U)(U item) if ( isOutputRange!(Unqual!T[],U) ){
        // put(T)
        static if ( isImplicitlyConvertible!(U, E) ){
            if(!_head)
                _head = _tail  = new Data( 16 * T.sizeof );
            else if( _tail.arr.length == _tail.capacity  ) {   // Try extending
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, nextCapacity) )
                     _tail.capacity     = u / T.sizeof;
                else _tail = _tail.next = new Data( nextCapacity );
            }
            auto          len  = _tail.arr.length;
            _tail.arr.ptr[len] = item;
            _tail.arr          = _tail.arr.ptr[0 .. len + 1];

        // fast put(T[])
        } else static if( is(typeof(_tail.arr[0..1] = item[0..1])) ){
            auto items  = cast(E[]) item[];
            if(!_tail)
                _head   = _tail = new Data(  items.length * T.sizeof );
            auto arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
            size_t len  = items.length;
            if(arr.length < len) {                             // Try extending
                auto size  = max(items.length*T.sizeof, nextCapacity);
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, size) ) {
                    _tail.capacity = u / T.sizeof;
                    arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
                }
                if(arr.length < len) len = arr.length;
            }
            arr[0..len] = items[0..len];
            items       = items[len..$];
            _tail.arr   = _tail.arr.ptr[0 .. _tail.arr.length + len];
            if( items.length > 0 ) {               // Add a segment and advance
                _tail.next = new Data(max(items.length*T.sizeof,nextCapacity));
                _tail      = _tail.next;
                _tail.arr.ptr[0..items.length] = items[];
                _tail.arr   = _tail.arr.ptr[0..items.length];
            }

        // Everything else
        } else {
            .put!(typeof(this),U,true,Unqual!T)(this,item);
        }
    }

    // only allow overwriting data on non-immutable and non-const data
    static if(!is(T == immutable) && !is(T == const)) {
        /** Clears the managed array. This function may reduce the appender's
         *  capacity.
         */
        void clear() {
            _head     = _tail;            // Save the largest chunk and move on
            _tail.arr = _tail.arr.ptr[0..0];
        }

        /** Shrinks the appender to a given length. Passing in a length that's
         *  greater than the current array length throws an enforce exception.
         *  This function may reduce the appender's capacity.
         */
        void shrinkTo(size_t newlength) {
            for(auto d = _head; d !is null; d = d.next) {
                if(d.arr.length >= newlength) {
                    d.arr  = d.arr.ptr[0..newlength];
                    d.next = null;
                }
                newlength -= d.arr.length;
            }
            enforce(newlength == 0, "Appender.shrinkTo: newlength > capacity");
        }
    }
}

/** An appender that can update an array in-place.  It forwards all calls to an
 *  underlying appender implementation.  Calling data updates the pointer to
 *  the original array passeed in.
 */
struct RefAppender(A : T[], T) {
    private {
        Appender!(A, T) impl;
        T[] *arr;
    }

    /** Construct a ref appender with a given array reference.  This does not
     *  copy the data.  If the array has a larger capacity as determined by
     *  arr.capacity, it will be used by the appender.  $(D RefAppender)
     *  assumes that arr is a non-null value.
     *
     *  Note, do not use builtin appending (i.e. ~=) on the original array
     *  passed in until you are done with the appender, because calls to the
     *  appender override those appends.
    */
    this(T[] *arr) {
        impl     = Appender!(A, T)(*arr);
        this.arr = arr;
    }

    auto opDispatch(string fn, Args...)(Args args)
        if (is(typeof(mixin("impl." ~ fn ~ "(args)"))))  {
        mixin("return impl." ~ fn ~ "(args);");
    }

    /// Returns the appender's data as an array and updates the original array.
    T[] data() {
         auto  arr = impl.data;
         *this.arr = arr;
        return arr;
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 03 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




*Update*
The previous version ended up having value semantics, not reference semantics,
due to the fact _tail wouldn't get updated and later nodes would be ignored.
For example:

    auto test = Appender!string(16);
    void func(Writer)(Writer w) {
        w.put("The quick brown fox jumps over the lazy dog");
    }
    func(test);
    assert(test.data == "The quick brown ");
    //assert(test.data == "The quick brown fox jumps over the lazy dog");

I've updated the code to check/update tail lazily as needed. As an out of date
_tail naturally triggers memory allocation, this doesn't overly change the
performance of put on average.

==============================================================================

struct Appender(A : T[], T) {
    private {
        enum  PageSize = 4096;          // Memory page size
        alias Unqual!T E;               // Internal element type

        struct Data {
            Data*       next;           // The next data segment
            size_t      capacity;       // Capacity of this segment
            E[]         arr;            // This segment's array

            // Initialize a segment using an existing array
            void opAssign(E[] _arr) {
                next           = null;
                capacity       = _arr.capacity;
                arr            = _arr;
                if(_arr.length < capacity) {
                    arr.length = capacity;
                    arr.length = _arr.length;
                }
                assert(_arr.ptr is arr.ptr,"Unexpected reallocation occurred");
            }

            // Create a new segment using an existing array
            this(Unqual!T[] _arr) { this = _arr; }

            // Create a new segment with at least size bytes
            this(size_t size) {
                if(size > PageSize)
                    size = (size +  PageSize-1) & ~(PageSize-1);
                auto bi  = GC.qalloc(size, !hasIndirections!T * 2);
                next     = null;
                capacity = bi.size / T.sizeof;
                arr      = (cast(E*)bi.base)[0..0];
                static assert(!false*2 == GC.BlkAttr.NO_SCAN);
            }
        }
        Data*  _head = null;                   // The head data segment
        Data*  _tail = null;                   // The last data segment

        // Returns: the total number of elements in the appender
        size_t _length() {
            size_t len = 0;
            for(auto d = _head; d !is null; d = d.next)
                len   += d.arr.length;
            return len;
        }

        // Flatten all the data segments into a single array
        E[] flatten() {
            if(!_head) return null;
            if( _head && _head.next is null)
                return _head.arr;

            size_t N   = _length;
            size_t len = N;
            size_t i   = 0;
            auto arr   = new E[N];
            for(auto d = _head; N > 0; d = d.next, N -= len, i += len) {
                len    = min(N, d.arr.length);
                memcpy(arr.ptr+i, d.arr.ptr, len * T.sizeof);
            }
            return arr;
        }

        // Returns: the next capacity size
        size_t nextCapacity() nothrow pure {
            auto   cap = _tail.capacity * T.sizeof * 2;
            return cap < PageSize ? cap : PageSize;
        }
    }

    /** Construct an appender with a given array.  Note that this does not copy
     *  the data.  If the array has a larger capacity as determined by
     *  arr.capacity, it will be used by the appender.  After initializing an
     *  appender on an array, appending to the original array will reallocate.
     */
    this(T[] arr) {
        if(arr is null) _head = _tail = new Data( 16 * T.sizeof );
        else            _head = _tail = new Data( cast(E[]) arr );
    }

    /// Construct an appender with a capacity of at least N elements.
    this(size_t N) { _head = _tail = new Data( N * T.sizeof ); }

    /// Returns: the maximum length that can be accommodated without allocation
    size_t capacity() {
        size_t cap = 0;
        for(auto d = _head; d !is null; d = d.next)
            cap   += d.capacity;
        return cap;
    }

    /// Returns: a mutable copy of the data.
    E[] dup()  {
        if(_head && _head.next is null)
            return flatten.dup;
        return flatten;
    }

    /// Returns: a immutable copy of the data.
    immutable(E)[] idup() {
        return cast(immutable(E)[]) dup;
    }

    /// Returns: the appender's data as an array.
    T[] data() {
        auto arr = flatten;
        if(_head !is _tail) {
            *_head = arr;
             _tail = _head;
        }
        return cast(T[]) arr;
    }

    /** Reserve at least newCapacity elements for appending.  Note that more
     *  elements may be reserved than requested.  If newCapacity < capacity,
     *  then nothing is done.
     */
    void reserve(size_t newCapacity) {
        auto cap  = capacity;
        if(  cap >= newCapacity) return;

        auto size = ( newCapacity - cap) * T.sizeof;

        // Initialize if not done so.
        if(!_head) return _head = _tail = new Data( size );

        // Update tail
        while(_tail.next !is null) _tail = _tail.next;

        // Try extending
        if( auto u = GC.extend(_tail.arr.ptr, size, size) )
            return _tail.capacity = u / T.sizeof;

        // If full, add a segment
        if(_tail.arr.length == _tail.capacity)
             return _tail = _tail.next = new Data( size );

        // Allocate & copy
        auto next = Data(size);
        memcpy(next.arr.ptr, _tail.arr.ptr, _tail.arr.length * T.sizeof);
        _tail.arr       = next.arr.ptr[0.._tail.arr.length];
        _tail.capacity  = next.capacity;
    }

    /// Appends to the output range
    void put(U)(U item) if ( isOutputRange!(Unqual!T[],U) ){
        // Transcoding is required to support char[].put(dchar)
        static if(isSomeChar!T && isSomeChar!U &&  T.sizeof < U.sizeof){
            E[T.sizeof == 1 ? 4 : 2] encoded;
            auto len = std.utf.encode(encoded, item);
            return put(encoded[0 .. len]);

        // put(T)
        } else static if(isImplicitlyConvertible!(U, E)) {
            if(!_head)
                _head = _tail  = new Data( 16 * T.sizeof );
            else if( _tail.arr.length == _tail.capacity  ) {   // Try extending
                while(_tail.next !is null) _tail = _tail.next; // Update tail
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, nextCapacity) )
                     _tail.capacity     = u / T.sizeof;
                else _tail = _tail.next = new Data( nextCapacity );
            }
            auto          len  = _tail.arr.length;
            _tail.arr.ptr[len] = item;
            _tail.arr          = _tail.arr.ptr[0 .. len + 1];

        // fast put(T[])
        } else static if (is(typeof(_tail.arr[0..1] = item[0..1]))) {
            auto items  = cast(E[]) item[];
            if(!_head)
                _head   = _tail = new Data(  items.length * T.sizeof );
            auto arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
            size_t len  = items.length;
            if(arr.length < len) {                             // Try extending
                while(_tail.next !is null) {                   // Update tail
                    _tail = _tail.next;
                    arr   = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
                }
                auto size  = max(items.length*T.sizeof, nextCapacity);
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, size) ) {
                    _tail.capacity = u / T.sizeof;
                    arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
                }
                if(arr.length < len) len = arr.length;
            }
            arr[0..len] = items[0..len];
            items       = items[len..$];
            _tail.arr   = _tail.arr.ptr[0 .. _tail.arr.length + len];
            if( items.length > 0 ) {               // Add a segment and advance
                _tail.next = new Data(max(items.length*T.sizeof,nextCapacity));
                _tail      = _tail.next;
                _tail.arr.ptr[0..items.length] = items[];
                _tail.arr   = _tail.arr.ptr[0..items.length];
            }

        // Kitchen sink
        } else {
            .put!(typeof(this),U,true)(this,item);
        }
    }

    // only allow overwriting data on non-immutable and non-const data
    static if(!is(T == immutable) && !is(T == const)) {
        /** Clears the managed array. This function may reduce the appender's
         *  capacity.
         */
        void clear() {
            _head     = _tail;            // Save the largest chunk and move on
            if(_head) {
                _head.arr  = _head.arr.ptr[0..0];
                _head.next = null;
            }
        }

        /** Shrinks the appender to a given length. Passing in a length that's
         *  greater than the current array length throws an enforce exception.
         *  This function may reduce the appender's capacity.
         */
        void shrinkTo(size_t newlength) {
            for(auto d = _head; d !is null; d = d.next) {
                if(d.arr.length >= newlength) {
                    d.arr  = d.arr.ptr[0..newlength];
                    d.next = null;
                }
                newlength -= d.arr.length;
            }
            enforce(newlength == 0, "Appender.shrinkTo: newlength > capacity");
        }
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

May 20 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




This patch passes the unittests listed in Issue 5663 - std.array.Appender.put
bug

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 09 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com



11:58:55 PDT ---
I think this is a good idea to do, but the code is somewhat obfuscated.  For
example, you are returning a value from a void function (extend), and doing
some weird shortcuts (like !false*2).  There are also some style differences
from the current D style guidelines.  These will have to be fixed before it's
put into phobos.

One technical point -- the limit of multiples of page sizes when requesting
more than one page is already done by the GC.  That is, If you request more
than 1/2 page, you will get a multiple of pages.  I think the code will perform
exactly as well if the multiple-of-page limitation is removed from your code,
and this then does not make any assumptions about how big a page size is (if it
ever changes).

Do you want to do a pull request and have the code reviewed?  I think it's
definitely worth changing Appender to do this.

One further thing -- I planned on making two versions of Appender: one that is
a reference type, and one that is not.  The version that is not should allow
filling an existing an array without allocating anything.  The current Appender
(the one in phobos now) always allocates an pImpl struct, even if the array
never needs to be extended.

The tradeoff is that the non-reference appender is not returnable, and you
can't make copies of it (i.e. this(this) is  disabled).

Is this possible with your code?  If so, and it's easy enough, can you create a
version that does this too?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 09 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813





 I think this is a good idea to do, but the code is somewhat obfuscated.  For
 example, you are returning a value from a void function (extend), and doing
 some weird shortcuts (like !false*2).  There are also some style differences
 from the current D style guidelines.  These will have to be fixed before it's
 put into phobos.

The obfuscation generally comes from a desire to minimize multi-lines to
one-liners. But if you think these are too compressed, I'll change them to
something cleaner. Regarding D style guidelines, there has been so much
discussion over them, that I don't know if/where there current guidelines
exist/are. And the existing Phobos codebase is a mash-up of different styles. I
mean, even ddoc comments vary between /++ and /**. Is there anything in
particular you'd recommend I'd change?

 One technical point -- the limit of multiples of page sizes when requesting
 more than one page is already done by the GC.  That is, If you request more
 than 1/2 page, you will get a multiple of pages.  I think the code will perform
 exactly as well if the multiple-of-page limitation is removed from your code,
 and this then does not make any assumptions about how big a page size is (if it
 ever changes).

Thank you. I'll remove that line.

 Do you want to do a pull request and have the code reviewed?  I think it's
 definitely worth changing Appender to do this.

Yes, but I haven't gotten around to learning/installing Git on Windows yet.

 One further thing -- I planned on making two versions of Appender: one that is
 a reference type, and one that is not.  The version that is not should allow
 filling an existing an array without allocating anything.  The current Appender
 (the one in phobos now) always allocates an pImpl struct, even if the array
 never needs to be extended.
 
 The tradeoff is that the non-reference appender is not returnable, and you
 can't make copies of it (i.e. this(this) is  disabled).
 
 Is this possible with your code?  If so, and it's easy enough, can you create a
 version that does this too?

The simple solution would be to emplace the first data node inside the appender
struct. But I'm not sure the benefit of such a solution: appender always
checks/extends-to the capacity of an array, which involves taking the GC lock,
etc. So since the taking lock can't be averted, avoiding an extra 16-byte
allocation isn't much worse. But I can always implement it using a template
parameter + static ifs, so it shouldn't be hard to try it out. Hmm... I could
also fold in RefAppender that way.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 09 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




14:22:32 PDT ---

 The obfuscation generally comes from a desire to minimize multi-lines to
 one-liners.

I don't mind the one-liners, but things like !hasIndirections!T*2 is better
written as hasIndirections!T ? 0 : GC.BlkAttr.NO_SCAN.  It's more decipherable
and self-documenting.  Shortcuts that hide the intention of the code are not
worth it.  The one-liners that assign to several things at once are at least
clear.  Although I think this one should be split into at least two statements:

return _tail = _tail.next = new Data( size );

 Regarding D style guidelines, there has been so much
 discussion over them, that I don't know if/where there current guidelines
 exist/are. And the existing Phobos codebase is a mash-up of different styles. I
 mean, even ddoc comments vary between /++ and /**. Is there anything in
 particular you'd recommend I'd change?

I think the best thing to do for now is to follow the style of the original
code.  TBH, I'd say submit the pull request and the review will flesh out
individual things that should be fixed (the github review process is really
good).

 Do you want to do a pull request and have the code reviewed?  I think it's
 definitely worth changing Appender to do this.

 
 Yes, but I haven't gotten around to learning/installing Git on Windows yet.

Highly recommend progit.org for learning git.  Just getting git working on
Windows myself...

 The simple solution would be to emplace the first data node inside the appender
 struct. But I'm not sure the benefit of such a solution: appender always
 checks/extends-to the capacity of an array, which involves taking the GC lock,
 etc. So since the taking lock can't be averted, avoiding an extra 16-byte
 allocation isn't much worse. But I can always implement it using a template
 parameter + static ifs, so it shouldn't be hard to try it out. Hmm... I could
 also fold in RefAppender that way.

You are probably right, I added the code to fill to capacity after thinking
about creating such a type.  The capacity check definitely needs to take the
global lock.  Maybe the non-reference appender would not try to expand to
capacity (that is, after all, an optimization).  In any case, this isn't as
important as the other fixes you've already created.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 09 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813


Vladimir Panteleev <thecybershadow gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow gmail.com



02:45:59 PST ---
What's this status of this patch?

Note that today, it's extremely unlikely for patches from Bugzilla to be
incorporated unless someone creates a GitHub pull request for them.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




04:49:25 PST ---
I think we were waiting on Rob to change this into a pull request.  Not sure if
that's going to happen?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




I'm planning on putting in a pull request for this, probably sometime over the
Christmas holidays(I hope). Currently, I'm in the middle of finishing up my
Ph.D. thesis and moving to a new city and job, so my bandwidth is very limited
at the moment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




10:33:44 PST ---
I did some benchmarks with this and some other variants.

At least for small strings, this code performs worse than the current appender.
Judging by the description it's optimized for large blocks of data, but in the
case of many small strings it can perform as bad as 50% worse than the current
appender for typical cases (e.g. building a few KB of HTML).

Here is my test code:

http://dump.thecybershadow.net/a05a2c4dc7cd2a8e21b3a447c7eff150/test2.d
http://dump.thecybershadow.net/eff5c7ef81e18bf75d8462ffe16a52e4/appender.d

I was able to get 25% more performance for my test case from the standard
appender by adding a method that accepts multiple arguments (which preallocates
only once). My code is a hack, but it would be nice to see something like that
in the official appender.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813





 I did some benchmarks with this and some other variants.
 
 At least for small strings, this code performs worse than the current appender.
 Judging by the description it's optimized for large blocks of data, but in the
 case of many small strings it can perform as bad as 50% worse than the current
 appender for typical cases (e.g. building a few KB of HTML).

 Here is my test code:
 
 http://dump.thecybershadow.net/a05a2c4dc7cd2a8e21b3a447c7eff150/test2.d
 http://dump.thecybershadow.net/eff5c7ef81e18bf75d8462ffe16a52e4/appender.d
 
 I was able to get 25% more performance for my test case from the standard
 appender by adding a method that accepts multiple arguments (which preallocates
 only once). My code is a hack, but it would be nice to see something like that
 in the official appender.

Thank you for the additional benchmarks. The primary purpose of my code was to
optimize for memory usage, which in turn optimizes for computation/performance
(i.e. number of main memory copies and memory allocations). And, I also
optimized appender for use as a dynamic buffer. But this is all big-O stuff.

I did include a short string test in my benchmark set while I was optimizing
the code; my first implementations (never posted) did have some major slow
downs, which I fixed by re-working the put routine. So there still might still
be some little-o issues there. And I will definitely add a void put(U...)(U
items) overload.

I don't think it would affect this benchmark in any ways, but my appender is
somewhat dependent on my patch to put (Issue 5233).

I've run your benchmarks and played with them a little bit. Running the
benchmarks sequentially is definitely generating GC warm-up issues. I've
recommend running them cold in the future, though the qualitative results are
the same and cold runs are the ideal situation for the old appender.

I want to rewrite my __ctfe version appender, since Don added support for
structs and new, so I'll also look at the short string performance of appender
while I'm at it I'll do some profiling and look into this some more. I'd really
like for appender to be the right solution for anything bigger than `hello` ~
`world`.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




I did some more digging. I reworked the put routine itself and changed the
growth strategy to be as aggressive as the original appender. This got me down
from a 52% slowdown, to a 20% slowdown. I've still have more work to do, but to
put this into perspective, a 20% slowdown is equivalent to adding an extra if
conditional to the put routine. 
Again, thanks Vladimir for these benchmarks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 02 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




14:56:17 PST ---
I spent some more time working on an optimized appender. Things I found:

1) Extra indirection levels are performance killers
2) I failed to create a chained appender (like the one in this patch) that was
faster than a copying one, for my test cases
3) At least on Windows and with short strings, simple slice copy trumps all
memcpy implementations I tried
4) You can write a nextCapacity function with no branches
5) It's better to store an "end" pointer than a "capacity" integer

I put all my attempts and benchmarking code here:
https://github.com/CyberShadow/DAppenderResearch

The version I went with is based on fastappender2 in that repo.
The final version is here:
https://github.com/CyberShadow/ae/blob/master/utils/appender.d

For a chained appender, I would move the tail members into the main structure.
My attempt at a fast chained appender is in fastappender4.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 28 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com



11:58:44 PST ---
Great work! Why not a pull request?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 29 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




22:28:34 PST ---
My code is more of a research project. It can't be safely copied, only supports
items 1 byte in size, and doesn't care much in terms of UTF encoding. So, it
can't be a drop-in replacement without a lot of hammering, and even then you
have to decide whether you want it to be fast or copyable. I hope my research
will be useful for others (like Rob) trying to improve the Phobos appender,
though.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 29 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




Vladimir, the code in 
https://github.com/CyberShadow/ae/blob/master/utils/appender.d
seems to be under the MPL, which isn't Phobos compatible. What license is the
code in: 
https://github.com/CyberShadow/DAppenderResearch
under? If it's not under a Boost compatible license, could you make it
available under one? 

Thanks for all this work.


 1) Extra indirection levels are performance killers

I agree.

 2) I failed to create a chained appender (like the one in this patch) that was
 faster than a copying one, for my test cases

The primary purpose of a chained appender is to minimize memory leaks, memory
usage and maximizing large scale performance. That said, I did write a faster
chained appender for your benchmarks; however I did this by initializing the
appender with a page of memory, which definitely should not be the default
behavior. That said, one viable strategy for appender would be to keep 1 page
of memory around as a scratch pad. 

 3) At least on Windows and with short strings, simple slice copy trumps all
 memcpy implementations I tried
 4) You can write a nextCapacity function with no branches

Good to know.

 5) It's better to store an "end" pointer than a "capacity" integer

I'll look into this, but this you can not make this judgment call based on the
performance of a char appender. The issue is that anything with a power of 2
T.sizeof performs integer multiplication/division using shift operations
instead of the actual underlying instructions, both of which are very slow.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 02 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




12:10:18 PST ---

 Vladimir, the code in 
 https://github.com/CyberShadow/ae/blob/master/utils/appender.d
 seems to be under the MPL, which isn't Phobos compatible. What license is the
 code in: 
 https://github.com/CyberShadow/DAppenderResearch
 under? If it's not under a Boost compatible license, could you make it
 available under one? 

Sure, consider my code in DAppenderResearch public domain. ae is mainly MPL
because it was the least restrictive license other contributors agreed on.

 That said, I did write a faster
 chained appender for your benchmarks; however I did this by initializing the
 appender with a page of memory, which definitely should not be the default
 behavior. That said, one viable strategy for appender would be to keep 1 page
 of memory around as a scratch pad. 

Can you elaborate on this (or share your code)?

 5) It's better to store an "end" pointer than a "capacity" integer

 I'll look into this, but this you can not make this judgment call based on the
 performance of a char appender. The issue is that anything with a power of 2
 T.sizeof performs integer multiplication/division using shift operations
 instead of the actual underlying instructions, both of which are very slow.

I'm not following your explanation. I don't see how element size plays against
my conclusion - in fact, as far as I can see, calculating a
"position-after-append" pointer and comparing it to an "end" pointer is going
to be faster, because you will need to update the position anyway.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 02 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




 That said, I did write a faster
 chained appender for your benchmarks; however I did this by initializing the
 appender with a page of memory, which definitely should not be the default
 behavior. That said, one viable strategy for appender would be to keep 1 page
 of memory around as a scratch pad. 

 
 Can you elaborate on this (or share your code)?

For part 1 (fastest possible 'chained' appender): Simply construct specifying a
large number of elements. i.e.
auto app = Appender!string(4096);
FastestAppender7 seems to do something similar (enum MIN_SIZE  = 4096;)

As for part 2 (fastest practical 'chained' appender) I haven't written it yet.
In essence you'd have two TLS variables, a scratch node and memory page and an
in use flag.

private static void* __Appender_scratch_node;
private static bool  __Appender_scratch_in_use;
Appender(T) {...}

Then when appender is constructed instead of creating a new node and a little
bit of ram each time, a single node and 1 page of memory would be reused. The
boolean flag would prevent a second appender from reusing the same scratch pad;
(I figure 2+ appenders would be relatively rare). Then, when data is called a
single copy would be made of the correct length (Appending after data should
also be relatively rare). 

 5) It's better to store an "end" pointer than a "capacity" integer

 I'll look into this, but this you can not make this judgment call based on the
 performance of a char appender. The issue is that anything with a power of 2
 T.sizeof performs integer multiplication/division using shift operations
 instead of the actual underlying instructions, both of which are very slow.

 
 I'm not following your explanation. I don't see how element size plays against
 my conclusion - in fact, as far as I can see, calculating a
 "position-after-append" pointer and comparing it to an "end" pointer is going
 to be faster, because you will need to update the position anyway.

I'm referring to the fact that ptr + i => cast(T*)(cast(size_t)ptr +
i*T.sizeof) and ptrA - ptrB => (cast(size_t)ptrA - cast(size_t)ptrB) /
T.sizeof. Which can lead to a subtle performance issues when T.sizeof != 1 or a
power of 2. But, my first code review doesn't reveal any worrying usages in the
primary code path; most cases of ptrA-ptrB seem to be behind rarely used if
conditionals.

P.S.
Regarding your previous assertion:
 4) You can write a nextCapacity function with no branches

I'm having trouble finding this in the code. All I can find is:

auto newCapacity = newSize < MIN_SIZE ? MIN_SIZE : newSize * 2;

which contains a branch. (i.e. the ?: operator).

Also, I know understand better what you meant by 
 1) Extra indirection levels are performance killers

Unfortunately, your attempt to reduce the number of indirections has broken one
of the invariants of Appender; specifically that it is a reference type.
Putting cursors, etc. into the Appender struct will result in data stomping.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 02 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




13:32:01 PST ---

 For part 1 (fastest possible 'chained' appender): Simply construct specifying a
 large number of elements. i.e.
 auto app = Appender!string(4096);
 FastestAppender7 seems to do something similar (enum MIN_SIZE  = 4096;)

The 4th one does that too, albeit implicitly. MIN_SIZE is half a page, but it
will always allocate a little over that; which will cause the GC to return a
whole page. MIN_SIZE was chosen to be the smallest size that results in an
expandable allocation.

Note that the 7th experiment is the least GC-friendly of the whole.

 As for part 2 (fastest practical 'chained' appender) I haven't written it yet.
 In essence you'd have two TLS variables, a scratch node and memory page and an
 in use flag.

Sounds like a nice idea.

 I'm referring to the fact that ptr + i => cast(T*)(cast(size_t)ptr +
 i*T.sizeof) and ptrA - ptrB => (cast(size_t)ptrA - cast(size_t)ptrB) /
 T.sizeof. Which can lead to a subtle performance issues when T.sizeof != 1 or a
 power of 2. But, my first code review doesn't reveal any worrying usages in the
 primary code path; most cases of ptrA-ptrB seem to be behind rarely used if
 conditionals.

Yes, that was my point. Only one multiplication by T.sizeof and one branch are
necessary on the hot path when appending a single item (I see that my code
doesn't follow this due to its usage of slice copies).

I wonder if putting the "cursorEnd > end" check inside the per-item loop in
such cases would be faster - it would be exchanging a multiplication with a
branch. 

 Regarding your previous assertion:
 4) You can write a nextCapacity function with no branches

 I'm having trouble finding this in the code. All I can find is:
 
 auto newCapacity = newSize < MIN_SIZE ? MIN_SIZE : newSize * 2;
 
 which contains a branch. (i.e. the ?: operator).

The main idea is in fastappender2.d. The "if" at the start can be replaced with
another bitwise or. It doesn't even matter, because that code will not be
executed as often to make a significant difference; I stated it more as a
curiosity.

 Also, I know understand better what you meant by 
 1) Extra indirection levels are performance killers

 Unfortunately, your attempt to reduce the number of indirections has broken one
 of the invariants of Appender; specifically that it is a reference type.
 Putting cursors, etc. into the Appender struct will result in data stomping.

I know, I said so in my answer to Andrei's comment. This is fine for my uses.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 02 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813




https://github.com/D-Programming-Language/phobos/pull/502

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Mar 19 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5813


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc




 https://github.com/D-Programming-Language/phobos/pull/502

 Algorithmically, Appender is implemented using a sealed rope,
 a linked-list of arrays, with the first node being cached in
 a thread local free list.

Was a linked-list of arrays better/faster than a dynamic array of pointers to
equal-sized memory blocks (decks data structure)?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Mar 19 2012

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 5813] New: [patch] std.array.Appender has severe performance and memory leak problems.