digitalmars.D.bugs - [Issue 6658] New: Slow short array equality

d-bugmail puremagic.com (49/49) Sep 12 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6658

d-bugmail puremagic.com (49/49) Mar 31 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (13/14) Apr 01 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (9/16) Apr 01 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (11/11) Apr 01 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (12/18) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (11/13) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (8/8) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (13/13) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (15/36) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (10/10) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (9/21) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658
d-bugmail puremagic.com (13/13) Apr 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6658

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658

           Summary: Slow short array equality
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



Equality comparisons between very short arrays is quite slow in D/DMD. When the
arrays are both fixed-sized and short, I suggest to inline item comparisons
instead of calling a runtime function, to speed up the code significantly.

A benchmark:


bool eq1(T, size_t N)(const ref T[N] a, const ref T[N] b) pure nothrow {
    return a == b;
}

bool eq2(T, size_t N)(const ref T[N] a, const ref T[N] b) pure nothrow {
    foreach (size_t i; 0 .. a.length)
        if (a[i] != b[i])
            return false;
    return true;
}

void main() {
    import std.stdio, std.random;
    enum size_t N = 3;
    enum size_t M = 100;
    rndGen.seed(1);

    int[N][M] data;
    foreach (ref d; data)
        foreach (ref x; d)
            x = uniform(0, 4);

    int total;
    foreach (i; 0 .. 20_000)
        foreach (j; 0 .. M)
            foreach (k; 0 .. M)
                static if (1)
                    total += eq1(data[j], data[k]); // 11.5 seconds
                else
                    total += eq2(data[j], data[k]); // 2.45 seconds

    writeln(total);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Sep 12 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




After closing Issue 9477  the situation is changed a little.

On the same PC with DMD 2.063alpha the run-time with eq2() is 2.04 seconds and
the run-time with eq1() is 6.37 seconds (3.12X).

The asm produced by dmd (-O -release -inline -noboundscheck) for the two
versions:


eq1:
        push EAX
        mov ECX, 0Ch
        push ESI
        mov ESI, 0Ch[ESP]
        push EDI
        mov EDI, EAX
        xor EAX, EAX
        rep
        cmpsb
        je  L19
        sbb EAX, EAX
        sbb EAX, 0FFFFFFFFh
L19:    neg EAX
        sbb EAX, EAX
        inc EAX
        pop EDI
        pop ESI
        pop ECX
        ret 4


eq2:
        push EBX
        mov EDX, EAX
        mov ECX, 8[ESP]
        xor EBX, EBX
L9:     mov EAX, [EBX*4][ECX]
        cmp EAX, [EBX*4][EDX]
        je  L17
        pop EBX
        xor EAX, EAX
        ret 4
L17:    inc EBX
        cmp EBX, 3
        jb  L9
        pop EBX
        mov EAX, 1
        ret 4


Is cmpsb efficient on modern CPUs? Maybe the answer is negative:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Mar 31 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658


Vladimir Panteleev <thecybershadow gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow gmail.com



00:56:15 EEST ---
Can we close this issue as a duplicate of issue 5282?

 Is cmpsb efficient on modern CPUs?

As the linked discussion mentions, it's not as efficient for short, fixed-size
data blocks. Backends other than DMD might take advantage of a known array
length, but DMD does not.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 01 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658





 Can we close this issue as a duplicate of issue 5282?
 
 Is cmpsb efficient on modern CPUs?

 
 As the linked discussion mentions, it's not as efficient for short, fixed-size
 data blocks. Backends other than DMD might take advantage of a known array
 length, but DMD does not.

Do you know why DMD can't be smarter here? The compiler knows the types and
lengths, so it knows when one of such situations happens. At worst calling a
template function defined in the object module is able to solve the situation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 01 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




02:10:35 EEST ---
It can, it's just that no one has implemented such an optimization yet. Keep in
mind that the patch would be against DMDBE (assuming the strategy would be to
optimize the memcmp intrinsic), so it would only improve the non-free,
non-redistributable part of one D compiler. D users who want their programs to
be fast should use GDC or LDC. In contrast, the patch to issue 9477 was in the
frontend, so it benefits all of GDC, LDC and DMD.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 01 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658


Iain Buclaw <ibuclaw ubuntu.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw ubuntu.com




 It can, it's just that no one has implemented such an optimization yet. Keep in
 mind that the patch would be against DMDBE (assuming the strategy would be to
 optimize the memcmp intrinsic), so it would only improve the non-free,
 non-redistributable part of one D compiler. D users who want their programs to
 be fast should use GDC or LDC. In contrast, the patch to issue 9477 was in the
 frontend, so it benefits all of GDC, LDC and DMD.

I'd just like to say that statement is false.  It wasn't a patch in the
frontend, and no one but DMD benefits.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




10:35:39 EEST ---

 I'd just like to say that statement is false.  It wasn't a patch in the
 frontend, and no one but DMD benefits.

Really??

We're talking about this, right?
https://github.com/D-Programming-Language/dmd/commit/c984175cf25dfa17b3956e8e33ff83547fa56b0a

I thought GDC/LDC simply provided their own implementation of el_* and the elem
type.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




10:42:00 EEST ---
I don't understand why GDC and LDC did not use the existing code in e2ir. This
would imply that every implementation would have to reimplement exactly the
same things as DMD (emit calls to Druntime, for example).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




10:55:41 EEST ---
DMD: https://github.com/D-Programming-Language/dmd/blob/v2.062/src/e2ir.c#L2431

GDC: https://github.com/D-Programming-GDC/GDC/blob/master/gcc/d/d-elem.cc#L100

LDC: https://github.com/ldc-developers/ldc/blob/master/gen/arrays.cpp#L812

It's all reimplementations of the same thing. I guess there was a reason for
why the layer of abstraction was chosen to be this high, but in this situation
it's not doing anyone any favors.

Well, I guess that now that DMD is faster than both GDC and LDC at something,
the burden is on those compilers' maintainers to catch up with it :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658

DMD: https://github.com/D-Programming-Language/dmd/blob/v2.062/src/e2ir.c#L2431

GDC: https://github.com/D-Programming-GDC/GDC/blob/master/gcc/d/d-elem.cc#L100

LDC: https://github.com/ldc-developers/ldc/blob/master/gen/arrays.cpp#L812

It's all reimplementations of the same thing. I guess there was a reason for
why the layer of abstraction was chosen to be this high, but in this situation
it's not doing anyone any favors.

I'd just like to say that statement is false. It wasn't a patch in the
frontend, and no one but DMD benefits.

Really??

We're talking about this, right?
https://github.com/D-Programming-Language/dmd/commit/c984175cf25dfa17b3956e8e33ff83547fa56b0a

I thought GDC/LDC simply provided their own implementation of el_* and the elem
type.

When gdc was first released, it provided it's own implementation of dt* and the
dt_t type. This is now what I widely regard of as a bad move. The DMD
implementation of toobj.c; todt.c; typinf.c; has been getting further from
GDC's copy of that part of the frontend, thus has become a pain to maintain
upon each frontend update. Frankly I can never see it working trying to hack
and wrap dmd backend calls into producing something that another backend see's
as correct. So I'll be yanking those three files out of dfrontend/ sometime
around this fall too.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




12:50:44 EEST ---
I don't know what dt_t is, but judging from all 3 of its mentions in the
5381-line e2ir.c, I can only suppose that it is a leaky abstraction poking out
of the DMD backend.

Anyway, I don't see how this applies to e2ir.c, since, as I've mentioned, dt_t
only occurs 3 times in the file.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




---

 I don't know what dt_t is, but judging from all 3 of its mentions in the
 5381-line e2ir.c, I can only suppose that it is a leaky abstraction poking out
 of the DMD backend.
 
 Anyway, I don't see how this applies to e2ir.c, since, as I've mentioned, dt_t
 only occurs 3 times in the file.


 I don't know what dt_t is, but judging from all 3 of its mentions in the
 5381-line e2ir.c, I can only suppose that it is a leaky abstraction poking out
 of the DMD backend.
 
 Anyway, I don't see how this applies to e2ir.c, since, as I've mentioned, dt_t
 only occurs 3 times in the file.

In brief, why define all these dmd backend symbols (OPcall) that are of no use
to gcc, and when you can just build gcc trees directly (CALL_EXPR)?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=6658




21:57:32 EEST ---
I don't understand the argument.

Because... choosing a lower (and simpler) layer of abstraction would mean that
less code would need to be reimplemented?

As I understand it, every time DMD implements a new expression type (for
example, dot multiply), GDC and LDC would need to be updated. However, DMD
already has a glue layer that lowers all of the D-specific expressions to
something closer to abstract machine code, which I'd think is what alternative
backends would be more interested in.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Apr 02 2013

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 6658] New: Slow short array equality