digitalmars.D - Speed kills

ixid (15/15) Feb 15 2016 Every time there is a D thread on reddit it feels like the new

Guillaume Piolat (4/10) Feb 15 2016 Something that annoyed me a bit is floating-point comparisons,

Wyatt (5/8) Feb 15 2016 I feel like this point comes up often, and that a lot of people
Basile B. (18/28) Feb 15 2016 Same for std.math.lround

Jack Stouffer (3/6) Feb 15 2016 Seems like you know a lot about the subject, and I know you

Basile B. (12/19) Feb 17 2016 In the meantime:

Jack Stouffer (6/10) Feb 17 2016 While I don't know about the post you're talking about, I don't

Basile B. (12/22) Feb 17 2016 That's more subtile than that.

Basile B. (3/13) Feb 17 2016 Also, forgot to say, but an uniform API is needed to set the

w0rp (4/4) Feb 21 2016 I think it's important that DMD gets more of the easier
Marco Leise (7/9) Mar 07 2016 At least GCC has a codegen switch for that. A solution would

Luc J. Bourhis (4/7) Feb 21 2016 SSE goes back to Pentium III, doesn't it? And the Pentium 4

Guillaume Piolat (3/19) Feb 15 2016 lround and friends have been a big performance problem at times.

Basile B. (5/7) Feb 15 2016 I didn't know this trick. It generates almost the same sse

Guillaume Piolat (3/13) Feb 16 2016 In SSE3 you also get an instruction that does this without

rsw0x (3/6) Feb 15 2016 if you want better codegen, don't use dmd.
ixid (6/21) Mar 08 2016 Since I posted this thread I've learned std.algorithm.sum is 4

jmh530 (3/8) Mar 08 2016 There was a longer discussion here
Andrei Alexandrescu (2/7) Mar 09 2016 Whoa. What's happening there? Do we have anyone on it? -- Andrei

Daniel Kozak via Digitalmars-d (3/10) Mar 09 2016 I guess he speaks about this one:
cym13 (5/16) Mar 09 2016 They just don't do the same thing, sum() uses pairwise summation

jmh530 (3/6) Mar 09 2016 That third comment about how it's not obvious which algorithm sum

John Colvin (6/17) Mar 09 2016 Ilya has long term plans for this, but I have a short-term fix

Andrei Alexandrescu (2/14) Mar 09 2016 Thanks much! -- Andrei

John Colvin (3/18) Mar 09 2016 https://github.com/D-Programming-Language/phobos/pull/4069

Jon D (40/45) Mar 09 2016 I seen a few cases while exploring D. Not all fully researched

H. S. Teoh via Digitalmars-d (44/79) Mar 09 2016 In the case of std.algorithm.sum, the focus is on accuracy rather than

John Colvin (3/15) Mar 09 2016 I think you'd be good at reviewing this:

Jon D (10/11) Mar 09 2016 Turns out there are issues filed for each of the performance

ixid <nuaccount gmail.com> writes:

Every time there is a D thread on reddit it feels like the new 
user is expecting mind-blowing speed from D.

https://www.reddit.com/r/programming/comments/45v03g/porterstemmerd_an_implementation_of_the_porter/

This is the most recent one where John Colvin provided some 
pointers to speed it up significantly. Walter has done some good 
work taking the low-hanging fruit to speed up DMD code and there 
is a lot of effort going on with reference counting machinery but 
I wondered if some of the common errors people make that slow 
down D code can be addressed?

Literals used to be a hidden speed bump but I think that was 
improved, now the append operator is one of the most common 
culprits, can this not be enhanced behind the scenes to work more 
like append? Do others notice common pitfalls between the article 
code and what the D community then suggests where we can bridge 
the gap so naive users get faster code?

Feb 15 2016

Guillaume Piolat <contact gam3sfrommars.fr> writes:

On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote:
 This is the most recent one where John Colvin provided some 
 pointers to speed it up significantly. Walter has done some 
 good work taking the low-hanging fruit to speed up DMD code and 
 there is a lot of effort going on with reference counting 
 machinery but I wondered if some of the common errors people 
 make that slow down D code can be addressed?

Something that annoyed me a bit is floating-point comparisons, 
DMD does not seem to be able to handle them from SSE registers, 
it will convert to FPU and do the comparison there IIRC.

Feb 15 2016

Wyatt <wyatt.epp gmail.com> writes:

On Monday, 15 February 2016 at 14:16:02 UTC, Guillaume Piolat 
wrote:
 Something that annoyed me a bit is floating-point comparisons, 
 DMD does not seem to be able to handle them from SSE registers, 
 it will convert to FPU and do the comparison there IIRC.

I feel like this point comes up often, and that a lot of people 
have argued x87 FP should just not happen anymore.

-Wyatt

Feb 15 2016

Basile B. <b2.temp gmx.com> writes:

On Monday, 15 February 2016 at 14:16:02 UTC, Guillaume Piolat 
wrote:
 On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote:
 This is the most recent one where John Colvin provided some 
 pointers to speed it up significantly. Walter has done some 
 good work taking the low-hanging fruit to speed up DMD code 
 and there is a lot of effort going on with reference counting 
 machinery but I wondered if some of the common errors people 
 make that slow down D code can be addressed?

 Something that annoyed me a bit is floating-point comparisons, 
 DMD does not seem to be able to handle them from SSE registers, 
 it will convert to FPU and do the comparison there IIRC.

Same for std.math.lround

they use the FP way while for float and double it's only one sse 
instruction. Typically with 6 functions similar to this one:


int round(float value)
{
     asm
     {
         naked;
         cvtss2si EAX, XMM0;
         ret;
     }
}

we could get ceil/trunc/round/floor, also almost as easily fmod, 
hypoth.
classic but I dont get why thery're not in std.math.

Goddamnit, we're in 2016.

Feb 15 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote:
 we could get ceil/trunc/round/floor, also almost as easily 
 fmod, hypoth.
 classic but I dont get why thery're not in std.math.

Seems like you know a lot about the subject, and I know you 
contributed to phobos before, so how about making a PR for this :)

Feb 15 2016

Basile B. <b2.temp gmx.com> writes:

On Monday, 15 February 2016 at 23:13:13 UTC, Jack Stouffer wrote:
 On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote:
 we could get ceil/trunc/round/floor, also almost as easily 
 fmod, hypoth.
 classic but I dont get why thery're not in std.math.

 Seems like you know a lot about the subject, and I know you 
 contributed to phobos before, so how about making a PR for this 
 :)

In the meantime:

https://github.com/BBasile/iz/blob/master/import/iz/math.d

Actually when i've participated to this conversation I didn't 
remember that it was not good on X86. Using SSE rouding is really 
only good on AMD64, otherwise loading the input parameter "sucks" 
a lot (even for a 32 bit float since it's not directly in EAX or 
XMMO).

Anyway, not good for phobos, why? When looking for documentation 
yesterday night I've landed on a post by Walter who explained 
that the library for a system programming language shouldn't be 
specific to an architecture.

Feb 17 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 17 February 2016 at 18:26:47 UTC, Basile B. wrote:
 Anyway, not good for phobos, why? When looking for 
 documentation yesterday night I've landed on a post by Walter 
 who explained that the library for a system programming 
 language shouldn't be specific to an architecture.

While I don't know about the post you're talking about, I don't 
think what Walter says applies to internal version blocks in a 
function. You could make it so on AMD lround and friends are much 
faster by using those ASM routines. Also, std.math is already 
chock full of architecture specific code.

Feb 17 2016

Basile B. <b2.temp gmx.com> writes:

On Wednesday, 17 February 2016 at 18:50:45 UTC, Jack Stouffer 
wrote:
 On Wednesday, 17 February 2016 at 18:26:47 UTC, Basile B. wrote:
 Anyway, not good for phobos, why? When looking for 
 documentation yesterday night I've landed on a post by Walter 
 who explained that the library for a system programming 
 language shouldn't be specific to an architecture.

 While I don't know about the post you're talking about, I don't 
 think what Walter says applies to internal version blocks in a 
 function. You could make it so on AMD lround and friends are 
 much faster by using those ASM routines. Also, std.math is 
 already chock full of architecture specific code.

That's more subtile than that.

The oldest 64 bit processor (AMD64) supports SSE, always. So when 
we do "cast(int) 0.1;" on X86_64, the backend always generate SSE 
instructions.

The oldest 32 bit processor (X86) doesn't support SSE, maybe MMX 
(not sure). So when we do "cast(int) 0.1;" on X86, the backend 
always generate FPU instructions.

This is how I understand the post 'I've landed onto'.
My current works always use SSE so it's not conform with the "at 
least available" feature.

Feb 17 2016

Basile B. <b2.temp gmx.com> writes:

On Wednesday, 17 February 2016 at 19:01:38 UTC, Basile B. wrote:
 That's more subtile than that.

 The oldest 64 bit processor (AMD64) supports SSE, always. So 
 when we do "cast(int) 0.1;" on X86_64, the backend always 
 generate SSE instructions.

 The oldest 32 bit processor (X86) doesn't support SSE, maybe 
 MMX (not sure). So when we do "cast(int) 0.1;" on X86, the 
 backend always generate FPU instructions.

 This is how I understand the post 'I've landed onto'.
 My current work always use SSE so it's not conform with the "at 
 least available" feature.

Also, forgot to say, but an uniform API is needed to set the 
rounding mode, whether SSE is used or the FPU...

Feb 17 2016

w0rp <devw0rp gmail.com> writes:

I think it's important that DMD gets more of the easier 
optimisations. Most new users won't bother trying GDC or LDC, and 
if DMD doesn't generate fast enough code, they might leave before 
they try the compilers with better optimisations.

Feb 21 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 17 Feb 2016 19:55:08 +0000
schrieb Basile B. <b2.temp gmx.com>:

 Also, forgot to say, but an uniform API is needed to set the 
 rounding mode, whether SSE is used or the FPU...

At least GCC has a codegen switch for that. A solution would
have to either set both rounding modes at once or the
compilers would need to expose version MathFPU/MathSSE.

-- 
Marco

Mar 07 2016

Luc J. Bourhis <luc_j_bourhis mac.com> writes:

On Wednesday, 17 February 2016 at 19:01:38 UTC, Basile B. wrote:
 The oldest 32 bit processor (X86) doesn't support SSE, maybe 
 MMX (not sure). So when we do "cast(int) 0.1;" on X86, the 
 backend always generate FPU instructions.

SSE goes back to Pentium III, doesn't it? And the Pentium 4 
supported SSE3, didn't it? Is it an active specification of D to 
run on Pentium II e.g.?

Feb 21 2016

Guillaume Piolat <name.lastname gmail.com> writes:

On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote:
 Same for std.math.lround

 they use the FP way while for float and double it's only one 
 sse instruction. Typically with 6 functions similar to this one:


 int round(float value)
 {
     asm
     {
         naked;
         cvtss2si EAX, XMM0;
         ret;
     }
 }

 we could get ceil/trunc/round/floor, also almost as easily 
 fmod, hypoth.
 classic but I dont get why thery're not in std.math.

 Goddamnit, we're in 2016.

lround and friends have been a big performance problem at times.
Everytime you can use cast(int) instead, it's way faster.

Feb 15 2016

Basile B. <b2.temp gmx.com> writes:

On Monday, 15 February 2016 at 23:19:44 UTC, Guillaume Piolat 
wrote:
 lround and friends have been a big performance problem at times.
 Everytime you can use cast(int) instead, it's way faster.

I didn't know this trick. It generates almost the same sse 
intruction (it truncates) and has the advantage to be inline-able.

Is it documented somewhere ? If not it should.

Feb 15 2016

Guillaume Piolat <contact gam3sfrommars.fr> writes:

On Monday, 15 February 2016 at 23:35:54 UTC, Basile B. wrote:
 On Monday, 15 February 2016 at 23:19:44 UTC, Guillaume Piolat 
 wrote:
 lround and friends have been a big performance problem at 
 times.
 Everytime you can use cast(int) instead, it's way faster.

 I didn't know this trick. It generates almost the same sse 
 intruction (it truncates) and has the advantage to be 
 inline-able.

 Is it documented somewhere ? If not it should.

In SSE3 you also get an instruction that does this without 
messing with the x87 control word: FISTTP.

Feb 16 2016

rsw0x <anonymous anonymous.com> writes:

On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote:
 Every time there is a D thread on reddit it feels like the new 
 user is expecting mind-blowing speed from D.

 [...]

if you want better codegen, don't use dmd.
use ldc, it's usualy only a version-ish behind dmd.

Feb 15 2016

ixid <nuaccount gmail.com> writes:

On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote:
 Every time there is a D thread on reddit it feels like the new 
 user is expecting mind-blowing speed from D.

 https://www.reddit.com/r/programming/comments/45v03g/porterstemmerd_an_implementation_of_the_porter/

 This is the most recent one where John Colvin provided some 
 pointers to speed it up significantly. Walter has done some 
 good work taking the low-hanging fruit to speed up DMD code and 
 there is a lot of effort going on with reference counting 
 machinery but I wondered if some of the common errors people 
 make that slow down D code can be addressed?

 Literals used to be a hidden speed bump but I think that was 
 improved, now the append operator is one of the most common 
 culprits, can this not be enhanced behind the scenes to work 
 more like append? Do others notice common pitfalls between the 
 article code and what the D community then suggests where we 
 can bridge the gap so naive users get faster code?

Since I posted this thread I've learned std.algorithm.sum is 4 
times slower than a naive loop sum. Even if this is for reasons 
of accuracy this is exactly what I am talking about- this is a 
hidden iceberg of terrible performance that will reflect poorly 
on D. That's so slow the function needs a health warning.

Mar 08 2016

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 
 times slower than a naive loop sum. Even if this is for reasons 
 of accuracy this is exactly what I am talking about- this is a 
 hidden iceberg of terrible performance that will reflect poorly 
 on D. That's so slow the function needs a health warning.

There was a longer discussion here
https://forum.dlang.org/post/vkiwojmfjrwhigbkenaa forum.dlang.org

Mar 08 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/08/2016 09:14 AM, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 times
 slower than a naive loop sum. Even if this is for reasons of accuracy
 this is exactly what I am talking about- this is a hidden iceberg of
 terrible performance that will reflect poorly on D. That's so slow the
 function needs a health warning.

Whoa. What's happening there? Do we have anyone on it? -- Andrei

Mar 09 2016

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

Dne 9.3.2016 v 14:26 Andrei Alexandrescu via Digitalmars-d napsal(a):
 On 03/08/2016 09:14 AM, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 times
 slower than a naive loop sum. Even if this is for reasons of accuracy
 this is exactly what I am talking about- this is a hidden iceberg of
 terrible performance that will reflect poorly on D. That's so slow the
 function needs a health warning.

 Whoa. What's happening there? Do we have anyone on it? -- Andrei

I guess he speaks about this one:

http://forum.dlang.org/post/mailman.4748.1456070484.22025.digitalmars-d-learn puremagic.com

Mar 09 2016

cym13 <cpicard openmailbox.org> writes:

On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu 
wrote:
 On 03/08/2016 09:14 AM, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 
 times
 slower than a naive loop sum. Even if this is for reasons of 
 accuracy
 this is exactly what I am talking about- this is a hidden 
 iceberg of
 terrible performance that will reflect poorly on D. That's so 
 slow the
 function needs a health warning.

 Whoa. What's happening there? Do we have anyone on it? -- Andrei

They just don't do the same thing, sum() uses pairwise summation 
which is safer as I understand it. Corresponding issue: 
https://issues.dlang.org/show_bug.cgi?id=15722

Mar 09 2016

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 9 March 2016 at 13:42:40 UTC, cym13 wrote:
 They just don't do the same thing, sum() uses pairwise 
 summation which is safer as I understand it. Corresponding 
 issue: https://issues.dlang.org/show_bug.cgi?id=15722

That third comment about how it's not obvious which algorithm sum 
is using is a good one.

Mar 09 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu 
wrote:
 On 03/08/2016 09:14 AM, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 
 times
 slower than a naive loop sum. Even if this is for reasons of 
 accuracy
 this is exactly what I am talking about- this is a hidden 
 iceberg of
 terrible performance that will reflect poorly on D. That's so 
 slow the
 function needs a health warning.

 Whoa. What's happening there? Do we have anyone on it? -- Andrei

Ilya has long term plans for this, but I have a short-term fix 
that will buy us a very large performance improvement here (if my 
old benchmarks were correct). Give me a few mins to prep the pull 
request :)

Mar 09 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/9/16 9:03 AM, John Colvin wrote:
 On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu wrote:
 On 03/08/2016 09:14 AM, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 times
 slower than a naive loop sum. Even if this is for reasons of accuracy
 this is exactly what I am talking about- this is a hidden iceberg of
 terrible performance that will reflect poorly on D. That's so slow the
 function needs a health warning.

 Whoa. What's happening there? Do we have anyone on it? -- Andrei

 Ilya has long term plans for this, but I have a short-term fix that will
 buy us a very large performance improvement here (if my old benchmarks
 were correct). Give me a few mins to prep the pull request :)

Thanks much! -- Andrei

Mar 09 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 March 2016 at 14:04:40 UTC, Andrei Alexandrescu 
wrote:
 On 3/9/16 9:03 AM, John Colvin wrote:
 On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei 
 Alexandrescu wrote:
 On 03/08/2016 09:14 AM, ixid wrote:
 [...]

 Whoa. What's happening there? Do we have anyone on it? -- 
 Andrei

 Ilya has long term plans for this, but I have a short-term fix 
 that will
 buy us a very large performance improvement here (if my old 
 benchmarks
 were correct). Give me a few mins to prep the pull request :)

 Thanks much! -- Andrei

https://github.com/D-Programming-Language/phobos/pull/4069

Mar 09 2016

Jon D <jond noreply.com> writes:

On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
 Since I posted this thread I've learned std.algorithm.sum is 4 
 times slower than a naive loop sum. Even if this is for reasons 
 of accuracy this is exactly what I am talking about- this is a 
 hidden iceberg of terrible performance that will reflect poorly 
 on D. That's so slow the function needs a health warning.

I seen a few cases while exploring D. Not all fully researched 
(apologies for that), but since there appears to be interest in 
identification I'll list them.

* Lower-casing strings (likely upper-casing), and some character 
type checks.

Routines like to toLower and asLowerCase call functions that work 
for all unicode characters. I was able to create much faster 
versions by checking if the character was ascii, then calling 
either the ascii version or the more general version. Same is 
true for a few routines like isNumber. Some have the ascii check 
optimization built in, but not all.

If this optimization is added, it might also be useful to add a 
few common combinations (or a generic solution, if that's 
feasible). For example, to check if a character is alpha-numeric, 
one currently ORs two tests from the standard library, isAlpha 
and isNumber. Putting in an ascii optimization check requires 
putting it before doing the OR, rather than inside the tests 
being ORed.

* Large associative arrays

When associative arrays get beyond about 10 million entries 
performance starts to decline. I believe this is due to resizing 
the arrays. It's worse with strings as keys than integers as 
keys. Having a way to reserve capacity may help under some 
circumstances.

* Associative arrays - Converting keys to immutable versions for 
lookup

Associative arrays want immutable values as keys. Far as I can 
tell, immutable values are also required when performing a 
lookup, even if a new entry won't be stored. A couple apps I've 
written walk through large lists of text values, naturally 
available as char[] because they are read from input streams. To 
test presence in an associative array, it's necessary to copy 
them to immutable strings first. I haven't fully researched this 
one, but my experience is that copying from char[] to string 
becomes a meaningful cost.

On the surface, this appears to be an optimization opportunity, 
to create the immutable strings only when actually storing a new 
value.

--Jon

Mar 09 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Mar 09, 2016 at 08:30:10PM +0000, Jon D via Digitalmars-d wrote:
 On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
Since I posted this thread I've learned std.algorithm.sum is 4 times
slower than a naive loop sum. Even if this is for reasons of accuracy
this is exactly what I am talking about- this is a hidden iceberg of
terrible performance that will reflect poorly on D. That's so slow
the function needs a health warning.


In the case of std.algorithm.sum, the focus is on accuracy rather than
performance.  It does some extra work to ensure maximum accuracy in the
result, so it shouldn't be expected to have top performance.  Granted,
though, the docs could be clearer about this accuracy vs. performance
tradeoff.  Please file a bug on this (or better yet, submit a PR for
it).

In any case, 4 times slower sounds a bit excessive... it would be good
to investigate why this is happening and fix it.


 I seen a few cases while exploring D. Not all fully researched
 (apologies for that), but since there appears to be interest in
 identification I'll list them.

One of the big performance killers that people are likely to run into is
autodecoding for strings in all range-based algorithms.  Walter has
repeatedly complained about this, but so far Andrei (and some others)
have resisted getting rid of autodecoding.  Fortunately, you can
(mostly) suppress this by wrapping your strings in
std.encoding.codeUnits before operating on them. Nevertheless, it's one
of those hidden gotchas that could result in poorer performance than
what you could get otherwise.

Another big performance killer that I repeatedly run into is the
overly-eager GC collection schedule.  Some may argue against the use of
the GC, period, but I have found that even with the GC, it's possible to
get 30-40% speedups just by calling GC.disable() and manually triggering
GC.collect() at a lower frequency than the default. This is especially
true when your program performs many allocations of small objects, like
(small) strings. I have observed this in at least 2-3 different
memory-intensive programs of mine, including a recent experiment in
writing a faster CSV parser than std.csv.  Arguably, we should fix this
in druntime itself so that collection cycles are run less often, though
I'm not sure what implications this may have on low-memory systems.
Sometimes, you don't even need to do this for the entire program; you
can get big performance boosts just by wrapping GC.disable() /
GC.enable() around strategic sections of allocation-intensive code.


 * Lower-casing strings (likely upper-casing), and some character type
 checks.
 
 Routines like to toLower and asLowerCase call functions that work for
 all unicode characters. I was able to create much faster versions by
 checking if the character was ascii, then calling either the ascii
 version or the more general version. Same is true for a few routines
 like isNumber. Some have the ascii check optimization built in, but
 not all.
 
 If this optimization is added, it might also be useful to add a few
 common combinations (or a generic solution, if that's feasible). For
 example, to check if a character is alpha-numeric, one currently ORs
 two tests from the standard library, isAlpha and isNumber. Putting in
 an ascii optimization check requires putting it before doing the OR,
 rather than inside the tests being ORed.

These sound like valuable additions to Phobos. Submit a PR, please?


[...]
 * Associative arrays - Converting keys to immutable versions for lookup
 
 Associative arrays want immutable values as keys. Far as I can tell,
 immutable values are also required when performing a lookup, even if a
 new entry won't be stored. A couple apps I've written walk through
 large lists of text values, naturally available as char[] because they
 are read from input streams. To test presence in an associative array,
 it's necessary to copy them to immutable strings first. I haven't
 fully researched this one, but my experience is that copying from
 char[] to string becomes a meaningful cost.

[...]

This is arguably a bug.  If you're only looking up a key, you should be
able to pass in const(char)[] rather than immutable(char)[] (aka
string).  The GC performance problem I mentioned above is especially
noticeable when there are large numbers of small allocations, as would
be the case if you constantly have to call .idup just because AA lookups
demand immutable keys.  Please file an issue for this, if there isn't
one already filed.


T

-- 
Windows 95 was a joke, and Windows 98 was the punchline.

Mar 09 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 March 2016 at 21:01:13 UTC, H. S. Teoh wrote:
 On Wed, Mar 09, 2016 at 08:30:10PM +0000, Jon D via 
 Digitalmars-d wrote:
 On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
[...]


 In the case of std.algorithm.sum, the focus is on accuracy 
 rather than performance.  It does some extra work to ensure 
 maximum accuracy in the result, so it shouldn't be expected to 
 have top performance.  Granted, though, the docs could be 
 clearer about this accuracy vs. performance tradeoff.  Please 
 file a bug on this (or better yet, submit a PR for it).

 In any case, 4 times slower sounds a bit excessive... it would 
 be good to investigate why this is happening and fix it.

I think you'd be good at reviewing this: 
https://github.com/D-Programming-Language/phobos/pull/4069

Mar 09 2016

Jon D <jond noreply.com> writes:

On Wednesday, 9 March 2016 at 20:30:10 UTC, Jon D wrote:
 I seen a few cases while exploring D.

Turns out there are issues filed for each of the performance 
issues I mentioned:

* Lower casing strings:  
https://issues.dlang.org/show_bug.cgi?id=11229
* Large associative arrays:  
https://issues.dlang.org/show_bug.cgi?id=2504
* Associative arrays - Checking membership with mutable values 
(char arrays) rather strings (immutable):  
https://issues.dlang.org/show_bug.cgi?id=15038

Mar 09 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Speed kills