www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2331] New: Enum hashes many times slower than normal hashes

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331

           Summary: Enum hashes many times slower than normal hashes
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: andrei metalanguage.com


This is quite incredible. Consider this map of stopwords for the English
language:

bool[string] stopWords;

static this() {
    stopWords = [
    "a"[]:1, "b":1, "c":1, "d":1, "e":1, "f":1, "g":1, "h":1, "i":1, "j":1,
"k":1, "l":1, "m":1, "n":1, "o":1, "p":1, "q":1, "r":1, "s":1, "t":1, "u":1,
"v":1, "w":1, "x":1, "y":1, "z":1,
    "the":1, "a":1,
    "about":1, "above":1, "across":1, "afterwards":1, "after":1, "again":1,
"against":1, "ago":1, "almost":1, "along":1,
    "already":1, "always":1, "among":1, "anywhere":1, "around":1, "as":1,
"at":1, "away":1,
    "back":1, "before":1, "behind":1, "beside":1, "between":1, "beyond":1,
"by":1,
    "down":1, "downstairs":1, "during":1,
    "else":1, "enough":1, "every":1, "everywhere":1,
    "far":1, "for":1, "from":1,
    "here":1,
    "in":1, "inside":1, "into":1,
    "just":1,
    "last":1, "lot":1, "lots":1,
    "many":1, "middle":1, "much":1,
    "near":1, "next":1, "never":1, "not":1, "now":1, "nowhere":1,
    "of":1, "off":1, "often":1, "on":1, "once":1, "over":1, "outside":1,
"over":1,
    "quite":1,
    "rather":1, "recently":1, "rarely":1, "round":1,
    "seldom":1, "sometimes":1, "somewhere":1,
    "there":1, "through":1, "till":1, "today":1, "to":1, "tomorrow":1,
"tonight":1, "too":1, "towards":1,
    "until":1, "up":1, "upstairs":1, "usually":1, "under":1,
    "very":1,
    "while":1, "with":1, "within":1, "without":1,
    "yes":1, "yesterday":1, "yet":1,
    "you":1, "he":1, "she":1, "it":1, "we":1, "they":1,
    "me":1, "him":1, "her":1, "it":1, "us":1, "them":1,
    "myself":1, "yourself":1, "himself":1, "herself":1, "itself":1,
"ourselves":1, "yourselves":1,
    "themselves":1, "oneself":1,
    "my":1, "your":1, "its":1, "our":1, "their":1,
    "mine":1, "yours":1, "hers":1, "ours":1, "theirs":1,
    "this":1, "these":1, "those":1,
    "some":1, "any":1, "no":1, "none":1,
    "other":1, "another":1, "every":1, "all":1, "others":1, "each":1,
"whole":1, "both":1,
    "neither":1, "none":1,
    "someone":1, "somebody":1, "something":1,
    "anyoneanybodyanything":1,
    "nobody":1, "nothing":1,
    "everyone":1, "everybody":1, "everything":1,
    "and":1, "or":1, "but":1, "because":1, "if":1,
    "as":1, "like":1, "such":1,
    "how":1, "who":1, "why":1, "what":1, "where":1, "whose":1, "when":1,
"whom":1, "which":1,
"bye":1, "hello":1,
    "be":1, "was":1, "been":1, "am":1, "is":1, "are":1, "were":1,
    "can":1, "could":1,
    "come":1, "came":1, "comes":1,
    "do":1, "did":1, "done":1, "does":1,
    "get":1, "got":1, "gets":1,
    "have":1, "had":1, "has":1, "having":1,
    "may":1, "might":1,
    "must":1,
    "shall":1, "should":1,
    "ought":1,
    "take":1, "took":1, "taken":1, "takes":1, "taking":1,
    "use":1, "uses":1, "used":1,
    "will":1, "would":1,
    "aren't":1,
    "cannot":1,
    "can't":1,
    "couldn't":1,
    "didn't":1,
    "doesn't":1,
    "don't":1,
    "wasn't":1,
    "wouldn't":1,
    "hadn't":1,
    "isn't":1,
    "one":1, "two":1, "three":1, "four":1, "five":1, "six":1, "seven":1,
    "eight":1, "nine":1, "ten":1, "nought":1, "zero":1,
    "first":1, "second":1, "third":1, "fourth":1, "fifth":1, "sixth":1,
"seventh":1,
    "eighth":1, "ninth":1, "tenth":1,
    "ii":1, "iii":1, "iv":1, "vi":1, "vii":1, "viii":1, "ix":1,
    "sunday":1, "monday":1, "tuesday":1, "wednesday":1, "thursday":1,
"friday":1, "saturday":1,
    "january":1, "february":1, "march":1, "april":1, "may":1, "june":1,
    "julyaugustseptember":1, "october":1, "november":1, "december":1,
    "date":1, "false":1, "e.geg":1, "i.e":1, "ie":1, "etc":1, "example":1,
"examples":1,
    "jrmiss":1, "thing":1, "things":1, "true":1, "year":1,
    "badbig":1, "close":1, "difficult":1, "easy":1, "empty":1, "full":1,
"good":1,
    "little":1, "long":1, "ready":1, "open":1, "short":1, "tall":1
];
}

The map can be used in help for parsing English text. Now say we make this
change:

enum bool[string] stopWords = [ ... same contents ... ];

Amazingly enough, looking up this new map (with "word in stopWords") is many
many times slower than looking up the old map. What's happening?


-- 
Sep 02 2008
next sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
Content-Disposition: inline

On Tue, Sep 2, 2008 at 11:18 PM, <d-bugmail puremagic.com> wrote:

 http://d.puremagic.com/issues/show_bug.cgi?id=2331

           Summary: Enum hashes many times slower than normal hashes
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: andrei metalanguage.com


 This is quite incredible. Consider this map of stopwords for the English
 language:

 bool[string] stopWords;

 static this() {
    stopWords = [
    "a"[]:1, "b":1, "c":1, "d":1, "e":1, "f":1, "g":1, "h":1, "i":1, "j":1,
 "k":1, "l":1, "m":1, "n":1, "o":1, "p":1, "q":1, "r":1, "s":1, "t":1,
 "u":1,
 "v":1, "w":1, "x":1, "y":1, "z":1,
    "the":1, "a":1,
    "about":1, "above":1, "across":1, "afterwards":1, "after":1, "again":1,
 "against":1, "ago":1, "almost":1, "along":1,
    "already":1, "always":1, "among":1, "anywhere":1, "around":1, "as":1,
 "at":1, "away":1,
    "back":1, "before":1, "behind":1, "beside":1, "between":1, "beyond":1,
 "by":1,
    "down":1, "downstairs":1, "during":1,
    "else":1, "enough":1, "every":1, "everywhere":1,
    "far":1, "for":1, "from":1,
    "here":1,
    "in":1, "inside":1, "into":1,
    "just":1,
    "last":1, "lot":1, "lots":1,
    "many":1, "middle":1, "much":1,
    "near":1, "next":1, "never":1, "not":1, "now":1, "nowhere":1,
    "of":1, "off":1, "often":1, "on":1, "once":1, "over":1, "outside":1,
 "over":1,
    "quite":1,
    "rather":1, "recently":1, "rarely":1, "round":1,
    "seldom":1, "sometimes":1, "somewhere":1,
    "there":1, "through":1, "till":1, "today":1, "to":1, "tomorrow":1,
 "tonight":1, "too":1, "towards":1,
    "until":1, "up":1, "upstairs":1, "usually":1, "under":1,
    "very":1,
    "while":1, "with":1, "within":1, "without":1,
    "yes":1, "yesterday":1, "yet":1,
    "you":1, "he":1, "she":1, "it":1, "we":1, "they":1,
    "me":1, "him":1, "her":1, "it":1, "us":1, "them":1,
    "myself":1, "yourself":1, "himself":1, "herself":1, "itself":1,
 "ourselves":1, "yourselves":1,
    "themselves":1, "oneself":1,
    "my":1, "your":1, "its":1, "our":1, "their":1,
    "mine":1, "yours":1, "hers":1, "ours":1, "theirs":1,
    "this":1, "these":1, "those":1,
    "some":1, "any":1, "no":1, "none":1,
    "other":1, "another":1, "every":1, "all":1, "others":1, "each":1,
 "whole":1, "both":1,
    "neither":1, "none":1,
    "someone":1, "somebody":1, "something":1,
    "anyoneanybodyanything":1,
    "nobody":1, "nothing":1,
    "everyone":1, "everybody":1, "everything":1,
    "and":1, "or":1, "but":1, "because":1, "if":1,
    "as":1, "like":1, "such":1,
    "how":1, "who":1, "why":1, "what":1, "where":1, "whose":1, "when":1,
 "whom":1, "which":1,
 "bye":1, "hello":1,
    "be":1, "was":1, "been":1, "am":1, "is":1, "are":1, "were":1,
    "can":1, "could":1,
    "come":1, "came":1, "comes":1,
    "do":1, "did":1, "done":1, "does":1,
    "get":1, "got":1, "gets":1,
    "have":1, "had":1, "has":1, "having":1,
    "may":1, "might":1,
    "must":1,
    "shall":1, "should":1,
    "ought":1,
    "take":1, "took":1, "taken":1, "takes":1, "taking":1,
    "use":1, "uses":1, "used":1,
    "will":1, "would":1,
    "aren't":1,
    "cannot":1,
    "can't":1,
    "couldn't":1,
    "didn't":1,
    "doesn't":1,
    "don't":1,
    "wasn't":1,
    "wouldn't":1,
    "hadn't":1,
    "isn't":1,
    "one":1, "two":1, "three":1, "four":1, "five":1, "six":1, "seven":1,
    "eight":1, "nine":1, "ten":1, "nought":1, "zero":1,
    "first":1, "second":1, "third":1, "fourth":1, "fifth":1, "sixth":1,
 "seventh":1,
    "eighth":1, "ninth":1, "tenth":1,
    "ii":1, "iii":1, "iv":1, "vi":1, "vii":1, "viii":1, "ix":1,
    "sunday":1, "monday":1, "tuesday":1, "wednesday":1, "thursday":1,
 "friday":1, "saturday":1,
    "january":1, "february":1, "march":1, "april":1, "may":1, "june":1,
    "julyaugustseptember":1, "october":1, "november":1, "december":1,
    "date":1, "false":1, "e.geg":1, "i.e":1, "ie":1, "etc":1, "example":1,
 "examples":1,
    "jrmiss":1, "thing":1, "things":1, "true":1, "year":1,
    "badbig":1, "close":1, "difficult":1, "easy":1, "empty":1, "full":1,
 "good":1,
    "little":1, "long":1, "ready":1, "open":1, "short":1, "tall":1
 ];
 }

 The map can be used in help for parsing English text. Now say we make this
 change:

 enum bool[string] stopWords = [ ... same contents ... ];

 Amazingly enough, looking up this new map (with "word in stopWords") is
 many
 many times slower than looking up the old map. What's happening?


 --
It may (and this is a big "may", you'd have to look at the disassembly to know) be related to http://d.puremagic.com/issues/show_bug.cgi?id=2237. In short, I had a constant array (D1, same as an enum array in D2) which, for some reason, DMD was re-constructing the array from scratch upon _every access to it_ instead of putting it in the static data segment. It's possible that the compiler is making the same stupid mistake here.
Sep 02 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331






What's happening is that the static this() constructor builds the hash table
once. The enum version builds it every time it is used, as the enum name is
replaced with its initializer.


-- 
Oct 11 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331






I believe enum should be smarter than that. If not, it is no different than
C-style define macro.


-- 
Oct 11 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331







 What's happening is that the static this() constructor builds the hash table
 once. The enum version builds it every time it is used, as the enum name is
 replaced with its initializer.
 
I hope this is an explanation of the bug's cause, not a justification that this is not a bug. :o) --
Oct 11 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331






Sounds eerily similar to
http://d.puremagic.com/issues/show_bug.cgi?id=2237

Coincidence?


-- 
Oct 11 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331


Mitch Hayenga <mitch.hayenga gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mitch.hayenga gmail.com



PDT ---
I recently hit this performance issue myself while trying to use a lookup
table, rather than branching on logic for a function.  It can be avoided by
declaring the field as invariant, but I had originally used Enum as thats one
of the ways suggested by TDPL for doing CTFE.


pseudocode:

bool[256] generate_lookup_table(); // function declaration

// Performance = terrible here
enum lookup_as_enum = generate_lookup_table();

// Performance = great here
invariant lookup_as_enum = generate_lookup_table();

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331


nfxjfg gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg gmail.com




 What's happening is that the static this() constructor builds the hash table
 once. The enum version builds it every time it is used, as the enum name is
 replaced with its initializer.
That's quite hilarious. There are now half a dozen of bugs related to dmd being stupid with static data construction. E.g. see bug 4397, bug 2526, bug 2356, bug 4881. They possibly are all caused by the same underlying issue. Walter, don't you think your users are finally annoyed enough so that you could look into fixing it? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 22 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2331


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc




 Walter, don't you think your users are finally annoyed enough
 so that you could look into fixing it?
Be more gentle, please. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 22 2010