digitalmars.D - Refining AA's
- Manfred Nowak (6/6) Nov 30 2005 As walter stated, the current implementation is good for general
- Chris (6/12) Nov 30 2005 I believe he said his hashtable was good enough for general use, but
- Sean Kelly (7/11) Nov 30 2005 I would say it's insertion and lookup of datasets of reaonable size
- pragma (13/23) Nov 30 2005 Sean,
- Sean Kelly (5/21) Nov 30 2005 Yeah, the King James Bible from Project Gutenberg is my standard test
- Manfred Nowak (7/9) Dec 02 2005 [...]
- Sean Kelly (9/19) Dec 02 2005 I was cleaning up some directories about a week ago and lost my AA test
- Dave (8/18) Nov 30 2005 That's a good one as a 'stress test', IMO.
- Manfred Nowak (5/7) Dec 02 2005 I do not understand, why randomness is not suitable for general
- Sean Kelly (9/17) Dec 02 2005 I wanted to see results given a real-world example. Using a massive
- Manfred Nowak (9/12) Dec 02 2005 Word counts do not realize anything about a language, except the word
- Sean Kelly (4/18) Dec 02 2005 Only because it's an example of typical use, so it seems a reasonable
- Manfred Nowak (5/11) Dec 03 2005 What other examples of typical usage do you know and how should the
- Sean Kelly (5/16) Dec 03 2005 I suppose I should have qualified that. It's an example of how *I*
- Manfred Nowak (12/13) Dec 03 2005 [...]
- Dave (100/106) Nov 30 2005 Attached are the "Shootout Benchmark" tests involving AA's, both past an...
- Manfred Nowak (40/41) Dec 02 2005 Preliminary results after some tuning of the parameters of my
- Sean Kelly (7/12) Dec 02 2005 My unordered_set implementation doesn't support rehashing yet--I wrote
- Manfred Nowak (110/113) Dec 02 2005 I dont understand you fully. If you mean to give your implemetation
- Sean Kelly (24/33) Dec 03 2005 That's what I was asking for :-) I'll try to produce a C++ equivalent
- Manfred Nowak (6/7) Dec 03 2005 [...]
- Sean Kelly (6/14) Dec 03 2005 I just tried reversing the tests so the std::map test is performed
- Dave (5/46) Dec 02 2005 What D implementation (DMD or GDC) and version?
- Manfred Nowak (12/13) Dec 02 2005 Using DMD 0.140 under WinXP32.
- Dave (6/19) Dec 02 2005 Cool - is this something you've actually plugged into internal/aaA.d alr...
- Manfred Nowak (25/27) Dec 04 2005 Some more results for using variable sized keys:
- Manfred Nowak (25/27) Dec 04 2005 After fixing some bugs the gain table looks like this:
- Sean Kelly (18/27) Dec 04 2005 Language-level functionality should provide the basic or weak exception
- Manfred Nowak (16/20) Dec 04 2005 I fear that your answer passes the real problem without touching it.
- Sean Kelly (8/16) Dec 04 2005 I've noticed this behavior also. In fact, I've gone close to the limits...
- Dave (17/44) Dec 04 2005 Fixed key length like the 1st results, or variable like the second? If v...
- Manfred Nowak (31/43) Dec 04 2005 Variable length keys.
- Dave (12/18) Dec 04 2005 After reading:
- Manfred Nowak (15/18) Dec 04 2005 After delivering a chunk of memory to the application the GC isnt
- Sean Kelly (5/8) Dec 05 2005 Not strictly true. Some of the fancier GCs use data from the virtual
- Dave (35/53) Dec 05 2005 Ok - I gotcha now.. The earlier term 'error' threw me a little bit, but
- Manfred Nowak (28/29) Dec 07 2005 [...]
- Oskar Linde (4/29) Dec 08 2005 Can't you just make the algorithm change at a certain size?
- Manfred Nowak (5/6) Dec 08 2005 Maybe, but how to decide that point for the general case? Which
- Oskar Linde (27/32) Dec 08 2005 Hi,
As walter stated, the current implementation is good for general usage. But what are the parameters for general usage? I ask because I just implemented a faster version according to the parameters suggested by me. -manfred
Nov 30 2005
On Wed, 30 Nov 2005 08:02:18 +0000 (UTC), Manfred Nowak <svv1999 hotmail.com> wrote:As walter stated, the current implementation is good for general usage. But what are the parameters for general usage? I ask because I just implemented a faster version according to the parameters suggested by me. -manfredI believe he said his hashtable was good enough for general use, but it also scales up better than most. I don't recall for sure though, and I haven't searched. Chris
Nov 30 2005
Manfred Nowak wrote:As walter stated, the current implementation is good for general usage. But what are the parameters for general usage?I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find). Sean
Nov 30 2005
In article <dmknqb$11aa$1 digitaldaemon.com>, Sean Kelly says...Manfred Nowak wrote:Sean, On a whim, I tried looking for something bigger. As it turns out, "War and Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg: http://www.gutenberg.org/etext/2600 But it looses out to the complete King James Bible (4.24 MB): http://www.gutenberg.org/etext/10 Alas, there's no way to search based on the size of an ebook. I also just learned that they support texts in different languages and encodings. The have an "online recoding service" that can change the ASCII into UTF-8 or some random ISO codepage and so forth. Provided their converter is accurate, it could make for some nice test data. - EricAnderton at yahooAs walter stated, the current implementation is good for general usage. But what are the parameters for general usage?I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find).
Nov 30 2005
pragma wrote:On a whim, I tried looking for something bigger. As it turns out, "War and Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg: http://www.gutenberg.org/etext/2600 But it looses out to the complete King James Bible (4.24 MB): http://www.gutenberg.org/etext/10 Alas, there's no way to search based on the size of an ebook. I also just learned that they support texts in different languages and encodings. The have an "online recoding service" that can change the ASCII into UTF-8 or some random ISO codepage and so forth. Provided their converter is accurate, it could make for some nice test data.Yeah, the King James Bible from Project Gutenberg is my standard test input :-) Interesting about the UTF conversion though. I hadn't realized they offered that feature. Sean
Nov 30 2005
Sean Kelly wrote: [...]the King James Bible from Project Gutenberg is my standard test input[...] Standard wc-example yields about 100 ms spent for the operations in the AA for the King James Bible. What are your criteria for performance, if this is indeed the largest possible test data? -manfred
Dec 02 2005
Manfred Nowak wrote:Sean Kelly wrote: [...]I was cleaning up some directories about a week ago and lost my AA test in the process. I'll rewrite something when I find some free time. But the metrics I was comparing were memory usage, cache misses, and test execution time, with execution time obviously being the most important factor. I didn't make any attempt to reduce or eliminate spurious memory allocations, though when I re-do the tests I'm going to use a Slice class I wrote in C++ instead of std::string. Seanthe King James Bible from Project Gutenberg is my standard test input[...] Standard wc-example yields about 100 ms spent for the operations in the AA for the King James Bible. What are your criteria for performance, if this is indeed the largest possible test data?
Dec 02 2005
"Sean Kelly" <sean f4.ca> wrote in message news:dmknqb$11aa$1 digitaldaemon.com...Manfred Nowak wrote:That's a good one as a 'stress test', IMO. FWIW, the way I generally use them is at most 100 or so insertions followed by lots and lots of lookups and/or value updates. That would be a good "common case" test, maybe, and also stress the most important part - lookups (since obviously lookups are used for insertions and updates too) and leave the GC out of the results.As walter stated, the current implementation is good for general usage. But what are the parameters for general usage?I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find). Sean
Nov 30 2005
Sean Kelly wrote: [...](as it's the largest non-random text file I could find).I do not understand, why randomness is not suitable for general usage. -manfred
Dec 02 2005
Manfred Nowak wrote:Sean Kelly wrote: [...]I wanted to see results given a real-world example. Using a massive dataset of randomly generated gibberish, ie: "jkhfdsa jksvldijsd sajasd d asdlk wpejvccx..." seems like it would skew results slightly by reducing the chance of collisions. I suppose a fancier method might use a Markhov generator to create arbitrarily large pseudo-English text to process, but I was only testing casually and didn't want to spend the time writing the code. Sean(as it's the largest non-random text file I could find).I do not understand, why randomness is not suitable for general usage.
Dec 02 2005
Sean Kelly wrote:[...] a fancier method might use a Markhov generator to create arbitrarily large pseudo-English text to processWord counts do not realize anything about a language, except the word frequencies and the corresponding word lengthes. You can get these from the known language institutes and build a test of the size you wish, by randomizing over that frequencies and ... But I still do not understand why word counting any example from a natural language could be a measure for the performance of an AA in the general case. -manfred
Dec 02 2005
Manfred Nowak wrote:Sean Kelly wrote:Only because it's an example of typical use, so it seems a reasonable metric to use. Sean[...] a fancier method might use a Markhov generator to create arbitrarily large pseudo-English text to processWord counts do not realize anything about a language, except the word frequencies and the corresponding word lengthes. You can get these from the known language institutes and build a test of the size you wish, by randomizing over that frequencies and ... But I still do not understand why word counting any example from a natural language could be a measure for the performance of an AA in the general case.
Dec 02 2005
Sean Kelly wrote: [...]What other examples of typical usage do you know and how should the measures taken for those example be weighted? -manfredwhy word counting any example from a natural language could be a measure for the performance of an AA in the general case.Only because it's an example of typical use, so it seems a reasonable metric to use.
Dec 03 2005
Manfred Nowak wrote:Sean Kelly wrote: [...]I suppose I should have qualified that. It's an example of how *I* typically use hashtables. For an informal test, I'd rather have one that was meaningful to me. SeanWhat other examples of typical usage do you know and how should the measures taken for those example be weighted?why word counting any example from a natural language could be a measure for the performance of an AA in the general case.Only because it's an example of typical use, so it seems a reasonable metric to use.
Dec 03 2005
Sean Kelly wrote: [...]It's an example of how *I* typically use hashtables.[...] Thx. You suggest a word count for a text taken from a natural language. Dave suggests some examples from the shootout. I suggest an abstract sizeable problem. Walter surely has some more examples. Then how should I build up a scenario consisting of examples individuals personally prefer and weight them to get performance differences most agree upon? -manfred
Dec 03 2005
"Manfred Nowak" <svv1999 hotmail.com> wrote in message news:Xns971E5BF112D92svv1999hotmailcom 63.105.9.61...As walter stated, the current implementation is good for general usage. But what are the parameters for general usage? I ask because I just implemented a faster version according to the parameters suggested by me. -manfredAttached are the "Shootout Benchmark" tests involving AA's, both past and present. They've been written to minimize GC thrashing so are a good test of just the AA implementation for common AA's, and generally emphasize different parts of the ABI so as to give a good indication of overall performance, I think. Plus, by looking at how D currently does compared to other languages on the Shootout site, we could extrapulate how your new implementation looks against the other languages too. Look over at: http://shootout.alioth.debian.org/ to find out how they are currently run and for input files. Please post your results too. (P.S.: v138 and v139 of phobos were apparently compiled without optimizations and/or with debug symbols, so you need to use v140 for a fair comparision). Thanks, - Dave begin 666 aa_test.zip M+QZ:V;QY\[Z93D=XSPG/CA3CT995P^3PHDS6J(SPEEO+MF$/0,Y<S:?3^BB% MJM"6\S"EM58FM"Z;>K]LN]TN7*NLH'W7^DYMDYUG>^?&F ]R3"G88HGU'DOU M07 JE&3:WE%6NJ YTC)%\(I FT(;0N"H(%43<E7+!MY(`G596<>H.0WETW9R M?#IMLDC:AE$J;8:;7+DXB1,HE]6^]]5RT'8-%IT6%F0RSG&/&1Z V.IA*\>S MQ,<<LZC;'5W,[4W2O 42G6+B/C_!ZFC<6C=LX[7$SR(I=PL8J>.QCQY^6D"X M`` `Y6Y^,_A3"'NP`0``/P,```\```!A85]T97-T+VAA<V R+F1MD4V+VS 0 MAN_^%4- P=XXBMU#H7'3'KJTET(/[2% 1%&BB:VM+1E93 E_>T=^2,Q;(2- MAYEGYGU'7C_#KQ+AFT7AX(NIF\ZAA>]"%YTH$'Z6QCC3N0``2N>:S7K=CBDF MPM=*T$PS$'6C*MR K"6L?L!*Z4IIA)7%"D6+4(JV?,=D\$P35=T8ZZ!UDM&K M3#R&5NDBH[)V4 NEPT,I;,YS#L(6;13\]4+ JQJV?8Y5J M7PB=(X3,(9U3H MF7-_`73^X*4E8)<P']T-\W'>251CG:(.1^*M>] /G1.(XE#V\&L\*GO1K%>. M;N"]Q9]=FE.9PW+;&\A?^7W#ZWR5LU4.CY4. MGJ?3!V Z1Z/QUUETG=6T=W -_ -02P,$% ```` `Y6Y^,],KMALG! ``E L` M`!4```!A85]T97-T+VMN=6-L96]T:61E+F255MUOVS80?_=?<?$P5(H=V>F& M`HN;#EE:9T73%E "["'S`RU1-F&)U$C*65;D?]\=*5F2K23K/5C2W>\^>%_T MY!ANUQRN-&<6+E5>E)9KN&9R5;(5AYNU4E:5= ``:VN+L\G$5*R(94+9=93P MI6 R4GHU&1#L(ELI+>PZA^ >GY9+$!(N?P A5M)JL40/"2P?X*-A+(8K53XX MO4LEMUR3S"I , %56)&+?Y&1* WO2>4]VW*89PQ]*M*),6"1\3-(\ 1.OL*) MDU$P_.^2RY [R/V:O!Z9B*LT"$/'\QDCJK+FJG,.!MVPY!H_` JB\DXDTH P M=5 [?F.I0AU1,L,.MXLAJA)^).*\<';OIE'T9C$>OKO]_8\/'X8MQS51DPI9 M\J[DL?/%L7XO>*;S0HR^O=?%H2,\`\K/X=6[5R$L,1^;0PPY\L C!,X0B!6B MJ>(!CF!18'N0_3!\*EK_]K K8LRRN,R8Y>TRIMK747#C<)^^-+*-K/JLQ0PP M"*MNW(0$M>^-C/ZDP.:-L>#T&=GKNK5JX:4JI0V&5U>WPSVU1G+QC.CV.;U; M>$ K\3L`M E;8E"4RTS$WKQ="Q-4/LS^B-6^:+Z:?MC YW36;H.M$ D<U(-< M/NI4"X(&Z" "YW-T[BRV)J$3%A;2]L?E8C&Z$TN$IDI\&+PZ<'7A/67X,_&A M3E5G*FA'MK\BB7IB&M8+XV!V_.:HRM8D>:[9*N?2?M_HU%K=VY (>^$8"CQ! M#YK M0F =>4EG\IQ&JEG.W5!6KV]Q`?K7T2B$3[6'A\#Q>BK3 I!!^I.!=]3<H?<" M<?N.&KK:FO7(G&"((S ]C$X N&UPAIRW(.F!NV;SU#IH%>BN]G4GHDB,-HO% M S=^2:PZ=F2_$*&*_O;Z8[7=KDH.D3(S[[V9R?P$?G0(7QT* B^V'T9"!TMA MVE&T"/>=M61'* " (QHNYG._A;C0RE+')6Z4,-RZ=E[LR:9IXAO1:GQ)U&]I MQW8WFY6U-<_H""60A05L7F AGA%NM [;5;T ])X`;*7,+N#F3):&8290XW" M(_ !M:X[K']R69R$M:H?K"/P)'GM*:3JJZ((ASR!, 3?KYZ6WVZOE]>W< GO MWG\,9(1[H4S)BE_Q9O!!N*H[X5;K=?R6HDIXAF#W;,;&A2T&ITS]W;VN4C[X MG\[96X/VC<'PWT_W!K<)PZ'_)4R_^3+:.$A9[22J*8_*I%,F-,)8O,TGIPB3 M/%)LOX]([_K( #HHR"&-SL!95;P6?P!02P,$% ```` `Y6Y^,]]('XE+` `` M' 4``!(```!A85]T97-T+W=O<F1F<F5Q+F2%5$UO&C$0O?,K1I$2UNQB2%6E M*INDZH?:2Z4>6O70%0>#O:R5Q=YZ34C:\-\[8V\H"ZIJ"6'[O7GS9A S&<&W M2L$GIX2']W;=;+QR\%F8U4:L%'RMK/5VXP<`4'G?S":3MKOBHM;65URJA1:& M5;"U3I9._>1R,$)%O6ZL\]!ZR9>M1YOK+!SPHVV.N/&P%MHD;/";]"E_ZV%9 M"5>\G+Z^FM/=8E.6RN4!QX BH/,`A44YVPA'"'J+8+ !H[81OKR:[[7 >!GT M-)U',E9/5M^$?0H)'<;DC\&,[OHJN]Y)U3AN:*QS\K\*C!]!\UW4F#C44DPY MCZ%S!.,,L5/7F("B&"2C\)VFIYS ) 41\I<;AHJ]?(TJ&L5S<$_BHR[W>!D M[J7PHC?N^+(.!_SP-TEA/^XX?DHLJZ0+(E86WL"]J/.N`4<32R *H7C;.#R4 M":;*SLY?23CGH_8LH] LQ![TCOP52*1&8/2^"Y3GL*8C0W?J,0^Q''<M;_&/ M`]_9O7*M.O:U==JKLC8),EE/TRF_<8::NAO\`5!+`P0*``````#F;GXS```` M````````````" ```&%A7W1E<W0O4$L!`A0`% ```` `Y6Y^,U+;CO5B`0`` M; (```X``````````0` `````````&%A7W1E<W0O:&%S:"YD4$L!`A0`% `` M`` `Y6Y^,_A3"'NP`0``/P,```\``````````0` ````C $``&%A7W1E<W0O M96-K+F102P$"% `4````" #E;GXSWT ?B4L"```>!0``$ `````````!`" ` M``"5"0``86%?=&5S="]W;W)D9G)E<2YD4$L!`A0`" ``````YFY^,P`````` M`````````` ````````````0````$ P``&%A7W1E<W0O4$L%! `````&``8` ` end
Nov 30 2005
Dave wrote: [...]Please post your results too.Preliminary results after some tuning of the parameters of my scalable implementation: size ins acc t.W t.me gain 4 16 9999984 1338 1137 -15,02% 7 130 9999870 1345 1136 -15,54% 10 1000 9999000 1395 1147 -17,78% 14 16000 9984000 1554 1140 -26,64% 17 130000 9870000 2092 1256 -39,96% 20 970000 9030000 3354 1763 -47,44% 22 2400000 7600000 7367 2183 -70,37% 24 4900000 5100000 21227 3282 -84,54% 26 8000000 2000000 58396 15691 -73,13% 27 8900000 1100000 108875 51929 -52,30% 28 9400000 600000 112160 193036 72,11%* legend: size:= binary logarithm of the size of the abstract problem ins:= number of insertions into the AA acc:= number of accesses to elements of the AA t.W:= total time in ms needed by Walters implementation t.me:= total time in ms needed by my implementation gain:= relative time gain after subtracting 770 ms, which is the time needed with no accesses to AA's *:= data insufficient comments: 1. My implementation uses more space than Walters. Therefore the time gain starts to stall at about 5M elements inserted and it collapses when about 9M elements are inserted, based on 2GB RAM of my machine. 2. One try with an extreme setting of the parameters of the implementation gave the hint, that a time reduction to about 15% of Walters implementation up to 5M elements might be possible, but dmd needed nearly an hour to produce the executable for that one try :-( 3. According to the answers in the thread my implementation currently is not suitable for general use, because the abstract problem I used was restricted to fixed length keys. I am experimenting with an appropriate extension. -manfred
Dec 02 2005
Manfred Nowak wrote:3. According to the answers in the thread my implementation currently is not suitable for general use, because the abstract problem I used was restricted to fixed length keys. I am experimenting with an appropriate extension.My unordered_set implementation doesn't support rehashing yet--I wrote it for use in a compiler and didn't have the time to bother with it. If you have a succinct test example I could run it on I'll give it a shot, but it'll have a bit of an unfair advantage until I get around to finishing it. Sean
Dec 02 2005
Sean Kelly wrote: [...]If you have a succinct test example I could run it on I'll give it a shot, but it'll have a bit of an unfair advantage until I get around to finishing it.I dont understand you fully. If you mean to give your implemetation a try on my framweork, here it is: -manfred /+ The abstract problem asks for the time needed for n value changes for keys somehow randomly choosen out of the range [ 0 .. s-1]. The abstract problem has two parameters n and s. n is the total number of operations on the AA. Initially I set it, so that the resulting total time was within a resonable limit, i.e. 60 s. But it turned out, to be also 10% more than the number of elements, that fit into the RAM of the target machine. Therefore the latter aproximation should be choosen. s is the size of the space from which the keys are drawn. It is reasonable to choose s to be a power of 2. The "somehow" in the description above reduces the randomness of the draw according to the well known 80:20 rule, i.e. 80% of the operations are executed on 20% of the key space. Compilation parameters: -version=a run will give the time needed without any accesses to AA -version=aaa run will give the time needed for the current implementation of the AA -version=aa run will give the time needed for the candidate -version=aa -debug=asrt will test the candidates correctness against the current implementation +/ void main(){ alias HighPerformanceCounter Timer; Timer time; time= new Timer; const uint ttotal= 64*1024*1024, // Problem size s tsmall= ttotal/ 5; // i.e. the target space const uint ntotal= 10_000_000, // total number of operations n nsmall= ntotal/4, nlarge= ntotal-nsmall; time.start(); version( aaa){ uint[ uint] aaa; uint r= void; for( int i= 1; i<= nlarge; i++){ r= rand(); aaa[ (r)%tsmall]= i; } for ( int i= 1; i<= nsmall; i++){ r= rand(); aaa[ (r)%ttotal]= i; } writef( aaa.length, " "); } else { version( aa){ AA aa= new AA; debug(asrt)uint[ uint] a; uint r= void; for( int i= 1; i<= nlarge; i++){ r= rand(); aa.insert( (r)%tsmall, i); debug(asrt) a[ (r)%tsmall] =i; debug assert( aa.retrieveRval( (r)%tsmall) == i); debug { writef( a.length , " " , aa.length, " ", (r)%tsmall); for( int d= aa.depth; d>0; d--) writef( " ", aa.position( (r)%tsmall,d)); writefln( ""); } debug assert( a.length == aa.length); debug foreach( uint k, uint v; a){ assert( aa.retrieveRval( k) == v); } } for( int i= 1; i<= nsmall; i++){ r= rand(); aa.insert( (r)%ttotal, i); debug(asrt) a[ (r)%ttotal] =i; debug assert( aa.retrieveRval( (r)%ttotal) == i); debug assert( a.length == aa.length()); debug foreach( uint k, uint v; a){ assert( aa.retrieveRval( k) == v); } } debug(asrt) assert( a.length == aa.length()); debug(asrt) foreach( uint k, uint v; a){ assert( aa.retrieveRval( k) == v); } writef( aa.length(), " "); } else { uint r= void; for( int i= 1; i<= nlarge; i++){ r= rand(); ( (r)%tsmall, i); } for( int i= 1; i<= nsmall; i++){ r= rand(); ( (r)%ttotal, i); } } } time.stop(); writefln( time.milliseconds()); } }
Dec 02 2005
Manfred Nowak wrote:Sean Kelly wrote: [...]That's what I was asking for :-) I'll try to produce a C++ equivalent this weekend. And for what it's worth, I mocked up a quick word-count test for std::map vs. my AA implementation. Here are the results of a run on a 5+ meg text file (the KJ Bible). Assume my unordered_map implementation is fixed at using 4097 hash buckets: Testing std::map... Execution Time: 1.752000 Unique Words: 61918 Page Faults: 607 Memory Used: 2490368 Testing unordered_map... Execution Time: 0.621000 Unique Words: 61918 Min Bucket Size: 3 Max Bucket Size: 34 Average Bucket Size: 15 Page Faults: 338 Memory Used: 1384448 Not bad at almost 1/3 the time of std::map, and it would be better if I added more buckets. For what it's worth, the main test program is available here: http://home.f4.ca/sean/d/aa_test.cpp SeanIf you have a succinct test example I could run it on I'll give it a shot, but it'll have a bit of an unfair advantage until I get around to finishing it.I dont understand you fully. If you mean to give your implemetation a try on my framweork, here it is:
Dec 03 2005
Sean Kelly wrote: [...]Not bad at almost 1/3 the time of std::map[...] The test looks biased because the burden of allocating memory to the process might lay on the call for std::map. -manfred
Dec 03 2005
Manfred Nowak wrote:Sean Kelly wrote: [...]I just tried reversing the tests so the std::map test is performed second, and the results were the same. But as I said earlier, my AA implementation has a somewhat unfair advantage in that it doesn't rehash automatically. SeanNot bad at almost 1/3 the time of std::map[...] The test looks biased because the burden of allocating memory to the process might lay on the call for std::map.
Dec 03 2005
In article <Xns9720E101439BFsvv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Dave wrote: [...]What D implementation (DMD or GDC) and version? Thanks, - DavePlease post your results too.Preliminary results after some tuning of the parameters of my scalable implementation: size ins acc t.W t.me gain 4 16 9999984 1338 1137 -15,02% 7 130 9999870 1345 1136 -15,54% 10 1000 9999000 1395 1147 -17,78% 14 16000 9984000 1554 1140 -26,64% 17 130000 9870000 2092 1256 -39,96% 20 970000 9030000 3354 1763 -47,44% 22 2400000 7600000 7367 2183 -70,37% 24 4900000 5100000 21227 3282 -84,54% 26 8000000 2000000 58396 15691 -73,13% 27 8900000 1100000 108875 51929 -52,30% 28 9400000 600000 112160 193036 72,11%* legend: size:= binary logarithm of the size of the abstract problem ins:= number of insertions into the AA acc:= number of accesses to elements of the AA t.W:= total time in ms needed by Walters implementation t.me:= total time in ms needed by my implementation gain:= relative time gain after subtracting 770 ms, which is the time needed with no accesses to AA's *:= data insufficient comments: 1. My implementation uses more space than Walters. Therefore the time gain starts to stall at about 5M elements inserted and it collapses when about 9M elements are inserted, based on 2GB RAM of my machine. 2. One try with an extreme setting of the parameters of the implementation gave the hint, that a time reduction to about 15% of Walters implementation up to 5M elements might be possible, but dmd needed nearly an hour to produce the executable for that one try :-( 3. According to the answers in the thread my implementation currently is not suitable for general use, because the abstract problem I used was restricted to fixed length keys. I am experimenting with an appropriate extension. -manfred
Dec 02 2005
Dave wrote: [...]What D implementation (DMD or GDC) and version?Using DMD 0.140 under WinXP32. Times reported are lowest of three consecutive runs. Times reported are the real times reported by the time command of cygwin. Remark: Although WinXP32 is limited to less than 4GB of RAM, the memory management seem to have difficulties to deliver memory chunks, when a single thread needs more than 1 GB, which is far beyond the absolute limit. -manfred
Dec 02 2005
In article <Xns972166DCE52Asvv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Dave wrote: [...]Cool - is this something you've actually plugged into internal/aaA.d already to test? If so post the code if you want, and I'll give it a shot with the benchmarks I sent earlier and post the results (I already have the input files handy, etc.). - DaveWhat D implementation (DMD or GDC) and version?Using DMD 0.140 under WinXP32.Times reported are lowest of three consecutive runs. Times reported are the real times reported by the time command of cygwin. Remark: Although WinXP32 is limited to less than 4GB of RAM, the memory management seem to have difficulties to deliver memory chunks, when a single thread needs more than 1 GB, which is far beyond the absolute limit. -manfred
Dec 02 2005
Manfred Nowak wrote: [...]Preliminary results after some tuning of the parameters of my scalable implementation:Some more results for using variable sized keys: P.size Ins Acc t.W t.me Cal Gain 4 16 9999984 4873 4773 3495 -7,26% 5 32 4902 5032 3646 10,35% 6 64 5098 5174 3775 5,74% 7 130 9999870 5237 5410 4000 13,99% 10 1000 9999000 5897 6065 4477 11,83% 11 2000 6271 6398 4733 8,26% 12 4100 6429 6502 4827 4,56% 13 8200 6774 6821 5051 2,73% 14 16000 9984000 7225 7139 5279 -4,42% 17 130000 9870000 13056 8606 5844 -61,70% 20 970000 9030000 17572 10993 6300 -58,37% 22 2400000 7600000 44757 13709 6517 -81,19% 24 4900000 5100000 138197 30409 7011 -82,16% 26 8000000 2000000 245284 165341 7275 -33,59% As one can see, this implementation is up to 10000 elements up to 15% slower and collapses at about 8,200,000 elements. In the range from 20,000 to 8,000,000 elements it is significantly faster. How to interpret this outcome if fine tuning will not bring any advance in the range up to 10,000 elements? -manfred
Dec 04 2005
Manfred Nowak wrote:Preliminary results after some tuning of the parameters of my scalable implementation:After fixing some bugs the gain table looks like this: size Gain 4 -31,57% 5 -28,42% 6 -29,10% 7 -14,63% 10 -14,15% 11 -22,89% 12 -24,09% 13 -22,29% 14 -24,36% 17 -68,07% 20 -61,68% 22 -82,17% 24 -82,36% 26 -33,69% An additional problem is left for discussion: Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight? Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation. -manfred
Dec 04 2005
Manfred Nowak wrote:An additional problem is left for discussion: Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight?Language-level functionality should provide the basic or weak exception guarantees, depending on context. Frankly, so should library code. I ran into this problem with my unordered_map implementation--the rehash function is required to provide the weak guarantee (that the contents of the table will be unchanged if an exception is thrown) and since I was relying on using a std::list to hold all the table elements, this meant that I would have to temporarily double memory usage by creating copies of all stored values to ensure I could fall back on the original list if an insertion failed. The alternate would be to use a custom linked-list where I could manipulate node pointers directly and therefore avoid ctor calls for creating/copying stored objects.Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation.See above. I consider this to be a documentation issue, as there should be explicit guarantees about data structure stability. However, since D objects are accessed via handles, the only issue should be OutOfMemory exceptions, and it's generally not too hard to preallocate needed memory before modifying data. Sean
Dec 04 2005
Sean Kelly wrote: [...]However, since D objects are accessed via handles, the only issue should be OutOfMemory exceptions, and it's generally not too hard to preallocate needed memory before modifying data.I fear that your answer passes the real problem without touching it. I reported, that the runtime collapsed under winXP, when the problem size approached real memory limits. winXP reported 2.3 GB needed in total whereas only 2.0 GB real memory were available. That led to very long times, 20 seconds, winXP was accessing the swapfile, shoveling about 600 MB around each time, interspersed by break outs of CPU activity. No OutOfMemory exception was thrown. Maybe because the threads memory consumption did not reach the 2.0 GB real memory limit. But maybe also, that winXP will report OutOfMemory only, when the virtual address space of 1 TB is reached? In fact I do not remember having had any OutOfMemory exception under winXP and filed a bug report on that issue also. -manfred
Dec 04 2005
Manfred Nowak wrote:No OutOfMemory exception was thrown. Maybe because the threads memory consumption did not reach the 2.0 GB real memory limit. But maybe also, that winXP will report OutOfMemory only, when the virtual address space of 1 TB is reached? In fact I do not remember having had any OutOfMemory exception under winXP and filed a bug report on that issue also.I've noticed this behavior also. In fact, I've gone close to the limits of a fixed-sized swapfile and the OS told me it was going to try and create more virtual memory. Though I didn't track things exactly so I'm not entirely certain if I ever actually exceeded my virtual memory limits or I just passed the 1 GB barrier (my laptop has 512MB RAM and a 768MB swapfile IIRC). Sean
Dec 04 2005
In article <Xns9722ABBE9EA2Fsvv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Manfred Nowak wrote:Fixed key length like the 1st results, or variable like the second? If variable, what does the set look like? Is it a 'word frequency' test similiar to Sean's test? How about some numbers on the shootout benchmarks - Is it too time consuming to run those? No need to instrument them to isolate just the AA operations because it's important and meaningful to see how your implementation runs in the context of the other runtime overhead as well (you mentioned that it consumes quite a bit more memory than the current AA impl). The code was posted publically earlier so everyone (or is it just you, me and Sean interested in this?) in this group could look at the numbers *and* the code and get a better idea of exactly what the benefits of your implementation are for different sets.Preliminary results after some tuning of the parameters of my scalable implementation:After fixing some bugs the gain table looks like this:size Gain 4 -31,57% 5 -28,42% 6 -29,10% 7 -14,63% 10 -14,15% 11 -22,89% 12 -24,09% 13 -22,29% 14 -24,36% 17 -68,07% 20 -61,68% 22 -82,17% 24 -82,36% 26 -33,69% An additional problem is left for discussion: Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight? Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation.Does your impl. rely on the GC now? If so, I'd say you can leave these concerns to the language and GC implementors, which would be consistent with the rest of the D RT lib.-manfred
Dec 04 2005
Dave wrote: [...]Fixed key length like the 1st results, or variable like the second? If variable, what does the set look like? Is it a 'word frequency' test similiar to Sean's test?Variable length keys. The framework used is posted here: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30849 For the variable length keys test I replaced the assignments ! r= rand(); ! aaa[ (r)%tsmall]= i; with ! r= rand(); ! sprintf( buf, toStringz("%d"), r%tsmall); ! aaa[ toString( buf)]= i; and then adjusted the other calls and declarations. The results of the framework show, that the underlying abstract problem, which is choosen without any respect to a particular implementation, smoothes very well from mostly accesses to mostly insertions.How about some numbers on the shootout benchmarks - Is it too time consuming to run those? No need to instrument them to isolate just the AA operations because it's important and meaningful to see how your implementation runs in the context of the other runtime overhead as well (you mentioned that it consumes quite a bit more memory than the current AA impl).I will do shootout benchmarks next. The test runs show, that the average memory consumption increases by about factor 7 while the time consumption decreases up to divisor 5. Therefore even under the optimization criteria S*T**2 it seems to be a gain ;-) [...]Does your impl. rely on the GC now? If so, I'd say you can leave these concerns to the language and GC implementors, which would be consistent with the rest of the D RT lib.I do not understand this remark. Are you talking about management of the real memory whilst I was asking for the management of the virtual memory? Hmmmm... AA's are the core of management of the virtual memory, aren't they? -manfred
Dec 04 2005
I do not understand this remark. Are you talking about management of the real memory whilst I was asking for the management of the virtual memory? Hmmmm... AA's are the core of management of the virtual memory, aren't they? -manfredAfter reading: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30905 I'm assuming that you are using the GC for the 'internals' of your implementation, and now I have a better understanding of what you were getting at. IMHO, it is best to let the GC handle these issues (like the rest of the D runtime library implementation), and let a future 'better' allocator/GC system account for exceptional memory conditions. That way the AA implementation will be stay consistent with the rest, along with being simpler to develop, debug and maintain. The current GC might not address the situation you describe very well right now, but that can always be addressed in the future if it becomes a serious problem.
Dec 04 2005
Dave wrote: [...]The current GC might not address the situation you describe very well right now, but that can always be addressed in the future if it becomes a serious problem.After delivering a chunk of memory to the application the GC isnt aware of that memory anymore, except when allocating more memory. Therefore a part of the time consuming swapping might stem from the activities of the GC, but that is the minor part. Every time the application accesses a segment currently swapped out, the OS will need to swap it in again. That is the major reason for swapping and has nothing to do with the GC. The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it. But your argument together with the necessary coupling of the GC with the virtual managemnt looks like every language with a GC must also implement a manager for the virtual memory. -manfred
Dec 04 2005
Manfred Nowak wrote:The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it.Not strictly true. Some of the fancier GCs use data from the virtual memory manager to avoid paging. Granted, this locks a GC implementation to a specific platform, but it's not unheard of. Sean
Dec 05 2005
In article <Xns97235303CDD71svv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Dave wrote: [...]Ok - I gotcha now.. The earlier term 'error' threw me a little bit, but basically you're talking about optimizing with regard to memory swapping because of it's impact on performance, right? I hope so, because no one should have to think this hard right after a cold beer <g> Temporal and spatial locality issues have been a big area GC research for years now from what I gather. Of course, one of the big advantages of generational, compacting collectors is that they help to minimize swapping both during collection and (generally) give better locality for the current working set in the program itself. And even if those may not be the most efficient in terms of CPU instructions, they seem to be proving to be the best for CPU *cycle* efficiency. From what I've read (and seen a bit of, actually), a non-generational, non-compacting GC like the current one for D would be the biggest 'offender' for swapping and VM 'abuse' during collection as it scans over what could be the entire memory space depending on the working set. An AA implementation that pre-allocates a really large amount of memory may actually be defeating the purpose - making the job of the GC harder and actually increasing swapping both during AA accesses and then when the collector kicks in. It's also very possible that designing a virtual memory manager tuned to something specific like an AA implementation could actually be self defeating in the context of a GC, and increase swapping as the VMM and GC compete for locality of the AA memory and live memory for other data in the program. All this to say that I don't think a separate virtual memory management system tuned for one part of the runtime system would really be worth the effort and might be self-defeating. The GC implementation is really the key, IMHO. The AA implementation should really play in the same sandbox as the rest <g> One idea is to make the memory pre-allocation for your AA implementation reverse logoritmic - as the AA set gets larger, the ratio of pre-allocated memory decreases. That should give really good performance for the more common, smaller sets and help to decrease swapping as the sets get larger. Plus it would help out the GC as it deals with other live data in the program. - DaveThe current GC might not address the situation you describe very well right now, but that can always be addressed in the future if it becomes a serious problem.After delivering a chunk of memory to the application the GC isnt aware of that memory anymore, except when allocating more memory. Therefore a part of the time consuming swapping might stem from the activities of the GC, but that is the minor part. Every time the application accesses a segment currently swapped out, the OS will need to swap it in again. That is the major reason for swapping and has nothing to do with the GC. The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it. But your argument together with the necessary coupling of the GC with the virtual managemnt looks like every language with a GC must also implement a manager for the virtual memory. -manfred
Dec 05 2005
Manfred Nowak wrote: [...]I will do shootout benchmarks next.[...] First results on hash.d Size t.curr t.me t.cal Gain 4 14,6 16 14,6 7 14,7 16 14,7 10 15,3 17,4 15,2 11 16,7 18,4 15,9 212,50% 12 18,9 21,5 17,2 152,94% 14 32,5 40 26,5 125,00% 15 61 62 40 4,76% 16 108 106 65 -4,65% 17 291 214 123 -45,83% 20 2610 2015 909 -34,98% 22 25094 17466 1917 -32,91% 23 96496 69447 3950 -29,23% 24 >180000 371049 Time for current implementation is not measurable for parameter values less 1000. As already shown, my implementation is faster for larger amounts of stored data, but because increased memory allocation reaches the limits of physical memory earlier and then collapses. Some constants of the implementation can be adjusted so that losses in cases of fewer elements are less, but gains in cases of more lements are also lower. How to decide for the general case? -manfred
Dec 07 2005
In article <Xns9725F3778AE4Dsvv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Manfred Nowak wrote: [...]Can't you just make the algorithm change at a certain size? /OskarI will do shootout benchmarks next.[...] First results on hash.d Size t.curr t.me t.cal Gain 4 14,6 16 14,6 7 14,7 16 14,7 10 15,3 17,4 15,2 11 16,7 18,4 15,9 212,50% 12 18,9 21,5 17,2 152,94% 14 32,5 40 26,5 125,00% 15 61 62 40 4,76% 16 108 106 65 -4,65% 17 291 214 123 -45,83% 20 2610 2015 909 -34,98% 22 25094 17466 1917 -32,91% 23 96496 69447 3950 -29,23% 24 >180000 371049 Time for current implementation is not measurable for parameter values less 1000. As already shown, my implementation is faster for larger amounts of stored data, but because increased memory allocation reaches the limits of physical memory earlier and then collapses. How to decide for the general case?
Dec 08 2005
Oskar Linde wrote: [...]Can't you just make the algorithm change at a certain size?Maybe, but how to decide that point for the general case? Which measure should be taken? -manfred
Dec 08 2005
Hi, In article <Xns9726A6F56EEE0svv1999hotmailcom 63.105.9.61>, Manfred Nowak says...Oskar Linde wrote: [...]It's extremly difficult to define a general case. I'd guess that in most cases, AA are used for convenience, neither speed nor size needs to be optimized. Then there are the specific uses with specific needs. It is impossible to make a optimal choice without having the case that should be optimized. The goal for a general implementation should probably be to perform decently in many cases than excellently in few. Performance tuning parameters are easy to change in the future. I'd say, make your best guess at a general usage case, and tune the parameters to this case. Then verify your parameters against a number of different test cases (different hash functions: good/poor, slow/fast, different usage patterns: ratio of insertion, deletes and lookups, different size of key/value data, etc...). Make sure that there are no places where you are considerably worse than a reference implementation. I think you will find that cut-off points between different algorithms are not very sensitive. They would probably be placed in the middle of a range where the algorithms perform similarly. If you by measure mean time vs space I'd say that since hash tables primary feature is time complexity, speed should be more important than memory usage, but there should also be a weighing between those. A 10 % speed increase at the cost of 100 % memory usage is probably not worth it for the general case. (I feel like I've only written vague and obvious things. Apologies for the waste of bandwidth :) Regards, OskarCan't you just make the algorithm change at a certain size?Maybe, but how to decide that point for the general case? Which measure should be taken?
Dec 08 2005