digitalmars.D - Need a Faster Compressor
- Walter Bright (5/5) May 21 2016 The current one is effective, but slow:
- Era Scarecrow (9/10) May 21 2016 I recall being able to get a marginal boost in speed by assigning
- Walter Bright (7/9) May 21 2016 Give it a try, see if it pans out.
- Era Scarecrow (7/13) May 21 2016 I'm going to try for a lookback and scan which uses quite a bit
- Walter Bright (2/7) May 21 2016 Just a note: large lookup tables can have cold cache performance problem...
- Era Scarecrow (3/8) May 21 2016 I know, if it doesn't fit in 64k then you can't rely on the
- Era Scarecrow (8/10) May 21 2016 Finished. It's blazing fast and even compresses in my test
- Era Scarecrow (12/16) May 21 2016 With 1 Million iterations:
- Walter Bright (2/12) May 22 2016 It is promising. Need more!
- Walter Bright (5/6) May 22 2016 I did figure out an improvement to the existing algorithm whereby it cou...
- Era Scarecrow (10/19) May 22 2016 Well there's other good news. While I was working with about 2Mb
- Era Scarecrow (5/20) May 22 2016 Oh noes! Making the tiny change to id_compress and it's faster!!!
- Era Scarecrow (4/10) May 22 2016 And shrinking the lookback to a mere 127 increases it faster but
- Stefan Koch (2/6) May 22 2016 Nice results.
- Era Scarecrow (12/13) May 22 2016 This is with one sample to test against. I need a much bigger
- Marco Leise (8/25) May 22 2016 That would be my question, too. What is the allowed alphabet?
- Era Scarecrow (8/16) May 22 2016 Actually if compressed text (which uses symbols 128-155) are
- Walter Bright (2/5) May 22 2016 You're doing the gods' work, Era!
- Era Scarecrow (18/23) May 22 2016 Maybe... I have to wonder if LZ4 from the looks of things will
- Era Scarecrow (29/30) May 22 2016 Well here's the rundown of some numbers. min_compress uses a tiny
- ZombineDev (3/32) May 23 2016 If you want, you can give the test case for issue 16039 a try. It
- Walter Bright (13/14) May 22 2016 My idea for speeding things up is every time a new character c is matche...
- Walter Bright (6/9) May 22 2016 Currently there's a bug in id_compress() where it doesn't take into acco...
- Walter Bright (2/19) May 22 2016 There's also https://github.com/dlang/dmd/pull/5799 which should make th...
- deadalnix (2/4) May 22 2016 16k lookup table are the gold standard.
- Walter Bright (3/6) May 22 2016 I like 64 bit vectors because the lookup and test can be done in one
- Wyatt (4/6) May 23 2016 Maybe consider lz4 instead? Tends to be a bit faster, and it's
- Wyatt (2/3) May 23 2016 Disregard that: I see it's come up already.
- Timon Gehr (2/7) May 21 2016 Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?
- Timon Gehr (3/13) May 21 2016 (Those are worst case bounds, not sure what's better on average on D
- Walter Bright (9/10) May 21 2016 I don't understand the terms you use, but as to the "why" it is based on...
- Guillaume Boucher (12/22) May 21 2016 Sorry if I didn't memorize everything in this forum from the last
- Stefan Koch (3/5) May 21 2016 Exactly. It always is. Since you know where to look for
- Walter Bright (6/8) May 21 2016 DMC++ matches the Microsoft name mangling scheme, which includes such
- Rainer Schuetze (37/46) May 22 2016 You are right about the symbols using the VC mangling. The test case
- Era Scarecrow (3/8) May 22 2016 Unless ? is the proper/valid character those are far more likely
- Rainer Schuetze (4/14) May 22 2016 ? is an allowed character in VC++ and for example permits unambiguous
- Era Scarecrow (4/10) May 22 2016 Not quite what I asked. Is the sample provided accurate? Is
- Rainer Schuetze (4/15) May 22 2016 The 3 consecutive ?s are also in the object file, so they are actually
- Marco Leise (10/13) May 22 2016 In general I agree. The extended characters may be swallowed
- Walter Bright (8/14) May 22 2016 Haha, thus showing why people should never invent their own ad-hoc crypt...
- Timon Gehr (39/51) May 23 2016 E.g. the Knuth-Morris-Pratt (KMP) string search algorithm can be
- Walter Bright (7/11) May 23 2016 The trouble with that is the mangled string is formed from component pie...
- Timon Gehr (10/24) May 23 2016 Then don't do that. I.e. re-mangle recursively from scratch for each
- Walter Bright (9/17) May 23 2016 How is that essentially different from running a compression pass? The o...
- Stefan Koch (7/7) May 23 2016 Just a heads up on the LZ4.
- Walter Bright (3/9) May 23 2016 I'm not sure why you're working on a decompressor. The speed needs to be...
- Timon Gehr (10/36) May 24 2016 No. Just increase the recursion depth by a small number of levels to
- Stefan Koch (5/22) May 24 2016 I completely agree!
- deadalnix (2/33) May 24 2016 KAMOULOX!
- Anon (54/59) May 24 2016 Since Walter is evidently still having a hard time understanding
- Walter Bright (5/9) May 24 2016 Because identifiers are generated from substrings that don't know about ...
- Marco Leise (6/19) May 25 2016 Let's go this route. It seems to be a scalable solution
- Walter Bright (6/7) May 24 2016 BTW, all types in dmd have a 'deco' string which is the AST linearized a...
- Guillaume Boucher (6/15) May 24 2016 There's no reason not to use the compression in the deco name.
- Walter Bright (2/4) May 24 2016 Not totally, since you'd like to refer to types that are common across d...
- Stefan Koch (6/12) May 24 2016 On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:
- Walter Bright (3/16) May 24 2016 The types do exist in the compiler, and are used in template name genera...
- Stefan Koch (4/13) May 24 2016 But would one ever have to be able to reconstruct the mangle ?
- Walter Bright (3/6) May 24 2016 Of course they can!
- Guillaume Boucher (21/29) May 24 2016 Yes, not all redundancy is covered. However, I think all
- Stefan Koch (4/11) May 21 2016 Is LZ4 acceptable ?
- Lionello Lunesu (4/16) May 21 2016 Agree. I've had the best results with LZ4. We can either link to the C
- Stefan Koch (2/9) May 21 2016 Could you post a link to code you use for benchmarking ?
- Walter Bright (3/4) May 21 2016 In the comments:
- qznc (8/15) May 22 2016 A quick search led to this SO thread:
- Era Scarecrow (11/17) May 22 2016 Curiously premaking a few primers as such for languages can give
- Marco Leise (5/17) May 22 2016 LZ4 can seemingly use a dictionary, too.
- Andrei Alexandrescu (2/18) May 22 2016 A primed dictionary is a great idea. -- Andrei
- Marco Leise (10/18) May 22 2016 I can write something 2 times more compact and 16 times faster
- Marco Leise (14/14) May 22 2016 Content-Disposition: inline
- Stefan Koch (2/13) May 22 2016 Please submit a PR.
- Marco Leise (13/31) May 22 2016 There are a few open questions, for me at least. Are there
- Stefan Koch (14/23) May 22 2016 Here are my answers please take them with a grain of ignorance.
- Marco Leise (11/16) May 22 2016 Maybe you are right about the dictionary. I haven't put much
- Stefan Koch (5/6) May 22 2016 I am already working on a non-ctfeable version that does not use
- Era Scarecrow (19/28) May 22 2016 How it's being benchmarked does make me question it a bit. I
- Walter Bright (7/14) May 22 2016 Not yet.
- Marco Leise (11/33) May 22 2016 Can you elaborate on how copying the source overrules your
- Walter Bright (6/11) May 22 2016 It does not. It just avoids having to link in a library that may or may ...
- Walter Bright (3/3) May 22 2016 The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html
- Marco Leise (8/12) May 23 2016 Ok, any volunteers? I'm not personally looking forward to
- Stefan Koch (8/14) May 23 2016 That it right. It's pretty simple.
- Walter Bright (3/11) May 23 2016 Also, the LZ4 compressor posted here has a 64K string limit, which won't...
- Stefan Koch (5/8) May 23 2016 This is only partially true.
- deadalnix (2/11) May 23 2016 If you want speed: 16k (aka half of the L1D).
- Walter Bright (3/9) May 23 2016 The starting line of the compression function posted here tests for it a...
- Era Scarecrow (18/22) May 24 2016 So Walter, big question: How big before the compressor kicks in?
- Walter Bright (3/6) May 24 2016 Yes, but it's a judgement call.
- Era Scarecrow (3/9) May 24 2016 64 bytes? Hmm I had the impression it was going to be far
- Era Scarecrow (18/20) Oct 17 2016 Well it's been a while since I last touched this topic (or code)
- Marco Leise (5/7) May 23 2016 Nice, if it can keep the compression speed up.
- Martin Nowak (7/10) May 22 2016 Yes, LZO, LZ4, or snappy would be good choices. They are
- Stefan Koch (4/10) May 22 2016 Agreed. The existing implementations have been fine-tuned for
- deadalnix (3/10) May 22 2016 Mandatory goldmine on compression:
The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?
May 21 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:So really, how good are you at fast code?I recall being able to get a marginal boost in speed by assigning the first 1-2 bytes in preassigned buckets (256 & 65536 respectively), so you have an immediate 1-2 byte matches (skipping the chafe so to speak). Unfortunately memory-wise it's highly inefficient and requires 2 passes over the data, and probably doesn't play well with cache. I assume this is related to compressing symbols thread? I mentioned possibly considering the LZO library.
May 21 2016
On 5/21/2016 2:27 PM, Era Scarecrow wrote:I assume this is related to compressing symbols thread?Yes.I mentioned possibly considering the LZO library.Give it a try, see if it pans out. Other possibilities are the lookback dictionary is fixed at 1023 characters. Experimenting and benchmarking can indicate if it should be more or less. Perhaps 512 will yield just as good compression on real world identifier strings. Or try another scheme entirely, like huffman coding.
May 21 2016
On Saturday, 21 May 2016 at 22:01:40 UTC, Walter Bright wrote:Give it a try, see if it pans out. Other possibilities are the lookback dictionary is fixed at 1023 characters. Experimenting and benchmarking can indicate if it should be more or less. Perhaps 512 will yield just as good compression on real world identifier strings. Or try another scheme entirely, like huffman coding.I'm going to try for a lookback and scan which uses quite a bit of memory. On the other hand you can reuse the memory over and over again, so that isn't an issue. Although it will take up a Meg in memory. Also written entirely in D, so... I think I'll go with a similar setup you have for the encoding (3 bytes). Should be nearly identical in output.
May 21 2016
On 5/21/2016 5:52 PM, Era Scarecrow wrote:I'm going to try for a lookback and scan which uses quite a bit of memory. On the other hand you can reuse the memory over and over again, so that isn't an issue. Although it will take up a Meg in memory. Also written entirely in D, so... I think I'll go with a similar setup you have for the encoding (3 bytes). Should be nearly identical in output.Just a note: large lookup tables can have cold cache performance problems.
May 21 2016
On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:On 5/21/2016 5:52 PM, Era Scarecrow wrote:I know, if it doesn't fit in 64k then you can't rely on the cache...I'm going to try for a lookback and scan which uses quite a bit of memory.Just a note: large lookup tables can have cold cache performance problems.
May 21 2016
On Sunday, 22 May 2016 at 01:46:23 UTC, Era Scarecrow wrote:I know, if it doesn't fit in 64k then you can't rely on the cache...Finished. It's blazing fast and even compresses in my test example as good or better than gzip/zlib (although maybe that's because it's headerless). Now I just need to stress test it against some really large input and check for failures. Regardless it has a compression and decompression functions, along with benchmarking it vs the compression algorithm I'm trying to beat.
May 21 2016
On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote:Now I just need to stress test it against some really large input and check for failures. Regardless it has a compression and decompression functions, along with benchmarking it vs the compression algorithm I'm trying to beat.With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right). I won't take this as the final verdict, but it is promising at least... Need a larger test sample to benchmark both functions against. Sample data is "E.s!(E.s!(E.s!(E.s!(E.s!(int).s(int).Result).s(E.s!(int)." etc from the bug listing: https://issues.dlang.org/show_bug.cgi?id=15831#c4
May 21 2016
On 5/21/2016 11:26 PM, Era Scarecrow wrote:On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote:It is promising. Need more!Now I just need to stress test it against some really large input and check for failures. Regardless it has a compression and decompression functions, along with benchmarking it vs the compression algorithm I'm trying to beat.With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right). I won't take this as the final verdict, but it is promising at least... Need a larger test sample to benchmark both functions against.
May 22 2016
On 5/22/2016 1:50 AM, Walter Bright wrote:It is promising. Need more!I did figure out an improvement to the existing algorithm whereby it could skip forward by the length of the existing match rather than test every character (when it's looking for a longer match). But I have no clue how much of a speedup that would be.
May 22 2016
On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:On 5/21/2016 11:26 PM, Era Scarecrow wrote:Well there's other good news. While I was working with about 2Mb for the data size in general, it can be reduced it looks like to about 8k-16k, and run entirely on the stack. It doesn't appear to gain any speed, or any speed gained is lost with better memory management. Although based on how things look, the id_compress might perform better with a max window of 255 and max match of 127. I'll give the original one a tweak based on my own findings and see if it turns out true.With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right).It is promising. Need more!
May 22 2016
On Sunday, 22 May 2016 at 09:07:07 UTC, Era Scarecrow wrote:On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:Oh noes! Making the tiny change to id_compress and it's faster!!! Now i can't take over the world!!! new_compress: TickDuration(306321939) 21% faster modified idcompress: TickDuration(269275629) 31% fasterOn 5/21/2016 11:26 PM, Era Scarecrow wrote:Although based on how things look, the id_compress might perform better with a max window of 255 and max match of 127. I'll give the original one a tweak based on my own findings and see if it turns out true.With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right).It is promising. Need more!
May 22 2016
On Sunday, 22 May 2016 at 09:19:54 UTC, Era Scarecrow wrote:On 5/21/2016 11:26 PM, Era Scarecrow wrote:And shrinking the lookback to a mere 127 increases it faster but compression starts taking a hit. Can't shrink it anymore anyways. modified id_compress2: TickDuration(230528074) 41% faster!!With 1 Million iterations: id_compress: TickDuration(385806589) (original/baseline)modified id_compress: TickDuration(269275629) 31% faster
May 22 2016
On Sunday, 22 May 2016 at 09:43:29 UTC, Era Scarecrow wrote:And shrinking the lookback to a mere 127 increases it faster but compression starts taking a hit. Can't shrink it anymore anyways. modified id_compress2: TickDuration(230528074) 41% faster!!Nice results.
May 22 2016
On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote:Nice results.This is with one sample to test against. I need a much bigger sample, but I'm not sure how to generate/find it to give it a full run for it's money. Not to mention there's a huge memory leak while doing the tests since I'm trying not to have malloc/free in the code results as much as possible. Although with a 30%-40% boost it might be enough to be good enough, and has minimal code changes. Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t).
May 22 2016
Am Sun, 22 May 2016 09:57:20 +0000 schrieb Era Scarecrow <rtcvb32 yahoo.com>:On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote:That would be my question, too. What is the allowed alphabet? Control characters might break output of debuggers, and spaces would also look confusing. So I guess everything in the range [33..127] (95 symbols)? -- MarcoNice results.This is with one sample to test against. I need a much bigger sample, but I'm not sure how to generate/find it to give it a full run for it's money. Not to mention there's a huge memory leak while doing the tests since I'm trying not to have malloc/free in the code results as much as possible. Although with a 30%-40% boost it might be enough to be good enough, and has minimal code changes. Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t).
May 22 2016
On Sunday, 22 May 2016 at 10:12:11 UTC, Marco Leise wrote:Actually if compressed text (which uses symbols 128-155) are okay, then that gives us closer to 223 symbols to work with, not 95. Curiously the extra 95 symbols actually is just enough to keep compression up and bring speed to about 37% speed gain, with it looks like no disadvantages. Then again I still need a larger set to test things on, but it's very promising!Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t).That would be my question, too. What is the allowed alphabet? Control characters might break output of debuggers, and spaces would also look confusing. So I guess everything in the range (95 symbols)?
May 22 2016
On 5/22/2016 3:32 AM, Era Scarecrow wrote:Curiously the extra 95 symbols actually is just enough to keep compression up and bring speed to about 37% speed gain, with it looks like no disadvantages. Then again I still need a larger set to test things on, but it's very promising!You're doing the gods' work, Era!
May 22 2016
On Sunday, 22 May 2016 at 19:21:13 UTC, Walter Bright wrote:On 5/22/2016 3:32 AM, Era Scarecrow wrote:Maybe... I have to wonder if LZ4 from the looks of things will take over instead. Have to see how it performs (and if we can integrate it); Meanwhile the main advantage of id_compress (as I'm tinkering with it) is it makes no changes to the signature/behavior so taking advantage of it's boost in speed can be immediate (although it will no doubt have a bit weaker compression, and demangling needs to be re-written for it). Honestly the majority of speed gains with id_compress are simply from having a much smaller window(1023 to 220), also resulting in a smaller footprint for the compressed output (3 bytes to 2 bytes). Compare to the speed gains from my new algorithm (at 20% faster) is pretty constant, and retains a HUGE window (pre-scans 8k, has a 2k window) that it can find matches from, so it will win in compression. If LZ4 is half as good as the initial results are then I'd go with it instead.Curiously the extra 95 symbols actually is just enough to keep compression up and bring speed to about 37% speed gain, with it looks like no disadvantages.You're doing the gods' work, Era!
May 22 2016
On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote:...Well here's the rundown of some numbers. min_compress uses a tiny window, big_compress was my original algorithmn but modified to use 16k total for a window. reduced_id_compress is the original except reduced to a 220 window and 2 byte constant output. Along with the compressed outputs of each. min_compress: [TickDuration(46410084)] 0.779836 big_compress: [TickDuration(47998202)] 0.806545 orig_id_compress: [TickDuration(59519257)] baseline reduced_id_compress: [TickDuration(44033192)] 0.739894 1001 (original size) 72 testexpansion.s!(æεó▌æ╗int).så°Resulà≡Ñ╪¢╨╘ÿ¼É ↑─↑╜►╘fñv ÿ╜ ↑│↑Ä .foo() 73 testexpansion.s!(ÅæεÅó▌Åæ╗int).sÅå°ResulÅà≡ÅÑ╪Å¢╨Å╘ÿżÉ₧├ÿÄ╜É╝▓ÿëåâ.foo() 67 tes╤xpansion.s!(ÇææÇóóÇææint)∙⌡ResulÇàÅÇѺǢ»Ç╘τǼ∩ë├τü╜∩¢▓τ².foo() 78 testexpansion.s!(æ2óCæ2int).så(Resulà0ÑH¢P╘ê¼É╘íñÖ├ê┤ÿσ║ñ¬├ê¼É╘íñÖ├êÉ1å).foo() min_compress: [TickDuration(29210832)] 0.82391 big_compress: [TickDuration(31058664)] 0.87601 orig_id_compress: [TickDuration(35466130)] baseline reduced_id_compress: [TickDuration(25032532)] 0.705977 629 (original size) 52 E.s!(à·è⌡àδint).så°Resulà≡ÖΣÅ▄╝╝ö┤ líd _Ög læ◄.foo() 61 E.s!(Åà·Åè⌡Åàδint).sÅå°ResulÅà≡ÅÖΣÅÅ▄Å╝╝Åö┤₧ç∞ÄÖΣ¡ó╠ïå╗.foo() 52 E.s!(ΣÇèèΣint)∙⌡ResulÇàÅÇÖ¢ÇÅúÇ╝├Çö╦ëçôüÖ¢Æó│².foo() 52 E.s!(à&è+à&int).så(Resulà0Ö<ÅD╝döl ┤í╝ ┴Ö╣ ┤æ9.foo()
May 22 2016
On Monday, 23 May 2016 at 01:46:40 UTC, Era Scarecrow wrote:On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote:If you want, you can give the test case for issue 16039 a try. It produces 300-400MB binary, so it should be a nice test case :P...Well here's the rundown of some numbers. min_compress uses a tiny window, big_compress was my original algorithmn but modified to use 16k total for a window. reduced_id_compress is the original except reduced to a 220 window and 2 byte constant output. Along with the compressed outputs of each. min_compress: [TickDuration(46410084)] 0.779836 big_compress: [TickDuration(47998202)] 0.806545 orig_id_compress: [TickDuration(59519257)] baseline reduced_id_compress: [TickDuration(44033192)] 0.739894 1001 (original size) 72 testexpansion.s!(æεó▌æ╗int).så°Resulà≡Ñ╪¢╨╘ÿ¼É ↑─↑╜►╘fñv ÿ╜ ↑│↑Ä .foo() 73 testexpansion.s!(ÅæεÅó▌Åæ╗int).sÅå°ResulÅà≡ÅÑ╪Å¢╨Å╘ÿżÉ₧├ÿÄ╜É╝▓ÿëåâ.foo() 67 tes╤xpansion.s!(ÇææÇóóÇææint)∙⌡ResulÇàÅÇѺǢ»Ç╘τǼ∩ë├τü╜∩¢▓τ².foo() 78 testexpansion.s!(æ2óCæ2int).så(Resulà0ÑH¢P╘ê¼É╘íñÖ├ê┤ÿσ║ñ¬├ê¼É╘íñÖ├êÉ1å).foo() min_compress: [TickDuration(29210832)] 0.82391 big_compress: [TickDuration(31058664)] 0.87601 orig_id_compress: [TickDuration(35466130)] baseline reduced_id_compress: [TickDuration(25032532)] 0.705977 629 (original size) 52 E.s!(à·è⌡àδint).så°Resulà≡ÖΣÅ▄╝╝ö┤ líd _Ög læ◄.foo() 61 E.s!(Åà·Åè⌡Åàδint).sÅå°ResulÅà≡ÅÖΣÅÅ▄Å╝╝Åö┤₧ç∞ÄÖΣ¡ó╠ïå╗.foo() 52 E.s!(ΣÇèèΣint)∙⌡ResulÇàÅÇÖ¢ÇÅúÇ╝├Çö╦ëçôüÖ¢Æó│².foo() 52 E.s!(à&è+à&int).så(Resulà0Ö<ÅD╝döl ┤í╝ ┴Ö╣ ┤æ9.foo()
May 23 2016
On 5/22/2016 3:32 AM, Era Scarecrow wrote:[...]My idea for speeding things up is every time a new character c is matched in pattern[], a bit is set in mask: ulong mask; ... mask |= 1 << (c & 63); Then, when scanning for longer matches, test: if (!((1 << (id[i + matchlen - 1] & 63)) & mask)) i += matchlen; because we know that id[i + matchlen - 1] cannot match any of the characters in the current match. The two expressions look complex, but they should compile to single instructions on x64.
May 22 2016
On 5/22/2016 2:57 AM, Era Scarecrow wrote:Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t).Currently there's a bug in id_compress() where it doesn't take into account UTF-8 sequences, as a subset of them are allowed as identifiers. I rue the day I allowed that in D, but we're kinda stuck with it. I kinda hate disallowing control characters, because then the compressor gets a bit specific.
May 22 2016
On 5/22/2016 2:07 AM, Era Scarecrow wrote:On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:There's also https://github.com/dlang/dmd/pull/5799 which should make things faster.On 5/21/2016 11:26 PM, Era Scarecrow wrote:Well there's other good news. While I was working with about 2Mb for the data size in general, it can be reduced it looks like to about 8k-16k, and run entirely on the stack. It doesn't appear to gain any speed, or any speed gained is lost with better memory management. Although based on how things look, the id_compress might perform better with a max window of 255 and max match of 127. I'll give the original one a tweak based on my own findings and see if it turns out true.With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right).It is promising. Need more!
May 22 2016
On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:Just a note: large lookup tables can have cold cache performance problems.16k lookup table are the gold standard.
May 22 2016
On 5/22/2016 5:50 AM, deadalnix wrote:On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:I like 64 bit vectors because the lookup and test can be done in one instruction, and the vector cached in a register!Just a note: large lookup tables can have cold cache performance problems.16k lookup table are the gold standard.
May 22 2016
On Saturday, 21 May 2016 at 21:27:37 UTC, Era Scarecrow wrote:I assume this is related to compressing symbols thread? I mentioned possibly considering the LZO library.Maybe consider lz4 instead? Tends to be a bit faster, and it's BSD instead of GPL. https://cyan4973.github.io/lz4/ -Wyatt
May 23 2016
On Monday, 23 May 2016 at 14:47:31 UTC, Wyatt wrote:Maybe consider lz4 instead?Disregard that: I see it's come up already.
May 23 2016
On 21.05.2016 23:12, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?
May 21 2016
On 21.05.2016 23:37, Timon Gehr wrote:On 21.05.2016 23:12, Walter Bright wrote:(Those are worst case bounds, not sure what's better on average on D symbol strings.)The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?
May 21 2016
On 5/21/2016 2:37 PM, Timon Gehr wrote:Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?I don't understand the terms you use, but as to the "why" it is based on what I knew about LZ77 compression. I don't pretend to be an expert on compression, and this is what I churned out. As mentioned elsewhere, the C++ mangling scheme has a primitive and ineffective LZ77 scheme built in, so I wouldn't waste time on that. A quick google search on finding the longest match in a string did not turn up anything obvious. If you want to throw your hat (i.e. expertise) into the ring and post a faster compressor, please do so!
May 21 2016
On Saturday, 21 May 2016 at 22:07:27 UTC, Walter Bright wrote:As mentioned elsewhere, the C++ mangling scheme has a primitive and ineffective LZ77 scheme built in, so I wouldn't waste time on that.Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a link to some reasoning? The only thing I found are some statements in https://github.com/dlang/dmd/pull/5793:Note that the VC++ and g++ name mangling schemes for C++ each use a similar, but primitive (and fairly ineffective) form of compression. It's like people who invent their own crypto algorithms.The name mangling in the Itanium C++ ABI is similar to LZ77 but heavily optimized for the typical use case in symbol names. I don't really see how the general LZ77 would be better. It's almost as if using information about the actual data leads to a better compression scheme.This is what C++ does, and results are pretty poor compression. Scanning to determine the backward references would consume the same order of time, anyway.It uses a constant amount of time if implemented correctly, which is much faster.
May 21 2016
On Saturday, 21 May 2016 at 22:41:49 UTC, Guillaume Boucher wrote:It's almost as if using information about the actual data leads to a better compression scheme.Exactly. It always is. Since you know where to look for redundancy instead of blindly comparing bytes.
May 21 2016
On 5/21/2016 3:41 PM, Guillaume Boucher wrote:Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a link to some reasoning?DMC++ matches the Microsoft name mangling scheme, which includes such compression. It proved hopelessly inadequate, which is why I implemented compress.c in the first place (it was for the DMC++ compiler). (And frankly, I don't see how an ad-hoc scheme like that could hope to compare with a real compression algorithm.)
May 21 2016
On 22.05.2016 00:58, Walter Bright wrote:On 5/21/2016 3:41 PM, Guillaume Boucher wrote:You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z testexpansion YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z testexpansion YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z 2 YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z Z testexpansion YA U1?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z testexpansion YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z 2 YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z 2 YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z Z Z QAEXXZ i.e. 936 characters. I think this is due to the very bad limitation of just back referencing the first 10 types. The gcc mangling allows arbitrary back references and is closer to what some have proposed. It yields _ZZN13testexpansion1sIZNS0_IZNS0_IZNS0_IZNS0_IiEEDaT_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_EN6Result3fooEv which is only 39 characters longer than the compressed version of the D symbol. It uses a much smaller character set, though. This is my translation of the test case to C++14: namespace testexpansion { template<class T> struct S { void foo(){} }; template<class T> auto testexpansion_s(T t) { #ifdef bad struct Result { void foo(){} }; return Result(); #else return S<T>(); #endif } void xmain() { auto x = s(s(s(s(s(1))))); x.foo(); } }Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a link to some reasoning?DMC++ matches the Microsoft name mangling scheme, which includes such compression. It proved hopelessly inadequate, which is why I implemented compress.c in the first place (it was for the DMC++ compiler). (And frankly, I don't see how an ad-hoc scheme like that could hope to compare with a real compression algorithm.)
May 22 2016
On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote:You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s <snip> i.e. 936 characters. I think this is due to the very bad limitation of just back referencing the first 10 types.Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check?
May 22 2016
On 22.05.2016 19:56, Era Scarecrow wrote:On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote:? is an allowed character in VC++ and for example permits unambiguous demangling. The gcc style using only alpha numeric chanracters and '_' can also just be plain C symbols.You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s <snip> i.e. 936 characters. I think this is due to the very bad limitation of just back referencing the first 10 types.Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check?
May 22 2016
On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote:On 22.05.2016 19:56, Era Scarecrow wrote:Not quite what I asked. Is the sample provided accurate? Is having 3 ?s in a row what should be there not just spitting a ? because it doesn't know how to decode it?Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check?? is an allowed character in VC++ and for example permits unambiguous demangling. The gcc style using only alpha numeric characters and '_' can also just be plain C symbols.
May 22 2016
On 22.05.2016 20:12, Era Scarecrow wrote:On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote:The 3 consecutive ?s are also in the object file, so they are actually part of the mangled symbol. VC++ uses only ASCII characters for mangling (which D should do, too, IMO).On 22.05.2016 19:56, Era Scarecrow wrote:Not quite what I asked. Is the sample provided accurate? Is having 3 ?s in a row what should be there not just spitting a ? because it doesn't know how to decode it?Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check?? is an allowed character in VC++ and for example permits unambiguous demangling. The gcc style using only alpha numeric characters and '_' can also just be plain C symbols.
May 22 2016
Am Sun, 22 May 2016 20:22:06 +0200 schrieb Rainer Schuetze <r.sagitario gmx.de>:The 3 consecutive ?s are also in the object file, so they are actually part of the mangled symbol. VC++ uses only ASCII characters for mangling (which D should do, too, IMO).In general I agree. The extended characters may be swallowed and the representation not copyable. In the worst case we may uncover bugs in debugging software. On the other hand, base-64 adds ~17% to the symbol length compared to base-128 and the linker in particular should only be interested in a terminating \0. -- Marco
May 22 2016
On 5/22/2016 10:44 AM, Rainer Schuetze wrote:You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z testexpansion YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z testexpansion YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z 2 YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z Z testexpansion YA U1?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z testexpansion YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z 2 YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion YA H Z testexpansion YA U1?1???$s H 2 YA H Z Z 2 YA U1?1???$s UResult ?1???$s H testexpansion YA H Z 2 YA U1?1???$s H 2 YA H Z Z Z Z Z QAEXXZHaha, thus showing why people should never invent their own ad-hoc crypto nor compression algorithms :-)_ZZN13testexpansion1sIZNS0_IZNS0_IZNS0_IZNS0_IiEEDaT_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_EN6Result3fooEvNote "Result" appearing 5 times. It's still crappy compression. Ironically, this design predates the much worse VC++ one.which is only 39 characters longer than the compressed version of the D symbol. It uses a much smaller character set, though.The much smaller character set can be done, if desired, by using base64 encoding. The compressor is careful to not emit 0 bytes, as those definitely would screw up bintools, but the rest seem unperturbed by binary strings.
May 22 2016
On 22.05.2016 00:07, Walter Bright wrote:On 5/21/2016 2:37 PM, Timon Gehr wrote:E.g. the Knuth-Morris-Pratt (KMP) string search algorithm can be modified to compute longest match instead of full match (as it just efficiently builds and runs a finite state machine whose state is the length of the longest match ending at the current position). https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm Implementation: auto longestMatch(string x,string y){ if(y.length>x.length) y=y[0..x.length]; auto f=new size_t[y.length+1]; for(size_t i=0,j=f[0]=-1;i<y.length;f[++i]=++j) while(~j&&y[i]!=y[j]) j=f[j]; auto r=x[0..0]; for(size_t i=0,j=0;i<x.length&&j<y.length;++i,++j){ while(~j&&x[i]!=y[j]) j=f[j]; if(j+1>r.length) r=x[i-j..i+1]; } return r; } This returns a slice of x representing the leftmost longest match with y. Running time is O(x.length). (In practice, if this should be really fast, the allocation for 'f' should probably be shared among multiple calls.) (This probably only improves running time in case there are sufficiently many sufficiently long matches.) But this just improves longest_match. It should be possible to bring down the running time of the entire compression algorithm significantly using a suffix tree (the resulting running time bound is linear in the input size and independent of the window size).Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?I don't understand the terms you use, but as to the "why" it is based on what I knew about LZ77 compression. I don't pretend to be an expert on compression, and this is what I churned out. As mentioned elsewhere, the C++ mangling scheme has a primitive and ineffective LZ77 scheme built in, so I wouldn't waste time on that. A quick google search on finding the longest match in a string did not turn up anything obvious. ...If you want to throw your hat (i.e. expertise) into the ring and post a faster compressor, please do so!As far as I understand, the use case for this is compressing mangled names? I don't think that generating huge mangled names explicitly just to compress them afterwards is a viable strategy. This process throws away a lot of structure information during mangling that could be useful for fast and effective compression. Instead, compression should be performed while generating the mangled string. As far as I understand, mangled names only grow unmanageably large because the same symbols are mangled into them multiple times? Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string?
May 23 2016
On 5/23/2016 12:32 PM, Timon Gehr wrote:Instead, compression should be performed while generating the mangled string.The trouble with that is the mangled string is formed from component pieces. Those component pieces may have common substrings with each other, which won't be detected until they are merged, when you're back to dealing with the string as a whole.Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string?That's been covered upthread, and VC++/g++ do that. But as shown, it is inadequate. There's a lot more redundancy than just the identifiers.
May 23 2016
On 23.05.2016 22:34, Walter Bright wrote:On 5/23/2016 12:32 PM, Timon Gehr wrote:Then don't do that. I.e. re-mangle recursively from scratch for each mangled name and allow sharing parts between unrelated components within that mangled name. (That's similar to how the compiler stores the structure in memory, which also avoids the exponential blowup.)Instead, compression should be performed while generating the mangled string.The trouble with that is the mangled string is formed from component pieces.Those component pieces may have common substrings with each other, which won't be detected until they are merged, when you're back to dealing with the string as a whole.Yes, but I think that should be covered (up to constants) by what I suggested above? You can't have exponential growth before compression. Any scheme that does not prevent running time exponential in template instantiation depth is also inadequate in the long run.Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string?That's been covered upthread, and VC++/g++ do that. But as shown, it is inadequate. There's a lot more redundancy than just the identifiers.
May 23 2016
On 5/23/2016 2:17 PM, Timon Gehr wrote:Then don't do that. I.e. re-mangle recursively from scratch for each mangled name and allow sharing parts between unrelated components within that mangled name.How is that essentially different from running a compression pass? The only real problem with the compressor was the compression speed, and the LZ4 test shows this is solvable.(That's similar to how the compiler stores the structure in memory, which also avoids the exponential blowup.)No, it doesn't.You can't have exponential growth before compression.The compiler is designed so that the internal names are pretty much write-only, uniquely hashed by their address. It's not spending time running strlen()'s on them.Any scheme that does not prevent running time exponential in template instantiation depth is also inadequate in the long run.Most algorithms seem to have pathological worst cases, but as long as they are rare, it doesn't impede their utility.
May 23 2016
Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant. It is still a mystery to me why that is :) Since the generated code both smaller and works almost without spills.
May 23 2016
On 5/23/2016 4:49 PM, Stefan Koch wrote:Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant. It is still a mystery to me why that is :) Since the generated code both smaller and works almost without spills.I'm not sure why you're working on a decompressor. The speed needs to be in the compressor.
May 23 2016
On 24.05.2016 01:02, Walter Bright wrote:On 5/23/2016 2:17 PM, Timon Gehr wrote:Speed. The title of this thread is "Need a faster compressor".Then don't do that. I.e. re-mangle recursively from scratch for each mangled name and allow sharing parts between unrelated components within that mangled name.How is that essentially different from running a compression pass?The only real problem with the compressor was the compression speed, and the LZ4 test shows this is solvable. ...No. Just increase the recursion depth by a small number of levels to completely kill the speedups gained by using a faster compression algorithm.> (That's similar to how the compiler stores the structure in memory, whichYes, it does. The compiler does not use exponential space to store the AST.also avoids the exponential blowup.)No, it doesn't. ...It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings.You can't have exponential growth before compression.The compiler is designed so that the internal names are pretty much write-only, uniquely hashed by their address. It's not spending time running strlen()'s on them. ...The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?Any scheme that does not prevent running time exponential in template instantiation depth is also inadequate in the long run.Most algorithms seem to have pathological worst cases, but as long as they are rare, it doesn't impede their utility.
May 24 2016
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:On 24.05.2016 01:02, Walter Bright wrote:I completely agree! However such a thing will only work if dmd is used as a pre-linker and If we can stablize the hash. I.E. every run of the compiler generates the same hash.[...]Speed. The title of this thread is "Need a faster compressor".[...]No. Just increase the recursion depth by a small number of levels to completely kill the speedups gained by using a faster compression algorithm.[...]Yes, it does. The compiler does not use exponential space to store the AST.[...]It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings.[...]The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?
May 24 2016
On Tuesday, 24 May 2016 at 18:30:36 UTC, Stefan Koch wrote:On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:KAMOULOX!On 24.05.2016 01:02, Walter Bright wrote:I completely agree! However such a thing will only work if dmd is used as a pre-linker and If we can stablize the hash. I.E. every run of the compiler generates the same hash.[...]Speed. The title of this thread is "Need a faster compressor".[...]No. Just increase the recursion depth by a small number of levels to completely kill the speedups gained by using a faster compression algorithm.[...]Yes, it does. The compiler does not use exponential space to store the AST.[...]It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings.[...]The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?
May 24 2016
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings. The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?Since Walter is evidently still having a hard time understanding this, I've done a few more pathological cases, comparing LZ4 to Back Referencing Named Types. For BRNT I manually replaced struct names in the mangling with N<number> numeric identifiers for all but one of their appearances, to simulate what could actually be done by the compiler. No other mangling changes (e.g., for eponymous templates or hidden types) were applied. auto foo(T)(T x) { static struct Vold { T payload; } return Vold(x); } At a chain of length 10: foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo current: Generate a 57_581 byte symbol lz4 -9: Generate a 57_581 byte symbol, then compress it to 412 bytes lzma -9: Generate a 57_581 byte symbol, then compress it to 239 bytes BRNT: Generate a 332 byte symbol Upping it to twelve levels: foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo.foo.foo current: Generate a 230_489 byte symbol lz4 -9: Generate a 230_489 byte symbol, then compress it to 1128 bytes lzma -9: Generate a 230_489 byte symbol, then compress it to 294 bytes BRNT: Generate a 393 byte symbol Upping it to fourteen levels: I'm too lazy to do more manual BRNT, so beyond this point its numbers are estimated based on the previously established fact that BRNT symbols grow linearly. current: Generate a 922_127 byte symbol lz4 -9: Generate a 922_127 byte symbol, then compress it to 4012 bytes lzma -9: Generate a 922_127 byte symbol, compress it to 422 bytes BRNT: Generate a ~457 byte symbol Upping it to sixteen levels: current: Generate a 3_688_679 byte symbol lz4 -9: Generate a 3_688_679 byte symbol, then compress it to 15535 bytes lzma -9: Generate a 3_688_679 byte symbol, then compress it to 840 bytes BRNT: Generate a ~527 byte symbol I want to let that sink in: in the general case, BRNT beats even **LZMA**. As if winning the compactness war while still being a symbol linkers won't have problems with wasn't enough, this approach also avoids generating giant symbols in the first place. Compression and/or hashing cannot do that. If D really wants to, it can compress symbols, but it *needs* to fix this problem properly *first*.
May 24 2016
On 5/24/2016 12:16 PM, Anon wrote:As if winning the compactness war while still being a symbol linkers won't have problems with wasn't enough, this approach also avoids generating giant symbols in the first place. Compression and/or hashing cannot do that. If D really wants to, it can compress symbols, but it *needs* to fix this problem properly *first*.Because identifiers are generated from substrings that don't know about each other, to do your algorithm a scan of the combined string will still be necessary. In any case, you should be able to implement your algorithm as a replacement for id_compress(). How about giving it a go?
May 24 2016
Am Tue, 24 May 2016 19:16:07 +0000 schrieb Anon <anon anon.anon>:Upping it to sixteen levels: current: Generate a 3_688_679 byte symbol lz4 -9: Generate a 3_688_679 byte symbol, then compress it to 15535 bytes lzma -9: Generate a 3_688_679 byte symbol, then compress it to 840 bytes BRNT: Generate a ~527 byte symbol I want to let that sink in: in the general case, BRNT beats even **LZMA**.Let's go this route. It seems to be a scalable solution without any big memory operations. -- Marco
May 25 2016
On 5/24/2016 9:22 AM, Timon Gehr wrote:Yes, it does. The compiler does not use exponential space to store the AST.BTW, all types in dmd have a 'deco' string which is the AST linearized as a string. This string is also used to build the mangled names. All the deco's are put into a hashtable, and then all types can be uniquely identified by their address in the hashtable. This reduces type comparisons to a single pointer comparison.
May 24 2016
On Tuesday, 24 May 2016 at 20:16:32 UTC, Walter Bright wrote:On 5/24/2016 9:22 AM, Timon Gehr wrote:There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set. No exponential space to store the AST. Please, if you change the ABI, do it right the first time. Some arbitrary compression algorithm doesn't solve the problem.Yes, it does. The compiler does not use exponential space to store the AST.BTW, all types in dmd have a 'deco' string which is the AST linearized as a string. This string is also used to build the mangled names. All the deco's are put into a hashtable, and then all types can be uniquely identified by their address in the hashtable. This reduces type comparisons to a single pointer comparison.
May 24 2016
On 5/24/2016 1:27 PM, Guillaume Boucher wrote:There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set.Not totally, since you'd like to refer to types that are common across deco names.
May 24 2016
On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:On 5/24/2016 1:27 PM, Guillaume Boucher wrote:On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote: A serious question, do these templates even need a mangle ? Is anyone ever going to call them from a different module ? I don't think you could do that because those are voldemort types. Those symbols are only temporary aren't they ?There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set.Not totally, since you'd like to refer to types that are common across deco names.
May 24 2016
On 5/24/2016 2:06 PM, Stefan Koch wrote:On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:The types do exist in the compiler, and are used in template name generation, generating types, etc.On 5/24/2016 1:27 PM, Guillaume Boucher wrote:On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote: A serious question, do these templates even need a mangle ? Is anyone ever going to call them from a different module ? I don't think you could do that because those are voldemort types. Those symbols are only temporary aren't they ?There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set.Not totally, since you'd like to refer to types that are common across deco names.
May 24 2016
On Tuesday, 24 May 2016 at 21:26:58 UTC, Walter Bright wrote:On 5/24/2016 2:06 PM, Stefan Koch wrote:But would one ever have to be able to reconstruct the mangle ? templates cannot be exported nor can instanciations. Or can they ?A serious question, do these templates even need a mangle ? Is anyone ever going to call them from a different module ? I don't think you could do that because those are voldemort types. Those symbols are only temporary aren't they ?The types do exist in the compiler, and are used in template name generation, generating types, etc.
May 24 2016
On 5/24/2016 2:34 PM, Stefan Koch wrote:But would one ever have to be able to reconstruct the mangle ?Symbolic debugging, and dealing with various linker errors.templates cannot be exported nor can instanciations. Or can they ?Of course they can!
May 24 2016
On Monday, 23 May 2016 at 20:34:19 UTC, Walter Bright wrote:On 5/23/2016 12:32 PM, Timon Gehr wrote:Yes, not all redundancy is covered. However, I think all non-pathologic blow-ups are avoided. Let's look at the properties of the Itanium C++ ABI mangling: 1. Guaranteed linear size name for recursive types that have exponential size names. 2. Compression speed can be linear to the size of the compressed name (i.e. no need to create an exponential garbage name just to compress it). 3. Decompression easy, possible without tools and speed linear to size of the compressed name. 4. Name retains some redundancy (in itself and when compared to other mangled names), well suited for a follow-up compression. Compare this to apply some arbitrary compression algorithm to a name of exponential size: 1. Exponential size names remain exponential size. Yes, up until the view size, some compression algorithms guarantee that, but they can not guarantee it in the general case. 2. Compression needs the whole name first. 3. Decompression only possible with tools. 4. Multiple mangled names can not be compressed well.Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string?That's been covered upthread, and VC++/g++ do that. But as shown, it is inadequate. There's a lot more redundancy than just the identifiers.
May 24 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Is LZ4 acceptable ? If so the current C implementation is hard to beat. And maybe it's best to just copy it in.
May 21 2016
On 22/5/2016 05:51, Stefan Koch wrote:On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:Agree. I've had the best results with LZ4. We can either link to the C code, or port it to D. It's very trivial. L.The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Is LZ4 acceptable ? If so the current C implementation is hard to beat. And maybe it's best to just copy it in.
May 21 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Could you post a link to code you use for benchmarking ?
May 21 2016
On 5/21/2016 3:13 PM, Stefan Koch wrote:Could you post a link to code you use for benchmarking ?In the comments: https://github.com/dlang/dmd/pull/5793
May 21 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?A quick search led to this SO thread: https://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings It links to SMAZ: https://github.com/antirez/smaz It uses a codebook primed for english text. Since this is for name mangling, priming for Phobos might be a good idea. E.g. put "std.", "core.", "etc." into the codebook.
May 22 2016
On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote:On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: It links to SMAZ: https://github.com/antirez/smaz It uses a codebook primed for english text. Since this is for name mangling, priming for Phobos might be a good idea. E.g. put "std.", "core.", "etc." into the codebook.Curiously premaking a few primers as such for languages can give similar performance with zlib and gzip. I recall back in 2009 doing that when I had xml pages to deal with, it would compress using all my pages and find the best one and use that. Then decompression was simply a matter of finding the right CRC32 that matched it and away it went! Although, priming something similar for the current id_compress for early on wouldn't hurt. 'unittest' I see quite a bit, probably ones including common templates would help too for minor compression benefits.
May 22 2016
Am Sun, 22 May 2016 10:55:40 +0000 schrieb Era Scarecrow <rtcvb32 yahoo.com>:On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote:LZ4 can seemingly use a dictionary, too. -- MarcoOn Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: It links to SMAZ: https://github.com/antirez/smaz It uses a codebook primed for english text. Since this is for name mangling, priming for Phobos might be a good idea. E.g. put "std.", "core.", "etc." into the codebook.Curiously premaking a few primers as such for languages can give similar performance with zlib and gzip.
May 22 2016
On 5/22/16 6:40 AM, qznc wrote:On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:A primed dictionary is a great idea. -- AndreiThe current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?A quick search led to this SO thread: https://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings It links to SMAZ: https://github.com/antirez/smaz It uses a codebook primed for english text. Since this is for name mangling, priming for Phobos might be a good idea. E.g. put "std.", "core.", "etc." into the codebook.
May 22 2016
Am Sat, 21 May 2016 14:12:15 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?I can write something 2 times more compact and 16 times faster (in microbenchmarks) for the 10 times nested "testexpansion" test case from https://issues.dlang.org/show_bug.cgi?id=15831#c0 using BSD licensed lz4 compression. Does it have to be C++ code or can it be D code? -- Marco
May 22 2016
Content-Disposition: inline =E2=AC=85 Download proof of concept This is a proof of concept micro-benchmark compressing the 37_640 bytes symbol from the bug report linked above with both id_compress from the dmd backend and lz4 (without dictionary). The lz4 result is additionally transformed to 7-bits similar to base-64. Original size: 37640 bytes id_compress : 5243 ms for 10_000 iterations, 800 bytes lz4 : 71 ms for 10_000 iterations, 389 bytes That roughly corresponds to a 2x higher compression ratio at 70x faster compression speed. --=20 Marco
May 22 2016
On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote:⬅ Download proof of concept This is a proof of concept micro-benchmark compressing the 37_640 bytes symbol from the bug report linked above with both id_compress from the dmd backend and lz4 (without dictionary). The lz4 result is additionally transformed to 7-bits similar to base-64. Original size: 37640 bytes id_compress : 5243 ms for 10_000 iterations, 800 bytes lz4 : 71 ms for 10_000 iterations, 389 bytes That roughly corresponds to a 2x higher compression ratio at 70x faster compression speed.Please submit a PR.
May 22 2016
Am Sun, 22 May 2016 14:06:52 +0000 schrieb Stefan Koch <uplink.coder googlemail.com>:On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote:There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against the system liblz4.so/a or copy the C code into the backend? Is a pre-filled dictionary worthwhile? If so, which words should go in it? The latter is important because changes to the Dlang mangling need to be reflected in other projects outside our direct control, too. --=20 Marco=E2=AC=85 Download proof of concept This is a proof of concept micro-benchmark compressing the=20 37_640 bytes symbol from the bug report linked above with both=20 id_compress from the dmd backend and lz4 (without dictionary).=20 The lz4 result is additionally transformed to 7-bits similar to=20 base-64. Original size: 37640 bytes id_compress : 5243 ms for 10_000 iterations, 800 bytes lz4 : 71 ms for 10_000 iterations, 389 bytes That roughly corresponds to a 2x higher compression ratio at=20 70x faster compression speed. =20=20 Please submit a PR.
May 22 2016
On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote:There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against the system liblz4.so/a or copy the C code into the backend? Is a pre-filled dictionary worthwhile? If so, which words should go in it? The latter is important because changes to the Dlang mangling need to be reflected in other projects outside our direct control, too.Here are my answers please take them with a grain of ignorance. I do not think prefilling the dictionary will do much good since LZ is all about finding prefixs anyway. I am for copying the sourcecode into our source-tree we don't want to be dependent on LZ4 library being available on the users system. I think re-implementing it in D is also fine as long as the function is extern(C). However I agree with Martins point: It is a waste of resources re-implementing something that already has a perfectly fine implementation. The only reason why I re-implemented the LZ4 decompression was to use it at CTFE.
May 22 2016
Am Sun, 22 May 2016 17:19:38 +0000 schrieb Stefan Koch <uplink.coder googlemail.com>:Here are my answers please take them with a grain of ignorance. [=E2=80=A6]Maybe you are right about the dictionary. I haven't put much any thought into it. Anyways I was looking for input from the core devs who are affected by this.However I agree with Martins point: It is a waste of resources=20 re-implementing something that already has a perfectly fine=20 implementation.I already loosely copied and fixed up the compressor in the attached proof of concept. Thanks to the two languages' similarities it was a matter of an hour or so. If you provide the decompressor, we're set. :) --=20 Marco
May 22 2016
On Sunday, 22 May 2016 at 18:18:46 UTC, Marco Leise wrote:If you provide the decompressor, we're set. :)I am already working on a non-ctfeable version that does not use slices. And therefore should be faster. But decompression is only needed for demangling.
May 22 2016
On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote:There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against the system liblz4.so/a or copy the C code into the backend? Is a pre-filled dictionary worthwhile? If so, which words should go in it? The latter is important because changes to the Dlang mangling need to be reflected in other projects outside our direct control, too.How it's being benchmarked does make me question it a bit. I avoided doing multi-threading so my results are as they would run on 1 CPU/Core, while adding multi-threading probably isn't too hard to do (at least in D). Then there's the wonder of how many cores, and other details. A compressor designed to run in parallel will naturally be in it's own element there. This of course assumes we are allowed to in the first place; The compiler may already be using those cores for other tasks: Scanning other files, optimizing code, CTFE, etc, to which point speed gained in compression may not actually translate to an overall speedup if you are stealing CPU cycles. As for a dictionary and being primed, I'm trying to figure out what would be in it myself. Commonly used templates and keywords that appear in symbols are so far my best guess. I'd avoid library names unless they'd appear in the symbols (so 'std.core' might not mean much, while 'unittestL' does). Then again I don't have a good sample of what I'd actually encounter live, maybe every major library import would be good thing to have.
May 22 2016
On 5/22/2016 9:06 AM, Marco Leise wrote:Are there any objections against the benchmarking method or the license?BSD may cause problems.Can the implementation be in D?Not yet.If not, should we link against the system liblz4.so/aNo.or copy the C code into the backend?Yes.Is a pre-filled dictionary worthwhile?The trouble with that is the demangler would need the same dictionary, right? If so, that would be problematic.
May 22 2016
Am Sun, 22 May 2016 13:02:37 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 5/22/2016 9:06 AM, Marco Leise wrote:Can you elaborate on how copying the source overrules your above license concerns? I would create a directory "lz4" for the sources, compile them like we compile zlib now and install dmd with a backend license stating that it uses lz4 licensed under BSD?Are there any objections against the benchmarking method or the license?BSD may cause problems.Can the implementation be in D?Not yet.If not, should we link against the system liblz4.so/aNo.or copy the C code into the backend?Yes.Right, it's probably more trouble than worth it. -- MarcoIs a pre-filled dictionary worthwhile?The trouble with that is the demangler would need the same dictionary, right? If so, that would be problematic.
May 22 2016
On 5/22/2016 10:52 PM, Marco Leise wrote:Can you elaborate on how copying the source overrules your above license concerns?It does not. It just avoids having to link in a library that may or may not exist on every machine we want to create demanglers on.I would create a directory "lz4" for the sources, compile them like we compile zlib now and install dmd with a backend license stating that it uses lz4 licensed under BSD?How many lines of code are we talking about here? The post you made makes it look like just a handful - why go to all the trouble of creating more directories and build complexity?
May 22 2016
The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.
May 22 2016
Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.Ok, any volunteers? I'm not personally looking forward to reimplementing lz4 from sratch right now. As Stefan Koch said, we should be able to use the existing optimized code, like gcc uses gmp. -- Marco
May 23 2016
Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.That it right. It's pretty simple. On Monday, 23 May 2016 at 07:30:00 UTC, Marco Leise wrote:Ok, any volunteers?Well I am not a compression expert but since I am already working on optimizing the decompressor. The method for archiving perfect compression it outlined here: https://github.com/Cyan4973/lz4/issues/183
May 23 2016
On 5/23/2016 5:04 AM, Stefan Koch wrote:Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings.The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.That it right. It's pretty simple.
May 23 2016
On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings.This is only partially true. The 64k limit does not apply to the input string. It does only apply to the dictionary.It would only hit if we find 64k of identifier without repetition.
May 23 2016
On Monday, 23 May 2016 at 16:00:20 UTC, Stefan Koch wrote:On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:If you want speed: 16k (aka half of the L1D).Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings.This is only partially true. The 64k limit does not apply to the input string. It does only apply to the dictionary.It would only hit if we find 64k of identifier without repetition.
May 23 2016
On 5/23/2016 9:00 AM, Stefan Koch wrote:On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:The starting line of the compression function posted here tests for it and aborts if larger.Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings.This is only partially true. The 64k limit does not apply to the input string.
May 23 2016
On Monday, 23 May 2016 at 20:36:04 UTC, Walter Bright wrote:On 5/23/2016 9:00 AM, Stefan Koch wrote:So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect? Found something VERY interesting, I have 4 algorithms I'm benchmarking (the original and 3 edited compression routines). Most of them perform with similar speed (15-25% boost), but when I gave it a test by compressing raw D code of 20k big (for a big example), it suddenly showed an obvious winner between the 4 (performing 5x faster than the original id_compress). This gain in speed wasn't obvious with a smaller string (1k or so), so assuming I'm getting good results this is very promising. id_compress: 1379ms (1379331μs), 22595 -> 8458 bytes big_compress: 1186ms (1186608μs), 22595 -> 6784 bytes id_reduced: 626ms (626750μs), 22595 -> 9618 bytes min_compress: 262ms (262340μs), 22595 -> 9266 bytes I assume speed is more important than a little loss in compression.The 64k limit does not apply to the input string.The starting line of the compression function posted here tests for it and aborts if larger.
May 24 2016
On 5/24/2016 4:55 AM, Era Scarecrow wrote:So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect?It's currently set at 64 in the PR. But that number is just picked out of the air.I assume speed is more important than a little loss in compression.Yes, but it's a judgement call.
May 24 2016
On Tuesday, 24 May 2016 at 20:07:02 UTC, Walter Bright wrote:On 5/24/2016 4:55 AM, Era Scarecrow wrote:64 bytes? Hmm I had the impression it was going to be far larger...So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect?It's currently set at 64 in the PR. But that number is just picked out of the air.
May 24 2016
On Tuesday, 24 May 2016 at 20:25:18 UTC, Era Scarecrow wrote:64 bytes? Hmm I had the impression it was going to be far larger...Well it's been a while since I last touched this topic (or code) so I'll post it, but there's 3 formulas written. 0) compress.c - control code, we are comparing against 1) compress_reduced.c, a drop-in replacement. Shouldn't be any issue with using, although doesn't compress or as fast as it could be. 2) compress_min.d, might need some API reworking, but other than that it is real fast by my tests. 3) compress_big.d, much more memory usage, but compresses similar to compress_min, but does much better compression on large objects that are more normal text like source code. For sheer speed compress_min is the best choice. although it may not compress as well. Note: wideint.d is referenced for a compress test, you can change the script to use something else, or copy something and just fill the file in. https://github.com/rtcvb32/Side-Projects/commit/452532904ca2d43af09bced31854eadc0395f406
Oct 17 2016
Am Mon, 23 May 2016 12:04:48 +0000 schrieb Stefan Koch <uplink.coder googlemail.com>:The method for archiving perfect compression it outlined here: https://github.com/Cyan4973/lz4/issues/183Nice, if it can keep the compression speed up. -- Marco
May 23 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.cYes, LZO, LZ4, or snappy would be good choices. They are real-time compression algorithms, i.e. it's faster to compress and write to main memory than w/o compression. Please don't write anything on your own. There is a meta library for those algorithms http://blosc.org/, for which I added Deimos headers http://code.dlang.org/packages/d-blosc.
May 22 2016
On Sunday, 22 May 2016 at 12:12:09 UTC, Martin Nowak wrote:Yes, LZO, LZ4, or snappy would be good choices. They are real-time compression algorithms, i.e. it's faster to compress and write to main memory than w/o compression.Yes the LZ compressors are very good for this usecasePlease don't write anything on your own. There is a meta library for those algorithms http://blosc.org/, for which I added Deimos headers http://code.dlang.org/packages/d-blosc.Agreed. The existing implementations have been fine-tuned for performance, and we should use them as leverage.
May 22 2016
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?Mandatory goldmine on compression: http://fastcompression.blogspot.fr/
May 22 2016