digitalmars.D - Need a Faster Compressor

Walter Bright (5/5) May 21 2016 The current one is effective, but slow:

Era Scarecrow (9/10) May 21 2016 I recall being able to get a marginal boost in speed by assigning

Walter Bright (7/9) May 21 2016 Give it a try, see if it pans out.

Era Scarecrow (7/13) May 21 2016 I'm going to try for a lookback and scan which uses quite a bit

Walter Bright (2/7) May 21 2016 Just a note: large lookup tables can have cold cache performance problem...

Era Scarecrow (3/8) May 21 2016 I know, if it doesn't fit in 64k then you can't rely on the

Era Scarecrow (8/10) May 21 2016 Finished. It's blazing fast and even compresses in my test

Era Scarecrow (12/16) May 21 2016 With 1 Million iterations:

Walter Bright (2/12) May 22 2016 It is promising. Need more!

Walter Bright (5/6) May 22 2016 I did figure out an improvement to the existing algorithm whereby it cou...
Era Scarecrow (10/19) May 22 2016 Well there's other good news. While I was working with about 2Mb

Era Scarecrow (5/20) May 22 2016 Oh noes! Making the tiny change to id_compress and it's faster!!!

Era Scarecrow (4/10) May 22 2016 And shrinking the lookback to a mere 127 increases it faster but

Stefan Koch (2/6) May 22 2016 Nice results.

Era Scarecrow (12/13) May 22 2016 This is with one sample to test against. I need a much bigger

Marco Leise (8/25) May 22 2016 That would be my question, too. What is the allowed alphabet?

Era Scarecrow (8/16) May 22 2016 Actually if compressed text (which uses symbols 128-155) are

Walter Bright (2/5) May 22 2016 You're doing the gods' work, Era!

Era Scarecrow (18/23) May 22 2016 Maybe... I have to wonder if LZ4 from the looks of things will

Era Scarecrow (29/30) May 22 2016 Well here's the rundown of some numbers. min_compress uses a tiny

ZombineDev (3/32) May 23 2016 If you want, you can give the test case for issue 16039 a try. It

Walter Bright (13/14) May 22 2016 My idea for speeding things up is every time a new character c is matche...

Walter Bright (6/9) May 22 2016 Currently there's a bug in id_compress() where it doesn't take into acco...

Walter Bright (2/19) May 22 2016 There's also https://github.com/dlang/dmd/pull/5799 which should make th...

deadalnix (2/4) May 22 2016 16k lookup table are the gold standard.

Walter Bright (3/6) May 22 2016 I like 64 bit vectors because the lookup and test can be done in one

Wyatt (4/6) May 23 2016 Maybe consider lz4 instead? Tends to be a bit faster, and it's

Wyatt (2/3) May 23 2016 Disregard that: I see it's come up already.

Timon Gehr (2/7) May 21 2016 Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?

Timon Gehr (3/13) May 21 2016 (Those are worst case bounds, not sure what's better on average on D
Walter Bright (9/10) May 21 2016 I don't understand the terms you use, but as to the "why" it is based on...

Guillaume Boucher (12/22) May 21 2016 Sorry if I didn't memorize everything in this forum from the last

Stefan Koch (3/5) May 21 2016 Exactly. It always is. Since you know where to look for
Walter Bright (6/8) May 21 2016 DMC++ matches the Microsoft name mangling scheme, which includes such

Rainer Schuetze (37/46) May 22 2016 You are right about the symbols using the VC mangling. The test case

Era Scarecrow (3/8) May 22 2016 Unless ? is the proper/valid character those are far more likely

Rainer Schuetze (4/14) May 22 2016 ? is an allowed character in VC++ and for example permits unambiguous

Era Scarecrow (4/10) May 22 2016 Not quite what I asked. Is the sample provided accurate? Is

Rainer Schuetze (4/15) May 22 2016 The 3 consecutive ?s are also in the object file, so they are actually

Marco Leise (10/13) May 22 2016 In general I agree. The extended characters may be swallowed

Walter Bright (8/14) May 22 2016 Haha, thus showing why people should never invent their own ad-hoc crypt...

Timon Gehr (39/51) May 23 2016 E.g. the Knuth-Morris-Pratt (KMP) string search algorithm can be

Walter Bright (7/11) May 23 2016 The trouble with that is the mangled string is formed from component pie...

Timon Gehr (10/24) May 23 2016 Then don't do that. I.e. re-mangle recursively from scratch for each

Walter Bright (9/17) May 23 2016 How is that essentially different from running a compression pass? The o...

Stefan Koch (7/7) May 23 2016 Just a heads up on the LZ4.

Walter Bright (3/9) May 23 2016 I'm not sure why you're working on a decompressor. The speed needs to be...

Timon Gehr (10/36) May 24 2016 No. Just increase the recursion depth by a small number of levels to

Stefan Koch (5/22) May 24 2016 I completely agree!

deadalnix (2/33) May 24 2016 KAMOULOX!

Anon (54/59) May 24 2016 Since Walter is evidently still having a hard time understanding

Walter Bright (5/9) May 24 2016 Because identifiers are generated from substrings that don't know about ...
Marco Leise (6/19) May 25 2016 Let's go this route. It seems to be a scalable solution

Walter Bright (6/7) May 24 2016 BTW, all types in dmd have a 'deco' string which is the AST linearized a...

Guillaume Boucher (6/15) May 24 2016 There's no reason not to use the compression in the deco name.

Walter Bright (2/4) May 24 2016 Not totally, since you'd like to refer to types that are common across d...

Stefan Koch (6/12) May 24 2016 On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:

Walter Bright (3/16) May 24 2016 The types do exist in the compiler, and are used in template name genera...

Stefan Koch (4/13) May 24 2016 But would one ever have to be able to reconstruct the mangle ?

Walter Bright (3/6) May 24 2016 Of course they can!

Guillaume Boucher (21/29) May 24 2016 Yes, not all redundancy is covered. However, I think all

Stefan Koch (4/11) May 21 2016 Is LZ4 acceptable ?

Lionello Lunesu (4/16) May 21 2016 Agree. I've had the best results with LZ4. We can either link to the C

Stefan Koch (2/9) May 21 2016 Could you post a link to code you use for benchmarking ?

Walter Bright (3/4) May 21 2016 In the comments:

qznc (8/15) May 22 2016 A quick search led to this SO thread:

Era Scarecrow (11/17) May 22 2016 Curiously premaking a few primers as such for languages can give

Marco Leise (5/17) May 22 2016 LZ4 can seemingly use a dictionary, too.

Andrei Alexandrescu (2/18) May 22 2016 A primed dictionary is a great idea. -- Andrei

Marco Leise (10/18) May 22 2016 I can write something 2 times more compact and 16 times faster

Marco Leise (14/14) May 22 2016 Content-Disposition: inline

Stefan Koch (2/13) May 22 2016 Please submit a PR.

Marco Leise (13/31) May 22 2016 There are a few open questions, for me at least. Are there

Stefan Koch (14/23) May 22 2016 Here are my answers please take them with a grain of ignorance.

Marco Leise (11/16) May 22 2016 Maybe you are right about the dictionary. I haven't put much

Stefan Koch (5/6) May 22 2016 I am already working on a non-ctfeable version that does not use

Era Scarecrow (19/28) May 22 2016 How it's being benchmarked does make me question it a bit. I
Walter Bright (7/14) May 22 2016 Not yet.

Marco Leise (11/33) May 22 2016 Can you elaborate on how copying the source overrules your

Walter Bright (6/11) May 22 2016 It does not. It just avoids having to link in a library that may or may ...

Walter Bright (3/3) May 22 2016 The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html

Marco Leise (8/12) May 23 2016 Ok, any volunteers? I'm not personally looking forward to

Stefan Koch (8/14) May 23 2016 That it right. It's pretty simple.

Walter Bright (3/11) May 23 2016 Also, the LZ4 compressor posted here has a 64K string limit, which won't...

Stefan Koch (5/8) May 23 2016 This is only partially true.

deadalnix (2/11) May 23 2016 If you want speed: 16k (aka half of the L1D).
Walter Bright (3/9) May 23 2016 The starting line of the compression function posted here tests for it a...

Era Scarecrow (18/22) May 24 2016 So Walter, big question: How big before the compressor kicks in?

Walter Bright (3/6) May 24 2016 Yes, but it's a judgement call.

Era Scarecrow (3/9) May 24 2016 64 bytes? Hmm I had the impression it was going to be far

Era Scarecrow (18/20) Oct 17 2016 Well it's been a while since I last touched this topic (or code)

Marco Leise (5/7) May 23 2016 Nice, if it can keep the compression speed up.

Martin Nowak (7/10) May 22 2016 Yes, LZO, LZ4, or snappy would be good choices. They are

Stefan Koch (4/10) May 22 2016 Agreed. The existing implementations have been fine-tuned for

deadalnix (3/10) May 22 2016 Mandatory goldmine on compression:

Walter Bright <newshound2 digitalmars.com> writes:

The current one is effective, but slow:

   https://github.com/dlang/dmd/blob/master/src/backend/compress.c

Anyone want to make a stab at making it faster? Changing the format is fair 
game, as well as down and dirty assembler if that's what it takes.

So really, how good are you at fast code?

May 21 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 So really, how good are you at fast code?

I recall being able to get a marginal boost in speed by assigning 
the first 1-2 bytes in preassigned buckets (256 & 65536 
respectively), so you have an immediate 1-2 byte matches 
(skipping the chafe so to speak). Unfortunately memory-wise it's 
highly inefficient and requires 2 passes over the data, and 
probably doesn't play well with cache.

I assume this is related to compressing symbols thread? I 
mentioned possibly considering the LZO library.

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 2:27 PM, Era Scarecrow wrote:
 I assume this is related to compressing symbols thread?

Yes.


 I mentioned possibly considering the LZO library.

Give it a try, see if it pans out.

Other possibilities are the lookback dictionary is fixed at 1023 characters. 
Experimenting and benchmarking can indicate if it should be more or less. 
Perhaps 512 will yield just as good compression on real world identifier
strings.

Or try another scheme entirely, like huffman coding.

May 21 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Saturday, 21 May 2016 at 22:01:40 UTC, Walter Bright wrote:
 Give it a try, see if it pans out.

 Other possibilities are the lookback dictionary is fixed at 
 1023 characters. Experimenting and benchmarking can indicate if 
 it should be more or less. Perhaps 512 will yield just as good 
 compression on real world identifier strings.

 Or try another scheme entirely, like huffman coding.

  I'm going to try for a lookback and scan which uses quite a bit 
of memory. On the other hand you can reuse the memory over and 
over again, so that isn't an issue. Although it will take up a 
Meg in memory. Also written entirely in D, so...

  I think I'll go with a similar setup you have for the encoding 
(3 bytes). Should be nearly identical in output.

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 5:52 PM, Era Scarecrow wrote:
  I'm going to try for a lookback and scan which uses quite a bit of memory. On
 the other hand you can reuse the memory over and over again, so that isn't an
 issue. Although it will take up a Meg in memory. Also written entirely in D,
so...

  I think I'll go with a similar setup you have for the encoding (3 bytes).
 Should be nearly identical in output.

Just a note: large lookup tables can have cold cache performance problems.

May 21 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:
 On 5/21/2016 5:52 PM, Era Scarecrow wrote:
  I'm going to try for a lookback and scan which uses quite a 
 bit of memory.

 Just a note: large lookup tables can have cold cache 
 performance problems.

  I know, if it doesn't fit in 64k then you can't rely on the 
cache...

May 21 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 01:46:23 UTC, Era Scarecrow wrote:
  I know, if it doesn't fit in 64k then you can't rely on the 
 cache...

  Finished. It's blazing fast and even compresses in my test 
example as good or better than gzip/zlib (although maybe that's 
because it's headerless).

  Now I just need to stress test it against some really large 
input and check for failures. Regardless it has a compression and 
decompression functions, along with benchmarking it vs the 
compression algorithm I'm trying to beat.

May 21 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote:
  Now I just need to stress test it against some really large 
 input and check for failures. Regardless it has a compression 
 and decompression functions, along with benchmarking it vs the 
 compression algorithm I'm trying to beat.

With 1 Million iterations:

new_compress: TickDuration(311404604)
id_compress   TickDuration(385806589)

  Approx 20% increase in speed (if i'm reading and did this right).

  I won't take this as the final verdict, but it is promising at 
least... Need a larger test sample to benchmark both functions 
against.

  Sample data is 
"E.s!(E.s!(E.s!(E.s!(E.s!(int).s(int).Result).s(E.s!(int)." etc 
from the bug listing: 
https://issues.dlang.org/show_bug.cgi?id=15831#c4

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 11:26 PM, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote:
  Now I just need to stress test it against some really large input and check
 for failures. Regardless it has a compression and decompression functions,
 along with benchmarking it vs the compression algorithm I'm trying to beat.

 With 1 Million iterations:

 new_compress: TickDuration(311404604)
 id_compress   TickDuration(385806589)

  Approx 20% increase in speed (if i'm reading and did this right).

  I won't take this as the final verdict, but it is promising at least... Need a
 larger test sample to benchmark both functions against.

It is promising. Need more!

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 1:50 AM, Walter Bright wrote:
 It is promising. Need more!

I did figure out an improvement to the existing algorithm whereby it could skip 
forward by the length of the existing match rather than test every character 
(when it's looking for a longer match). But I have no clue how much of a
speedup 
that would be.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:
 On 5/21/2016 11:26 PM, Era Scarecrow wrote:
 With 1 Million iterations:

 new_compress: TickDuration(311404604)
 id_compress   TickDuration(385806589)

  Approx 20% increase in speed (if i'm reading and did this 
 right).

 It is promising. Need more!

  Well there's other good news. While I was working with about 2Mb 
for the data size in general, it can be reduced it looks like to 
about 8k-16k, and run entirely on the stack. It doesn't appear to 
gain any speed, or any speed gained is lost with better memory 
management.

  Although based on how things look, the id_compress might perform 
better with a max window of 255 and max match of 127. I'll give 
the original one a tweak based on my own findings and see if it 
turns out true.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 09:07:07 UTC, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:
 On 5/21/2016 11:26 PM, Era Scarecrow wrote:
 With 1 Million iterations:

 new_compress: TickDuration(311404604)
 id_compress   TickDuration(385806589)

  Approx 20% increase in speed (if i'm reading and did this 
 right).

 It is promising. Need more!

 Although based on how things look, the id_compress might 
 perform better with a max window of 255 and max match of 127. 
 I'll give the original one a tweak based on my own findings and 
 see if it turns out true.

Oh noes! Making the tiny change to id_compress and it's faster!!!
Now i can't take over the world!!!

new_compress:        TickDuration(306321939)  21% faster
modified idcompress: TickDuration(269275629)  31% faster

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 09:19:54 UTC, Era Scarecrow wrote:
 On 5/21/2016 11:26 PM, Era Scarecrow wrote:
 With 1 Million iterations:

         id_compress: TickDuration(385806589)  
 (original/baseline)

  modified id_compress: TickDuration(269275629)  31% faster

And shrinking the lookback to a mere 127 increases it faster but 
compression starts taking a hit. Can't shrink it anymore anyways.

modified id_compress2: TickDuration(230528074)  41% faster!!

May 22 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 22 May 2016 at 09:43:29 UTC, Era Scarecrow wrote:
 And shrinking the lookback to a mere 127 increases it faster 
 but compression starts taking a hit. Can't shrink it anymore 
 anyways.

 modified id_compress2: TickDuration(230528074)  41% faster!!

Nice results.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote:
 Nice results.

  This is with one sample to test against. I need a much bigger 
sample, but I'm not sure how to generate/find it to give it a 
full run for it's money. Not to mention there's a huge memory 
leak while doing the tests since I'm trying not to have 
malloc/free in the code results as much as possible.

  Although with a 30%-40% boost it might be enough to be good 
enough, and has minimal code changes.


  Question Walter (or Andrei): Would it be terrible to have lower 
characters as part of the identifier string? I'm referring to 
characters under 32 (control characters, especially the \r, \n 
and \t).

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 09:57:20 +0000
schrieb Era Scarecrow <rtcvb32 yahoo.com>:

 On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote:
 Nice results.  

 
   This is with one sample to test against. I need a much bigger 
 sample, but I'm not sure how to generate/find it to give it a 
 full run for it's money. Not to mention there's a huge memory 
 leak while doing the tests since I'm trying not to have 
 malloc/free in the code results as much as possible.
 
   Although with a 30%-40% boost it might be enough to be good 
 enough, and has minimal code changes.
 
 
   Question Walter (or Andrei): Would it be terrible to have lower 
 characters as part of the identifier string? I'm referring to 
 characters under 32 (control characters, especially the \r, \n 
 and \t).

That would be my question, too. What is the allowed alphabet?
Control characters might break output of debuggers, and spaces
would also look confusing. So I guess everything in the range
[33..127] (95 symbols)?

-- 
Marco

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 10:12:11 UTC, Marco Leise wrote:
 Question Walter (or Andrei): Would it be terrible to have 
 lower characters as part of the identifier string? I'm 
 referring to characters under 32 (control characters, 
 especially the \r, \n and \t).

 That would be my question, too. What is the allowed alphabet? 
 Control characters might break output of debuggers, and spaces 
 would also look confusing. So I guess everything in the range  
 (95 symbols)?

  Actually if compressed text (which uses symbols 128-155) are 
okay, then that gives us closer to 223 symbols to work with, not 
95.

  Curiously the extra 95 symbols actually is just enough to keep 
compression up and bring speed to about 37% speed gain, with it 
looks like no disadvantages. Then again I still need a larger set 
to test things on, but it's very promising!

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 3:32 AM, Era Scarecrow wrote:
  Curiously the extra 95 symbols actually is just enough to keep compression up
 and bring speed to about 37% speed gain, with it looks like no disadvantages.
 Then again I still need a larger set to test things on, but it's very
promising!

You're doing the gods' work, Era!

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 19:21:13 UTC, Walter Bright wrote:
 On 5/22/2016 3:32 AM, Era Scarecrow wrote:
 Curiously the extra 95 symbols actually is just enough to keep 
 compression up and bring speed to about 37% speed gain, with 
 it looks like no disadvantages.

 You're doing the gods' work, Era!

  Maybe... I have to wonder if LZ4 from the looks of things will 
take over instead. Have to see how it performs (and if we can 
integrate it); Meanwhile the main advantage of id_compress (as 
I'm tinkering with it) is it makes no changes to the 
signature/behavior so taking advantage of it's boost in speed can 
be immediate (although it will no doubt have a bit weaker 
compression, and demangling needs to be re-written for it).

  Honestly the majority of speed gains with id_compress are simply 
from having a much smaller window(1023 to 220), also resulting in 
a smaller footprint for the compressed output (3 bytes to 2 
bytes).

  Compare to the speed gains from my new algorithm (at 20% faster) 
is pretty constant, and retains a HUGE window (pre-scans 8k, has 
a 2k window) that it can find matches from, so it will win in 
compression.

  If LZ4 is half as good as the initial results are then I'd go 
with it instead.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote:
 ...

Well here's the rundown of some numbers. min_compress uses a tiny 
window, big_compress was my original algorithmn but modified to 
use 16k total for a window. reduced_id_compress is the original 
except reduced to a 220 window and 2 byte constant output. Along 
with the compressed outputs of each.

min_compress:           [TickDuration(46410084)]  0.779836
big_compress:           [TickDuration(47998202)]  0.806545
orig_id_compress:       [TickDuration(59519257)]  baseline
reduced_id_compress:    [TickDuration(44033192)]  0.739894
1001 (original size)

72 
testexpansion.s!(æεó▌æ╗int).så°Resulà≡Ñ╪¢╨╘ÿ¼É ↑─↑╜►╘fñv�
�ÿ╜ ↑│↑Ä 
.foo()
73 
testexpansion.s!(ÅæεÅó▌Åæ╗int).sÅå°ResulÅà≡ÅÑ╪Å¢╨Å╘ÿÅ¼É₧├ÿÄ╜É╝▓ÿëåâ.foo()
67 
tes╤xpansion.s!(ÇææÇóóÇææint)∙⌡ResulÇàÅÇÑºÇ¢»Ç╘τÇ¼∩ë├τü╜∩¢▓τ².foo()
78 
testexpansion.s!(æ2óCæ2int).så(Resulà0ÑH¢P╘ê¼É╘íñÖ├ê┤ÿσ║ñ¬├ê¼É╘íñÖ├êÉ1å).foo()

min_compress:           [TickDuration(29210832)]  0.82391
big_compress:           [TickDuration(31058664)]  0.87601
orig_id_compress:       [TickDuration(35466130)]  baseline
reduced_id_compress:    [TickDuration(25032532)]  0.705977
629  (original size)

52 E.s!(à·è⌡àδint).så°Resulà≡ÖΣÅ▄╝╝ö┤ líd _Ög læ◄.foo()
61 E.s!(Åà·Åè⌡Åàδint).sÅå°ResulÅà≡ÅÖΣÅÅ▄Å╝╝Åö┤₧ç∞ÄÖΣ¡ó╠ïå╗.foo()
52 E.s!(ΣÇèèΣint)∙⌡ResulÇàÅÇÖ¢ÇÅúÇ╝├Çö╦ëçôüÖ¢Æó│².foo()
52 E.s!(à&è+à&int).så(Resulà0Ö<ÅD╝döl ┤í╝ ┴Ö╣ ┤æ9.foo()

May 22 2016

ZombineDev <petar.p.kirov gmail.com> writes:

On Monday, 23 May 2016 at 01:46:40 UTC, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote:
 ...

 Well here's the rundown of some numbers. min_compress uses a 
 tiny window, big_compress was my original algorithmn but 
 modified to use 16k total for a window. reduced_id_compress is 
 the original except reduced to a 220 window and 2 byte constant 
 output. Along with the compressed outputs of each.

 min_compress:           [TickDuration(46410084)]  0.779836
 big_compress:           [TickDuration(47998202)]  0.806545
 orig_id_compress:       [TickDuration(59519257)]  baseline
 reduced_id_compress:    [TickDuration(44033192)]  0.739894
 1001 (original size)

 72 
 testexpansion.s!(æεó▌æ╗int).så°Resulà≡Ñ╪¢╨╘ÿ¼É ↑─↑╜►╘fñv�
�ÿ╜ ↑│↑Ä .foo()
 73 
 testexpansion.s!(ÅæεÅó▌Åæ╗int).sÅå°ResulÅà≡ÅÑ╪Å¢╨Å╘ÿÅ¼É₧├ÿÄ╜É╝▓ÿëåâ.foo()
 67 
 tes╤xpansion.s!(ÇææÇóóÇææint)∙⌡ResulÇàÅÇÑºÇ¢»Ç╘τÇ¼∩ë├τü╜∩¢▓τ².foo()
 78 
 testexpansion.s!(æ2óCæ2int).så(Resulà0ÑH¢P╘ê¼É╘íñÖ├ê┤ÿσ║ñ¬├ê¼É╘íñÖ├êÉ1å).foo()

 min_compress:           [TickDuration(29210832)]  0.82391
 big_compress:           [TickDuration(31058664)]  0.87601
 orig_id_compress:       [TickDuration(35466130)]  baseline
 reduced_id_compress:    [TickDuration(25032532)]  0.705977
 629  (original size)

 52 E.s!(à·è⌡àδint).så°Resulà≡ÖΣÅ▄╝╝ö┤ líd _Ög læ◄.foo()
 61 E.s!(Åà·Åè⌡Åàδint).sÅå°ResulÅà≡ÅÖΣÅÅ▄Å╝╝Åö┤₧ç∞ÄÖΣ¡ó╠ïå╗.foo()
 52 E.s!(ΣÇèèΣint)∙⌡ResulÇàÅÇÖ¢ÇÅúÇ╝├Çö╦ëçôüÖ¢Æó│².foo()
 52 E.s!(à&è+à&int).så(Resulà0Ö<ÅD╝döl ┤í╝ ┴Ö╣ ┤æ9.foo()

If you want, you can give the test case for issue 16039 a try. It 
produces 300-400MB binary, so it should be a nice test case :P

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 3:32 AM, Era Scarecrow wrote:
 [...]

My idea for speeding things up is every time a new character c is matched in 
pattern[], a bit is set in mask:

     ulong mask;
     ...
     mask |= 1 << (c & 63);

Then, when scanning for longer matches, test:

     if (!((1 << (id[i + matchlen - 1] & 63)) & mask))
	i += matchlen;

because we know that id[i + matchlen - 1] cannot match any of the characters in 
the current match.

The two expressions look complex, but they should compile to single
instructions 
on x64.

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 2:57 AM, Era Scarecrow wrote:
  Question Walter (or Andrei): Would it be terrible to have lower characters as
 part of the identifier string? I'm referring to characters under 32 (control
 characters, especially the \r, \n and \t).

Currently there's a bug in id_compress() where it doesn't take into account 
UTF-8 sequences, as a subset of them are allowed as identifiers. I rue the day
I 
allowed that in D, but we're kinda stuck with it.

I kinda hate disallowing control characters, because then the compressor gets a 
bit specific.

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 2:07 AM, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote:
 On 5/21/2016 11:26 PM, Era Scarecrow wrote:
 With 1 Million iterations:

 new_compress: TickDuration(311404604)
 id_compress   TickDuration(385806589)

  Approx 20% increase in speed (if i'm reading and did this right).

 It is promising. Need more!

  Well there's other good news. While I was working with about 2Mb for the data
 size in general, it can be reduced it looks like to about 8k-16k, and run
 entirely on the stack. It doesn't appear to gain any speed, or any speed gained
 is lost with better memory management.

  Although based on how things look, the id_compress might perform better with a
 max window of 255 and max match of 127. I'll give the original one a tweak
based
 on my own findings and see if it turns out true.

There's also https://github.com/dlang/dmd/pull/5799 which should make things
faster.

May 22 2016

deadalnix <deadalnix gmail.com> writes:

On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:
 Just a note: large lookup tables can have cold cache 
 performance problems.

16k lookup table are the gold standard.

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 5:50 AM, deadalnix wrote:
 On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote:
 Just a note: large lookup tables can have cold cache performance problems.

 16k lookup table are the gold standard.

I like 64 bit vectors because the lookup and test can be done in one 
instruction, and the vector cached in a register!

May 22 2016

Wyatt <wyatt.epp gmail.com> writes:

On Saturday, 21 May 2016 at 21:27:37 UTC, Era Scarecrow wrote:
 I assume this is related to compressing symbols thread? I 
 mentioned possibly considering the LZO library.

Maybe consider lz4 instead?  Tends to be a bit faster, and it's 
BSD instead of GPL. https://cyan4973.github.io/lz4/

-Wyatt

May 23 2016

Wyatt <wyatt.epp gmail.com> writes:

On Monday, 23 May 2016 at 14:47:31 UTC, Wyatt wrote:
 Maybe consider lz4 instead?

Disregard that: I see it's come up already.

May 23 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 21.05.2016 23:12, Walter Bright wrote:
 The current one is effective, but slow:

    https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the format is
 fair game, as well as down and dirty assembler if that's what it takes.

 So really, how good are you at fast code?

Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?

May 21 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 21.05.2016 23:37, Timon Gehr wrote:
 On 21.05.2016 23:12, Walter Bright wrote:
 The current one is effective, but slow:

    https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the format is
 fair game, as well as down and dirty assembler if that's what it takes.

 So really, how good are you at fast code?

 Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?

(Those are worst case bounds, not sure what's better on average on D 
symbol strings.)

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 2:37 PM, Timon Gehr wrote:
 Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?

I don't understand the terms you use, but as to the "why" it is based on what I 
knew about LZ77 compression. I don't pretend to be an expert on compression,
and 
this is what I churned out. As mentioned elsewhere, the C++ mangling scheme has 
a primitive and ineffective LZ77 scheme built in, so I wouldn't waste time on
that.

A quick google search on finding the longest match in a string did not turn up 
anything obvious.

If you want to throw your hat (i.e. expertise) into the ring and post a faster 
compressor, please do so!

May 21 2016

Guillaume Boucher <guillaume.boucher.d gmail.com> writes:

On Saturday, 21 May 2016 at 22:07:27 UTC, Walter Bright wrote:
 As mentioned elsewhere, the C++ mangling scheme has a primitive 
 and ineffective LZ77 scheme built in, so I wouldn't waste time 
 on that.

Sorry if I didn't memorize everything in this forum from the last 
20 years, can you give a link to some reasoning?

The only thing I found are some statements in 
https://github.com/dlang/dmd/pull/5793:

 Note that the VC++ and g++ name mangling schemes for C++ each 
 use a similar, but primitive (and fairly ineffective) form of 
 compression. It's like people who invent their own crypto 
 algorithms.

The name mangling in the Itanium C++ ABI is similar to LZ77 but 
heavily optimized for the typical use case in symbol names.  I 
don't really see how the general LZ77 would be better.  It's 
almost as if using information about the actual data leads to a 
better compression scheme.

 This is what C++ does, and results are pretty poor compression. 
 Scanning to determine the backward references would consume the 
 same order of time, anyway.

It uses a constant amount of time if implemented correctly, which 
is much faster.

May 21 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Saturday, 21 May 2016 at 22:41:49 UTC, Guillaume Boucher wrote:
  It's almost as if using information about the actual data 
 leads to a better compression scheme.

Exactly. It always is. Since you know where to look for 
redundancy instead of blindly comparing bytes.

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 3:41 PM, Guillaume Boucher wrote:
 Sorry if I didn't memorize everything in this forum from the last 20 years, can
 you give a link to some reasoning?

DMC++ matches the Microsoft name mangling scheme, which includes such 
compression. It proved hopelessly inadequate, which is why I implemented 
compress.c in the first place (it was for the DMC++ compiler).

(And frankly, I don't see how an ad-hoc scheme like that could hope to compare 
with a real compression algorithm.)

May 21 2016

Rainer Schuetze <r.sagitario gmx.de> writes:

On 22.05.2016 00:58, Walter Bright wrote:
 On 5/21/2016 3:41 PM, Guillaume Boucher wrote:
 Sorry if I didn't memorize everything in this forum from the last 20
 years, can
 you give a link to some reasoning?

 DMC++ matches the Microsoft name mangling scheme, which includes such
 compression. It proved hopelessly inadequate, which is why I implemented
 compress.c in the first place (it was for the DMC++ compiler).

 (And frankly, I don't see how an ad-hoc scheme like that could hope to
 compare with a real compression algorithm.)

You are right about the symbols using the VC mangling. The test case 
"1.s.s.s.s.s" in the bugreport translated to C++ yields

?foo Result ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  2 YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  2 YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  2 YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  Z  Z QAEXXZ

i.e. 936 characters. I think this is due to the very bad limitation of 
just back referencing the first 10 types.

The gcc mangling allows arbitrary back references and is closer to what 
some have proposed. It yields

_ZZN13testexpansion1sIZNS0_IZNS0_IZNS0_IZNS0_IiEEDaT_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_EN6Result3fooEv

which is only 39 characters longer than the compressed version of the D 
symbol. It uses a much smaller character set, though.

This is my translation of the test case to C++14:

namespace testexpansion {

template<class T>
struct S
{
     void foo(){}
};

template<class T>
auto testexpansion_s(T t)
{
#ifdef bad
     struct Result
	{
		void foo(){}
	};
	return Result();
#else
     return S<T>();
#endif
}

void xmain()
{
     auto x = s(s(s(s(s(1)))));
     x.foo();
}

}

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote:
 You are right about the symbols using the VC mangling. The test 
 case "1.s.s.s.s.s" in the bugreport translated to C++ yields

 ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s <snip>

 i.e. 936 characters. I think this is due to the very bad 
 limitation of just back referencing the first 10 types.

  Unless ? is the proper/valid character those are far more likely 
to be failed UTF-8 decoding. Care to double check?

May 22 2016

Rainer Schuetze <r.sagitario gmx.de> writes:

On 22.05.2016 19:56, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote:
 You are right about the symbols using the VC mangling. The test case
 "1.s.s.s.s.s" in the bugreport translated to C++ yields

 ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s <snip>

 i.e. 936 characters. I think this is due to the very bad limitation of
 just back referencing the first 10 types.

  Unless ? is the proper/valid character those are far more likely to be
 failed UTF-8 decoding. Care to double check?

? is an allowed character in VC++ and for example permits unambiguous 
demangling. The gcc style using only alpha numeric chanracters and '_' 
can also just be plain C symbols.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote:
 On 22.05.2016 19:56, Era Scarecrow wrote:
  Unless ? is the proper/valid character those are far more 
 likely to be failed UTF-8 decoding. Care to double check?

 ? is an allowed character in VC++ and for example permits 
 unambiguous demangling. The gcc style using only alpha numeric 
 characters and '_' can also just be plain C symbols.

  Not quite what I asked. Is the sample provided accurate? Is 
having 3 ?s in a row what should be there not just spitting a ? 
because it doesn't know how to decode it?

May 22 2016

Rainer Schuetze <r.sagitario gmx.de> writes:

On 22.05.2016 20:12, Era Scarecrow wrote:
 On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote:
 On 22.05.2016 19:56, Era Scarecrow wrote:
  Unless ? is the proper/valid character those are far more likely to
 be failed UTF-8 decoding. Care to double check?

 ? is an allowed character in VC++ and for example permits unambiguous
 demangling. The gcc style using only alpha numeric characters and '_'
 can also just be plain C symbols.

  Not quite what I asked. Is the sample provided accurate? Is having 3 ?s
 in a row what should be there not just spitting a ? because it doesn't
 know how to decode it?

The 3 consecutive ?s are also in the object file, so they are actually 
part of the mangled symbol. VC++ uses only ASCII characters for mangling 
(which D should do, too, IMO).

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 20:22:06 +0200
schrieb Rainer Schuetze <r.sagitario gmx.de>:

 The 3 consecutive ?s are also in the object file, so they are actually 
 part of the mangled symbol. VC++ uses only ASCII characters for mangling 
 (which D should do, too, IMO).

In general I agree. The extended characters may be swallowed
and the representation not copyable. In the worst case we may
uncover bugs in debugging software. On the other hand, base-64
adds ~17% to the symbol length compared to base-128 and the
linker in particular should only be interested in a
terminating \0.

-- 
Marco

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 10:44 AM, Rainer Schuetze wrote:
 You are right about the symbols using the VC mangling. The test case
 "1.s.s.s.s.s" in the bugreport translated to C++ yields

 ?foo Result ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  2 YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  testexpansion  YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  2 YA U1?1???$s UResult ?1???$s UResult ?1???$s H testexpansion  YA H Z  testexpansion  YA U1?1???$s H 2 YA H Z  Z  2 YA U1?1???$s UResult ?1???$s H testexpansion  YA H Z  2 YA U1?1???$s H 2 YA H Z  Z  Z  Z  Z QAEXXZ

Haha, thus showing why people should never invent their own ad-hoc crypto nor 
compression algorithms :-)


 _ZZN13testexpansion1sIZNS0_IZNS0_IZNS0_IZNS0_IiEEDaT_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_EN6Result3fooEv

Note "Result" appearing 5 times. It's still crappy compression. Ironically,
this 
design predates the much worse VC++ one.


 which is only 39 characters longer than the compressed version of the D symbol.
 It uses a much smaller character set, though.

The much smaller character set can be done, if desired, by using base64 
encoding. The compressor is careful to not emit 0 bytes, as those definitely 
would screw up bintools, but the rest seem unperturbed by binary strings.

May 22 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 22.05.2016 00:07, Walter Bright wrote:
 On 5/21/2016 2:37 PM, Timon Gehr wrote:
 Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?

 I don't understand the terms you use, but as to the "why" it is based on
 what I knew about LZ77 compression.  I don't pretend to be an expert on
 compression, and this is what I churned out. As mentioned elsewhere, the
 C++ mangling scheme has a primitive and ineffective LZ77 scheme built
 in, so I wouldn't waste time on that.

 A quick google search on finding the longest match in a string did not
 turn up anything obvious.
 ...

E.g. the Knuth-Morris-Pratt (KMP) string search algorithm can be 
modified to compute longest match instead of full match (as it just 
efficiently builds and runs a finite state machine whose state is the 
length of the longest match ending at the current position).

https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Implementation:

auto longestMatch(string x,string y){
     if(y.length>x.length) y=y[0..x.length];
     auto f=new size_t[y.length+1];
     for(size_t i=0,j=f[0]=-1;i<y.length;f[++i]=++j)
         while(~j&&y[i]!=y[j]) j=f[j];
     auto r=x[0..0];
     for(size_t i=0,j=0;i<x.length&&j<y.length;++i,++j){
         while(~j&&x[i]!=y[j]) j=f[j];
         if(j+1>r.length) r=x[i-j..i+1];
     }
     return r;
}

This returns a slice of x representing the leftmost longest match with 
y. Running time is O(x.length). (In practice, if this should be really 
fast, the allocation for 'f' should probably be shared among multiple 
calls.)

(This probably only improves running time in case there are sufficiently 
many sufficiently long matches.)

But this just improves longest_match. It should be possible to bring 
down the running time of the entire compression algorithm significantly 
using a suffix tree (the resulting running time bound is linear in the 
input size and independent of the window size).


 If you want to throw your hat (i.e. expertise) into the ring and post a
 faster compressor, please do so!

As far as I understand, the use case for this is compressing mangled 
names? I don't think that generating huge mangled names explicitly just 
to compress them afterwards is a viable strategy. This process throws 
away a lot of structure information during mangling that could be useful 
for fast and effective compression.

Instead, compression should be performed while generating the mangled 
string. As far as I understand, mangled names only grow unmanageably 
large because the same symbols are mangled into them multiple times? 
Isn't it enough to create references to previously embedded mangled 
symbols (i-th symbol already mangled) while generating the mangled string?

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/23/2016 12:32 PM, Timon Gehr wrote:
 Instead, compression should be performed while generating the mangled string.

The trouble with that is the mangled string is formed from component pieces. 
Those component pieces may have common substrings with each other, which won't 
be detected until they are merged, when you're back to dealing with the string 
as a whole.


 Isn't it enough to create
 references to previously embedded mangled symbols (i-th symbol already mangled)
 while generating the mangled string?

That's been covered upthread, and VC++/g++ do that. But as shown, it is 
inadequate. There's a lot more redundancy than just the identifiers.

May 23 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 23.05.2016 22:34, Walter Bright wrote:
 On 5/23/2016 12:32 PM, Timon Gehr wrote:
 Instead, compression should be performed while generating the mangled
 string.

 The trouble with that is the mangled string is formed from component
 pieces.

Then don't do that. I.e. re-mangle recursively from scratch for each 
mangled name and allow sharing parts between unrelated components within 
that mangled name. (That's similar to how the compiler stores the 
structure in memory, which also avoids the exponential blowup.)

 Those component pieces may have common substrings with each
 other, which won't be detected until they are merged,
 when you're back to dealing with the string as a whole.


 Isn't it enough to create
 references to previously embedded mangled symbols (i-th symbol already
 mangled)
 while generating the mangled string?

 That's been covered upthread, and VC++/g++ do that. But as shown, it is
 inadequate. There's a lot more redundancy than just the identifiers.

Yes, but I think that should be covered (up to constants) by what I 
suggested above?

You can't have exponential growth before compression. Any scheme that 
does not prevent running time exponential in template instantiation 
depth is also inadequate in the long run.

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/23/2016 2:17 PM, Timon Gehr wrote:
 Then don't do that. I.e. re-mangle recursively from scratch for each mangled
 name and allow sharing parts between unrelated components within that mangled
 name.

How is that essentially different from running a compression pass? The only
real 
problem with the compressor was the compression speed, and the LZ4 test shows 
this is solvable.


 (That's similar to how the compiler stores the structure in memory, which
 also avoids the exponential blowup.)

No, it doesn't.


 You can't have exponential growth before compression.

The compiler is designed so that the internal names are pretty much write-only, 
uniquely hashed by their address. It's not spending time running strlen()'s on
them.


 Any scheme that does not prevent running time exponential in template
instantiation depth is also
 inadequate in the long run.

Most algorithms seem to have pathological worst cases, but as long as they are 
rare, it doesn't impede their utility.

May 23 2016

Stefan Koch <uplink.coder googlemail.com> writes:

Just a heads up on the LZ4.
I have spent roughly 3 hours optimizing my decompresser.
And while I had stunning success, a speed-up of about 400%.
I am still about 600x slower then the C variant.

It is still a mystery to me why that is :)
Since the generated code both smaller and works almost without 
spills.

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/23/2016 4:49 PM, Stefan Koch wrote:
 Just a heads up on the LZ4.
 I have spent roughly 3 hours optimizing my decompresser.
 And while I had stunning success, a speed-up of about 400%.
 I am still about 600x slower then the C variant.

 It is still a mystery to me why that is :)
 Since the generated code both smaller and works almost without spills.


I'm not sure why you're working on a decompressor. The speed needs to be in the 
compressor.

May 23 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 24.05.2016 01:02, Walter Bright wrote:
 On 5/23/2016 2:17 PM, Timon Gehr wrote:
 Then don't do that. I.e. re-mangle recursively from scratch for each
 mangled
 name and allow sharing parts between unrelated components within that
 mangled
 name.

 How is that essentially different from running a compression pass?

Speed. The title of this thread is "Need a faster compressor".

 The
 only real problem with the compressor was the compression speed, and the
 LZ4 test shows this is solvable.
 ...

No. Just increase the recursion depth by a small number of levels to 
completely kill the speedups gained by using a faster compression algorithm.

  > (That's similar to how the compiler stores the structure in memory,
 which
 also avoids the exponential blowup.)

 No, it doesn't.
 ...

Yes, it does. The compiler does not use exponential space to store the AST.

 You can't have exponential growth before compression.

 The compiler is designed so that the internal names are pretty much
 write-only, uniquely hashed by their address. It's not spending time
 running strlen()'s on them.
 ...

It's _exponential_ growth. We don't even want to spend the time and 
memory required to generate the strings.

 Any scheme that does not prevent running time exponential in template
 instantiation depth is also
 inadequate in the long run.

 Most algorithms seem to have pathological worst cases, but as long as
 they are rare, it doesn't impede their utility.

The reason we have this discussion is that the worst case isn't rare 
enough to make this argument. Why compress in the first place if mangled 
names don't grow large in practice?

May 24 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
 On 24.05.2016 01:02, Walter Bright wrote:
 [...]

 Speed. The title of this thread is "Need a faster compressor".

 [...]

 No. Just increase the recursion depth by a small number of 
 levels to completely kill the speedups gained by using a faster 
 compression algorithm.

 [...]

 Yes, it does. The compiler does not use exponential space to 
 store the AST.

 [...]

 It's _exponential_ growth. We don't even want to spend the time 
 and memory required to generate the strings.

[...]

 The reason we have this discussion is that the worst case isn't 
 rare enough to make this argument. Why compress in the first 
 place if mangled names don't grow large in practice?

I completely agree!
However such a thing will only work if dmd is used as a 
pre-linker and If we can stablize the hash.
I.E. every run of the compiler generates the same hash.

May 24 2016

deadalnix <deadalnix gmail.com> writes:

On Tuesday, 24 May 2016 at 18:30:36 UTC, Stefan Koch wrote:
 On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
 On 24.05.2016 01:02, Walter Bright wrote:
 [...]

 Speed. The title of this thread is "Need a faster compressor".

 [...]

 No. Just increase the recursion depth by a small number of 
 levels to completely kill the speedups gained by using a 
 faster compression algorithm.

 [...]

 Yes, it does. The compiler does not use exponential space to 
 store the AST.

 [...]

 It's _exponential_ growth. We don't even want to spend the 
 time and memory required to generate the strings.

[...]

 The reason we have this discussion is that the worst case 
 isn't rare enough to make this argument. Why compress in the 
 first place if mangled names don't grow large in practice?

 I completely agree!
 However such a thing will only work if dmd is used as a 
 pre-linker and If we can stablize the hash.
 I.E. every run of the compiler generates the same hash.

KAMOULOX!

May 24 2016

Anon <anon anon.anon> writes:

On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
 It's _exponential_ growth. We don't even want to spend the time 
 and memory required to generate the strings.

 The reason we have this discussion is that the worst case isn't 
 rare enough to make this argument. Why compress in the first 
 place if mangled names don't grow large in practice?

Since Walter is evidently still having a hard time understanding 
this, I've done a few more pathological cases, comparing LZ4 to 
Back Referencing Named Types.

For BRNT I manually replaced struct names in the mangling with 
N<number> numeric identifiers for all but one of their 
appearances, to simulate what could actually be done by the 
compiler. No other mangling changes (e.g., for eponymous 
templates or hidden types) were applied.

auto foo(T)(T x)
{
     static struct Vold { T payload; }
     return Vold(x);
}


At a chain of length 10:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo

current: Generate a 57_581 byte symbol
lz4 -9: Generate a 57_581 byte symbol, then compress it to 412 
bytes
lzma -9: Generate a 57_581 byte symbol, then compress it to 239 
bytes
BRNT: Generate a 332 byte symbol


Upping it to twelve levels:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo.foo.foo

current: Generate a 230_489 byte symbol
lz4 -9: Generate a 230_489 byte symbol, then compress it to 1128 
bytes
lzma -9: Generate a 230_489 byte symbol, then compress it to 294 
bytes
BRNT: Generate a 393 byte symbol

Upping it to fourteen levels:

I'm too lazy to do more manual BRNT, so beyond this point its 
numbers are estimated based on the previously established fact 
that BRNT symbols grow linearly.

current: Generate a 922_127 byte symbol
lz4 -9: Generate a 922_127 byte symbol, then compress it to 4012 
bytes
lzma -9: Generate a 922_127 byte symbol, compress it to 422 bytes
BRNT: Generate a ~457 byte symbol

Upping it to sixteen levels:

current: Generate a 3_688_679 byte symbol
lz4 -9: Generate a 3_688_679 byte symbol, then compress it to 
15535 bytes
lzma -9: Generate a 3_688_679 byte symbol, then compress it to 
840 bytes
BRNT: Generate a ~527 byte symbol



I want to let that sink in: in the general case, BRNT beats even 
**LZMA**.



As if winning the compactness war while still being a symbol 
linkers won't have problems with wasn't enough, this approach 
also avoids generating giant symbols in the first place. 
Compression and/or hashing cannot do that. If D really wants to, 
it can compress symbols, but it *needs* to fix this problem 
properly *first*.

May 24 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 12:16 PM, Anon wrote:
 As if winning the compactness war while still being a symbol linkers won't have
 problems with wasn't enough, this approach also avoids generating giant symbols
 in the first place. Compression and/or hashing cannot do that. If D really
wants
 to, it can compress symbols, but it *needs* to fix this problem properly
*first*.

Because identifiers are generated from substrings that don't know about each 
other, to do your algorithm a scan of the combined string will still be
necessary.

In any case, you should be able to implement your algorithm as a replacement
for 
id_compress(). How about giving it a go?

May 24 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 24 May 2016 19:16:07 +0000
schrieb Anon <anon anon.anon>:

 Upping it to sixteen levels:
 
 current: Generate a 3_688_679 byte symbol
 lz4 -9: Generate a 3_688_679 byte symbol, then compress it to 
 15535 bytes
 lzma -9: Generate a 3_688_679 byte symbol, then compress it to 
 840 bytes
 BRNT: Generate a ~527 byte symbol
 
 
 
 I want to let that sink in: in the general case, BRNT beats even 
 **LZMA**.

Let's go this route. It seems to be a scalable solution
without any big memory operations.

-- 
Marco

May 25 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 9:22 AM, Timon Gehr wrote:
 Yes, it does. The compiler does not use exponential space to store the AST.

BTW, all types in dmd have a 'deco' string which is the AST linearized as a 
string. This string is also used to build the mangled names.

All the deco's are put into a hashtable, and then all types can be uniquely 
identified by their address in the hashtable. This reduces type comparisons to
a 
single pointer comparison.

May 24 2016

Guillaume Boucher <guillaume.boucher.d gmail.com> writes:

On Tuesday, 24 May 2016 at 20:16:32 UTC, Walter Bright wrote:
 On 5/24/2016 9:22 AM, Timon Gehr wrote:
 Yes, it does. The compiler does not use exponential space to 
 store the AST.

 BTW, all types in dmd have a 'deco' string which is the AST 
 linearized as a string. This string is also used to build the 
 mangled names.

 All the deco's are put into a hashtable, and then all types can 
 be uniquely identified by their address in the hashtable. This 
 reduces type comparisons to a single pointer comparison.

There's no reason not to use the compression in the deco name.  
Just make sure the references are relative and you're set.  No 
exponential space to store the AST.

Please, if you change the ABI, do it right the first time.  Some 
arbitrary compression algorithm doesn't solve the problem.

May 24 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 1:27 PM, Guillaume Boucher wrote:
 There's no reason not to use the compression in the deco name.  Just make sure
 the references are relative and you're set.

Not totally, since you'd like to refer to types that are common across deco
names.

May 24 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:
 On 5/24/2016 1:27 PM, Guillaume Boucher wrote:
 There's no reason not to use the compression in the deco name.
  Just make sure
 the references are relative and you're set.

 Not totally, since you'd like to refer to types that are common 
 across deco names.

On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:

A serious question, do these templates even need a mangle ?
Is anyone ever going to call them from a different module ?
I don't think you could do that because those are voldemort types.

Those symbols are only temporary aren't they ?

May 24 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 2:06 PM, Stefan Koch wrote:
 On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:
 On 5/24/2016 1:27 PM, Guillaume Boucher wrote:
 There's no reason not to use the compression in the deco name.
  Just make sure
 the references are relative and you're set.

 Not totally, since you'd like to refer to types that are common across deco
 names.

 On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote:

 A serious question, do these templates even need a mangle ?
 Is anyone ever going to call them from a different module ?
 I don't think you could do that because those are voldemort types.

 Those symbols are only temporary aren't they ?

The types do exist in the compiler, and are used in template name generation, 
generating types, etc.

May 24 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 24 May 2016 at 21:26:58 UTC, Walter Bright wrote:
 On 5/24/2016 2:06 PM, Stefan Koch wrote:
 A serious question, do these templates even need a mangle ?
 Is anyone ever going to call them from a different module ?
 I don't think you could do that because those are voldemort 
 types.

 Those symbols are only temporary aren't they ?

 The types do exist in the compiler, and are used in template 
 name generation, generating types, etc.

But would one ever have to be able to reconstruct the mangle ?
templates cannot be exported nor can instanciations.
Or can they ?

May 24 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 2:34 PM, Stefan Koch wrote:
 But would one ever have to be able to reconstruct the mangle ?

Symbolic debugging, and dealing with various linker errors.

 templates cannot be exported nor can instanciations.
 Or can they ?

Of course they can!

May 24 2016

Guillaume Boucher <guillaume.boucher.d gmail.com> writes:

On Monday, 23 May 2016 at 20:34:19 UTC, Walter Bright wrote:
 On 5/23/2016 12:32 PM, Timon Gehr wrote:
 Isn't it enough to create
 references to previously embedded mangled symbols (i-th symbol 
 already mangled)
 while generating the mangled string?

 That's been covered upthread, and VC++/g++ do that. But as 
 shown, it is inadequate. There's a lot more redundancy than 
 just the identifiers.

Yes, not all redundancy is covered.  However, I think all 
non-pathologic blow-ups are avoided.

Let's look at the properties of the Itanium C++ ABI mangling:

1. Guaranteed linear size name for recursive types that have 
exponential size names.
2. Compression speed can be linear to the size of the compressed 
name (i.e. no need to create an exponential garbage name just to 
compress it).
3. Decompression easy, possible without tools and speed linear to 
size of the compressed name.
4. Name retains some redundancy (in itself and when compared to 
other mangled names), well suited for a follow-up compression.

Compare this to apply some arbitrary compression algorithm to a 
name of exponential size:

1. Exponential size names remain exponential size.  Yes, up until 
the view size, some compression algorithms guarantee that, but 
they can not guarantee it in the general case.
2. Compression needs the whole name first.
3. Decompression only possible with tools.
4. Multiple mangled names can not be compressed well.

May 24 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

   
 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the 
 format is fair game, as well as down and dirty assembler if 
 that's what it takes.

 So really, how good are you at fast code?

Is LZ4 acceptable ?
If so the current C implementation is hard to beat.
And maybe it's best to just copy it in.

May 21 2016

Lionello Lunesu <lionello lunesu.remove.com> writes:

On 22/5/2016 05:51, Stefan Koch wrote:
 On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the format is
 fair game, as well as down and dirty assembler if that's what it takes.

 So really, how good are you at fast code?

 Is LZ4 acceptable ?
 If so the current C implementation is hard to beat.
 And maybe it's best to just copy it in.

Agree. I've had the best results with LZ4. We can either link to the C 
code, or port it to D. It's very trivial.

L.

May 21 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

   
 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the 
 format is fair game, as well as down and dirty assembler if 
 that's what it takes.

 So really, how good are you at fast code?

Could you post a link to code you use for benchmarking ?

May 21 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/21/2016 3:13 PM, Stefan Koch wrote:
 Could you post a link to code you use for benchmarking ?

In the comments:

https://github.com/dlang/dmd/pull/5793

May 21 2016

qznc <qznc web.de> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

   
 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the 
 format is fair game, as well as down and dirty assembler if 
 that's what it takes.

 So really, how good are you at fast code?

A quick search led to this SO thread:

https://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings

It links to SMAZ:

https://github.com/antirez/smaz

It uses a codebook primed for english text. Since this is for 
name mangling, priming for Phobos might be a good idea. E.g. put 
"std.", "core.", "etc." into the codebook.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote:
 On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 It links to SMAZ:

 https://github.com/antirez/smaz

 It uses a codebook primed for english text. Since this is for 
 name mangling, priming for Phobos might be a good idea. E.g. 
 put "std.", "core.", "etc." into the codebook.

  Curiously premaking a few primers as such for languages can give 
similar performance with zlib and gzip. I recall back in 2009 
doing that when I had xml pages to deal with, it would compress 
using all my pages and find the best one and use that. Then 
decompression was simply a matter of finding the right CRC32 that 
matched it and away it went!

  Although, priming something similar for the current id_compress 
for early on wouldn't hurt. 'unittest' I see quite a bit, 
probably ones including common templates would help too for minor 
compression benefits.

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 10:55:40 +0000
schrieb Era Scarecrow <rtcvb32 yahoo.com>:

 On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote:
 On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 It links to SMAZ:

 https://github.com/antirez/smaz

 It uses a codebook primed for english text. Since this is for 
 name mangling, priming for Phobos might be a good idea. E.g. 
 put "std.", "core.", "etc." into the codebook.  

 
   Curiously premaking a few primers as such for languages can give 
 similar performance with zlib and gzip.

LZ4 can seemingly use a dictionary, too.

-- 
Marco

May 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/22/16 6:40 AM, qznc wrote:
 On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the format is
 fair game, as well as down and dirty assembler if that's what it takes.

 So really, how good are you at fast code?

 A quick search led to this SO thread:

 https://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings


 It links to SMAZ:

 https://github.com/antirez/smaz

 It uses a codebook primed for english text. Since this is for name
 mangling, priming for Phobos might be a good idea. E.g. put "std.",
 "core.", "etc." into the codebook.

A primed dictionary is a great idea. -- Andrei

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 21 May 2016 14:12:15 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 The current one is effective, but slow:
 
    https://github.com/dlang/dmd/blob/master/src/backend/compress.c
 
 Anyone want to make a stab at making it faster? Changing the format is fair 
 game, as well as down and dirty assembler if that's what it takes.
 
 So really, how good are you at fast code?

I can write something 2 times more compact and 16 times faster
(in microbenchmarks) for the 10 times nested "testexpansion"
test case from
https://issues.dlang.org/show_bug.cgi?id=15831#c0
using BSD licensed lz4 compression.
Does it have to be C++ code or can it be D code?

-- 
Marco

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Content-Disposition: inline

=E2=AC=85 Download proof of concept

This is a proof of concept micro-benchmark compressing the
37_640 bytes symbol from the bug report linked above with both
id_compress from the dmd backend and lz4 (without dictionary).
The lz4 result is additionally transformed to 7-bits similar
to base-64.

Original size: 37640 bytes

id_compress : 5243 ms for 10_000 iterations, 800 bytes
lz4         :   71 ms for 10_000 iterations, 389 bytes

That roughly corresponds to a 2x higher compression ratio at
70x faster compression speed.

--=20
Marco

May 22 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote:
 ⬅ Download proof of concept

 This is a proof of concept micro-benchmark compressing the 
 37_640 bytes symbol from the bug report linked above with both 
 id_compress from the dmd backend and lz4 (without dictionary). 
 The lz4 result is additionally transformed to 7-bits similar to 
 base-64.

 Original size: 37640 bytes

 id_compress : 5243 ms for 10_000 iterations, 800 bytes
 lz4         :   71 ms for 10_000 iterations, 389 bytes

 That roughly corresponds to a 2x higher compression ratio at 
 70x faster compression speed.

Please submit a PR.

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 14:06:52 +0000
schrieb Stefan Koch <uplink.coder googlemail.com>:

 On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote:
 =E2=AC=85 Download proof of concept

 This is a proof of concept micro-benchmark compressing the=20
 37_640 bytes symbol from the bug report linked above with both=20
 id_compress from the dmd backend and lz4 (without dictionary).=20
 The lz4 result is additionally transformed to 7-bits similar to=20
 base-64.

 Original size: 37640 bytes

 id_compress : 5243 ms for 10_000 iterations, 800 bytes
 lz4         :   71 ms for 10_000 iterations, 389 bytes

 That roughly corresponds to a 2x higher compression ratio at=20
 70x faster compression speed. =20

=20
 Please submit a PR.

There are a few open questions, for me at least. Are there
other proposals with good properties? Are there any objections
against the benchmarking method or the license? Can the
implementation be in D? If not, should we link against the
system liblz4.so/a or copy the C code into the backend? Is a
pre-filled dictionary worthwhile? If so, which words should go
in it? The latter is important because changes to the Dlang
mangling need to be reflected in other projects outside our
direct control, too.

--=20
Marco

May 22 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote:

 There are a few open questions, for me at least. Are there 
 other proposals with good properties? Are there any objections 
 against the benchmarking method or the license? Can the 
 implementation be in D? If not, should we link against the 
 system liblz4.so/a or copy the C code into the backend? Is a 
 pre-filled dictionary worthwhile? If so, which words should go 
 in it? The latter is important because changes to the Dlang 
 mangling need to be reflected in other projects outside our 
 direct control, too.

Here are my answers please take them with a grain of ignorance.

I do not think prefilling the dictionary will do much good since 
LZ is all about finding prefixs anyway.

I am for copying the sourcecode into our source-tree we don't 
want to be dependent on LZ4 library being available on the users 
system.
I think re-implementing it in D is also fine as long as the 
function is extern(C).
However I agree with Martins point: It is a waste of resources 
re-implementing something that already has a perfectly fine 
implementation.

The only reason why I re-implemented the LZ4 decompression was to 
use it at CTFE.

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 17:19:38 +0000
schrieb Stefan Koch <uplink.coder googlemail.com>:

 Here are my answers please take them with a grain of ignorance.
 [=E2=80=A6]

Maybe you are right about the dictionary. I haven't put much
any thought into it. Anyways I was looking for input from the
core devs who are affected by this.

 However I agree with Martins point: It is a waste of resources=20
 re-implementing something that already has a perfectly fine=20
 implementation.

I already loosely copied and fixed up the compressor in the
attached proof of concept. Thanks to the two languages'
similarities it was a matter of an hour or so. If you provide
the decompressor, we're set. :)

--=20
Marco

May 22 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 22 May 2016 at 18:18:46 UTC, Marco Leise wrote:
 If you provide the decompressor, we're set. :)

I am already working on a non-ctfeable version that does not use 
slices.
And therefore should be faster.
But decompression is only needed for demangling.

May 22 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote:
 There are a few open questions, for me at least. Are there 
 other proposals with good properties? Are there any objections 
 against the benchmarking method or the license? Can the 
 implementation be in D? If not, should we link against the 
 system liblz4.so/a or copy the C code into the backend? Is a 
 pre-filled dictionary worthwhile? If so, which words should go 
 in it? The latter is important because changes to the Dlang 
 mangling need to be reflected in other projects outside our 
 direct control, too.

How it's being benchmarked does make me question it a bit. I 
avoided doing multi-threading so my results are as they would run 
on 1 CPU/Core, while adding multi-threading probably isn't too 
hard to do (at least in D). Then there's the wonder of how many 
cores, and other details. A compressor designed to run in 
parallel will naturally be in it's own element there. This of 
course assumes we are allowed to in the first place; The compiler 
may already be using those cores for other tasks: Scanning other 
files, optimizing code, CTFE, etc, to which point speed gained in 
compression may not actually translate to an overall speedup if 
you are stealing CPU cycles.

  As for a dictionary and being primed, I'm trying to figure out 
what would be in it myself. Commonly used templates and keywords 
that appear in symbols are so far my best guess. I'd avoid 
library names unless they'd appear in the symbols (so 'std.core' 
might not mean much, while 'unittestL' does). Then again I don't 
have a good sample of what I'd actually encounter live, maybe 
every major library import would be good thing to have.

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 9:06 AM, Marco Leise wrote:
 Are there any objections
 against the benchmarking method or the license?

BSD may cause problems.

 Can the implementation be in D?

Not yet.

 If not, should we link against the
 system liblz4.so/a

No.

 or copy the C code into the backend?

Yes.

 Is a pre-filled dictionary worthwhile?

The trouble with that is the demangler would need the same dictionary, right?
If 
so, that would be problematic.

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 13:02:37 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 5/22/2016 9:06 AM, Marco Leise wrote:
 Are there any objections
 against the benchmarking method or the license?  

 
 BSD may cause problems.
 
 Can the implementation be in D?  

 
 Not yet.
 
 If not, should we link against the
 system liblz4.so/a  

 
 No.
 
 or copy the C code into the backend?  

 
 Yes.

Can you elaborate on how copying the source overrules your
above license concerns?
I would create a directory "lz4" for the sources, compile
them like we compile zlib now and install dmd with a backend
license stating that it uses lz4 licensed under BSD?

 Is a pre-filled dictionary worthwhile?  

 
 The trouble with that is the demangler would need the same dictionary, right?
If 
 so, that would be problematic.

 
Right, it's probably more trouble than worth it.

-- 
Marco

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/22/2016 10:52 PM, Marco Leise wrote:
 Can you elaborate on how copying the source overrules your
 above license concerns?

It does not. It just avoids having to link in a library that may or may not 
exist on every machine we want to create demanglers on.

 I would create a directory "lz4" for the sources, compile
 them like we compile zlib now and install dmd with a backend
 license stating that it uses lz4 licensed under BSD?

How many lines of code are we talking about here? The post you made makes it 
look like just a handful - why go to all the trouble of creating more 
directories and build complexity?

May 22 2016

Walter Bright <newshound2 digitalmars.com> writes:

The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html

It doesn't look too difficult. If we implement our own LZ4 compressor based on 
that, from scratch, we can boost license it.

May 22 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 22 May 2016 23:42:33 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html
 
 It doesn't look too difficult. If we implement our own LZ4 compressor based on 
 that, from scratch, we can boost license it.

Ok, any volunteers? I'm not personally looking forward to
reimplementing lz4 from sratch right now. As Stefan Koch said,
we should be able to use the existing optimized code, like gcc
uses gmp.

-- 
Marco

May 23 2016

Stefan Koch <uplink.coder googlemail.com> writes:

Am Sun, 22 May 2016 23:42:33 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:
 The file format: 
 http://cyan4973.github.io/lz4/lz4_Block_format.html
 
 It doesn't look too difficult. If we implement our own LZ4 
 compressor based on that, from scratch, we can boost license it.

That it right. It's pretty simple.
On Monday, 23 May 2016 at 07:30:00 UTC, Marco Leise wrote:
 Ok, any volunteers?

Well I am not a compression expert but since I am already working 
on optimizing the decompressor.

The method for archiving perfect compression it outlined here:
https://github.com/Cyan4973/lz4/issues/183

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/23/2016 5:04 AM, Stefan Koch wrote:
 Am Sun, 22 May 2016 23:42:33 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:
 The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html

 It doesn't look too difficult. If we implement our own LZ4 compressor based on
 that, from scratch, we can boost license it.

 That it right. It's pretty simple.


Also, the LZ4 compressor posted here has a 64K string limit, which won't work 
for D because there are reported 8Mb identifier strings.

May 23 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:
 Also, the LZ4 compressor posted here has a 64K string limit, 
 which won't work for D because there are reported 8Mb 
 identifier strings.

This is only partially true.
The 64k limit does not apply to the input string. It does only 
apply to the dictionary.It would only hit if we find 64k of 
identifier without repetition.

May 23 2016

deadalnix <deadalnix gmail.com> writes:

On Monday, 23 May 2016 at 16:00:20 UTC, Stefan Koch wrote:
 On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:
 Also, the LZ4 compressor posted here has a 64K string limit, 
 which won't work for D because there are reported 8Mb 
 identifier strings.

 This is only partially true.
 The 64k limit does not apply to the input string. It does only 
 apply to the dictionary.It would only hit if we find 64k of 
 identifier without repetition.

If you want speed: 16k (aka half of the L1D).

May 23 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/23/2016 9:00 AM, Stefan Koch wrote:
 On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:
 Also, the LZ4 compressor posted here has a 64K string limit, which won't work
 for D because there are reported 8Mb identifier strings.

 This is only partially true.
 The 64k limit does not apply to the input string.

The starting line of the compression function posted here tests for it and 
aborts if larger.

May 23 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Monday, 23 May 2016 at 20:36:04 UTC, Walter Bright wrote:
 On 5/23/2016 9:00 AM, Stefan Koch wrote:
 The 64k limit does not apply to the input string.

 The starting line of the compression function posted here tests 
 for it and aborts if larger.

  So Walter, big question: How big before the compressor kicks in? 
What's the minimum size of the input string the compressor should 
expect?

  Found something VERY interesting, I have 4 algorithms I'm 
benchmarking (the original and 3 edited compression routines). 
Most of them perform with similar speed (15-25% boost), but when 
I gave it a test by compressing raw D code of 20k big (for a big 
example), it suddenly showed an obvious winner between the 4 
(performing 5x faster than the original id_compress). This gain 
in speed wasn't obvious with a smaller string (1k or so), so 
assuming I'm getting good results this is very promising.

  id_compress: 1379ms (1379331μs), 22595	-> 8458 bytes
big_compress: 1186ms (1186608μs), 22595	-> 6784 bytes
   id_reduced:  626ms  (626750μs), 22595	-> 9618 bytes
min_compress:  262ms  (262340μs), 22595 -> 9266 bytes

  I assume speed is more important than a little loss in 
compression.

May 24 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2016 4:55 AM, Era Scarecrow wrote:
  So Walter, big question: How big before the compressor kicks in? What's the
 minimum size of the input string the compressor should expect?

It's currently set at 64 in the PR. But that number is just picked out of the
air.


  I assume speed is more important than a little loss in compression.

Yes, but it's a judgement call.

May 24 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 24 May 2016 at 20:07:02 UTC, Walter Bright wrote:
 On 5/24/2016 4:55 AM, Era Scarecrow wrote:
 So Walter, big question: How big before the compressor kicks 
 in? What's the minimum size of the input string the compressor 
 should expect?

 It's currently set at 64 in the PR. But that number is just 
 picked out of the air.

  64 bytes? Hmm I had the impression it was going to be far 
larger...

May 24 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 24 May 2016 at 20:25:18 UTC, Era Scarecrow wrote:
  64 bytes? Hmm I had the impression it was going to be far 
 larger...

  Well it's been a while since I last touched this topic (or code) 
so I'll post it, but there's 3 formulas written.


0) compress.c - control code, we are comparing against
1) compress_reduced.c, a drop-in replacement. Shouldn't be any 
issue with using, although doesn't compress or as fast as it 
could be.
2) compress_min.d, might need some API reworking, but other than 
that it is real fast by my tests.
3) compress_big.d, much more memory usage, but compresses similar 
to compress_min, but does much better compression on large 
objects that are more normal text like source code.

  For sheer speed compress_min is the best choice. although it may 
not compress as well.

Note: wideint.d is referenced for a compress test, you can change 
the script to use something else, or copy something and just fill 
the file in.

https://github.com/rtcvb32/Side-Projects/commit/452532904ca2d43af09bced31854eadc0395f406

Oct 17 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 23 May 2016 12:04:48 +0000
schrieb Stefan Koch <uplink.coder googlemail.com>:

 The method for archiving perfect compression it outlined here:
 https://github.com/Cyan4973/lz4/issues/183

Nice, if it can keep the compression speed up.

-- 
Marco

May 23 2016

Martin Nowak <code dawg.eu> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

   
 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

Yes, LZO, LZ4, or snappy would be good choices. They are 
real-time compression algorithms, i.e. it's faster to compress 
and write to main memory than w/o compression.
Please don't write anything on your own. There is a meta library 
for those algorithms http://blosc.org/, for which I added Deimos 
headers http://code.dlang.org/packages/d-blosc.

May 22 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 22 May 2016 at 12:12:09 UTC, Martin Nowak wrote:

 Yes, LZO, LZ4, or snappy would be good choices. They are 
 real-time compression algorithms, i.e. it's faster to compress 
 and write to main memory than w/o compression.

Yes the LZ compressors are very good for this usecase

 Please don't write anything on your own. There is a meta 
 library for those algorithms http://blosc.org/, for which I 
 added Deimos headers http://code.dlang.org/packages/d-blosc.

Agreed. The existing implementations have been fine-tuned for 
performance, and we should use them as leverage.

May 22 2016

deadalnix <deadalnix gmail.com> writes:

On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote:
 The current one is effective, but slow:

   
 https://github.com/dlang/dmd/blob/master/src/backend/compress.c

 Anyone want to make a stab at making it faster? Changing the 
 format is fair game, as well as down and dirty assembler if 
 that's what it takes.

 So really, how good are you at fast code?

Mandatory goldmine on compression:

http://fastcompression.blogspot.fr/

May 22 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Need a Faster Compressor