digitalmars.D.learn - Make shared static this() encoding table compilable
- zhad3 (24/24) Mar 14 2022 Hey everyone, I am in need of some help. I have written this
- Basile B. (4/20) Mar 14 2022 That's a compiler bug of type "ICE", the compiler crashes.
- zhad3 (5/8) Mar 16 2022 Thank you and sorry for the late reply, I have been quite busy.
- bauss (13/37) Mar 14 2022 I think it's a memory issue and it's unlikely to be solved.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (13/14) Mar 14 2022 I could not reproduce the issue but it takes close to 1 minute for 'dmd
- =?UTF-8?Q?Ali_=c3=87ehreli?= (47/53) Mar 14 2022 Yes, better but not much: 37 seconds vs. 50+ seconds on my system.
- zhad3 (24/35) Mar 16 2022 Thank you for this. This works although I could not reach the
- zhad3 (5/17) Mar 16 2022 Thank you, that's unfortunate if true. I don't know if the
- rikki cattermole (5/5) Mar 14 2022 The recommended solution by Unicode is to use Trie tables for Look Up
- zhad3 (4/9) Mar 16 2022 Thank your for this hint. I'll have to check whether this is
- Salih Dincer (68/77) Mar 14 2022 OMG, I gasp at my computer screen and waited for minutes. :)
- zhad3 (4/12) Mar 16 2022 Thank you, but wouldn't using the DMD backend make it so that it
- Salih Dincer (3/21) Mar 16 2022 No need for all files. Just aarray.d is enough.
- Patrick Schluter (6/30) Mar 17 2022 Why not use a simple static array (not an associative array).
- Patrick Schluter (46/83) Mar 17 2022 Something akin to
- Patrick Schluter (5/15) Mar 17 2022 Takes 165 ms to compile with dmd 2.094.2 -O on [godbolt] with the
- Patrick Schluter (4/20) Mar 17 2022 Upps, remove the ] at the end of the link to [godbolt]
Hey everyone, I am in need of some help. I have written this Windows CP949 encoding table https://github.com/zhad3/zencoding/blob/main/windows949/source/zencodin /windows949/table.d which is used to convert CP949 to UTF-16. After some research about how to initialize immutable associative arrays people suggested using `shared static this()`. So far this worked for me, but I recently discovered that DMD cannot compile this in release mode with optimizations. `dub build --build=release` or `dmd` with `-release -O` fails: ``` code windows949 function zencoding.windows949.fromWindows949!(immutable(ubyte)[]).fromWindows949 code table function zencoding.windows949.table._sharedStaticCtor_L29_C1 dmd failed with exit code -11. ``` I usually compile my projects using LDC where this works fine, but I don't want to force others to use LDC because of this one problem. Hence I'd like to ask on how to change the code so that it compiles on DMD in release mode (with optimizations). I thought about having a computational algorithm instead of an encoding table but sadly I could not find any references in that regard. Apparently encoding tables seem to be the standard.
Mar 14 2022
On Monday, 14 March 2022 at 09:40:00 UTC, zhad3 wrote:Hey everyone, I am in need of some help. I have written this Windows CP949 encoding table https://github.com/zhad3/zencoding/blob/main/windows949/source/zencodin /windows949/table.d which is used to convert CP949 to UTF-16. After some research about how to initialize immutable associative arrays people suggested using `shared static this()`. So far this worked for me, but I recently discovered that DMD cannot compile this in release mode with optimizations. `dub build --build=release` or `dmd` with `-release -O` fails: ``` code windows949 function zencoding.windows949.fromWindows949!(immutable(ubyte)[]).fromWindows949 code table function zencoding.windows949.table._sharedStaticCtor_L29_C1 dmd failed with exit code -11. ```That's a compiler bug of type "ICE", the compiler crashes. Try reducing to a simple module that does not use phobos and report to bugzilla.
Mar 14 2022
On Monday, 14 March 2022 at 10:07:52 UTC, Basile B. wrote:That's a compiler bug of type "ICE", the compiler crashes. Try reducing to a simple module that does not use phobos and report to bugzilla.Thank you and sorry for the late reply, I have been quite busy. You can test this already with just the `table.d` file: `dmd -release -O table.d` fails on my computer with `DMD64 D Compiler v2.098.1`.
Mar 16 2022
On Monday, 14 March 2022 at 09:40:00 UTC, zhad3 wrote:Hey everyone, I am in need of some help. I have written this Windows CP949 encoding table https://github.com/zhad3/zencoding/blob/main/windows949/source/zencodin /windows949/table.d which is used to convert CP949 to UTF-16. After some research about how to initialize immutable associative arrays people suggested using `shared static this()`. So far this worked for me, but I recently discovered that DMD cannot compile this in release mode with optimizations. `dub build --build=release` or `dmd` with `-release -O` fails: ``` code windows949 function zencoding.windows949.fromWindows949!(immutable(ubyte)[]).fromWindows949 code table function zencoding.windows949.table._sharedStaticCtor_L29_C1 dmd failed with exit code -11. ``` I usually compile my projects using LDC where this works fine, but I don't want to force others to use LDC because of this one problem. Hence I'd like to ask on how to change the code so that it compiles on DMD in release mode (with optimizations). I thought about having a computational algorithm instead of an encoding table but sadly I could not find any references in that regard. Apparently encoding tables seem to be the standard.I think it's a memory issue and it's unlikely to be solved. I saw a similar issue a while ago where it worked with everything but DMD. Someone can correct me but if I remember correctly it's because DMD issues instructions for each value (or something like that) in the static array and thus runs out of memory before any optimization can happen or whatever, but LDC etc. doesn't have said issue. I can't exactly remember how it is, but I think it's something along those lines. I don't think there really is a workaround as of now and probably never will be.
Mar 14 2022
On 3/14/22 03:23, bauss wrote:I think it's a memory issue and it's unlikely to be solved.I could not reproduce the issue but it takes close to 1 minute for 'dmd -O'. Something is definitely wrong there. :) A workaround could be the -lowmem switch: dmd -O -lowmem ... But still, I would find a different method for the compilation time alone. I would experiment with two arrays holding corresponding keys and values separately: ushort[] keys = /* ... */; ushort[] values = /* ... */; And then building the AA from those. Hopefully, -O works better for that case. Ali
Mar 14 2022
On 3/14/22 11:36, Ali Çehreli wrote:I would experiment with two arrays holding corresponding keys and values separately: ushort[] keys = /* ... */; ushort[] values = /* ... */; And then building the AA from those. Hopefully, -O works better for that case.Yes, better but not much: 37 seconds vs. 50+ seconds on my system. Even though I am pretty sure the OP has access to the keys and the values separately, if it helps, I used the following code to separate the keys and values: import std.stdio; import std.algorithm; import std.range; void main() { auto f = File("deleteme.d", "w"); enum lineFormat = "%-( %-( 0x%04X,%|%)\n%)"; auto keys = cp949_table.keys.sort; auto values = keys.map!(key => cp949_table[key]); f.writefln!("ushort[] keys = [\n" ~ lineFormat ~ "\n];")(keys.chunks(8)); f.writefln!("ushort[] values = [\n" ~ lineFormat ~ "\n];")(values.chunks(8)); } That program will produce a deleteme.d. Then I copy-pasted the generated keys and values in the following code: pure auto make_cp949_table() { ushort[] keys = [ 0x8141, 0x8142, 0x8143, 0x8144, 0x8145, 0x8146, 0x8147, 0x8148, // ... 0xFDF7, 0xFDF8, 0xFDF9, 0xFDFA, 0xFDFB, 0xFDFC, 0xFDFD, 0xFDFE, ]; ushort[] values = [ 0xAC02, 0xAC03, 0xAC05, 0xAC06, 0xAC0B, 0xAC0C, 0xAC0D, 0xAC0E, // ... 0x7199, 0x71B9, 0x71BA, 0x72A7, 0x79A7, 0x7A00, 0x7FB2, 0x8A70, ]; /* The following failed with segmentation fault during compilation: import std.array : assocArray; return assocArray(keys, values); */ import std.range : zip; ushort[ushort] result; foreach (t; zip(keys, values)) { result[t[0]] = t[1]; } return result; } shared static this() { cp949_table = make_cp949_table(); } Yeah, dmd's -O performance with those tables is still very poor. Ali
Mar 14 2022
On Monday, 14 March 2022 at 19:05:41 UTC, Ali Çehreli wrote:Yes, better but not much: 37 seconds vs. 50+ seconds on my system. Even though I am pretty sure the OP has access to the keys and the values separately, if it helps, I used the following code to separate the keys and values: [snip] shared static this() { cp949_table = make_cp949_table(); } Yeah, dmd's -O performance with those tables is still very poor. AliThank you for this. This works although I could not reach the same speed as you. I wonder if people with less memory as me (16 GB) will still be able to compile it. DMD (DMD64 D Compiler v2.098.1) ``` $ time dub test --compiler=dmd --build=release real 2m11,438s user 2m11,134s sys 0m0,153s ``` LDC (LDC - the LLVM D compiler (1.26.0): based on DMD v2.096.1 and LLVM 7.0.1) ``` $ time dub test --compiler=ldc2 --build=release real 0m18,466s user 0m18,099s sys 0m0,151s ``` But this is definitely better than failing with an error. I have not yet tried the -lowmem flag. I'll try that later. I think I'll use your solution for now as that works and hopefully people won't have to recompile everything so often. Usually this should just get compiled once and be good with it :)
Mar 16 2022
On Monday, 14 March 2022 at 10:23:18 UTC, bauss wrote:I think it's a memory issue and it's unlikely to be solved. I saw a similar issue a while ago where it worked with everything but DMD. Someone can correct me but if I remember correctly it's because DMD issues instructions for each value (or something like that) in the static array and thus runs out of memory before any optimization can happen or whatever, but LDC etc. doesn't have said issue. I can't exactly remember how it is, but I think it's something along those lines. I don't think there really is a workaround as of now and probably never will be.Thank you, that's unfortunate if true. I don't know if the solution provided by Ali just mitigates the problem (in the sense that less memory is being used) but for now it works on my machine(tm).
Mar 16 2022
The recommended solution by Unicode is to use Trie tables for Look Up Tables (LUTs). https://en.wikipedia.org/wiki/Trie You can generate these as read only global arrays and are very fast for this.
Mar 14 2022
On Monday, 14 March 2022 at 22:20:42 UTC, rikki cattermole wrote:The recommended solution by Unicode is to use Trie tables for Look Up Tables (LUTs). https://en.wikipedia.org/wiki/Trie You can generate these as read only global arrays and are very fast for this.Thank your for this hint. I'll have to check whether this is applicable for this context. For example this lookup table appears to be a bijection.
Mar 16 2022
On Monday, 14 March 2022 at 09:40:00 UTC, zhad3 wrote:[...] I usually compile my projects using LDC where this works fine, but I don't want to force others to use LDC because of this one problem. Hence I'd like to ask on how to change the code so that it compiles on DMD in release mode (with optimizations). I thought about having a computational algorithm instead of an encoding table but sadly I could not find any references in that regard. Apparently encoding tables seem to be the standard.OMG, I gasp at my computer screen and waited for minutes. :) When you edit the code at the back-end level, you can use system resources in the best way. I think you should start with [dlang.dmd.backend.aarray](https://github.com/dlang/dmd/blob/master/src/dmd/backend/aarray.d) If we use the following codes with Ali's code by separate the keys and values, it compiles fast on DMD and works correctly: ```d import std.stdio; import dmd.backend.aarray; import zencoding.windows949; struct Make_CP949Table(T) { private AArray!(Tinfo!T, T) aa; this(T[] keys, T[] values) { foreach (i, T value; values) { T * set = aa.get(&keys[i]); *set = value; } aa.rehash(); } T* opBinaryRight(string op)(T index) if (op == "in") { T* key = aa.get(&index); if(*key > 0) return key; return null; } T get(T key) { return *aa.get(&key); } size_t length() { return aa.nodes; } } Make_CP949Table!ushort cp949_table; shared static this() { cp949_table = Make_CP949Table!ushort(keys, values); } // Ali had already prepared these for you ----------------^ void main() { const(ubyte[]) cp949 = [ 0x64, 0x61, 0x74, 0x61, 0x5C, 0x69, 0x6D, 0x66, 0x5C, 0xB1, 0xB8, 0xC6, 0xE4, 0xC4, 0xDA, 0x5F, 0xC5, 0xA9, 0xB7, 0xE7, 0xBC, 0xBC, 0xC0, 0xCC, 0xB4, 0xF5, 0x5F, 0xB3, 0xB2, 0x2E, 0x69, 0x6D, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]; const(ushort[]) utf16 = [ 0x64, 0x61, 0x74, 0x61, 0x5C, 0x69, 0x6D, 0x66, 0x5C, 0xAD6C, 0xD398, 0xCF54, 0x5F, 0xD06C, 0xB8E8, 0xC138, 0xC774, 0xB354, 0x5F, 0xB0A8, 0x2E, 0x69, 0x6D, 0x66]; cp949.fromWindows949.writeln; // data\imf\구페코_크루세이더_남.imf } ``` SDB 79
Mar 14 2022
On Tuesday, 15 March 2022 at 03:01:05 UTC, Salih Dincer wrote:OMG, I gasp at my computer screen and waited for minutes. :) When you edit the code at the back-end level, you can use system resources in the best way. I think you should start with [dlang.dmd.backend.aarray](https://github.com/dlang/dmd/blob/master/src/dmd/backend/aarray.d) If we use the following codes with Ali's code by separate the keys and values, it compiles fast on DMD and works correctly: [snip] SDB 79Thank you, but wouldn't using the DMD backend make it so that it won't compile with LDC? I am not that knowledgeable in compiler internals so I don't know.
Mar 16 2022
On Wednesday, 16 March 2022 at 18:40:35 UTC, zhad3 wrote:On Tuesday, 15 March 2022 at 03:01:05 UTC, Salih Dincer wrote:No need for all files. Just aarray.d is enough. SDB 79OMG, I gasp at my computer screen and waited for minutes. :) When you edit the code at the back-end level, you can use system resources in the best way. I think you should start with [dlang.dmd.backend.aarray](https://github.com/dlang/dmd/blob/master/src/dmd/backend/aarray.d) If we use the following codes with Ali's code by separate the keys and values, it compiles fast on DMD and works correctly: [snip] SDB 79Thank you, but wouldn't using the DMD backend make it so that it won't compile with LDC? I am not that knowledgeable in compiler internals so I don't know.
Mar 16 2022
On Monday, 14 March 2022 at 09:40:00 UTC, zhad3 wrote:Hey everyone, I am in need of some help. I have written this Windows CP949 encoding table https://github.com/zhad3/zencoding/blob/main/windows949/source/zencodin /windows949/table.d which is used to convert CP949 to UTF-16. After some research about how to initialize immutable associative arrays people suggested using `shared static this()`. So far this worked for me, but I recently discovered that DMD cannot compile this in release mode with optimizations. `dub build --build=release` or `dmd` with `-release -O` fails: ``` code windows949 function zencoding.windows949.fromWindows949!(immutable(ubyte)[]).fromWindows949 code table function zencoding.windows949.table._sharedStaticCtor_L29_C1 dmd failed with exit code -11. ``` I usually compile my projects using LDC where this works fine, but I don't want to force others to use LDC because of this one problem. Hence I'd like to ask on how to change the code so that it compiles on DMD in release mode (with optimizations). I thought about having a computational algorithm instead of an encoding table but sadly I could not find any references in that regard. Apparently encoding tables seem to be the standard.Why not use a simple static array (not an associative array). Where the values are indexed on `key - min(keys)`. Even with the holes in the keys (i.e. keys that do not have corresponding values) it will be smaller that the constructed associative array? The lookup is also faster.
Mar 17 2022
On Thursday, 17 March 2022 at 11:36:40 UTC, Patrick Schluter wrote:On Monday, 14 March 2022 at 09:40:00 UTC, zhad3 wrote:Something akin to ```d auto lookup(ushort key) { return cp949[key-0x8141]; } immutable ushort[0xFDFE-0x8141+1] cp949 = [ 0x8141-0x8141: 0xAC02, 0x8142-0x8141: 0xAC03, 0x8143-0x8141: 0xAC05, 0x8144-0x8141: 0xAC06, 0x8145-0x8141: 0xAC0B, 0x8146-0x8141: 0xAC0C, 0x8147-0x8141: 0xAC0D, 0x8148-0x8141: 0xAC0E, 0x8149-0x8141: 0xAC0F, 0x814A-0x8141: 0xAC18, 0x814B-0x8141: 0xAC1E, 0x814C-0x8141: 0xAC1F, 0x814D-0x8141: 0xAC21, 0x814E-0x8141: 0xAC22, 0x814F-0x8141: 0xAC23, 0x8150-0x8141: 0xAC25, 0x8151-0x8141: 0xAC26, 0x8152-0x8141: 0xAC27, 0x8153-0x8141: 0xAC28, 0x8154-0x8141: 0xAC29, 0x8155-0x8141: 0xAC2A, 0x8156-0x8141: 0xAC2B, 0x8157-0x8141: 0xAC2E, 0x8158-0x8141: 0xAC32, 0x8159-0x8141: 0xAC33, 0x815A-0x8141: 0xAC34, 0x8161-0x8141: 0xAC35, 0x8162-0x8141: 0xAC36, 0x8163-0x8141: 0xAC37, ... 0xFDFA-0x8141: 0x72A7, 0xFDFB-0x8141: 0x79A7, 0xFDFC-0x8141: 0x7A00, 0xFDFD-0x8141: 0x7FB2, 0xFDFE-0x8141: 0x8A70, ]; ```Hey everyone, I am in need of some help. I have written this Windows CP949 encoding table https://github.com/zhad3/zencoding/blob/main/windows949/source/zencodin /windows949/table.d which is used to convert CP949 to UTF-16. After some research about how to initialize immutable associative arrays people suggested using `shared static this()`. So far this worked for me, but I recently discovered that DMD cannot compile this in release mode with optimizations. `dub build --build=release` or `dmd` with `-release -O` fails: ``` code windows949 function zencoding.windows949.fromWindows949!(immutable(ubyte)[]).fromWindows949 code table function zencoding.windows949.table._sharedStaticCtor_L29_C1 dmd failed with exit code -11. ``` I usually compile my projects using LDC where this works fine, but I don't want to force others to use LDC because of this one problem. Hence I'd like to ask on how to change the code so that it compiles on DMD in release mode (with optimizations). I thought about having a computational algorithm instead of an encoding table but sadly I could not find any references in that regard. Apparently encoding tables seem to be the standard.Why not use a simple static array (not an associative array). Where the values are indexed on `key - min(keys)`. Even with the holes in the keys (i.e. keys that do not have corresponding values) it will be smaller that the constructed associative array? The lookup is also faster.
Mar 17 2022
On Thursday, 17 March 2022 at 12:11:19 UTC, Patrick Schluter wrote:On Thursday, 17 March 2022 at 11:36:40 UTC, Patrick Schluter wrote:Takes 165 ms to compile with dmd 2.094.2 -O on [godbolt] with the whole table generated from the Unicode link. [godbolt]: https://godbolt.org/z/hEzP7rKnn][...]Something akin to ```d auto lookup(ushort key) { return cp949[key-0x8141]; } [...]
Mar 17 2022
On Thursday, 17 March 2022 at 12:19:36 UTC, Patrick Schluter wrote:On Thursday, 17 March 2022 at 12:11:19 UTC, Patrick Schluter wrote:Upps, remove the ] at the end of the link to [godbolt] [godbolt]: https://godbolt.org/z/hEzP7rKnnOn Thursday, 17 March 2022 at 11:36:40 UTC, Patrick Schluter wrote:Takes 165 ms to compile with dmd 2.094.2 -O on [godbolt] with the whole table generated from the Unicode link. [godbolt]: https://godbolt.org/z/hEzP7rKnn][...]Something akin to ```d auto lookup(ushort key) { return cp949[key-0x8141]; } [...]
Mar 17 2022