digitalmars.D.announce - Ecoji-d v1.0.0 is released - Base1024 using emojis
- Anton Fediushin (31/31) Mar 14 2018 ๐, I'm glad to announce that ecoji-d - pure D implementation of
- bauss (3/35) Mar 15 2018 Fun, but seems pretty useless in practice.
- Anton Fediushin (36/37) Mar 15 2018 I disagree. Ecoji (base1024) has bigger character set meaning
- bauss (13/50) Mar 16 2018 If your care about size of data then you're not going to encode
- Anton Fediushin (7/13) Mar 18 2018 Well, encoding is not *mine*, only D implementation is. What do
- bauss (5/14) Mar 18 2018 Yes, but that makes your example pointless, because having to
- Rainer Schuetze (3/6) Mar 16 2018 If you can compress random data to 52% of the original data, you should
- Manu (6/8) Mar 17 2018 This doesn't make sense. For every 10 bits, you're emitting 32 bits...
- Cym13 (59/69) Mar 18 2018 Randomness isn't compressible. The fact that ecoji-d compresses
- Anton Fediushin (4/5) Mar 18 2018 Indeed, there's an error somewhere. For some reason it stops
- Faux Amis (2/16) Mar 17 2018 Useful feature: Easy manual verification.
- Abdulhaq (3/6) Mar 18 2018 Congratulations, it's a nice bit of fun.
๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ What is ecoji? Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ. Encoding example: --- $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐ --- And decoding: --- $ echo -n "๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ ผ๐ฆ๐๐ฅด๐" | ecoji-d -d Base64 is so 1999, isn't there something better? --- Ecoji-d's features: โ๏ธ Range interface โ๏ธ Lazy encoding/decoding โ๏ธ Low memory usage โ๏ธ safe and pure when possible โ๏ธ Many tests โ๏ธ Can be used as a library and as a CLI utility API consists of just 2๏ธโฃ functions: ๐ `encode`, which does encoding ๐ `decode`, which does decoding Links: ๐ฆ DUB package page: http://code.dlang.org/packages/ecoji-d ๐ GitHub repository: https://github.com/ohdatboi/ecoji-d ๐ค GitHub repository of the reference Go implementation: https://github.com/keith-turner/ecoji
Mar 14 2018
On Wednesday, 14 March 2018 at 17:30:18 UTC, Anton Fediushin wrote:๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ What is ecoji? Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ. Encoding example: --- $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐ --- And decoding: --- $ echo -n "๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ ผ๐ฆ๐๐ฅด๐" | ecoji-d -d Base64 is so 1999, isn't there something better? --- Ecoji-d's features: โ๏ธ Range interface โ๏ธ Lazy encoding/decoding โ๏ธ Low memory usage โ๏ธ safe and pure when possible โ๏ธ Many tests โ๏ธ Can be used as a library and as a CLI utility API consists of just 2๏ธโฃ functions: ๐ `encode`, which does encoding ๐ `decode`, which does decoding Links: ๐ฆ DUB package page: http://code.dlang.org/packages/ecoji-d ๐ GitHub repository: https://github.com/ohdatboi/ecoji-d ๐ค GitHub repository of the reference Go implementation: https://github.com/keith-turner/ecojiFun, but seems pretty useless in practice.
Mar 15 2018
On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:Fun, but seems pretty useless in practice.I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character. For example ecoji encoded "abcde" looks like this: "๐๐ธ๐ฆ๐ญ" And base64 encoded one looks like this: "YWJjZGU=". Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data: --- $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s $ dd if=test.raw | ./ecoji-d | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s $ dd if=test.raw | base64 | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s --- And if we move to real word scenarios, where web pages are gzip'ped most of the time: --- $ dd if=test.raw | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s $ dd if=test.raw | base64 | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s --- So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
Mar 15 2018
On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:If your care about size of data then you're not going to encode anyway. Same goes for speed. Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support. Sure you can encode your data and gzip it, but once it reaches the browser and it unzips it, then what? The browser doesn't know what to do with the data. You can't even use base64 for http headers. At most it could be used for email clients, since they do support "Content-Transfer-Encoding" but browsers don't. They only support "Content-Encoding" which at most can be compressions such as gzip.Fun, but seems pretty useless in practice.I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character. For example ecoji encoded "abcde" looks like this: "๐๐ธ๐ฆ๐ญ" And base64 encoded one looks like this: "YWJjZGU=". Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data: --- $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s $ dd if=test.raw | ./ecoji-d | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s $ dd if=test.raw | base64 | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s --- And if we move to real word scenarios, where web pages are gzip'ped most of the time: --- $ dd if=test.raw | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s $ dd if=test.raw | base64 | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s --- So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
Mar 16 2018
On Friday, 16 March 2018 at 08:25:30 UTC, bauss wrote:Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support.Well, encoding is not *mine*, only D implementation is. What do you mean by "browser support"? Indeed, ecoji-d cannot be used on the client side, but since algorithm is simple and code is publically available anyone can implement decoding in JavaScript or any other language.Sure you can encode your data and gzip it, but once it reaches the browser and it unzips it, then what? The browser doesn't know what to do with the data. You can't even use base64 for http headers.Then you use client-side decoder, of course!
Mar 18 2018
On Sunday, 18 March 2018 at 12:51:23 UTC, Anton Fediushin wrote:On Friday, 16 March 2018 at 08:25:30 UTC, bauss wrote:Yes, but that makes your example pointless, because having to decode in javascript is not exactly something that anybody in their sane mind would ever do with a webpage or anything like that anyway.Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support.Well, encoding is not *mine*, only D implementation is. What do you mean by "browser support"? Indeed, ecoji-d cannot be used on the client side, but since algorithm is simple and code is publically available anyone can implement decoding in JavaScript or any other language.
Mar 18 2018
On 15/03/2018 19:45, Anton Fediushin wrote:$ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/sIf you can compress random data to 52% of the original data, you should repeat this step until there is a single byte left.
Mar 16 2018
On 15 March 2018 at 11:45, Anton Fediushin via Digitalmars-d-announce < digitalmars-d-announce puremagic.com> wrote:Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data:This doesn't make sense. For every 10 bits, you're emitting 32 bits... you're more than tripling the size of the data. Base64 takes 6 bits and emits 8 bits, which is a third larger. 1.333x is smaller than 3.2x. O_o
Mar 17 2018
On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:$ dd if=test.raw | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s $ dd if=test.raw | base64 | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/sRandomness isn't compressible. The fact that ecoji-d compresses anything above 1% shows only that there is a bug in your library: ``` $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 0.373423 s, 180 MB/s $ dd if=test.raw | ./ecoji-d | gzip -c | gzip -cd | ./ecoji-d -dtest2.raw131072+0 records in 131072+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 24.9523 s, 2.7 MB/s $ wc -c test.raw test2.raw 67108864 test.raw 11185155 test2.raw ``` So definitely not the same files before and after compression/decompression. However the beginning is the same: ``` $ xxd test.raw | head 00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy.. 00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O.... 00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......) 00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj.... 00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T 00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M 00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n. 00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r 00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9 $ xxd test2.raw | head 00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy.. 00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O.... 00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......) 00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj.... 00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T 00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M 00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n. 00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r 00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9 ``` So I think ecoji-d just truncates its input at some point.
Mar 18 2018
On Sunday, 18 March 2018 at 11:25:45 UTC, Cym13 wrote:So I think ecoji-d just truncates its input at some point.Indeed, there's an error somewhere. For some reason it stops after 7457792 bytes. I'll create an issue for that and will look into this later
Mar 18 2018
On 2018-03-14 18:30, Anton Fediushin wrote:๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ What is ecoji? Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ. Encoding example: --- $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ ผ๐ฆ๐๐ฅด๐Useful feature: Easy manual verification.
Mar 17 2018
On Wednesday, 14 March 2018 at 17:30:18 UTC, Anton Fediushin wrote:๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ [...]Congratulations, it's a nice bit of fun.
Mar 18 2018