digitalmars.D - kill the commas! (phobos code cleanup)
- ketmar via Digitalmars-d (9/9) Sep 03 2014 another code cleanup: removing "comma-separated expressions" from
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 03 2014 :-) Way to go! What's next? The exclamaition points?
- ketmar via Digitalmars-d (6/8) Sep 03 2014 no, "!" are nice. using "!" is like telling everyone "see, im SOOO
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/14) Sep 03 2014 So I have to annihilate the exclamation points from templates
- ketmar via Digitalmars-d (4/5) Sep 03 2014 ah, i'm not using unicode. my favorite editor (mcedit) is not good with
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Sep 03 2014 That sucks! Now I had to do it myself. (I think you should
- Timon Gehr (3/4) Sep 03 2014 What about π? This is already valid code: auto π=3;
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Sep 03 2014 Well, it is not for the main branch. I think it is better to have
- "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> (4/11) Sep 03 2014 Maybe this would be a nice place to push tau into the library?
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/7) Sep 03 2014 Hm, haven't thought about that before, but I am usually
- ketmar via Digitalmars-d (6/9) Sep 03 2014 no-no-no-no! utf of any size is boring. i want my strings to be
- Marco Leise (16/27) Sep 06 2014 But there lies greatness in the unification of all locales
- ketmar via Digitalmars-d (7/8) Sep 06 2014 sure i am! ;-)
- monarch_dodra (9/20) Sep 06 2014 That sounds so much better than UTF-32.
- ketmar via Digitalmars-d (12/19) Sep 06 2014 why, in the name of hell, do you need UTF-32?! doesn't
- Marco Leise (20/43) Sep 06 2014 Dude! This is handled the same way sound fonts for Midi did
- ketmar via Digitalmars-d (6/11) Sep 06 2014 and hurts my eyes. i have a little background in typography, and mixing
- Marco Leise (31/43) Sep 06 2014 Japanese and Latin are already so far apart that the font
- ketmar via Digitalmars-d (12/19) Sep 06 2014 i prefer to not read the text i cannot understand. there is zero
- Marco Leise (10/32) Sep 06 2014 So because you see no use for Unicode (which is hard to
- ketmar via Digitalmars-d (10/14) Sep 07 2014 ah, i hate so-called "nls" too. and smart programs that tries to mess
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/15) Sep 07 2014 Whatever your motivation is, I'd say utf-8 is a blessing, and I
- ketmar via Digitalmars-d (5/6) Sep 07 2014 "efficient" and "utf-8" can't play together. in C i must scan the whole
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/17) Sep 07 2014 For western text strings utf-8 is much better due to cache
- ketmar via Digitalmars-d (16/30) Sep 07 2014 that's what i call efficiency! using SIMD for string indexing!
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/16) Sep 07 2014 Uhm, why? For a parser you are generally better off padding the
- ketmar via Digitalmars-d (13/21) Sep 07 2014 why do i need a useless argument (position in a string), if i have the
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/17) Sep 07 2014 If speed does not matter then you don't need a system level
- ketmar via Digitalmars-d (7/17) Sep 07 2014 D is not just system-level. and i love D templates and CTFE.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 07 2014 Because you need to fetch data over a network.
- Marco Leise (21/35) Sep 07 2014 I'm not so convinced that many people would be happy with
- ketmar via Digitalmars-d (10/17) Sep 07 2014 no, you aren't right. ;-) i'm not a native English speaker (as you can
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/9) Sep 07 2014 Which is a good idea. If you also require SSE alignment (or cache
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 07 2014 Actually, latest gen AVX-512 can work on 64 bytes per instruction…
- Idan Arye (10/19) Sep 06 2014 Does it really matter? I don't really care if some UTF8 encoded
- Nick Treleaven (3/5) Sep 03 2014 It's better to just disallow them where they are bug prone:
- ketmar via Digitalmars-d (4/5) Sep 03 2014 that is exactly what i mean. `for ()` can be an exception though, to
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/16) Sep 03 2014 The conclusion was to make the comma operator return void when
- ketmar via Digitalmars-d (31/33) Sep 03 2014 i'm pretty sure that this can be done with a little hack in
- Daniel Murphy (5/8) Sep 03 2014 It can be done there, but it would not be 'correct'. The ast should att...
- ketmar via Digitalmars-d (5/9) Sep 03 2014 i can't see any sense in increasing compiler complexity for the feature
- Daniel Murphy (6/9) Sep 03 2014 Nothing gets removed quickly (full deprecation cycle is usually > 18
- ketmar via Digitalmars-d (5/7) Sep 03 2014 ah, really. it's more risk in introducing new flag to CommaExp and code
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/16) Sep 03 2014 The funny thing is, I had to split `CommaExp` into two classes
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/2) Sep 03 2014 I changed the PR as suggested by Daniel. Here is the new version:
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/15) Sep 03 2014 https://github.com/D-Programming-Language/phobos/pull/2485
another code cleanup: removing "comma-separated expressions" from phobos. https://issues.dlang.org/show_bug.cgi?id=3D13419 there is only one poisoned file in phobos and druntime: std.uni. there are some "return a, b;" abominations. KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS! (found with my "warning on comma-separated expressions" patch, so i can guarantee that nothing is left) KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!
Sep 03 2014
On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via Digitalmars-d wrote:KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!:-) Way to go! What's next? The exclamaition points?
Sep 03 2014
On Wed, 03 Sep 2014 12:43:50 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:no, "!" are nice. using "!" is like telling everyone "see, im SOOO cool!" anyway, "comma-separated expressions" must be eliminated. ok, let 'em appear in `for()` for a while, but nowhere else.KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!:-) Way to go! What's next? The exclamaition points?
Sep 03 2014
On Wednesday, 3 September 2014 at 12:52:07 UTC, ketmar via Digitalmars-d wrote:no, "!" are nice. using "!" is like telling everyone "see, im SOOO cool!"So I have to annihilate the exclamation points from templates myself. I'd like to have a unicode alternative such as: temp‹a,b› =>temp!(a,b) temp«a» => temp!("a") and while you are at it, please implement √x and √(x+y) as well. :)anyway, "comma-separated expressions" must be eliminated. ok, let 'em appear in `for()` for a while, but nowhere else.Yeah, I agree. They are only useful in C macros.
Sep 03 2014
On Wed, 03 Sep 2014 13:25:49 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:I'd like to have a unicode alternativeah, i'm not using unicode. my favorite editor (mcedit) is not good with unicode anyway and my locale is not utf. ;-)
Sep 03 2014
On Wednesday, 3 September 2014 at 13:35:48 UTC, ketmar via Digitalmars-d wrote:ah, i'm not using unicode. my favorite editor (mcedit) is not good with unicode anyway and my locale is not utf. ;-)That sucks! Now I had to do it myself. (I think you should upgrade to a decent editor on a decent OS and save me some unicode-work…;) Implementing "tmpl‹a,b›" and "tmpl«str»" was tedious, but I believe √ and π can be done in a heartbeat.
Sep 03 2014
On 09/03/2014 11:38 PM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:[...] I believe √ and π can be done in a heartbeat.What about π? This is already valid code: auto π=3;
Sep 03 2014
On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr wrote:What about π? This is already valid code: auto π=3;Well, it is not for the main branch. I think it is better to have pi as a symbol so you can get rid of trigonometric-functions and get higher precision? Dunno. I see no point in redefining pi…
Sep 03 2014
On Wednesday, 3 September 2014 at 22:00:36 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr wrote:Maybe this would be a nice place to push tau into the library? http://www.tauday.com/What about π? This is already valid code: auto π=3;Well, it is not for the main branch. I think it is better to have pi as a symbol so you can get rid of trigonometric-functions and get higher precision? Dunno. I see no point in redefining pi…
Sep 03 2014
On Wednesday, 3 September 2014 at 22:03:50 UTC, Casper Færgemand wrote:Maybe this would be a nice place to push tau into the library? http://www.tauday.com/Hm, haven't thought about that before, but I am usually interested in 2π, true. Maybe it is possible to special case "2π" too. Hm.
Sep 03 2014
On Wed, 03 Sep 2014 21:38:55 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:That sucks! Now I had to do it myself. (I think you should=20 upgrade to a decent editor on a decent OS and save me some=20 unicode-work=E2=80=A6;)no-no-no-no! utf of any size is boring. i want my strings to be indexable without any hidden function calls! ;-) i even did "native-encoded strings" patch (n"hello!") and made lexer don't complain about bad utf in comments. i love my one-byte locale!
Sep 03 2014
Am Thu, 4 Sep 2014 00:55:47 +0300 schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:On Wed, 03 Sep 2014 21:38:55 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote: =20But there lies greatness in the unification of all locales into just one. All the need for encodings in HTTP that more often than not were incorrectly declared making browsers guess. Text files that looked like gibberish because they came from DOS or were written in another language that you even happen to speak, but now cannot decypher the byte mess. Or do you remember the mess that happened to file names with accented characters if they were sufficiently often copied between file systems? I'm all for performance, but different encodings on each computing platform and language just didn't work in the globalized world. You are a relic :) --=20 MarcoThat sucks! Now I had to do it myself. (I think you should=20 upgrade to a decent editor on a decent OS and save me some=20 unicode-work=E2=80=A6;)no-no-no-no! utf of any size is boring. i want my strings to be indexable without any hidden function calls! ;-) =20 i even did "native-encoded strings" patch (n"hello!") and made lexer don't complain about bad utf in comments. i love my one-byte locale!
Sep 06 2014
On Sat, 6 Sep 2014 11:48:53 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:You are a relic :)sure i am! ;-) i'm just waiting for 32-bit bytes. and white bytes are 8-bit, i'll use ebcdic^w one of available one-byte encodings. ;-) btw: are there fonts that can display all unicode? i doubt it (ok, may be one). so we designed the thing that can't really use. ;-)
Sep 06 2014
On Saturday, 6 September 2014 at 10:20:23 UTC, ketmar via Digitalmars-d wrote:On Sat, 6 Sep 2014 11:48:53 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:That sounds so much better than UTF-32.You are a relic :)sure i am! ;-) i'm just waiting for 32-bit bytes. and white bytes are 8-bit, i'll use ebcdic^w one of available one-byte encodings. ;-)btw: are there fonts that can display all unicode? i doubt it (ok, maybe one).Fonts are encoding agnostic, your point is irrelevant.so we designed the thing that can't really use. ;-)We can and do: unicode is the only thing that could process text that could come from any client on earth, without choking on any character. This is all done without the need for font-display, which is on the burden of the final client, and their respective local needs.
Sep 06 2014
On Sat, 06 Sep 2014 11:05:13 +0000 monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote:That sounds so much better than UTF-32.why, in the name of hell, do you need UTF-32?! doesn't 0x10000000000000000 chars enough for everyone?!so where can i download font collection with fonts contains all unicode chars?btw: are there fonts that can display all unicode? i doubt it (ok, maybe one).Fonts are encoding agnostic, your point is irrelevant.This is all done without the need for font-displaythank you, but i don't need any text i can't display (and therefore read). i bet you don't need to process Thai, for example -- 'cause this requires much more than just character encoding convention. and bytes are encoding-agnostic.which is on the burden of the final client, and their respective=20 local needs.hm... text processing software developed on systems which can't display processing text? wow! i want two!
Sep 06 2014
Am Sat, 6 Sep 2014 14:52:19 +0300 schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:On Sat, 06 Sep 2014 11:05:13 +0000 monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote: =20Dude! This is handled the same way sound fonts for Midi did it. You can mix and match fonts to create the complete experience. If your version of "Arial" doesn't come with Thai symbols, you just install _any_ Asian font which includes those and it will automatically be used in places where your favorite font lacks symbols. Read this Wikipedia article from 2005 on it: http://en.wikipedia.org/wiki/Fallback_font In practice it is a solved problem, as you can see in your browser when you load a web site with mixed writing systems. If all else fails, there is usually something like this in place: http://scripts.sil.org/cms/scripts/page.php?site_id=3Dnrsi&id=3DUnicodeBMPF= allbackFont E.g. Missing symbols are replaced by a square with the hexadecimal code point. So the missing symbol can at least be identified correctly (and a matching font installed). --=20 MarcoThat sounds so much better than UTF-32.why, in the name of hell, do you need UTF-32?! doesn't 0x10000000000000000 chars enough for everyone?! =20so where can i download font collection with fonts contains all unicode chars? =20btw: are there fonts that can display all unicode? i doubt it (ok, maybe one).Fonts are encoding agnostic, your point is irrelevant.This is all done without the need for font-displaythank you, but i don't need any text i can't display (and therefore read). i bet you don't need to process Thai, for example -- 'cause this requires much more than just character encoding convention. and bytes are encoding-agnostic. =20which is on the burden of the final client, and their respective=20 local needs.hm... text processing software developed on systems which can't display processing text? wow! i want two!
Sep 06 2014
On Sat, 6 Sep 2014 14:52:50 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:In practice it is a solved problem, as you can see in your browser when you load a web site with mixed writing systems.and hurts my eyes. i have a little background in typography, and mixing different fonts makes my eyes bleed.E.g. Missing symbols are replaced by a square with the hexadecimal code point. So the missing symbol can at least be identified correctly (and a matching font installed).this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Sep 06 2014
Am Sat, 6 Sep 2014 15:52:09 +0300 schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:On Sat, 6 Sep 2014 14:52:50 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote: =20Japanese and Latin are already so far apart that the font doesn't make much of a difference anymore, so long as it has similar size and hinting options. As for mixing writing systems there are of course dozens of use cases. Presenting an English website with links to localized versions labeled with each language's name, programs dealing with mathematical/technical symbols can use regular text allowing for easy copy&paste, instead of resorting to bitmaps, e.g. for logical OR or Greek variables. And to make your eyes bleed even more here is a Cyrillic Wikipedia article on Mao Tse-Tung, using traditional and simplified versions of his name in Chinese and the two transliterations to Latin according to Pinyin and Wade-Giles: https://ru.wikipedia.org/wiki/=D0=9C=D0=B0=D0=BE_=D0=A6=D0=B7=D1=8D=D0=B4= =D1=83=D0=BDIn practice it is a solved problem, as you can see in your browser when you load a web site with mixed writing systems.and hurts my eyes. i have a little background in typography, and mixing different fonts makes my eyes bleed.Yes, but why do you prefer garbled symbols incorrectly mapped to your native encoding or even invalid characters silently removed ? Do you understand that with the symbols displayed as code points you still have all the information even if it doesn't look readable immediately ? It offers you new options: * You can copy and paste the text into an online translator to get an idea of what the text says. * You can enter the code into a tool that tells you which script it is from and then look for a font that contains that script to get an acceptable display. --=20 MarcoE.g. Missing symbols are replaced by a square with the hexadecimal code point. So the missing symbol can at least be identified correctly (and a matching font installed).this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Sep 06 2014
On Sat, 6 Sep 2014 16:38:50 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:Yes, but why do you prefer garbled symbols incorrectly mapped to your native encoding or even invalid characters silently removed ?i prefer to not read the text i cannot understand. there is zero information in Chinese, or Thai, or even Spanish for me. those texts looks (for me) like gibberish anyway. so i don't care if they are displayed correctly or not. that's why i using one-byte encoding and happy with it.Do you understand that with the symbols displayed as code points you still have all the information even if it doesn't look readable immediately ?no, i don't understand this. for me Chinese glyph and abstract painting is the same. and simple box, for that matter.It offers you new options:only one: trying to paste URL to google translate and then trying to make sense from GT output. and i don't care what encoding was used for page in this case.
Sep 06 2014
Am Sat, 6 Sep 2014 17:51:23 +0300 schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:On Sat, 6 Sep 2014 16:38:50 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote: =20So because you see no use for Unicode (which is hard to believe considering all the places where localized strings may be used), everyone has to keep supporting hacks to guess text encodings or NFC normalize and convert strings to the system locale that go to the terminal. Thanks for the extra work :p --=20 MarcoYes, but why do you prefer garbled symbols incorrectly mapped to your native encoding or even invalid characters silently removed ?i prefer to not read the text i cannot understand. there is zero information in Chinese, or Thai, or even Spanish for me. those texts looks (for me) like gibberish anyway. so i don't care if they are displayed correctly or not. that's why i using one-byte encoding and happy with it. =20Do you understand that with the symbols displayed as code points you still have all the information even if it doesn't look readable immediately ?no, i don't understand this. for me Chinese glyph and abstract painting is the same. and simple box, for that matter. =20It offers you new options:only one: trying to paste URL to google translate and then trying to make sense from GT output. and i don't care what encoding was used for page in this case.
Sep 06 2014
On Sun, 7 Sep 2014 01:57:12 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:So because you see no use for Unicode (which is hard to believe considering all the places where localized strings may be used), everyone has to keep supporting hacks to guessah, i hate so-called "nls" too. and smart programs that tries to mess with my input bytes. i'm giving you byte -- you store it. that's all. no utf-8, no checks, no translations. byte in -- same byte out.Thanks for the extrawork :pbe my guest. ;-) but there is no need in extra work actually. using ASCII and English for program UI will work in any encoding. and any byte with high bit set should not be interpreted in any way. it's much easier than utf, you see? ;-)
Sep 07 2014
On Sunday, 7 September 2014 at 08:35:13 UTC, ketmar via Digitalmars-d wrote:but there is no need in extra work actually. using ASCII and English for program UI will work in any encoding. and any byte with high bit set should not be interpreted in any way. it's much easier than utf, you see? ;-)Whatever your motivation is, I'd say utf-8 is a blessing, and I personally see no reason for supporting any other encoding (not even utf-16 or utf-32). utf-8 combined with unique ref-counted immutable short strings is quite acceptable IMO (you can compare non-equality by address only). D needs an efficient implementation of it, that's all.
Sep 07 2014
On Sun, 07 Sep 2014 10:21:48 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:D needs an efficient implementation of it, that's all."efficient" and "utf-8" can't play together. in C i must scan the whole string to get it length, but with utf-8 i must scan the string just to index nth symbol! ucs-4 (aka dchar/dstring) is ok though.
Sep 07 2014
On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via Digitalmars-d wrote:index nth symbol! ucs-4 (aka dchar/dstring) is ok though.For western text strings utf-8 is much better due to cache efficiency. You can speed it up using SSE or dedicated datastructures. The point of having unique immutable strings is that they compare by reference only and that you can have auxillary datastructures that classify them if needed. I think the D approach to strings is unpleasant. You should not have slices of strings, only slices of ubyte arrays. If you want real speedups for streams of symbols you have to move into the landscape of huffman-encoding, tries, dedicated datastructures… Having uniform string support in libraries (i.e. only supporting utf-8) is a clear advantage IMO, that will allow for APIs that are SSE backed and performant.
Sep 07 2014
On Sun, 07 Sep 2014 10:45:22 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:For western text strings utf-8 is much better due to cache=20 efficiency. You can speed it up using SSE or dedicated=20 datastructures.that's what i call efficiency! using SIMD for string indexing!The point of having unique immutable strings is that they compare=20 by reference only and that you can have auxillary datastructures=20 that classify them if needed.and this fill fail with compacting gc. heh.I think the D approach to strings is unpleasant. You should not=20 have slices of strings, only slices of ubyte arrays.oh, no, thanks. casting strings back and forth for slicing is not fun. and writing parsers using string slicing is fun.If you want real speedups for streams of symbols you have to move=20 into the landscape of huffman-encoding, tries, dedicated=20 datastructures=E2=80=A6or just ditch utf-8 and use ucs-4. this will speedup the most frequently string operations: correct indexing and slicing.Having uniform string support in libraries (i.e. only supporting=20 utf-8) is a clear advantage IMO, that will allow for APIs that=20 are SSE backed and performant.utf-8 was not invented as encoding for internal string representation. it's merely for data interchange. i myself believe that language should not do any encoding/decoding on given string without explicit asking. i.e. `foreach (dchar ch; s)` must be the same as `foreach (char ch; s)` when s is `string`. for any decoding i must use `foreach (ch; s.byUtf8Char)= `. the whole "let's use utf-8 as internal string representation" was a mistake. and i'm not talking about D here.
Sep 07 2014
On Sunday, 7 September 2014 at 11:31:01 UTC, ketmar via Digitalmars-d wrote:oh, no, thanks. casting strings back and forth for slicing is not fun. and writing parsers using string slicing is fun.Uhm, why? For a parser you are generally better off padding the end with guards/sentinels and only move a pointer.or just ditch utf-8 and use ucs-4. this will speedup the most frequently string operations: correct indexing and slicing.I almost never use indexing of strings. I tend to use either comparisons, regexps, splitting into phrases or matching on the head or tail.the whole "let's use utf-8 as internal string representation" was a mistake. and i'm not talking about D here.Not if you want efficient I/O and want to conserve memory (which is what you want in a server).
Sep 07 2014
On Sun, 07 Sep 2014 14:08:30 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:Uhm, why? For a parser you are generally better off padding the=20 end with guards/sentinels and only move a pointer.why do i need a useless argument (position in a string), if i have the string itself, which can be easily sliced? it's not C, i don't need to keep pointer to string head to free() it later.I tend to use either comparisons, regexps, splitting into phrases=20 or matching on the head or tail.regexps and splitting needs indexing, for example. at least for some engines.if my server is just a data storage, i'll pack my data before storing. and if i need to actually *process* my data, i'd prefer not to use variable-length characters. memory is cheap nowdays and what is limiting is network speed. ah, and network throughput. as for servers -- i can use two, or three or n for that matter. smart sharding rocks, hardware is cheap.the whole "let's use utf-8 as internal string representation"=20 was a mistake. and i'm not talking about D here.Not if you want efficient I/O and want to conserve memory (which=20 is what you want in a server).
Sep 07 2014
On Sunday, 7 September 2014 at 14:43:21 UTC, ketmar via Digitalmars-d wrote:variable-length characters. memory is cheap nowdays and what is limiting is network speed. ah, and network throughput. as for servers -- i can use two, or three or n for that matter. smart sharding rocks, hardware is cheap.If speed does not matter then you don't need a system level more convenience. Memory is not so cheap on servers, you also need to take into account that any pressure on memory will increase network traffic because you push data out of the in-memory caches. Server prices on Amazon: t2.micro 1 core 1GiB $51 ~$77 per year t2.small 1 core 2GiB $102 ~$137 per year
Sep 07 2014
On Sun, 07 Sep 2014 15:21:05 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:If speed does not matter then you don't need a system level=20 more convenience.D is not just system-level. and i love D templates and CTFE.Memory is not so cheap on serversif i need such powerful servers for my task, memory cost will not be an issue.you also need to take into=20 account that any pressure on memory will increase network traffic=20 because you push data out of the in-memory caches.ahem... why invalidating caches will increase traffic?Server prices on Amazon: t2.micro 1 core 1GiB $51 ~$77 per year t2.small 1 core 2GiB $102 ~$137 per yearvery cheap even for individual.
Sep 07 2014
On Sunday, 7 September 2014 at 16:09:11 UTC, ketmar via Digitalmars-d wrote:ahem... why invalidating caches will increase traffic?Because you need to fetch data over a network.
Sep 07 2014
On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via=20 Digitalmars-d wrote:but there is no need in extra work actually. using ASCII and English for program UI will work in any encoding.I'm not so convinced that many people would be happy with reduction of they alphabet to ASCII. Some for aesthetics and some for political reasons. Cyrillic, Arabic or Japanese just wouldn't look right anymore. But I figure, your system is 100% English anyways and you have no use for NLS ? :Dindex nth symbol! ucs-4 (aka dchar/dstring) is ok though.Now you mentally map UCS-4 onto your 1-byte encodig and try to see it as the same, just 4 times larger and think that C style indexing solves all use cases. But it doesn't. While Latin places letters in a sequence which you could cut off anywhere, Korean uses blocks containing multiple consonants and vowels. For truncation of text you would be interested in the whole block or "grapheme" not a single vowel/consonant. Am Sun, 07 Sep 2014 10:45:22 +0000 schrieb "Ola Fosheim Gr=C3=B8stad" <ola.fosheim.grostad+dlang gmail.com>:[...] =20 I think the D approach to strings is unpleasant. You should not=20 have slices of strings, only slices of ubyte arrays.Rust does that for at least OS paths.If you want real speedups for streams of symbols you have to move=20 into the landscape of huffman-encoding, tries, dedicated=20 datastructures=E2=80=A6 =20 Having uniform string support in libraries (i.e. only supporting=20 utf-8) is a clear advantage IMO, that will allow for APIs that=20 are SSE backed and performant.--=20 Marco
Sep 07 2014
On Sun, 7 Sep 2014 13:51:23 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:But I figure, your system is 100% English anyways and you have no use for NLS ? :Dno, you aren't right. ;-) i'm not a native English speaker (as you can see from my postings) and my system is bilingual (trilingual, actually). yet i'm using one-byte encoding.But it doesn't. While Latin places letters in a sequence which you could cut off anywhere, Korean uses blocks containing multiple consonants and vowels. For truncation of text you would be interested in the whole block or "grapheme" not a single vowel/consonant.that's problem of Korean language, not mine. ;-) and yes, i know about accents and other unicode stupidness, including that "rtl switch". bwah, just ignore 'em altogether. additionally, ignore everything that is out of first two unicode planes, so we can stick with 0..0xffff range.
Sep 07 2014
On Sunday, 7 September 2014 at 11:42:51 UTC, Marco Leise wrote:Which is a good idea. If you also require SSE alignment (or cache a padding) then you can get efficient operations on it. Modern SIMD can act on 32 bytes in parallel, so libraries that aim for single-char semantics are bound to be under-performing. Which is no good for system level programming.I think the D approach to strings is unpleasant. You should not have slices of strings, only slices of ubyte arrays.Rust does that for at least OS paths.
Sep 07 2014
On Sunday, 7 September 2014 at 14:25:51 UTC, Ola Fosheim Grøstad wrote:Modern SIMD can act on 32 bytes in parallel, so libraries thatActually, latest gen AVX-512 can work on 64 bytes per instruction…
Sep 07 2014
On Saturday, 6 September 2014 at 12:52:19 UTC, ketmar via Digitalmars-d wrote:On Sat, 6 Sep 2014 14:52:50 +0200 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:Does it really matter? I don't really care if some UTF8 encoded text can't be displayed properly because it's written in some Ancient Mayan language and I don't have the right font installed. Even if I did install that font and manage to get it to display correctly - I can't read that language! Learning a language is much harder than installing a font, so it's a pretty safe assumption that if you can read that language you can install the font for it...E.g. Missing symbols are replaced by a square with the hexadecimal code point. So the missing symbol can at least be identified correctly (and a matching font installed).this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Sep 06 2014
On 03/09/2014 13:51, ketmar via Digitalmars-d wrote:anyway, "comma-separated expressions" must be eliminated. ok, let 'em appear in `for()` for a while, but nowhere else.It's better to just disallow them where they are bug prone: http://forum.dlang.org/thread/lgsel0$29fd$1 digitalmars.com
Sep 03 2014
On Wed, 03 Sep 2014 15:33:02 +0100 Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> wrote:It's better to just disallow them where they are bug prone:that is exactly what i mean. `for ()` can be an exception though, to ease porting C/C++ code.
Sep 03 2014
On Wednesday, 3 September 2014 at 14:47:21 UTC, ketmar via Digitalmars-d wrote:On Wed, 03 Sep 2014 15:33:02 +0100 Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> wrote:The conclusion was to make the comma operator return void when compiled with `-w`. I started implementing it here: https://github.com/schuetzm/dmd/commit/33f4af3aee29cb2ced610d57f65c350fab306918 Unfortunately I don't understand DMD's internals enough to finish it, see my last comment on the PR: https://github.com/D-Programming-Language/dmd/pull/3399#issuecomment-41612282 Maybe it is a good starting point for you.It's better to just disallow them where they are bug prone:that is exactly what i mean. `for ()` can be an exception though, to ease porting C/C++ code.
Sep 03 2014
On Wed, 03 Sep 2014 15:20:55 +0000 via Digitalmars-d <digitalmars-d puremagic.com> wrote:Unfortunately I don't understand DMD's internals enough to finish=20 iti'm pretty sure that this can be done with a little hack in parseExpression(): just append 'cast(void)0' to the list of expressions there. something like this: Expression *Parser::parseExpression(bool allowCommas) { Expression *e; Expression *e2; Loc loc =3D token.loc; bool addVoid =3D false; //printf("Parser::parseExpression() loc =3D %d\n", loc.linnum); e =3D parseAssignExp(); while (token.value =3D=3D TOKcomma) { if (!allowCommas) { warning(token.loc, "commas in expression are deprecated"); addVoid =3D true; } nextToken(); e2 =3D parseAssignExp(); e =3D new CommaExp(loc, e, e2); loc =3D token.loc; } if (addVoid) e =3D new CommaExp(loc, e, new CastExp(loc, new Intege= rExp(loc, 0, Type::tint32), Type::tvoid)); return e; } no need to introduce new types and hack semantic analyzer. ;-)
Sep 03 2014
"ketmar via Digitalmars-d" wrote in message news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...i'm pretty sure that this can be done with a little hack in parseExpression(): just append 'cast(void)0' to the list of expressions there.It can be done there, but it would not be 'correct'. The ast should attempt to match the original code as closely as possible, so that differences are minimized in di generation and other things like that.
Sep 03 2014
On Thu, 4 Sep 2014 02:11:50 +1000 Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:It can be done there, but it would not be 'correct'. The ast should attempt to match the original code as closely as possible, so that differences are minimized in di generation and other things like that.=20i can't see any sense in increasing compiler complexity for the feature that will be removed anyway. let it be "hacky", so anyone who stomp on it will be tempted to remove it altogether.
Sep 03 2014
"ketmar via Digitalmars-d" wrote in message news:mailman.366.1409761518.5783.digitalmars-d puremagic.com...i can't see any sense in increasing compiler complexity for the feature that will be removed anyway. let it be "hacky", so anyone who stomp on it will be tempted to remove it altogether.Nothing gets removed quickly (full deprecation cycle is usually > 18 months), and hacky solution are not acceptable in the mainline compiler as the risk of causing new bugs is too high. You increase compiler complexity either way, hacky or not.
Sep 03 2014
On Thu, 4 Sep 2014 02:39:31 +1000 Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:hacky solution are not acceptable in the mainline compiler as the risk of causing new bugs is too high.ah, really. it's more risk in introducing new flag to CommaExp and code to process it than just adding `e =3D new CastExp(loc, e, Type::tvoid);` which uses already tested code. silly me.
Sep 03 2014
On Wednesday, 3 September 2014 at 16:11:39 UTC, Daniel Murphy wrote:"ketmar via Digitalmars-d" wrote in message news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...The funny thing is, I had to split `CommaExp` into two classes not because of the new behaviour, but because the frontend lowers things into comma expressions everywhere. If it didn't do this, the additional class wouldn't even be necessary.i'm pretty sure that this can be done with a little hack in parseExpression(): just append 'cast(void)0' to the list of expressions there.It can be done there, but it would not be 'correct'. The ast should attempt to match the original code as closely as possible, so that differences are minimized in di generation and other things like that.
Sep 03 2014
I changed the PR as suggested by Daniel. Here is the new version: https://github.com/D-Programming-Language/dmd/pull/3943
Sep 03 2014
On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via Digitalmars-d wrote:another code cleanup: removing "comma-separated expressions" from phobos. https://issues.dlang.org/show_bug.cgi?id=13419 there is only one poisoned file in phobos and druntime: std.uni. there are some "return a, b;" abominations. KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS! (found with my "warning on comma-separated expressions" patch, so i can guarantee that nothing is left) KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!https://github.com/D-Programming-Language/phobos/pull/2485
Sep 03 2014