www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - kill the commas! (phobos code cleanup)

reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
another code cleanup: removing "comma-separated expressions" from
phobos.

https://issues.dlang.org/show_bug.cgi?id=3D13419

there is only one poisoned file in phobos and druntime: std.uni. there
are some "return a, b;" abominations.

KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

(found with my "warning on comma-separated expressions" patch, so i can
guarantee that nothing is left)

KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!
Sep 03 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via 
Digitalmars-d wrote:
 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!
:-) Way to go! What's next? The exclamaition points?
Sep 03 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 03 Sep 2014 12:43:50 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!
:-) Way to go! What's next? The exclamaition points?
no, "!" are nice. using "!" is like telling everyone "see, im SOOO cool!" anyway, "comma-separated expressions" must be eliminated. ok, let 'em appear in `for()` for a while, but nowhere else.
Sep 03 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 3 September 2014 at 12:52:07 UTC, ketmar via 
Digitalmars-d wrote:
 no, "!" are nice. using "!" is like telling everyone "see, im 
 SOOO cool!"
So I have to annihilate the exclamation points from templates myself. I'd like to have a unicode alternative such as: temp‹a,b› =>temp!(a,b) temp«a» => temp!("a") and while you are at it, please implement √x and √(x+y) as well. :)
 anyway, "comma-separated expressions" must be eliminated. ok, 
 let
 'em appear in `for()` for a while, but nowhere else.
Yeah, I agree. They are only useful in C macros.
Sep 03 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 03 Sep 2014 13:25:49 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I'd like to have a unicode alternative
ah, i'm not using unicode. my favorite editor (mcedit) is not good with unicode anyway and my locale is not utf. ;-)
Sep 03 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 3 September 2014 at 13:35:48 UTC, ketmar via 
Digitalmars-d wrote:
 ah, i'm not using unicode. my favorite editor (mcedit) is not 
 good with
 unicode anyway and my locale is not utf. ;-)
That sucks! Now I had to do it myself. (I think you should upgrade to a decent editor on a decent OS and save me some unicode-work…;) Implementing "tmpl‹a,b›" and "tmpl«str»" was tedious, but I believe √ and π can be done in a heartbeat.
Sep 03 2014
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09/03/2014 11:38 PM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 [...] I believe √ and π can be done in a heartbeat.
What about π? This is already valid code: auto π=3;
Sep 03 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr wrote:
 What about π? This is already valid code: auto π=3;
Well, it is not for the main branch. I think it is better to have pi as a symbol so you can get rid of trigonometric-functions and get higher precision? Dunno. I see no point in redefining pi…
Sep 03 2014
parent reply "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:
On Wednesday, 3 September 2014 at 22:00:36 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr 
 wrote:
 What about π? This is already valid code: auto π=3;
Well, it is not for the main branch. I think it is better to have pi as a symbol so you can get rid of trigonometric-functions and get higher precision? Dunno. I see no point in redefining pi…
Maybe this would be a nice place to push tau into the library? http://www.tauday.com/
Sep 03 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 3 September 2014 at 22:03:50 UTC, Casper Færgemand 
wrote:
 Maybe this would be a nice place to push tau into the library?
 http://www.tauday.com/
Hm, haven't thought about that before, but I am usually interested in 2π, true. Maybe it is possible to special case "2π" too. Hm.
Sep 03 2014
prev sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 03 Sep 2014 21:38:55 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 That sucks! Now I had to do it myself. (I think you should=20
 upgrade to a decent editor on a decent OS and save me some=20
 unicode-work=E2=80=A6;)
no-no-no-no! utf of any size is boring. i want my strings to be indexable without any hidden function calls! ;-) i even did "native-encoded strings" patch (n"hello!") and made lexer don't complain about bad utf in comments. i love my one-byte locale!
Sep 03 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 4 Sep 2014 00:55:47 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Wed, 03 Sep 2014 21:38:55 +0000
 via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 That sucks! Now I had to do it myself. (I think you should=20
 upgrade to a decent editor on a decent OS and save me some=20
 unicode-work=E2=80=A6;)
no-no-no-no! utf of any size is boring. i want my strings to be indexable without any hidden function calls! ;-) =20 i even did "native-encoded strings" patch (n"hello!") and made lexer don't complain about bad utf in comments. i love my one-byte locale!
But there lies greatness in the unification of all locales into just one. All the need for encodings in HTTP that more often than not were incorrectly declared making browsers guess. Text files that looked like gibberish because they came from DOS or were written in another language that you even happen to speak, but now cannot decypher the byte mess. Or do you remember the mess that happened to file names with accented characters if they were sufficiently often copied between file systems? I'm all for performance, but different encodings on each computing platform and language just didn't work in the globalized world. You are a relic :) --=20 Marco
Sep 06 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 6 Sep 2014 11:48:53 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 You are a relic :)
sure i am! ;-) i'm just waiting for 32-bit bytes. and white bytes are 8-bit, i'll use ebcdic^w one of available one-byte encodings. ;-) btw: are there fonts that can display all unicode? i doubt it (ok, may be one). so we designed the thing that can't really use. ;-)
Sep 06 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 6 September 2014 at 10:20:23 UTC, ketmar via 
Digitalmars-d wrote:
 On Sat, 6 Sep 2014 11:48:53 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 You are a relic :)
sure i am! ;-) i'm just waiting for 32-bit bytes. and white bytes are 8-bit, i'll use ebcdic^w one of available one-byte encodings. ;-)
That sounds so much better than UTF-32.
 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).
Fonts are encoding agnostic, your point is irrelevant.
  so we designed the thing that can't really use. ;-)
We can and do: unicode is the only thing that could process text that could come from any client on earth, without choking on any character. This is all done without the need for font-display, which is on the burden of the final client, and their respective local needs.
Sep 06 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 06 Sep 2014 11:05:13 +0000
monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 That sounds so much better than UTF-32.
why, in the name of hell, do you need UTF-32?! doesn't 0x10000000000000000 chars enough for everyone?!
 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).
Fonts are encoding agnostic, your point is irrelevant.
so where can i download font collection with fonts contains all unicode chars?
 This is all done without the need for font-display
thank you, but i don't need any text i can't display (and therefore read). i bet you don't need to process Thai, for example -- 'cause this requires much more than just character encoding convention. and bytes are encoding-agnostic.
 which is on the burden of the final client, and their respective=20
 local needs.
hm... text processing software developed on systems which can't display processing text? wow! i want two!
Sep 06 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 6 Sep 2014 14:52:19 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 06 Sep 2014 11:05:13 +0000
 monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 That sounds so much better than UTF-32.
why, in the name of hell, do you need UTF-32?! doesn't 0x10000000000000000 chars enough for everyone?! =20
 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).
Fonts are encoding agnostic, your point is irrelevant.
so where can i download font collection with fonts contains all unicode chars? =20
 This is all done without the need for font-display
thank you, but i don't need any text i can't display (and therefore read). i bet you don't need to process Thai, for example -- 'cause this requires much more than just character encoding convention. and bytes are encoding-agnostic. =20
 which is on the burden of the final client, and their respective=20
 local needs.
hm... text processing software developed on systems which can't display processing text? wow! i want two!
Dude! This is handled the same way sound fonts for Midi did it. You can mix and match fonts to create the complete experience. If your version of "Arial" doesn't come with Thai symbols, you just install _any_ Asian font which includes those and it will automatically be used in places where your favorite font lacks symbols. Read this Wikipedia article from 2005 on it: http://en.wikipedia.org/wiki/Fallback_font In practice it is a solved problem, as you can see in your browser when you load a web site with mixed writing systems. If all else fails, there is usually something like this in place: http://scripts.sil.org/cms/scripts/page.php?site_id=3Dnrsi&id=3DUnicodeBMPF= allbackFont E.g. Missing symbols are replaced by a square with the hexadecimal code point. So the missing symbol can at least be identified correctly (and a matching font installed). --=20 Marco
Sep 06 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 6 Sep 2014 14:52:50 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 In practice it is a solved problem, as you can see in your
 browser when you load a web site with mixed writing systems.
and hurts my eyes. i have a little background in typography, and mixing different fonts makes my eyes bleed.
 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).
this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Sep 06 2014
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 6 Sep 2014 15:52:09 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 6 Sep 2014 14:52:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 In practice it is a solved problem, as you can see in your
 browser when you load a web site with mixed writing systems.
and hurts my eyes. i have a little background in typography, and mixing different fonts makes my eyes bleed.
Japanese and Latin are already so far apart that the font doesn't make much of a difference anymore, so long as it has similar size and hinting options. As for mixing writing systems there are of course dozens of use cases. Presenting an English website with links to localized versions labeled with each language's name, programs dealing with mathematical/technical symbols can use regular text allowing for easy copy&paste, instead of resorting to bitmaps, e.g. for logical OR or Greek variables. And to make your eyes bleed even more here is a Cyrillic Wikipedia article on Mao Tse-Tung, using traditional and simplified versions of his name in Chinese and the two transliterations to Latin according to Pinyin and Wade-Giles: https://ru.wikipedia.org/wiki/=D0=9C=D0=B0=D0=BE_=D0=A6=D0=B7=D1=8D=D0=B4= =D1=83=D0=BD
 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).
this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Yes, but why do you prefer garbled symbols incorrectly mapped to your native encoding or even invalid characters silently removed ? Do you understand that with the symbols displayed as code points you still have all the information even if it doesn't look readable immediately ? It offers you new options: * You can copy and paste the text into an online translator to get an idea of what the text says. * You can enter the code into a tool that tells you which script it is from and then look for a font that contains that script to get an acceptable display. --=20 Marco
Sep 06 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 6 Sep 2014 16:38:50 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Yes, but why do you prefer garbled symbols incorrectly mapped
 to your native encoding or even invalid characters silently
 removed ?
i prefer to not read the text i cannot understand. there is zero information in Chinese, or Thai, or even Spanish for me. those texts looks (for me) like gibberish anyway. so i don't care if they are displayed correctly or not. that's why i using one-byte encoding and happy with it.
 Do you understand that with the symbols displayed as code
 points you still have all the information even if it doesn't
 look readable immediately ?
no, i don't understand this. for me Chinese glyph and abstract painting is the same. and simple box, for that matter.
 It offers you new options:
only one: trying to paste URL to google translate and then trying to make sense from GT output. and i don't care what encoding was used for page in this case.
Sep 06 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 6 Sep 2014 17:51:23 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 6 Sep 2014 16:38:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 Yes, but why do you prefer garbled symbols incorrectly mapped
 to your native encoding or even invalid characters silently
 removed ?
i prefer to not read the text i cannot understand. there is zero information in Chinese, or Thai, or even Spanish for me. those texts looks (for me) like gibberish anyway. so i don't care if they are displayed correctly or not. that's why i using one-byte encoding and happy with it. =20
 Do you understand that with the symbols displayed as code
 points you still have all the information even if it doesn't
 look readable immediately ?
no, i don't understand this. for me Chinese glyph and abstract painting is the same. and simple box, for that matter. =20
 It offers you new options:
only one: trying to paste URL to google translate and then trying to make sense from GT output. and i don't care what encoding was used for page in this case.
So because you see no use for Unicode (which is hard to believe considering all the places where localized strings may be used), everyone has to keep supporting hacks to guess text encodings or NFC normalize and convert strings to the system locale that go to the terminal. Thanks for the extra work :p --=20 Marco
Sep 06 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 7 Sep 2014 01:57:12 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 So because you see no use for Unicode (which is hard to
 believe considering all the places where localized strings
 may be used), everyone has to keep supporting hacks to guess
ah, i hate so-called "nls" too. and smart programs that tries to mess with my input bytes. i'm giving you byte -- you store it. that's all. no utf-8, no checks, no translations. byte in -- same byte out.
 Thanks for the extrawork :p
be my guest. ;-) but there is no need in extra work actually. using ASCII and English for program UI will work in any encoding. and any byte with high bit set should not be interpreted in any way. it's much easier than utf, you see? ;-)
Sep 07 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 08:35:13 UTC, ketmar via 
Digitalmars-d wrote:
 but there is no need in extra work actually. using ASCII and 
 English
 for program UI will work in any encoding. and any byte with 
 high bit
 set should not be interpreted in any way. it's much easier than 
 utf,
 you see? ;-)
Whatever your motivation is, I'd say utf-8 is a blessing, and I personally see no reason for supporting any other encoding (not even utf-16 or utf-32). utf-8 combined with unique ref-counted immutable short strings is quite acceptable IMO (you can compare non-equality by address only). D needs an efficient implementation of it, that's all.
Sep 07 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 07 Sep 2014 10:21:48 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 D needs an efficient implementation of it, that's all.
"efficient" and "utf-8" can't play together. in C i must scan the whole string to get it length, but with utf-8 i must scan the string just to index nth symbol! ucs-4 (aka dchar/dstring) is ok though.
Sep 07 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via 
Digitalmars-d wrote:
 index nth symbol! ucs-4 (aka dchar/dstring) is ok though.
For western text strings utf-8 is much better due to cache efficiency. You can speed it up using SSE or dedicated datastructures. The point of having unique immutable strings is that they compare by reference only and that you can have auxillary datastructures that classify them if needed. I think the D approach to strings is unpleasant. You should not have slices of strings, only slices of ubyte arrays. If you want real speedups for streams of symbols you have to move into the landscape of huffman-encoding, tries, dedicated datastructures… Having uniform string support in libraries (i.e. only supporting utf-8) is a clear advantage IMO, that will allow for APIs that are SSE backed and performant.
Sep 07 2014
next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 07 Sep 2014 10:45:22 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 For western text strings utf-8 is much better due to cache=20
 efficiency. You can speed it up using SSE or dedicated=20
 datastructures.
that's what i call efficiency! using SIMD for string indexing!
 The point of having unique immutable strings is that they compare=20
 by reference only and that you can have auxillary datastructures=20
 that classify them if needed.
and this fill fail with compacting gc. heh.
 I think the D approach to strings is unpleasant. You should not=20
 have slices of strings, only slices of ubyte arrays.
oh, no, thanks. casting strings back and forth for slicing is not fun. and writing parsers using string slicing is fun.
 If you want real speedups for streams of symbols you have to move=20
 into the landscape of huffman-encoding, tries, dedicated=20
 datastructures=E2=80=A6
or just ditch utf-8 and use ucs-4. this will speedup the most frequently string operations: correct indexing and slicing.
 Having uniform string support in libraries (i.e. only supporting=20
 utf-8) is a clear advantage IMO, that will allow for APIs that=20
 are SSE backed and performant.
utf-8 was not invented as encoding for internal string representation. it's merely for data interchange. i myself believe that language should not do any encoding/decoding on given string without explicit asking. i.e. `foreach (dchar ch; s)` must be the same as `foreach (char ch; s)` when s is `string`. for any decoding i must use `foreach (ch; s.byUtf8Char)= `. the whole "let's use utf-8 as internal string representation" was a mistake. and i'm not talking about D here.
Sep 07 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 11:31:01 UTC, ketmar via 
Digitalmars-d wrote:
 oh, no, thanks. casting strings back and forth for slicing is 
 not fun.
 and writing parsers using string slicing is fun.
Uhm, why? For a parser you are generally better off padding the end with guards/sentinels and only move a pointer.
 or just ditch utf-8 and use ucs-4. this will speedup the most
 frequently string operations: correct indexing and slicing.
I almost never use indexing of strings. I tend to use either comparisons, regexps, splitting into phrases or matching on the head or tail.
 the whole "let's use utf-8 as internal string representation" 
 was a mistake. and i'm not talking about D here.
Not if you want efficient I/O and want to conserve memory (which is what you want in a server).
Sep 07 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 07 Sep 2014 14:08:30 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Uhm, why? For a parser you are generally better off padding the=20
 end with guards/sentinels and only move a pointer.
why do i need a useless argument (position in a string), if i have the string itself, which can be easily sliced? it's not C, i don't need to keep pointer to string head to free() it later.
 I tend to use either comparisons, regexps, splitting into phrases=20
 or matching on the head or tail.
regexps and splitting needs indexing, for example. at least for some engines.
 the whole "let's use utf-8 as internal string representation"=20
 was a mistake. and i'm not talking about D here.
Not if you want efficient I/O and want to conserve memory (which=20 is what you want in a server).
if my server is just a data storage, i'll pack my data before storing. and if i need to actually *process* my data, i'd prefer not to use variable-length characters. memory is cheap nowdays and what is limiting is network speed. ah, and network throughput. as for servers -- i can use two, or three or n for that matter. smart sharding rocks, hardware is cheap.
Sep 07 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 14:43:21 UTC, ketmar via 
Digitalmars-d wrote:
 variable-length characters. memory is cheap nowdays and what is
 limiting is network speed. ah, and network throughput. as for 
 servers
 -- i can use two, or three or n for that matter. smart sharding 
 rocks,
 hardware is cheap.
If speed does not matter then you don't need a system level more convenience. Memory is not so cheap on servers, you also need to take into account that any pressure on memory will increase network traffic because you push data out of the in-memory caches. Server prices on Amazon: t2.micro 1 core 1GiB $51 ~$77 per year t2.small 1 core 2GiB $102 ~$137 per year
Sep 07 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 07 Sep 2014 15:21:05 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 If speed does not matter then you don't need a system level=20

 more convenience.
D is not just system-level. and i love D templates and CTFE.
 Memory is not so cheap on servers
if i need such powerful servers for my task, memory cost will not be an issue.
 you also need to take into=20
 account that any pressure on memory will increase network traffic=20
 because you push data out of the in-memory caches.
ahem... why invalidating caches will increase traffic?
 Server prices on Amazon:
 t2.micro 1 core 1GiB $51 ~$77 per year
 t2.small 1 core 2GiB $102	 ~$137 per year
very cheap even for individual.
Sep 07 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 16:09:11 UTC, ketmar via 
Digitalmars-d wrote:
 ahem... why invalidating caches will increase traffic?
Because you need to fetch data over a network.
Sep 07 2014
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via=20
Digitalmars-d wrote:
 but there is no need in extra work actually. using ASCII
 and English for program UI will work in any encoding.
I'm not so convinced that many people would be happy with reduction of they alphabet to ASCII. Some for aesthetics and some for political reasons. Cyrillic, Arabic or Japanese just wouldn't look right anymore. But I figure, your system is 100% English anyways and you have no use for NLS ? :D
 index nth symbol! ucs-4 (aka dchar/dstring) is ok though.
Now you mentally map UCS-4 onto your 1-byte encodig and try to see it as the same, just 4 times larger and think that C style indexing solves all use cases. But it doesn't. While Latin places letters in a sequence which you could cut off anywhere, Korean uses blocks containing multiple consonants and vowels. For truncation of text you would be interested in the whole block or "grapheme" not a single vowel/consonant. Am Sun, 07 Sep 2014 10:45:22 +0000 schrieb "Ola Fosheim Gr=C3=B8stad" <ola.fosheim.grostad+dlang gmail.com>:
 [...]
=20
 I think the D approach to strings is unpleasant. You should not=20
 have slices of strings, only slices of ubyte arrays.
Rust does that for at least OS paths.
 If you want real speedups for streams of symbols you have to move=20
 into the landscape of huffman-encoding, tries, dedicated=20
 datastructures=E2=80=A6
=20
 Having uniform string support in libraries (i.e. only supporting=20
 utf-8) is a clear advantage IMO, that will allow for APIs that=20
 are SSE backed and performant.
--=20 Marco
Sep 07 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 7 Sep 2014 13:51:23 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 But I figure, your system is 100%
 English anyways and you have no use for NLS ? :D
no, you aren't right. ;-) i'm not a native English speaker (as you can see from my postings) and my system is bilingual (trilingual, actually). yet i'm using one-byte encoding.
 But it doesn't. While Latin places letters in a sequence which
 you could cut off anywhere, Korean uses blocks containing
 multiple consonants and vowels. For truncation of text you
 would be interested in the whole block or "grapheme" not a
 single vowel/consonant.
that's problem of Korean language, not mine. ;-) and yes, i know about accents and other unicode stupidness, including that "rtl switch". bwah, just ignore 'em altogether. additionally, ignore everything that is out of first two unicode planes, so we can stick with 0..0xffff range.
Sep 07 2014
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 11:42:51 UTC, Marco Leise wrote:
 I think the D approach to strings is unpleasant. You should 
 not have slices of strings, only slices of ubyte arrays.
Rust does that for at least OS paths.
Which is a good idea. If you also require SSE alignment (or cache a padding) then you can get efficient operations on it. Modern SIMD can act on 32 bytes in parallel, so libraries that aim for single-char semantics are bound to be under-performing. Which is no good for system level programming.
Sep 07 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 7 September 2014 at 14:25:51 UTC, Ola Fosheim Grøstad 
wrote:
 Modern SIMD can act on 32 bytes in parallel, so libraries that
Actually, latest gen AVX-512 can work on 64 bytes per instruction…
Sep 07 2014
prev sibling parent "Idan Arye" <GenericNPC gmail.com> writes:
On Saturday, 6 September 2014 at 12:52:19 UTC, ketmar via 
Digitalmars-d wrote:
 On Sat, 6 Sep 2014 14:52:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:
 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).
this can't help me reading texts. really, i'm not a computer, i don't remember which unicode number corresponds to which symbol.
Does it really matter? I don't really care if some UTF8 encoded text can't be displayed properly because it's written in some Ancient Mayan language and I don't have the right font installed. Even if I did install that font and manage to get it to display correctly - I can't read that language! Learning a language is much harder than installing a font, so it's a pretty safe assumption that if you can read that language you can install the font for it...
Sep 06 2014
prev sibling parent reply Nick Treleaven <ntrel-public yahoo.co.uk> writes:
On 03/09/2014 13:51, ketmar via Digitalmars-d wrote:
 anyway, "comma-separated expressions" must be eliminated. ok, let
 'em appear in `for()` for a while, but nowhere else.
It's better to just disallow them where they are bug prone: http://forum.dlang.org/thread/lgsel0$29fd$1 digitalmars.com
Sep 03 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 03 Sep 2014 15:33:02 +0100
Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It's better to just disallow them where they are bug prone:
that is exactly what i mean. `for ()` can be an exception though, to ease porting C/C++ code.
Sep 03 2014
parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Wednesday, 3 September 2014 at 14:47:21 UTC, ketmar via 
Digitalmars-d wrote:
 On Wed, 03 Sep 2014 15:33:02 +0100
 Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 It's better to just disallow them where they are bug prone:
that is exactly what i mean. `for ()` can be an exception though, to ease porting C/C++ code.
The conclusion was to make the comma operator return void when compiled with `-w`. I started implementing it here: https://github.com/schuetzm/dmd/commit/33f4af3aee29cb2ced610d57f65c350fab306918 Unfortunately I don't understand DMD's internals enough to finish it, see my last comment on the PR: https://github.com/D-Programming-Language/dmd/pull/3399#issuecomment-41612282 Maybe it is a good starting point for you.
Sep 03 2014
next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 03 Sep 2014 15:20:55 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Unfortunately I don't understand DMD's internals enough to finish=20
 it
i'm pretty sure that this can be done with a little hack in parseExpression(): just append 'cast(void)0' to the list of expressions there. something like this: Expression *Parser::parseExpression(bool allowCommas) { Expression *e; Expression *e2; Loc loc =3D token.loc; bool addVoid =3D false; //printf("Parser::parseExpression() loc =3D %d\n", loc.linnum); e =3D parseAssignExp(); while (token.value =3D=3D TOKcomma) { if (!allowCommas) { warning(token.loc, "commas in expression are deprecated"); addVoid =3D true; } nextToken(); e2 =3D parseAssignExp(); e =3D new CommaExp(loc, e, e2); loc =3D token.loc; } if (addVoid) e =3D new CommaExp(loc, e, new CastExp(loc, new Intege= rExp(loc, 0, Type::tint32), Type::tvoid)); return e; } no need to introduce new types and hack semantic analyzer. ;-)
Sep 03 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"ketmar via Digitalmars-d"  wrote in message 
news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...

 i'm pretty sure that this can be done with a little hack in
 parseExpression(): just append 'cast(void)0' to the list of expressions
 there.
It can be done there, but it would not be 'correct'. The ast should attempt to match the original code as closely as possible, so that differences are minimized in di generation and other things like that.
Sep 03 2014
next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thu, 4 Sep 2014 02:11:50 +1000
Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It can be done there, but it would not be 'correct'.  The ast should
 attempt to match the original code as closely as possible, so that
 differences are minimized in di generation and other things like
 that.=20
i can't see any sense in increasing compiler complexity for the feature that will be removed anyway. let it be "hacky", so anyone who stomp on it will be tempted to remove it altogether.
Sep 03 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"ketmar via Digitalmars-d"  wrote in message 
news:mailman.366.1409761518.5783.digitalmars-d puremagic.com...

 i can't see any sense in increasing compiler complexity for the feature
 that will be removed anyway. let it be "hacky", so anyone who stomp on
 it will be tempted to remove it altogether.
Nothing gets removed quickly (full deprecation cycle is usually > 18 months), and hacky solution are not acceptable in the mainline compiler as the risk of causing new bugs is too high. You increase compiler complexity either way, hacky or not.
Sep 03 2014
parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thu, 4 Sep 2014 02:39:31 +1000
Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 hacky solution are not acceptable in the mainline
 compiler as the risk of causing new bugs is too high.
ah, really. it's more risk in introducing new flag to CommaExp and code to process it than just adding `e =3D new CastExp(loc, e, Type::tvoid);` which uses already tested code. silly me.
Sep 03 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Wednesday, 3 September 2014 at 16:11:39 UTC, Daniel Murphy 
wrote:
 "ketmar via Digitalmars-d"  wrote in message 
 news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...

 i'm pretty sure that this can be done with a little hack in
 parseExpression(): just append 'cast(void)0' to the list of 
 expressions
 there.
It can be done there, but it would not be 'correct'. The ast should attempt to match the original code as closely as possible, so that differences are minimized in di generation and other things like that.
The funny thing is, I had to split `CommaExp` into two classes not because of the new behaviour, but because the frontend lowers things into comma expressions everywhere. If it didn't do this, the additional class wouldn't even be necessary.
Sep 03 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
I changed the PR as suggested by Daniel. Here is the new version:

https://github.com/D-Programming-Language/dmd/pull/3943
Sep 03 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via 
Digitalmars-d wrote:
 another code cleanup: removing "comma-separated expressions" 
 from
 phobos.

 https://issues.dlang.org/show_bug.cgi?id=13419

 there is only one poisoned file in phobos and druntime: 
 std.uni. there
 are some "return a, b;" abominations.

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

 (found with my "warning on comma-separated expressions" patch, 
 so i can
 guarantee that nothing is left)

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!
https://github.com/D-Programming-Language/phobos/pull/2485
Sep 03 2014