digitalmars.D - kill the commas! (phobos code cleanup)

ketmar via Digitalmars-d (9/9) Sep 03 2014 another code cleanup: removing "comma-separated expressions" from

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 03 2014 :-) Way to go! What's next? The exclamaition points?

ketmar via Digitalmars-d (6/8) Sep 03 2014 no, "!" are nice. using "!" is like telling everyone "see, im SOOO

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/14) Sep 03 2014 So I have to annihilate the exclamation points from templates

ketmar via Digitalmars-d (4/5) Sep 03 2014 ah, i'm not using unicode. my favorite editor (mcedit) is not good with

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Sep 03 2014 That sucks! Now I had to do it myself. (I think you should

Timon Gehr (3/4) Sep 03 2014 What about π? This is already valid code: auto π=3;

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Sep 03 2014 Well, it is not for the main branch. I think it is better to have

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> (4/11) Sep 03 2014 Maybe this would be a nice place to push tau into the library?

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/7) Sep 03 2014 Hm, haven't thought about that before, but I am usually

ketmar via Digitalmars-d (6/9) Sep 03 2014 no-no-no-no! utf of any size is boring. i want my strings to be

Marco Leise (16/27) Sep 06 2014 But there lies greatness in the unification of all locales

ketmar via Digitalmars-d (7/8) Sep 06 2014 sure i am! ;-)

monarch_dodra (9/20) Sep 06 2014 That sounds so much better than UTF-32.

ketmar via Digitalmars-d (12/19) Sep 06 2014 why, in the name of hell, do you need UTF-32?! doesn't

Marco Leise (20/43) Sep 06 2014 Dude! This is handled the same way sound fonts for Midi did

ketmar via Digitalmars-d (6/11) Sep 06 2014 and hurts my eyes. i have a little background in typography, and mixing

Marco Leise (31/43) Sep 06 2014 Japanese and Latin are already so far apart that the font

ketmar via Digitalmars-d (12/19) Sep 06 2014 i prefer to not read the text i cannot understand. there is zero

Marco Leise (10/32) Sep 06 2014 So because you see no use for Unicode (which is hard to

ketmar via Digitalmars-d (10/14) Sep 07 2014 ah, i hate so-called "nls" too. and smart programs that tries to mess

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/15) Sep 07 2014 Whatever your motivation is, I'd say utf-8 is a blessing, and I

ketmar via Digitalmars-d (5/6) Sep 07 2014 "efficient" and "utf-8" can't play together. in C i must scan the whole

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/17) Sep 07 2014 For western text strings utf-8 is much better due to cache

ketmar via Digitalmars-d (16/30) Sep 07 2014 that's what i call efficiency! using SIMD for string indexing!

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/16) Sep 07 2014 Uhm, why? For a parser you are generally better off padding the

ketmar via Digitalmars-d (13/21) Sep 07 2014 why do i need a useless argument (position in a string), if i have the

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/17) Sep 07 2014 If speed does not matter then you don't need a system level

ketmar via Digitalmars-d (7/17) Sep 07 2014 D is not just system-level. and i love D templates and CTFE.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 07 2014 Because you need to fetch data over a network.

Marco Leise (21/35) Sep 07 2014 I'm not so convinced that many people would be happy with

ketmar via Digitalmars-d (10/17) Sep 07 2014 no, you aren't right. ;-) i'm not a native English speaker (as you can
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/9) Sep 07 2014 Which is a good idea. If you also require SSE alignment (or cache

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Sep 07 2014 Actually, latest gen AVX-512 can work on 64 bytes per instruction…

Idan Arye (10/19) Sep 06 2014 Does it really matter? I don't really care if some UTF8 encoded

Nick Treleaven (3/5) Sep 03 2014 It's better to just disallow them where they are bug prone:

ketmar via Digitalmars-d (4/5) Sep 03 2014 that is exactly what i mean. `for ()` can be an exception though, to

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/16) Sep 03 2014 The conclusion was to make the comma operator return void when

ketmar via Digitalmars-d (31/33) Sep 03 2014 i'm pretty sure that this can be done with a little hack in

Daniel Murphy (5/8) Sep 03 2014 It can be done there, but it would not be 'correct'. The ast should att...

ketmar via Digitalmars-d (5/9) Sep 03 2014 i can't see any sense in increasing compiler complexity for the feature

Daniel Murphy (6/9) Sep 03 2014 Nothing gets removed quickly (full deprecation cycle is usually > 18

ketmar via Digitalmars-d (5/7) Sep 03 2014 ah, really. it's more risk in introducing new flag to CommaExp and code

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/16) Sep 03 2014 The funny thing is, I had to split `CommaExp` into two classes

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/2) Sep 03 2014 I changed the PR as suggested by Daniel. Here is the new version:

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/15) Sep 03 2014 https://github.com/D-Programming-Language/phobos/pull/2485

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

another code cleanup: removing "comma-separated expressions" from
phobos.

https://issues.dlang.org/show_bug.cgi?id=3D13419

there is only one poisoned file in phobos and druntime: std.uni. there
are some "return a, b;" abominations.

KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

(found with my "warning on comma-separated expressions" patch, so i can
guarantee that nothing is left)

KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

Sep 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via 
Digitalmars-d wrote:
 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

:-) Way to go! What's next? The exclamaition points?

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 03 Sep 2014 12:43:50 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

 :-) Way to go! What's next? The exclamaition points?

no, "!" are nice. using "!" is like telling everyone "see, im SOOO
cool!"

anyway, "comma-separated expressions" must be eliminated. ok, let
'em appear in `for()` for a while, but nowhere else.

Sep 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 3 September 2014 at 12:52:07 UTC, ketmar via 
Digitalmars-d wrote:
 no, "!" are nice. using "!" is like telling everyone "see, im 
 SOOO cool!"

So I have to annihilate the exclamation points from templates 
myself. I'd like to have a unicode alternative such as:

temp‹a,b› =>temp!(a,b)

temp«a» => temp!("a")

and while you are at it, please implement √x and √(x+y) as well. 
:)

 anyway, "comma-separated expressions" must be eliminated. ok, 
 let
 'em appear in `for()` for a while, but nowhere else.

Yeah, I agree. They are only useful in C macros.

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 03 Sep 2014 13:25:49 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I'd like to have a unicode alternative

ah, i'm not using unicode. my favorite editor (mcedit) is not good with
unicode anyway and my locale is not utf. ;-)

Sep 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 3 September 2014 at 13:35:48 UTC, ketmar via 
Digitalmars-d wrote:
 ah, i'm not using unicode. my favorite editor (mcedit) is not 
 good with
 unicode anyway and my locale is not utf. ;-)

That sucks! Now I had to do it myself. (I think you should 
upgrade to a decent editor on a decent OS and save me some 
unicode-work…;)

Implementing "tmpl‹a,b›" and "tmpl«str»" was tedious, but I 
believe √ and π can be done in a heartbeat.

Sep 03 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/03/2014 11:38 PM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 [...] I believe √ and π can be done in a heartbeat.

What about π? This is already valid code: auto π=3;

Sep 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr wrote:
 What about π? This is already valid code: auto π=3;

Well, it is not for the main branch. I think it is better to have 
pi as a symbol so you can get rid of trigonometric-functions and 
get higher precision? Dunno. I see no point in redefining pi…

Sep 03 2014

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:

On Wednesday, 3 September 2014 at 22:00:36 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 3 September 2014 at 21:45:16 UTC, Timon Gehr 
 wrote:
 What about π? This is already valid code: auto π=3;

 Well, it is not for the main branch. I think it is better to 
 have pi as a symbol so you can get rid of 
 trigonometric-functions and get higher precision? Dunno. I see 
 no point in redefining pi…

Maybe this would be a nice place to push tau into the library?
http://www.tauday.com/

Sep 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 3 September 2014 at 22:03:50 UTC, Casper Færgemand 
wrote:
 Maybe this would be a nice place to push tau into the library?
 http://www.tauday.com/

Hm, haven't thought about that before, but I am usually 
interested in 2π, true. Maybe it is possible to special case "2π" 
too. Hm.

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 03 Sep 2014 21:38:55 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 That sucks! Now I had to do it myself. (I think you should=20
 upgrade to a decent editor on a decent OS and save me some=20
 unicode-work=E2=80=A6;)

no-no-no-no! utf of any size is boring. i want my strings to be
indexable without any hidden function calls! ;-)

i even did "native-encoded strings" patch (n"hello!") and made lexer
don't complain about bad utf in comments. i love my one-byte locale!

Sep 03 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Thu, 4 Sep 2014 00:55:47 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Wed, 03 Sep 2014 21:38:55 +0000
 via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 That sucks! Now I had to do it myself. (I think you should=20
 upgrade to a decent editor on a decent OS and save me some=20
 unicode-work=E2=80=A6;)

 no-no-no-no! utf of any size is boring. i want my strings to be
 indexable without any hidden function calls! ;-)
=20
 i even did "native-encoded strings" patch (n"hello!") and made lexer
 don't complain about bad utf in comments. i love my one-byte locale!

But there lies greatness in the unification of all locales
into just one. All the need for encodings in HTTP that more
often than not were incorrectly declared making browsers guess.
Text files that looked like gibberish because they came from
DOS or were written in another language that you even happen to
speak, but now cannot decypher the byte mess. Or do you
remember the mess that happened to file names with accented
characters if they were sufficiently often copied between file
systems?
I'm all for performance, but different encodings on each
computing platform and language just didn't work in the
globalized world. You are a relic :)

--=20
Marco

Sep 06 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, 6 Sep 2014 11:48:53 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 You are a relic :)

sure i am! ;-)
i'm just waiting for 32-bit bytes. and white bytes are 8-bit, i'll use
ebcdic^w one of available one-byte encodings. ;-)

btw: are there fonts that can display all unicode? i doubt it (ok, may
be one). so we designed the thing that can't really use. ;-)

Sep 06 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 6 September 2014 at 10:20:23 UTC, ketmar via 
Digitalmars-d wrote:
 On Sat, 6 Sep 2014 11:48:53 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 You are a relic :)

 sure i am! ;-)
 i'm just waiting for 32-bit bytes. and white bytes are 8-bit, 
 i'll use
 ebcdic^w one of available one-byte encodings. ;-)

That sounds so much better than UTF-32.

 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).

Fonts are encoding agnostic, your point is irrelevant.

  so we designed the thing that can't really use. ;-)

We can and do: unicode is the only thing that could process text 
that could come from any client on earth, without choking on any 
character. This is all done without the need for font-display, 
which is on the burden of the final client, and their respective 
local needs.

Sep 06 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, 06 Sep 2014 11:05:13 +0000
monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 That sounds so much better than UTF-32.

why, in the name of hell, do you need UTF-32?! doesn't
0x10000000000000000 chars enough for everyone?!

 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).

 Fonts are encoding agnostic, your point is irrelevant.

so where can i download font collection with fonts contains all unicode
chars?

 This is all done without the need for font-display

thank you, but i don't need any text i can't display (and therefore
read). i bet you don't need to process Thai, for example -- 'cause this
requires much more than just character encoding convention. and bytes
are encoding-agnostic.

 which is on the burden of the final client, and their respective=20
 local needs.

hm... text processing software developed on systems which can't display
processing text? wow! i want two!

Sep 06 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 6 Sep 2014 14:52:19 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 06 Sep 2014 11:05:13 +0000
 monarch_dodra via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 That sounds so much better than UTF-32.

 why, in the name of hell, do you need UTF-32?! doesn't
 0x10000000000000000 chars enough for everyone?!
=20
 btw: are there fonts that can display all unicode?
 i doubt it (ok, maybe one).

 Fonts are encoding agnostic, your point is irrelevant.

 so where can i download font collection with fonts contains all unicode
 chars?
=20
 This is all done without the need for font-display

 thank you, but i don't need any text i can't display (and therefore
 read). i bet you don't need to process Thai, for example -- 'cause this
 requires much more than just character encoding convention. and bytes
 are encoding-agnostic.
=20
 which is on the burden of the final client, and their respective=20
 local needs.

 hm... text processing software developed on systems which can't display
 processing text? wow! i want two!

Dude! This is handled the same way sound fonts for Midi did
it. You can mix and match fonts to create the complete
experience. If your version of "Arial" doesn't come with
Thai symbols, you just install _any_ Asian font which includes
those and it will automatically be used in places where your
favorite font lacks symbols. Read this Wikipedia article from
2005 on it: http://en.wikipedia.org/wiki/Fallback_font
In practice it is a solved problem, as you can see in your
browser when you load a web site with mixed writing systems.

If all else fails, there is usually something like this in
place:
http://scripts.sil.org/cms/scripts/page.php?site_id=3Dnrsi&id=3DUnicodeBMPF=
allbackFont
E.g. Missing symbols are replaced by a square with the
hexadecimal code point. So the missing symbol can at least be
identified correctly (and a matching font installed).

--=20
Marco

Sep 06 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, 6 Sep 2014 14:52:50 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 In practice it is a solved problem, as you can see in your
 browser when you load a web site with mixed writing systems.

and hurts my eyes. i have a little background in typography, and mixing
different fonts makes my eyes bleed.

 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).

this can't help me reading texts. really, i'm not a computer, i don't
remember which unicode number corresponds to which symbol.

Sep 06 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 6 Sep 2014 15:52:09 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 6 Sep 2014 14:52:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 In practice it is a solved problem, as you can see in your
 browser when you load a web site with mixed writing systems.

 and hurts my eyes. i have a little background in typography, and mixing
 different fonts makes my eyes bleed.

Japanese and Latin are already so far apart that the font
doesn't make much of a difference anymore, so long as it has
similar size and hinting options. As for mixing writing
systems there are of course dozens of use cases. Presenting an
English website with links to localized versions labeled with
each language's name, programs dealing with
mathematical/technical symbols can use regular text allowing
for easy copy&paste, instead of resorting to bitmaps, e.g.
for logical OR or Greek variables. And to make your eyes
bleed even more here is a Cyrillic Wikipedia article on Mao
Tse-Tung, using traditional and simplified versions of his
name in Chinese and the two transliterations to Latin according
to Pinyin and Wade-Giles:
https://ru.wikipedia.org/wiki/=D0=9C=D0=B0=D0=BE_=D0=A6=D0=B7=D1=8D=D0=B4=
=D1=83=D0=BD

 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).

 this can't help me reading texts. really, i'm not a computer, i don't
 remember which unicode number corresponds to which symbol.

Yes, but why do you prefer garbled symbols incorrectly mapped
to your native encoding or even invalid characters silently
removed ?
Do you understand that with the symbols displayed as code
points you still have all the information even if it doesn't
look readable immediately ?
It offers you new options:
* You can copy and paste the text into an online translator to
  get an idea of what the text says.
* You can enter the code into a tool that tells you which
  script it is from and then look for a font that contains
  that script to get an acceptable display.

--=20
Marco

Sep 06 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, 6 Sep 2014 16:38:50 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Yes, but why do you prefer garbled symbols incorrectly mapped
 to your native encoding or even invalid characters silently
 removed ?

i prefer to not read the text i cannot understand. there is zero
information in Chinese, or Thai, or even Spanish for me. those texts
looks (for me) like gibberish anyway. so i don't care if they are
displayed correctly or not. that's why i using one-byte encoding and
happy with it.

 Do you understand that with the symbols displayed as code
 points you still have all the information even if it doesn't
 look readable immediately ?

no, i don't understand this. for me Chinese glyph and abstract painting
is the same. and simple box, for that matter.

 It offers you new options:

only one: trying to paste URL to google translate and then trying to
make sense from GT output. and i don't care what encoding was used for
page in this case.

Sep 06 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 6 Sep 2014 17:51:23 +0300
schrieb ketmar via Digitalmars-d <digitalmars-d puremagic.com>:

 On Sat, 6 Sep 2014 16:38:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:
=20
 Yes, but why do you prefer garbled symbols incorrectly mapped
 to your native encoding or even invalid characters silently
 removed ?

 i prefer to not read the text i cannot understand. there is zero
 information in Chinese, or Thai, or even Spanish for me. those texts
 looks (for me) like gibberish anyway. so i don't care if they are
 displayed correctly or not. that's why i using one-byte encoding and
 happy with it.
=20
 Do you understand that with the symbols displayed as code
 points you still have all the information even if it doesn't
 look readable immediately ?

 no, i don't understand this. for me Chinese glyph and abstract painting
 is the same. and simple box, for that matter.
=20
 It offers you new options:

 only one: trying to paste URL to google translate and then trying to
 make sense from GT output. and i don't care what encoding was used for
 page in this case.

So because you see no use for Unicode (which is hard to
believe considering all the places where localized strings
may be used), everyone has to keep supporting hacks to guess
text encodings or NFC normalize and convert strings to the
system locale that go to the terminal. Thanks for the extra
work :p

--=20
Marco

Sep 06 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 7 Sep 2014 01:57:12 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 So because you see no use for Unicode (which is hard to
 believe considering all the places where localized strings
 may be used), everyone has to keep supporting hacks to guess

ah, i hate so-called "nls" too. and smart programs that tries to mess
with my input bytes. i'm giving you byte -- you store it. that's all.
no utf-8, no checks, no translations. byte in -- same byte out.

 Thanks for the extrawork :p

be my guest. ;-)
but there is no need in extra work actually. using ASCII and English
for program UI will work in any encoding. and any byte with high bit
set should not be interpreted in any way. it's much easier than utf,
you see? ;-)

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 08:35:13 UTC, ketmar via 
Digitalmars-d wrote:
 but there is no need in extra work actually. using ASCII and 
 English
 for program UI will work in any encoding. and any byte with 
 high bit
 set should not be interpreted in any way. it's much easier than 
 utf,
 you see? ;-)

Whatever your motivation is, I'd say utf-8 is a blessing, and I 
personally see no reason for supporting any other encoding (not 
even utf-16 or utf-32).

utf-8 combined with unique ref-counted immutable short strings is 
quite acceptable IMO (you can compare non-equality by address 
only). D needs an efficient implementation of it, that's all.

Sep 07 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 07 Sep 2014 10:21:48 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 D needs an efficient implementation of it, that's all.

"efficient" and "utf-8" can't play together. in C i must scan the whole
string to get it length, but with utf-8 i must scan the string just to
index nth symbol! ucs-4 (aka dchar/dstring) is ok though.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via 
Digitalmars-d wrote:
 index nth symbol! ucs-4 (aka dchar/dstring) is ok though.

For western text strings utf-8 is much better due to cache 
efficiency. You can speed it up using SSE or dedicated 
datastructures.

The point of having unique immutable strings is that they compare 
by reference only and that you can have auxillary datastructures 
that classify them if needed.

I think the D approach to strings is unpleasant. You should not 
have slices of strings, only slices of ubyte arrays.

If you want real speedups for streams of symbols you have to move 
into the landscape of huffman-encoding, tries, dedicated 
datastructures…

Having uniform string support in libraries (i.e. only supporting 
utf-8) is a clear advantage IMO, that will allow for APIs that 
are SSE backed and performant.

Sep 07 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 07 Sep 2014 10:45:22 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 For western text strings utf-8 is much better due to cache=20
 efficiency. You can speed it up using SSE or dedicated=20
 datastructures.

that's what i call efficiency! using SIMD for string indexing!

 The point of having unique immutable strings is that they compare=20
 by reference only and that you can have auxillary datastructures=20
 that classify them if needed.

and this fill fail with compacting gc. heh.

 I think the D approach to strings is unpleasant. You should not=20
 have slices of strings, only slices of ubyte arrays.

oh, no, thanks. casting strings back and forth for slicing is not fun.
and writing parsers using string slicing is fun.

 If you want real speedups for streams of symbols you have to move=20
 into the landscape of huffman-encoding, tries, dedicated=20
 datastructures=E2=80=A6

or just ditch utf-8 and use ucs-4. this will speedup the most
frequently string operations: correct indexing and slicing.

 Having uniform string support in libraries (i.e. only supporting=20
 utf-8) is a clear advantage IMO, that will allow for APIs that=20
 are SSE backed and performant.

utf-8 was not invented as encoding for internal string representation.
it's merely for data interchange. i myself believe that language should
not do any encoding/decoding on given string without explicit asking.
i.e. `foreach (dchar ch; s)` must be the same as `foreach (char ch; s)`
when s is `string`. for any decoding i must use `foreach (ch; s.byUtf8Char)=
`.

the whole "let's use utf-8 as internal string representation" was a
mistake. and i'm not talking about D here.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 11:31:01 UTC, ketmar via 
Digitalmars-d wrote:
 oh, no, thanks. casting strings back and forth for slicing is 
 not fun.
 and writing parsers using string slicing is fun.

Uhm, why? For a parser you are generally better off padding the 
end with guards/sentinels and only move a pointer.

 or just ditch utf-8 and use ucs-4. this will speedup the most
 frequently string operations: correct indexing and slicing.

I almost never use indexing of strings.

I tend to use either comparisons, regexps, splitting into phrases 
or matching on the head or tail.

 the whole "let's use utf-8 as internal string representation" 
 was a mistake. and i'm not talking about D here.

Not if you want efficient I/O and want to conserve memory (which 
is what you want in a server).

Sep 07 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 07 Sep 2014 14:08:30 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Uhm, why? For a parser you are generally better off padding the=20
 end with guards/sentinels and only move a pointer.

why do i need a useless argument (position in a string), if i have the
string itself, which can be easily sliced? it's not C, i don't need to
keep pointer to string head to free() it later.

 I tend to use either comparisons, regexps, splitting into phrases=20
 or matching on the head or tail.

regexps and splitting needs indexing, for example. at least for some
engines.

 the whole "let's use utf-8 as internal string representation"=20
 was a mistake. and i'm not talking about D here.

 Not if you want efficient I/O and want to conserve memory (which=20
 is what you want in a server).

if my server is just a data storage, i'll pack my data before storing.
and if i need to actually *process* my data, i'd prefer not to use
variable-length characters. memory is cheap nowdays and what is
limiting is network speed. ah, and network throughput. as for servers
-- i can use two, or three or n for that matter. smart sharding rocks,
hardware is cheap.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 14:43:21 UTC, ketmar via 
Digitalmars-d wrote:
 variable-length characters. memory is cheap nowdays and what is
 limiting is network speed. ah, and network throughput. as for 
 servers
 -- i can use two, or three or n for that matter. smart sharding 
 rocks,
 hardware is cheap.

If speed does not matter then you don't need a system level 

more convenience.

Memory is not so cheap on servers, you also need to take into 
account that any pressure on memory will increase network traffic 
because you push data out of the in-memory caches.

Server prices on Amazon:
t2.micro 1 core 1GiB $51 ~$77 per year
t2.small 1 core 2GiB $102	 ~$137 per year

Sep 07 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 07 Sep 2014 15:21:05 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 If speed does not matter then you don't need a system level=20

 more convenience.

D is not just system-level. and i love D templates and CTFE.

 Memory is not so cheap on servers

if i need such powerful servers for my task, memory cost will not be an
issue.

 you also need to take into=20
 account that any pressure on memory will increase network traffic=20
 because you push data out of the in-memory caches.

ahem... why invalidating caches will increase traffic?

 Server prices on Amazon:
 t2.micro 1 core 1GiB $51 ~$77 per year
 t2.small 1 core 2GiB $102	 ~$137 per year

very cheap even for individual.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 16:09:11 UTC, ketmar via 
Digitalmars-d wrote:
 ahem... why invalidating caches will increase traffic?

Because you need to fetch data over a network.

Sep 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via=20
Digitalmars-d wrote:
 but there is no need in extra work actually. using ASCII
 and English for program UI will work in any encoding.

I'm not so convinced that many people would be happy with
reduction of they alphabet to ASCII. Some for aesthetics and
some for political reasons. Cyrillic, Arabic or Japanese just
wouldn't look right anymore. But I figure, your system is 100%
English anyways and you have no use for NLS ? :D

 index nth symbol! ucs-4 (aka dchar/dstring) is ok though.

Now you mentally map UCS-4 onto your 1-byte encodig and try
to see it as the same, just 4 times larger and think that
C style indexing solves all use cases.
But it doesn't. While Latin places letters in a sequence which
you could cut off anywhere, Korean uses blocks containing
multiple consonants and vowels. For truncation of text you
would be interested in the whole block or "grapheme" not a
single vowel/consonant.

Am Sun, 07 Sep 2014 10:45:22 +0000
schrieb "Ola Fosheim Gr=C3=B8stad"
<ola.fosheim.grostad+dlang gmail.com>:

 [...]
=20
 I think the D approach to strings is unpleasant. You should not=20
 have slices of strings, only slices of ubyte arrays.

Rust does that for at least OS paths.

 If you want real speedups for streams of symbols you have to move=20
 into the landscape of huffman-encoding, tries, dedicated=20
 datastructures=E2=80=A6
=20
 Having uniform string support in libraries (i.e. only supporting=20
 utf-8) is a clear advantage IMO, that will allow for APIs that=20
 are SSE backed and performant.

--=20
Marco

Sep 07 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 7 Sep 2014 13:51:23 +0200
Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 But I figure, your system is 100%
 English anyways and you have no use for NLS ? :D

no, you aren't right. ;-) i'm not a native English speaker (as you can
see from my postings) and my system is bilingual (trilingual, actually).
yet i'm using one-byte encoding.

 But it doesn't. While Latin places letters in a sequence which
 you could cut off anywhere, Korean uses blocks containing
 multiple consonants and vowels. For truncation of text you
 would be interested in the whole block or "grapheme" not a
 single vowel/consonant.

that's problem of Korean language, not mine. ;-) and yes, i know about
accents and other unicode stupidness, including that "rtl switch".
bwah, just ignore 'em altogether. additionally, ignore everything that
is out of first two unicode planes, so we can stick with 0..0xffff
range.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 11:42:51 UTC, Marco Leise wrote:
 I think the D approach to strings is unpleasant. You should 
 not have slices of strings, only slices of ubyte arrays.

 Rust does that for at least OS paths.

Which is a good idea. If you also require SSE alignment (or cache 
a padding) then you can get efficient operations on it.

Modern SIMD can act on 32 bytes in parallel, so libraries that 
aim for single-char semantics are bound to be under-performing. 
Which is no good for system level programming.

Sep 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 7 September 2014 at 14:25:51 UTC, Ola Fosheim Grøstad 
wrote:
 Modern SIMD can act on 32 bytes in parallel, so libraries that

Actually, latest gen AVX-512 can work on 64 bytes per instruction…

Sep 07 2014

"Idan Arye" <GenericNPC gmail.com> writes:

On Saturday, 6 September 2014 at 12:52:19 UTC, ketmar via 
Digitalmars-d wrote:
 On Sat, 6 Sep 2014 14:52:50 +0200
 Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:
 E.g. Missing symbols are replaced by a square with the
 hexadecimal code point. So the missing symbol can at least be
 identified correctly (and a matching font installed).

 this can't help me reading texts. really, i'm not a computer, i 
 don't
 remember which unicode number corresponds to which symbol.

Does it really matter? I don't really care if some UTF8 encoded 
text can't be displayed properly because it's written in some 
Ancient Mayan language and I don't have the right font installed. 
Even if I did install that font and manage to get it to display 
correctly - I can't read that language!


Learning a language is much harder than installing a font, so 
it's a pretty safe assumption that if you can read that language 
you can install the font for it...

Sep 06 2014

Nick Treleaven <ntrel-public yahoo.co.uk> writes:

On 03/09/2014 13:51, ketmar via Digitalmars-d wrote:
 anyway, "comma-separated expressions" must be eliminated. ok, let
 'em appear in `for()` for a while, but nowhere else.

It's better to just disallow them where they are bug prone:
http://forum.dlang.org/thread/lgsel0$29fd$1 digitalmars.com

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 03 Sep 2014 15:33:02 +0100
Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It's better to just disallow them where they are bug prone:

that is exactly what i mean. `for ()` can be an exception though, to
ease porting C/C++ code.

Sep 03 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 3 September 2014 at 14:47:21 UTC, ketmar via 
Digitalmars-d wrote:
 On Wed, 03 Sep 2014 15:33:02 +0100
 Nick Treleaven via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 It's better to just disallow them where they are bug prone:

 that is exactly what i mean. `for ()` can be an exception 
 though, to
 ease porting C/C++ code.

The conclusion was to make the comma operator return void when 
compiled with `-w`. I started implementing it here:
https://github.com/schuetzm/dmd/commit/33f4af3aee29cb2ced610d57f65c350fab306918

Unfortunately I don't understand DMD's internals enough to finish 
it, see my last comment on the PR:
https://github.com/D-Programming-Language/dmd/pull/3399#issuecomment-41612282

Maybe it is a good starting point for you.

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 03 Sep 2014 15:20:55 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Unfortunately I don't understand DMD's internals enough to finish=20
 it

i'm pretty sure that this can be done with a little hack in
parseExpression(): just append 'cast(void)0' to the list of expressions
there.

something like this:

    Expression *Parser::parseExpression(bool allowCommas)
    {
        Expression *e;
        Expression *e2;
        Loc loc =3D token.loc;
        bool addVoid =3D false;

        //printf("Parser::parseExpression() loc =3D %d\n", loc.linnum);
        e =3D parseAssignExp();
        while (token.value =3D=3D TOKcomma)
        {
            if (!allowCommas)
            {
                warning(token.loc, "commas in expression are deprecated");
                addVoid =3D true;
            }
            nextToken();
            e2 =3D parseAssignExp();
            e =3D new CommaExp(loc, e, e2);
            loc =3D token.loc;
        }
        if (addVoid) e =3D new CommaExp(loc, e, new CastExp(loc, new Intege=
rExp(loc, 0, Type::tint32), Type::tvoid));
        return e;
    }

no need to introduce new types and hack semantic analyzer. ;-)

Sep 03 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"ketmar via Digitalmars-d"  wrote in message 
news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...

 i'm pretty sure that this can be done with a little hack in
 parseExpression(): just append 'cast(void)0' to the list of expressions
 there.

It can be done there, but it would not be 'correct'.  The ast should attempt 
to match the original code as closely as possible, so that differences are 
minimized in di generation and other things like that.

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 4 Sep 2014 02:11:50 +1000
Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It can be done there, but it would not be 'correct'.  The ast should
 attempt to match the original code as closely as possible, so that
 differences are minimized in di generation and other things like
 that.=20

i can't see any sense in increasing compiler complexity for the feature
that will be removed anyway. let it be "hacky", so anyone who stomp on
it will be tempted to remove it altogether.

Sep 03 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"ketmar via Digitalmars-d"  wrote in message 
news:mailman.366.1409761518.5783.digitalmars-d puremagic.com...

 i can't see any sense in increasing compiler complexity for the feature
 that will be removed anyway. let it be "hacky", so anyone who stomp on
 it will be tempted to remove it altogether.

Nothing gets removed quickly (full deprecation cycle is usually > 18 
months), and hacky solution are not acceptable in the mainline compiler as 
the risk of causing new bugs is too high.  You increase compiler complexity 
either way, hacky or not.

Sep 03 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 4 Sep 2014 02:39:31 +1000
Daniel Murphy via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 hacky solution are not acceptable in the mainline
 compiler as the risk of causing new bugs is too high.

ah, really. it's more risk in introducing new flag to CommaExp and code
to process it than just adding `e =3D new CastExp(loc, e, Type::tvoid);`
which uses already tested code. silly me.

Sep 03 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 3 September 2014 at 16:11:39 UTC, Daniel Murphy 
wrote:
 "ketmar via Digitalmars-d"  wrote in message 
 news:mailman.363.1409759971.5783.digitalmars-d puremagic.com...

 i'm pretty sure that this can be done with a little hack in
 parseExpression(): just append 'cast(void)0' to the list of 
 expressions
 there.

 It can be done there, but it would not be 'correct'.  The ast 
 should attempt to match the original code as closely as 
 possible, so that differences are minimized in di generation 
 and other things like that.

The funny thing is, I had to split `CommaExp` into two classes 
not because of the new behaviour, but because the frontend lowers 
things into comma expressions everywhere. If it didn't do this, 
the additional class wouldn't even be necessary.

Sep 03 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I changed the PR as suggested by Daniel. Here is the new version:

https://github.com/D-Programming-Language/dmd/pull/3943

Sep 03 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 3 September 2014 at 12:32:40 UTC, ketmar via 
Digitalmars-d wrote:
 another code cleanup: removing "comma-separated expressions" 
 from
 phobos.

 https://issues.dlang.org/show_bug.cgi?id=13419

 there is only one poisoned file in phobos and druntime: 
 std.uni. there
 are some "return a, b;" abominations.

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

 (found with my "warning on comma-separated expressions" patch, 
 so i can
 guarantee that nothing is left)

 KILL THE COMMAS! KILL THE COMMAS! KILL THE COMMAS!

https://github.com/D-Programming-Language/phobos/pull/2485

Sep 03 2014

D Programming

C/C++ Programming

Other

digitalmars.D - kill the commas! (phobos code cleanup)