digitalmars.D - Morale of a story: ~ is the right choice for concat operator

Dukc (42/42) May 25 2018 I was programming in C# and wanted to format an int in

Walter Bright (3/6) May 25 2018 This ambiguity bug with + has been causing well-known problems since Alg...

Dukc (5/8) May 25 2018 Yeah. I could understand that choice for a language that tries to

IntegratedDimensions (22/31) May 25 2018 I used to program in C# quite regularly and never had this issue.

Jonathan M Davis (74/105) May 25 2018 Personally, I don't think that I've ever made the mistake of screwing up...

IntegratedDimensions (44/163) May 25 2018 I don't deny that there something things can make it more

Jonathan M Davis (34/42) May 25 2018 I don't see how it makes anyone less effective. It catches a programming
Dukc (10/20) May 26 2018 You're confusing two things here. Yes, if we never use void[]

Patrick Schluter (7/17) May 26 2018 and that the issue is real in C is also illustrated by the fact

rumbu (6/15) May 26 2018 Sorry, but the mistake here is the fact that you wrongly assume C

Dukc (8/12) May 26 2018 Yes it is. But that does not make differentiating concat and

rumbu (13/25) May 26 2018 Adding 2 individual chars in C# (and in D also) will result in an

Dukc (4/12) May 26 2018 I had to go back and check. Yes, it appears I screw up here.

Nick Treleaven (8/12) May 26 2018 I don't think it makes sense to allow adding two characters - the

Dukc (4/6) May 26 2018 So it would behave like pointer arithmetic. Sounds sound. Not for

Dukc <ajieskola gmail.com> writes:


hexadecimal. It may be that I should have used some library 
function for that, but I decided to roll my own function for that 
anyway, in my general utility class:

public static string FormatHexadecimal(int what)
{   if (what == 0) return "0";
	string result = "";
	bool signed = what < 0;
	if (signed) what = -what;

	for (;what != 0;what >>= 4)
	{   int digit = what & 0x0000000F;
		result = (digit < 10? '0' + (char)digit: 'A' + (char)(digit - 
10)) + result;
	}

	return signed? "-" + result: result;
}

Looks correct, right? Yes.


context it means either addition or string concatenation. If you 
add two characters, it interprets it as a concatenation that 
results in a string with two charactes.  The correct way to do 
what I tried is:

public static string FormatHexadecimal(int what)
{   if (what == 0) return "0";
     string result = "";
     bool signed = what < 0;
     if (signed) what = -what;

     for (;what != 0;what >>= 4)
     {   int digit = what & 0x0000000F;
         result = (char)(digit < 10? (int)'0' + digit: (int)'A' + 
(digit - 10)) + result;
     }

     return signed? "-" + result: result;
}

You can imagine me confused when the first version returned way 
too long and incorrect strings. Now, if I were programming in D, 
this would not have happened. Using + always means an addition. 
If one wants to concatenate, ~ is used instead.

So, ~ may be a bit confusing for newcomers, but there is a solid 
reason why it's used instead of +, and it's because they have a 
fundamentally different meaning. Good work, whoever chose that 
meaning!

May 25 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 5/25/2018 1:27 AM, Dukc wrote:
 So, ~ may be a bit confusing for newcomers, but there is a solid reason why
it's 
 used instead of +, and it's because they have a fundamentally different
meaning. 
 Good work, whoever chose that meaning!

This ambiguity bug with + has been causing well-known problems since Algol. A 
*really* long time. Yet it gets constantly welded into new languages.

May 25 2018

Dukc <ajieskola gmail.com> writes:

On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems 
 since Algol. A *really* long time. Yet it gets constantly 
 welded into new languages.

Yeah. I could understand that choice for a language that tries to 
be simple for beginners above everything else. But for 

not occur to them.

May 25 2018

IntegratedDimensions <IntegratedDimensions gmail.com> writes:

On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems 
 since Algol. A *really* long time. Yet it gets constantly 
 welded into new languages.

 Yeah. I could understand that choice for a language that tries 
 to be simple for beginners above everything else. But for 

 not occur to them.


It is not a problem of the language but a problem of the 
programmer.

A programmer should always know the types he is working and the 
functional semantics used. While it obviously has the potential 
to cause more problems it is not a huge deal in general. I might 
have been caught by that "bug" once or twice but it's usually an 
obvious fix. If you are moving from one language to another or 
haven't programming in one much you will have these types of 
problems, but they go away with experience. To degrade the 
language based on that is wrong. Languages should not be designed 
around noobs because then the programmers of that language stay 
noobs. Think BASIC. If all you did was programmed in basic then 
you would be considered a novice programmer by today's standards. 
If even you were an expert BASIC programmer, when you moved to a 
modern language you would be confused. For you to say that those 
languages are inferior because they don't do things like BASIC 
would be wrong, it is your unfamiliarity with the language and 
newer programming concepts that are the problem.

A language will never solve all your problems as a programmer, 
else it would write the programs for us.

May 25 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, May 25, 2018 22:23:07 IntegratedDimensions via Digitalmars-d 
wrote:
 On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems
 since Algol. A *really* long time. Yet it gets constantly
 welded into new languages.

 Yeah. I could understand that choice for a language that tries
 to be simple for beginners above everything else. But for

 not occur to them.


 It is not a problem of the language but a problem of the
 programmer.

 A programmer should always know the types he is working and the
 functional semantics used. While it obviously has the potential
 to cause more problems it is not a huge deal in general. I might
 have been caught by that "bug" once or twice but it's usually an
 obvious fix. If you are moving from one language to another or
 haven't programming in one much you will have these types of
 problems, but they go away with experience. To degrade the
 language based on that is wrong. Languages should not be designed
 around noobs because then the programmers of that language stay
 noobs. Think BASIC. If all you did was programmed in basic then
 you would be considered a novice programmer by today's standards.
 If even you were an expert BASIC programmer, when you moved to a
 modern language you would be confused. For you to say that those
 languages are inferior because they don't do things like BASIC
 would be wrong, it is your unfamiliarity with the language and
 newer programming concepts that are the problem.

 A language will never solve all your problems as a programmer,
 else it would write the programs for us.

Personally, I don't think that I've ever made the mistake of screwing up +
and concatenating instead of adding or vice versa. And at the end of the
day, the programmer does need to know the tools that they're using and use
them correctly. That being said, the language (and other tools used for
programming) can often be designed in a way that reduces mistakes - and all
programmers make mistakes. e.g. in D, implicit fallthrough in case
statements is now illegal if the case statement is non-empty. e.g.

switch(i)
{
    case 0: // legal fallthrough
    case 1:
    {
        foo(bar());
        break;
    }
    case 2:
    {
        do(something());
        // illegal fallthrough
    }
    default: return 17;
}

Instead, the programmer must put a control flow statement there such as
break or goto. e.g.

switch(i)
{
    case 0: // legal fallthrough
    case 1:
    {
        foo(bar());
        break;
    }
    case 2:
    {
        do(something());
        goto case; // now explicitly goes to the next case statement
    }
    default: return 17;
}

Sure, it can be argued that this should be unnecessary and that the
programmer should just get it right, but it's not an altogether uncommon bug
to screw up case statements and invadvertently fall through to the next one
when you meant to put a break or some other control statement there.
Originally, implicit fallthrough was perfectly legal in D just like it is in
C or C++. However, when it was made illegal, it caught quite a few bugs in
existing programs - including at companies using D. This change to the
language fixed bugs and almost certainly saved people time and money.

Designing a good programming language is a bit of an art. It's not always
easy to decide when the language should be picky about something and when it
should let the programmer shoot themselves in the foot, but there are plenty
of cases where having the language be picky catches bugs that programmers
would otherwise make all the time, because we're not perfect.

That's part of why we have  safe in D. It disallows all kinds of perfectly
legitimate code, because it's stuff that's easy for the programmer to screw
up and often hard for them to get right, and by having large sections of the
program restricted in what is allowed prevents all kinds of bugs. Then in
the cases where the programmer actually needs to do the unsafe stuff, they
write  system code, manually verify that it's correct, and mark it as
 trusted so that it can be called from  safe code. Then, when they run into
a memory corruption issue later, they have a relatively small portion of the
program that they need to inspect.

A well-designed language enables the programmer to do their job correctly
and efficiently while protecting them from stupid mistakes where reasonably
possible. Using ~ instead of + costs us almost nothing while preventing
potential bugs. It's quickly learned when you first start using D, and then
the code is clear about whether something is intended to be addition or
concatenation without the programmer having to study it closely, and there
are cases like what the OP described where it actually allows the compiler
to catch bugs. It's a simple design decision with almost no cost that
prevents bugs. That's the kind of thing that we generally consider to be a
win around here.

- Jonathan M Davis

May 25 2018

IntegratedDimensions <IntegratedDimensions gmail.com> writes:

On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 On Friday, May 25, 2018 22:23:07 IntegratedDimensions via 
 Digitalmars-d wrote:
 On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known 
 problems since Algol. A *really* long time. Yet it gets 
 constantly welded into new languages.

 Yeah. I could understand that choice for a language that 
 tries to be simple for beginners above everything else. But 

 just did not occur to them.


 issue. It is not a problem of the language but a problem of 
 the programmer.

 A programmer should always know the types he is working and 
 the functional semantics used. While it obviously has the 
 potential to cause more problems it is not a huge deal in 
 general. I might have been caught by that "bug" once or twice 
 but it's usually an obvious fix. If you are moving from one 
 language to another or haven't programming in one much you 
 will have these types of problems, but they go away with 
 experience. To degrade the language based on that is wrong. 
 Languages should not be designed around noobs because then the 
 programmers of that language stay noobs. Think BASIC. If all 
 you did was programmed in basic then you would be considered a 
 novice programmer by today's standards. If even you were an 
 expert BASIC programmer, when you moved to a modern language 
 you would be confused. For you to say that those languages are 
 inferior because they don't do things like BASIC would be 
 wrong, it is your unfamiliarity with the language and newer 
 programming concepts that are the problem.

 A language will never solve all your problems as a programmer, 
 else it would write the programs for us.

 Personally, I don't think that I've ever made the mistake of 
 screwing up + and concatenating instead of adding or vice 
 versa. And at the end of the day, the programmer does need to 
 know the tools that they're using and use them correctly. That 
 being said, the language (and other tools used for programming) 
 can often be designed in a way that reduces mistakes - and all 
 programmers make mistakes. e.g. in D, implicit fallthrough in 
 case statements is now illegal if the case statement is 
 non-empty. e.g.

 switch(i)
 {
     case 0: // legal fallthrough
     case 1:
     {
         foo(bar());
         break;
     }
     case 2:
     {
         do(something());
         // illegal fallthrough
     }
     default: return 17;
 }

 Instead, the programmer must put a control flow statement there 
 such as break or goto. e.g.

 switch(i)
 {
     case 0: // legal fallthrough
     case 1:
     {
         foo(bar());
         break;
     }
     case 2:
     {
         do(something());
         goto case; // now explicitly goes to the next case 
 statement
     }
     default: return 17;
 }

 Sure, it can be argued that this should be unnecessary and that 
 the programmer should just get it right, but it's not an 
 altogether uncommon bug to screw up case statements and 
 invadvertently fall through to the next one when you meant to 
 put a break or some other control statement there. Originally, 
 implicit fallthrough was perfectly legal in D just like it is 
 in C or C++. However, when it was made illegal, it caught quite 
 a few bugs in existing programs - including at companies using 
 D. This change to the language fixed bugs and almost certainly 
 saved people time and money.

 Designing a good programming language is a bit of an art. It's 
 not always easy to decide when the language should be picky 
 about something and when it should let the programmer shoot 
 themselves in the foot, but there are plenty of cases where 
 having the language be picky catches bugs that programmers 
 would otherwise make all the time, because we're not perfect.

 That's part of why we have  safe in D. It disallows all kinds 
 of perfectly legitimate code, because it's stuff that's easy 
 for the programmer to screw up and often hard for them to get 
 right, and by having large sections of the program restricted 
 in what is allowed prevents all kinds of bugs. Then in the 
 cases where the programmer actually needs to do the unsafe 
 stuff, they write  system code, manually verify that it's 
 correct, and mark it as  trusted so that it can be called from 
  safe code. Then, when they run into a memory corruption issue 
 later, they have a relatively small portion of the program that 
 they need to inspect.

 A well-designed language enables the programmer to do their job 
 correctly and efficiently while protecting them from stupid 
 mistakes where reasonably possible. Using ~ instead of + costs 
 us almost nothing while preventing potential bugs. It's quickly 
 learned when you first start using D, and then the code is 
 clear about whether something is intended to be addition or 
 concatenation without the programmer having to study it 
 closely, and there are cases like what the OP described where 
 it actually allows the compiler to catch bugs. It's a simple 
 design decision with almost no cost that prevents bugs. That's 
 the kind of thing that we generally consider to be a win around 
 here.

 - Jonathan M Davis

I don't deny that there something things can make it more 
difficult. But there is a difference between a fundamentally 
unsound semantic and something that is just difficult for a 
typical programmer to get rid every time.

The difficult things become easier with time. It is not a valid 
assessment to claim a language is inferior in some way simply 
because someone unfamiliar with it runs in to problems.

No language is perfectly well designed and someone will always 
have problems in some way.  Obviously there are many things that 
D has done that have screwed the pooch. My point is simply that 
familiarity of a language is necessary to be able to properly 
criticize superficial differences.

What happens is that when you regulate behavior then that 
behavior becomes amplified.

So, you think by forcing programmers to use a break, goto, or 
return at the end of a case somes errors but really what it does 
it make programmers less aware of the problems. They become less 
effective programmers in the long run.

If you have a child, for example, and you always throw them in 
the pool with maximum amount of safety gear they never really 
learn to swim. You are trying to protect them from a *potential* 
problem but cause a real problem in stead. It is impossible to 
swim correctly with "floaties" because they change the physics 
and also offer a false sense of security.

So, for a potential problem that has other solutions(such as 
proper education) you have created a false sense of security, an 
unreal environment that creates a whole new set of problems, and 
get further away from the whole purpose. For the kid, the purpose 
of floaties is not to use them when you swim. They provably take 
away from the experience. Kids that can swim do not use them for 
a reason. Hence they are a handicap and forcing that handicap on 
someone permanently prevents them from growing.

The mentality that you think you can police everything and that 
you also have the experience and knowledge to protect everyone is 
ignorant and provably catastrophic in the long run. Take nuclear 
weapons. They are the "floaties"... a false sense of security 
that now will eventually destroy humanity because it is just a 
matter of time before they are used.

The fact is that people cannot and never will be able to properly 
design things that provide the perfect balance. When people error 
too much on regulation, control, and prediction, they always 
create more problems than the lack of. Just because everyone does 
it does not mean it is right.

May 25 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, May 25, 2018 23:57:03 IntegratedDimensions via Digitalmars-d 
wrote:
 On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 So, you think by forcing programmers to use a break, goto, or
 return at the end of a case somes errors but really what it does
 it make programmers less aware of the problems. They become less
 effective programmers in the long run.

I don't see how it makes anyone less effective. It catches a programming
mistake, and if you want to purposefully fall through to the next statement,
then use goto case;. No expressiveness is lost, and there's no problem that
the programmer is not left aware of.

 The mentality that you think you can police everything and that
 you also have the experience and knowledge to protect everyone is
 ignorant and provably catastrophic in the long run.

We can't protect everyone from everything. And what we've done here (or what
primarily Walter has done here) is to make some simple constructs that have
proven over time to cause bugs illegal. In each case, there's a simple
alternative that really doesn't cost you anything more. So, a simple mistake
is prevented without you losing expressiveness in the language and without
hiding problems. The programmer still has to do know what they're doing, and
they can still do exactly the same things that they could do before. It's
just that one class of mistake just became illegal in the language, so you
won't have that particular bug.

I really don't understand why you think that that's a bad thing. It would be
one thing if D prevented you from doing something the simple way and forced
you to bend over backwards to in order to be able to do it, but that's not
how we generally do things in D. In some cases, you do have to tell the
compiler that you know what you're doing and don't want the hand-holding
(e.g. with  safe vs  system), but in general, the stuff that's made illegal
is stuff that's going to cause problems, and the alternatives are pretty
much just as simple as what's being prevented.

You seem to be saying the programming equivalent to arguing that knives
don't need handles (just bare blades) and that anyone who wants to use a
knife should learn how to hold the blade properly without cutting
themselves.

No, we can't protect programmers from everything (and shouldn't try), but
that's no reason to give up on designing language features in a way that
minimizes stupid mistakes - especially when the result is just as expressive
and doesn't actually restrict the programmer.

If you want a language that doesn't protect you from anything, then C is
going to be a much better fit for you than D.

- Jonathan M Davis

May 25 2018

Dukc <ajieskola gmail.com> writes:

On Friday, 25 May 2018 at 23:57:03 UTC, IntegratedDimensions 
wrote:
 So, you think by forcing programmers to use a break, goto, or 
 return at the end of a case somes errors but really what it 
 does it make programmers less aware of the problems. They 
 become less effective programmers in the long run.

 If you have a child, for example, and you always throw them in 
 the pool with maximum amount of safety gear they never really 
 learn to swim. You are trying to protect them from a 
 *potential* problem but cause a real problem in stead. It is 
 impossible to swim correctly with "floaties" because they 
 change the physics and also offer a false sense of security.

You're confusing two things here. Yes, if we never use void[] 
casts, pointers, goto statements ect, or at least study how they 
work, we're gonna be helpless. But D does not prevent using them, 
just makes you aware when you do.

For sure you won't learn to swim correctly with plastic pillows 
only, but that does not mean they have to be made so that they 
could break at any moment. It just means you have to take them 
off sometimes.

May 26 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 Sure, it can be argued that this should be unnecessary and that 
 the programmer should just get it right, but it's not an 
 altogether uncommon bug to screw up case statements and 
 invadvertently fall through to the next one when you meant to 
 put a break or some other control statement there. Originally, 
 implicit fallthrough was perfectly legal in D just like it is 
 in C or C++. However, when it was made illegal, it caught quite 
 a few bugs in existing programs - including at companies using 
 D. This change to the language fixed bugs and almost certainly 
 saved people time and money.

and that the issue is real in C is also illustrated by the fact 
that gcc now warns about implicit fallthrough since version 7. 
One has to add at least a comment to suppress the warning (btw 
the implementation of the heuristic to analyse the comments is 
more or less broken, I had to file my first bug report to gcc 
about it).

May 26 2018

rumbu <rumbu rumbu.ro> writes:

On Friday, 25 May 2018 at 08:27:30 UTC, Dukc wrote:
[...]
 result = (digit < 10? '0' + (char)digit: 'A' + (char)(digit -
[...]
 Looks correct, right? Yes.

[...]

 So, ~ may be a bit confusing for newcomers, but there is a 
 solid reason why it's used instead of +, and it's because they 
 have a fundamentally different meaning. Good work, whoever 
 chose that meaning!

Sorry, but the mistake here is the fact that you wrongly assume C 

a string as in the language specification. The same '+' operator 
works also with multicast delegates and I doubt that you'll 
expect something else than a multicast delegate as a result.

May 26 2018

Dukc <ajieskola gmail.com> writes:

On Saturday, 26 May 2018 at 09:01:29 UTC, rumbu wrote:
 Sorry, but the mistake here is the fact that you wrongly assume 


Yes it is. But that does not make differentiating concat and 
addition in language desing any less worthwhile. In car crashes, 
the mistake is usually made by a driver, but I know no-one who 
says safety belts aren't worthwhile.

 Adding chars to an existing string will result in a string as 
 in the language specification.

In fact I didn't make the mistake there. What surprised me was 
that adding two INDIVIDUAL chars result in a string. When op+ is 
srictly for mathematical summing, there are no ambiquites.

May 26 2018

rumbu <rumbu rumbu.ro> writes:

On Saturday, 26 May 2018 at 10:08:35 UTC, Dukc wrote:
 On Saturday, 26 May 2018 at 09:01:29 UTC, rumbu wrote:
 Sorry, but the mistake here is the fact that you wrongly 


 Yes it is. But that does not make differentiating concat and 
 addition in language desing any less worthwhile. In car 
 crashes, the mistake is usually made by a driver, but I know 
 no-one who says safety belts aren't worthwhile.

 Adding chars to an existing string will result in a string as 
 in the language specification.

 In fact I didn't make the mistake there. What surprised me was 
 that adding two INDIVIDUAL chars result in a string. When op+ 
 is srictly for mathematical summing, there are no ambiquites.


int. Adding an int to a string will box the int and ToString() 
will be called on the resulted object => result is a string. So 
'a' + 'b' + "sssss" = 195 + "sssss" = 195.ToString() + "sssss" = 
"195sssss".

Therefore your first example will work correctly if you convert 
the int result back to char: (char)('a' + 'b') + "sssss" will 
render the correct result.


conversion problem, not an operator problem, you wrongly assumed 
that adding two chars will result in a char, not an int. In the 
hypothetically code 'a' + 'b' ~ "sssss" is also "195sssss".

May 26 2018

Dukc <ajieskola gmail.com> writes:

On Saturday, 26 May 2018 at 12:37:15 UTC, rumbu wrote:
 Therefore your first example will work correctly if you convert 
 the int result back to char: (char)('a' + 'b') + "sssss" will 
 render the correct result.


 conversion problem, not an operator problem, you wrongly 
 assumed that adding two chars will result in a char, not an 
 int. In the hypothetically code 'a' + 'b' ~ "sssss" is also 
 "195sssss".

I had to go back and check. Yes, it appears I screw up here. 
Sorry.

Sigh, this is one of the cases whereI wish I could edit my posts.

May 26 2018

Nick Treleaven <nick geany.org> writes:

On Friday, 25 May 2018 at 08:27:30 UTC, Dukc wrote:
 If you add two characters, it interprets it as a concatenation 
 that results in a string with two charactes.

...
 Now, if I were programming in D, this would not have happened. 
 Using + always means an addition.

I don't think it makes sense to allow adding two characters - the 
second operand should be an integer type. Why have `byte` in the 
language if `char` works like an integer? Ideally ops like 
addition would allow one operand to be a character type, but 
require the other operand to be an integer - that is a useful 
operation, unlike adding '+' to 'Z'.

May 26 2018

Dukc <ajieskola gmail.com> writes:

On Saturday, 26 May 2018 at 11:04:44 UTC, Nick Treleaven wrote:
 I don't think it makes sense to allow adding two characters - 
 the second operand should be an integer type.

So it would behave like pointer arithmetic. Sounds sound. Not for 
D because of the C semantic similarity requirement but for some 
other aspiring language.

May 26 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Morale of a story: ~ is the right choice for concat operator