www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Morale of a story: ~ is the right choice for concat operator

reply Dukc <ajieskola gmail.com> writes:

hexadecimal. It may be that I should have used some library 
function for that, but I decided to roll my own function for that 
anyway, in my general utility class:

public static string FormatHexadecimal(int what)
{   if (what == 0) return "0";
	string result = "";
	bool signed = what < 0;
	if (signed) what = -what;

	for (;what != 0;what >>= 4)
	{   int digit = what & 0x0000000F;
		result = (digit < 10? '0' + (char)digit: 'A' + (char)(digit - 
10)) + result;
	}

	return signed? "-" + result: result;
}

Looks correct, right? Yes.


context it means either addition or string concatenation. If you 
add two characters, it interprets it as a concatenation that 
results in a string with two charactes.  The correct way to do 
what I tried is:

public static string FormatHexadecimal(int what)
{   if (what == 0) return "0";
     string result = "";
     bool signed = what < 0;
     if (signed) what = -what;

     for (;what != 0;what >>= 4)
     {   int digit = what & 0x0000000F;
         result = (char)(digit < 10? (int)'0' + digit: (int)'A' + 
(digit - 10)) + result;
     }

     return signed? "-" + result: result;
}

You can imagine me confused when the first version returned way 
too long and incorrect strings. Now, if I were programming in D, 
this would not have happened. Using + always means an addition. 
If one wants to concatenate, ~ is used instead.

So, ~ may be a bit confusing for newcomers, but there is a solid 
reason why it's used instead of +, and it's because they have a 
fundamentally different meaning. Good work, whoever chose that 
meaning!
May 25 2018
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/25/2018 1:27 AM, Dukc wrote:
 So, ~ may be a bit confusing for newcomers, but there is a solid reason why
it's 
 used instead of +, and it's because they have a fundamentally different
meaning. 
 Good work, whoever chose that meaning!
This ambiguity bug with + has been causing well-known problems since Algol. A *really* long time. Yet it gets constantly welded into new languages.
May 25 2018
parent reply Dukc <ajieskola gmail.com> writes:
On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems 
 since Algol. A *really* long time. Yet it gets constantly 
 welded into new languages.
Yeah. I could understand that choice for a language that tries to be simple for beginners above everything else. But for not occur to them.
May 25 2018
parent reply IntegratedDimensions <IntegratedDimensions gmail.com> writes:
On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems 
 since Algol. A *really* long time. Yet it gets constantly 
 welded into new languages.
Yeah. I could understand that choice for a language that tries to be simple for beginners above everything else. But for not occur to them.
It is not a problem of the language but a problem of the programmer. A programmer should always know the types he is working and the functional semantics used. While it obviously has the potential to cause more problems it is not a huge deal in general. I might have been caught by that "bug" once or twice but it's usually an obvious fix. If you are moving from one language to another or haven't programming in one much you will have these types of problems, but they go away with experience. To degrade the language based on that is wrong. Languages should not be designed around noobs because then the programmers of that language stay noobs. Think BASIC. If all you did was programmed in basic then you would be considered a novice programmer by today's standards. If even you were an expert BASIC programmer, when you moved to a modern language you would be confused. For you to say that those languages are inferior because they don't do things like BASIC would be wrong, it is your unfamiliarity with the language and newer programming concepts that are the problem. A language will never solve all your problems as a programmer, else it would write the programs for us.
May 25 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, May 25, 2018 22:23:07 IntegratedDimensions via Digitalmars-d 
wrote:
 On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known problems
 since Algol. A *really* long time. Yet it gets constantly
 welded into new languages.
Yeah. I could understand that choice for a language that tries to be simple for beginners above everything else. But for not occur to them.
It is not a problem of the language but a problem of the programmer. A programmer should always know the types he is working and the functional semantics used. While it obviously has the potential to cause more problems it is not a huge deal in general. I might have been caught by that "bug" once or twice but it's usually an obvious fix. If you are moving from one language to another or haven't programming in one much you will have these types of problems, but they go away with experience. To degrade the language based on that is wrong. Languages should not be designed around noobs because then the programmers of that language stay noobs. Think BASIC. If all you did was programmed in basic then you would be considered a novice programmer by today's standards. If even you were an expert BASIC programmer, when you moved to a modern language you would be confused. For you to say that those languages are inferior because they don't do things like BASIC would be wrong, it is your unfamiliarity with the language and newer programming concepts that are the problem. A language will never solve all your problems as a programmer, else it would write the programs for us.
Personally, I don't think that I've ever made the mistake of screwing up + and concatenating instead of adding or vice versa. And at the end of the day, the programmer does need to know the tools that they're using and use them correctly. That being said, the language (and other tools used for programming) can often be designed in a way that reduces mistakes - and all programmers make mistakes. e.g. in D, implicit fallthrough in case statements is now illegal if the case statement is non-empty. e.g. switch(i) { case 0: // legal fallthrough case 1: { foo(bar()); break; } case 2: { do(something()); // illegal fallthrough } default: return 17; } Instead, the programmer must put a control flow statement there such as break or goto. e.g. switch(i) { case 0: // legal fallthrough case 1: { foo(bar()); break; } case 2: { do(something()); goto case; // now explicitly goes to the next case statement } default: return 17; } Sure, it can be argued that this should be unnecessary and that the programmer should just get it right, but it's not an altogether uncommon bug to screw up case statements and invadvertently fall through to the next one when you meant to put a break or some other control statement there. Originally, implicit fallthrough was perfectly legal in D just like it is in C or C++. However, when it was made illegal, it caught quite a few bugs in existing programs - including at companies using D. This change to the language fixed bugs and almost certainly saved people time and money. Designing a good programming language is a bit of an art. It's not always easy to decide when the language should be picky about something and when it should let the programmer shoot themselves in the foot, but there are plenty of cases where having the language be picky catches bugs that programmers would otherwise make all the time, because we're not perfect. That's part of why we have safe in D. It disallows all kinds of perfectly legitimate code, because it's stuff that's easy for the programmer to screw up and often hard for them to get right, and by having large sections of the program restricted in what is allowed prevents all kinds of bugs. Then in the cases where the programmer actually needs to do the unsafe stuff, they write system code, manually verify that it's correct, and mark it as trusted so that it can be called from safe code. Then, when they run into a memory corruption issue later, they have a relatively small portion of the program that they need to inspect. A well-designed language enables the programmer to do their job correctly and efficiently while protecting them from stupid mistakes where reasonably possible. Using ~ instead of + costs us almost nothing while preventing potential bugs. It's quickly learned when you first start using D, and then the code is clear about whether something is intended to be addition or concatenation without the programmer having to study it closely, and there are cases like what the OP described where it actually allows the compiler to catch bugs. It's a simple design decision with almost no cost that prevents bugs. That's the kind of thing that we generally consider to be a win around here. - Jonathan M Davis
May 25 2018
next sibling parent reply IntegratedDimensions <IntegratedDimensions gmail.com> writes:
On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 On Friday, May 25, 2018 22:23:07 IntegratedDimensions via 
 Digitalmars-d wrote:
 On Friday, 25 May 2018 at 22:07:22 UTC, Dukc wrote:
 On Friday, 25 May 2018 at 21:06:17 UTC, Walter Bright wrote:
 This ambiguity bug with + has been causing well-known 
 problems since Algol. A *really* long time. Yet it gets 
 constantly welded into new languages.
Yeah. I could understand that choice for a language that tries to be simple for beginners above everything else. But just did not occur to them.
issue. It is not a problem of the language but a problem of the programmer. A programmer should always know the types he is working and the functional semantics used. While it obviously has the potential to cause more problems it is not a huge deal in general. I might have been caught by that "bug" once or twice but it's usually an obvious fix. If you are moving from one language to another or haven't programming in one much you will have these types of problems, but they go away with experience. To degrade the language based on that is wrong. Languages should not be designed around noobs because then the programmers of that language stay noobs. Think BASIC. If all you did was programmed in basic then you would be considered a novice programmer by today's standards. If even you were an expert BASIC programmer, when you moved to a modern language you would be confused. For you to say that those languages are inferior because they don't do things like BASIC would be wrong, it is your unfamiliarity with the language and newer programming concepts that are the problem. A language will never solve all your problems as a programmer, else it would write the programs for us.
Personally, I don't think that I've ever made the mistake of screwing up + and concatenating instead of adding or vice versa. And at the end of the day, the programmer does need to know the tools that they're using and use them correctly. That being said, the language (and other tools used for programming) can often be designed in a way that reduces mistakes - and all programmers make mistakes. e.g. in D, implicit fallthrough in case statements is now illegal if the case statement is non-empty. e.g. switch(i) { case 0: // legal fallthrough case 1: { foo(bar()); break; } case 2: { do(something()); // illegal fallthrough } default: return 17; } Instead, the programmer must put a control flow statement there such as break or goto. e.g. switch(i) { case 0: // legal fallthrough case 1: { foo(bar()); break; } case 2: { do(something()); goto case; // now explicitly goes to the next case statement } default: return 17; } Sure, it can be argued that this should be unnecessary and that the programmer should just get it right, but it's not an altogether uncommon bug to screw up case statements and invadvertently fall through to the next one when you meant to put a break or some other control statement there. Originally, implicit fallthrough was perfectly legal in D just like it is in C or C++. However, when it was made illegal, it caught quite a few bugs in existing programs - including at companies using D. This change to the language fixed bugs and almost certainly saved people time and money. Designing a good programming language is a bit of an art. It's not always easy to decide when the language should be picky about something and when it should let the programmer shoot themselves in the foot, but there are plenty of cases where having the language be picky catches bugs that programmers would otherwise make all the time, because we're not perfect. That's part of why we have safe in D. It disallows all kinds of perfectly legitimate code, because it's stuff that's easy for the programmer to screw up and often hard for them to get right, and by having large sections of the program restricted in what is allowed prevents all kinds of bugs. Then in the cases where the programmer actually needs to do the unsafe stuff, they write system code, manually verify that it's correct, and mark it as trusted so that it can be called from safe code. Then, when they run into a memory corruption issue later, they have a relatively small portion of the program that they need to inspect. A well-designed language enables the programmer to do their job correctly and efficiently while protecting them from stupid mistakes where reasonably possible. Using ~ instead of + costs us almost nothing while preventing potential bugs. It's quickly learned when you first start using D, and then the code is clear about whether something is intended to be addition or concatenation without the programmer having to study it closely, and there are cases like what the OP described where it actually allows the compiler to catch bugs. It's a simple design decision with almost no cost that prevents bugs. That's the kind of thing that we generally consider to be a win around here. - Jonathan M Davis
I don't deny that there something things can make it more difficult. But there is a difference between a fundamentally unsound semantic and something that is just difficult for a typical programmer to get rid every time. The difficult things become easier with time. It is not a valid assessment to claim a language is inferior in some way simply because someone unfamiliar with it runs in to problems. No language is perfectly well designed and someone will always have problems in some way. Obviously there are many things that D has done that have screwed the pooch. My point is simply that familiarity of a language is necessary to be able to properly criticize superficial differences. What happens is that when you regulate behavior then that behavior becomes amplified. So, you think by forcing programmers to use a break, goto, or return at the end of a case somes errors but really what it does it make programmers less aware of the problems. They become less effective programmers in the long run. If you have a child, for example, and you always throw them in the pool with maximum amount of safety gear they never really learn to swim. You are trying to protect them from a *potential* problem but cause a real problem in stead. It is impossible to swim correctly with "floaties" because they change the physics and also offer a false sense of security. So, for a potential problem that has other solutions(such as proper education) you have created a false sense of security, an unreal environment that creates a whole new set of problems, and get further away from the whole purpose. For the kid, the purpose of floaties is not to use them when you swim. They provably take away from the experience. Kids that can swim do not use them for a reason. Hence they are a handicap and forcing that handicap on someone permanently prevents them from growing. The mentality that you think you can police everything and that you also have the experience and knowledge to protect everyone is ignorant and provably catastrophic in the long run. Take nuclear weapons. They are the "floaties"... a false sense of security that now will eventually destroy humanity because it is just a matter of time before they are used. The fact is that people cannot and never will be able to properly design things that provide the perfect balance. When people error too much on regulation, control, and prediction, they always create more problems than the lack of. Just because everyone does it does not mean it is right.
May 25 2018
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, May 25, 2018 23:57:03 IntegratedDimensions via Digitalmars-d 
wrote:
 On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 So, you think by forcing programmers to use a break, goto, or
 return at the end of a case somes errors but really what it does
 it make programmers less aware of the problems. They become less
 effective programmers in the long run.
I don't see how it makes anyone less effective. It catches a programming mistake, and if you want to purposefully fall through to the next statement, then use goto case;. No expressiveness is lost, and there's no problem that the programmer is not left aware of.
 The mentality that you think you can police everything and that
 you also have the experience and knowledge to protect everyone is
 ignorant and provably catastrophic in the long run.
We can't protect everyone from everything. And what we've done here (or what primarily Walter has done here) is to make some simple constructs that have proven over time to cause bugs illegal. In each case, there's a simple alternative that really doesn't cost you anything more. So, a simple mistake is prevented without you losing expressiveness in the language and without hiding problems. The programmer still has to do know what they're doing, and they can still do exactly the same things that they could do before. It's just that one class of mistake just became illegal in the language, so you won't have that particular bug. I really don't understand why you think that that's a bad thing. It would be one thing if D prevented you from doing something the simple way and forced you to bend over backwards to in order to be able to do it, but that's not how we generally do things in D. In some cases, you do have to tell the compiler that you know what you're doing and don't want the hand-holding (e.g. with safe vs system), but in general, the stuff that's made illegal is stuff that's going to cause problems, and the alternatives are pretty much just as simple as what's being prevented. You seem to be saying the programming equivalent to arguing that knives don't need handles (just bare blades) and that anyone who wants to use a knife should learn how to hold the blade properly without cutting themselves. No, we can't protect programmers from everything (and shouldn't try), but that's no reason to give up on designing language features in a way that minimizes stupid mistakes - especially when the result is just as expressive and doesn't actually restrict the programmer. If you want a language that doesn't protect you from anything, then C is going to be a much better fit for you than D. - Jonathan M Davis
May 25 2018
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Friday, 25 May 2018 at 23:57:03 UTC, IntegratedDimensions 
wrote:
 So, you think by forcing programmers to use a break, goto, or 
 return at the end of a case somes errors but really what it 
 does it make programmers less aware of the problems. They 
 become less effective programmers in the long run.

 If you have a child, for example, and you always throw them in 
 the pool with maximum amount of safety gear they never really 
 learn to swim. You are trying to protect them from a 
 *potential* problem but cause a real problem in stead. It is 
 impossible to swim correctly with "floaties" because they 
 change the physics and also offer a false sense of security.
You're confusing two things here. Yes, if we never use void[] casts, pointers, goto statements ect, or at least study how they work, we're gonna be helpless. But D does not prevent using them, just makes you aware when you do. For sure you won't learn to swim correctly with plastic pillows only, but that does not mean they have to be made so that they could break at any moment. It just means you have to take them off sometimes.
May 26 2018
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 25 May 2018 at 23:05:51 UTC, Jonathan M Davis wrote:
 Sure, it can be argued that this should be unnecessary and that 
 the programmer should just get it right, but it's not an 
 altogether uncommon bug to screw up case statements and 
 invadvertently fall through to the next one when you meant to 
 put a break or some other control statement there. Originally, 
 implicit fallthrough was perfectly legal in D just like it is 
 in C or C++. However, when it was made illegal, it caught quite 
 a few bugs in existing programs - including at companies using 
 D. This change to the language fixed bugs and almost certainly 
 saved people time and money.
and that the issue is real in C is also illustrated by the fact that gcc now warns about implicit fallthrough since version 7. One has to add at least a comment to suppress the warning (btw the implementation of the heuristic to analyse the comments is more or less broken, I had to file my first bug report to gcc about it).
May 26 2018
prev sibling next sibling parent reply rumbu <rumbu rumbu.ro> writes:
On Friday, 25 May 2018 at 08:27:30 UTC, Dukc wrote:
[...]
 result = (digit < 10? '0' + (char)digit: 'A' + (char)(digit -
[...]
 Looks correct, right? Yes.

[...]

 So, ~ may be a bit confusing for newcomers, but there is a 
 solid reason why it's used instead of +, and it's because they 
 have a fundamentally different meaning. Good work, whoever 
 chose that meaning!
Sorry, but the mistake here is the fact that you wrongly assume C a string as in the language specification. The same '+' operator works also with multicast delegates and I doubt that you'll expect something else than a multicast delegate as a result.
May 26 2018
parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 26 May 2018 at 09:01:29 UTC, rumbu wrote:
 Sorry, but the mistake here is the fact that you wrongly assume 

Yes it is. But that does not make differentiating concat and addition in language desing any less worthwhile. In car crashes, the mistake is usually made by a driver, but I know no-one who says safety belts aren't worthwhile.
 Adding chars to an existing string will result in a string as 
 in the language specification.
In fact I didn't make the mistake there. What surprised me was that adding two INDIVIDUAL chars result in a string. When op+ is srictly for mathematical summing, there are no ambiquites.
May 26 2018
parent reply rumbu <rumbu rumbu.ro> writes:
On Saturday, 26 May 2018 at 10:08:35 UTC, Dukc wrote:
 On Saturday, 26 May 2018 at 09:01:29 UTC, rumbu wrote:
 Sorry, but the mistake here is the fact that you wrongly 

Yes it is. But that does not make differentiating concat and addition in language desing any less worthwhile. In car crashes, the mistake is usually made by a driver, but I know no-one who says safety belts aren't worthwhile.
 Adding chars to an existing string will result in a string as 
 in the language specification.
In fact I didn't make the mistake there. What surprised me was that adding two INDIVIDUAL chars result in a string. When op+ is srictly for mathematical summing, there are no ambiquites.
int. Adding an int to a string will box the int and ToString() will be called on the resulted object => result is a string. So 'a' + 'b' + "sssss" = 195 + "sssss" = 195.ToString() + "sssss" = "195sssss". Therefore your first example will work correctly if you convert the int result back to char: (char)('a' + 'b') + "sssss" will render the correct result. conversion problem, not an operator problem, you wrongly assumed that adding two chars will result in a char, not an int. In the hypothetically code 'a' + 'b' ~ "sssss" is also "195sssss".
May 26 2018
parent Dukc <ajieskola gmail.com> writes:
On Saturday, 26 May 2018 at 12:37:15 UTC, rumbu wrote:
 Therefore your first example will work correctly if you convert 
 the int result back to char: (char)('a' + 'b') + "sssss" will 
 render the correct result.


 conversion problem, not an operator problem, you wrongly 
 assumed that adding two chars will result in a char, not an 
 int. In the hypothetically code 'a' + 'b' ~ "sssss" is also 
 "195sssss".
I had to go back and check. Yes, it appears I screw up here. Sorry. Sigh, this is one of the cases whereI wish I could edit my posts.
May 26 2018
prev sibling parent reply Nick Treleaven <nick geany.org> writes:
On Friday, 25 May 2018 at 08:27:30 UTC, Dukc wrote:
 If you add two characters, it interprets it as a concatenation 
 that results in a string with two charactes.
...
 Now, if I were programming in D, this would not have happened. 
 Using + always means an addition.
I don't think it makes sense to allow adding two characters - the second operand should be an integer type. Why have `byte` in the language if `char` works like an integer? Ideally ops like addition would allow one operand to be a character type, but require the other operand to be an integer - that is a useful operation, unlike adding '+' to 'Z'.
May 26 2018
parent Dukc <ajieskola gmail.com> writes:
On Saturday, 26 May 2018 at 11:04:44 UTC, Nick Treleaven wrote:
 I don't think it makes sense to allow adding two characters - 
 the second operand should be an integer type.
So it would behave like pointer arithmetic. Sounds sound. Not for D because of the C semantic similarity requirement but for some other aspiring language.
May 26 2018