digitalmars.D - Comparing D vs C++ (wierd behaviour of C++)

Daniel Kozak (40/40) Jul 24 2018 I am not C++ expert so this seems wierd to me:

Ecstatic Coder (5/45) Jul 24 2018 As the C++ char are signed by default, when you accumulate

Patrick Schluter (9/63) Jul 24 2018 That's not exactly what happens here. There's no 64 bit buffer.

Ecstatic Coder (10/67) Jul 24 2018 Sure about that ? ;)

Patrick Schluter (16/77) Jul 24 2018 Yes, there are no "buffers" only register and a place on the

Patrick Schluter (9/43) Jul 24 2018 what I forgot to mention, for the compiler the type deduction for

Ecstatic Coder (6/12) Jul 24 2018 Btw you are actually confirming what I said.

Patrick Schluter (5/18) Jul 24 2018 We're nitpicking over vocabulary. For me buffer != register.

Ecstatic Coder (3/25) Jul 24 2018 Pick the word you prefer, the i value is stored in a 64 bits

Patrick Schluter (13/53) Jul 24 2018 int promotion rule. char is signed. The 256 are signed. When the
Caspar Kielwein (11/34) Jul 24 2018 Signedness of char in C++ is platform dependent.

Daniel Kozak <kozzi11 gmail.com> writes:

I am not C++ expert so this seems wierd to me:

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv)
{
	char c = 0xFF;
	std::string sData = {c,c,c,c};
	unsigned int i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
					
	if (i != 0xFFFFFFFF) { // it is true why?
		// this print 18446744073709551615 wow
		std::cout << "WTF: " << i  << std::endl;
	}	    	
	return 0;
}

compiled with:
g++ -O2 -Wall  -o "test" "test.cxx"
when compiled with -O0 it works as expected

Vs. D:

import std.stdio;

void main(string[] args)
{
	char c = 0xFF;
	string sData = [c,c,c,c];
	uint i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
	if (i != 0xFFFFFFFF) { // is false - make sense
		writefln("WTF: %d", i);
	}			
}

compiled with:
dmd -release -inline -boundscheck=off -w -of"test" "test.d"

So it is code gen bug on c++ side, or there is something wrong 
with that code.

Jul 24 2018

Ecstatic Coder <ecstatic.coder gmail.com> writes:

On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.

As the C++ char are signed by default, when you accumulate 
several shifted 8 bit -1 into a char result and then store it in 
a 64 bit unsigned buffer, you get -1 in 64 bits : 
18446744073709551615.

Jul 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.

 As the C++ char are signed by default, when you accumulate 
 several shifted 8 bit -1 into a char result and then store it 
 in a 64 bit unsigned buffer, you get -1 in 64 bits : 
 18446744073709551615.

That's not exactly what happens here. There's no 64 bit buffer. 
It's signed overflow which is undefined behavior in C and C++.
He gets different results with and without optimization because 
without optimization the result of the calculation is spilled to 
the i unsigned int and then reloaded for the print call. This 
save and reload truncated the value to its real value. In the 
optimized version, the compiler removed the spill and the 
overflowed value contained in the register is printed as is.

Jul 24 2018

Ecstatic Coder <ecstatic.coder gmail.com> writes:

On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something 
 wrong with that code.

 As the C++ char are signed by default, when you accumulate 
 several shifted 8 bit -1 into a char result and then store it 
 in a 64 bit unsigned buffer, you get -1 in 64 bits : 
 18446744073709551615.

 That's not exactly what happens here. There's no 64 bit buffer.

Sure about that ? ;)

As "i" is printed as 18446744073709551615 when put into cout, I 
don't see how I couldn't be stored as a uint64...

It's actually -1 stored as an uint64.

This kind of optimizer problem is classical when mixing signed 
and unsigned values into such bit shifting expressions.

This is why you should always cast the signed input values to the 
unsigned result type right from the start before starting to 
mix/shift them.

Jul 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter 
 wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something 
 wrong with that code.

 As the C++ char are signed by default, when you accumulate 
 several shifted 8 bit -1 into a char result and then store it 
 in a 64 bit unsigned buffer, you get -1 in 64 bits : 
 18446744073709551615.

 That's not exactly what happens here. There's no 64 bit buffer.

 Sure about that ? ;)

Yes, there are no "buffers" only register and a place on the 
stack for the variable i.

As said it's undefined behaviour so anything goes. I just checked 
on godbolt what code is generated. https://godbolt.org/g/wxqfmM
So with -O0 this happens:
 From line 41 to line 77 the instruction to make the calculation. 
At line 78
mov DWORD PTR [rbp-40], eax which is writing out 32 bits to 
reserved space of i.
At line 85  mov eax, DWORD PTR [rbp-40] reloads that value in 
eax, this annuls the high part of RAX => RAX contains 
0x0000_0000_FFFF_FFFF

On the -O2 version it's even simpler. The calculation is done at 
compile time and the endresult -1 is put directly to the output. 
The test is even removed. Everything happens in the compiler.

Jul 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 24 July 2018 at 20:59:22 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter 
 wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder 
 wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 [...]

 As the C++ char are signed by default, when you accumulate 
 several shifted 8 bit -1 into a char result and then store 
 it in a 64 bit unsigned buffer, you get -1 in 64 bits : 
 18446744073709551615.

 That's not exactly what happens here. There's no 64 bit 
 buffer.

 Sure about that ? ;)

 Yes, there are no "buffers" only register and a place on the 
 stack for the variable i.

 As said it's undefined behaviour so anything goes. I just 
 checked on godbolt what code is generated. 
 https://godbolt.org/g/wxqfmM
 So with -O0 this happens:
 From line 41 to line 77 the instruction to make the 
 calculation. At line 78
 mov DWORD PTR [rbp-40], eax which is writing out 32 bits to 
 reserved space of i.
 At line 85  mov eax, DWORD PTR [rbp-40] reloads that value in 
 eax, this annuls the high part of RAX => RAX contains 
 0x0000_0000_FFFF_FFFF

what I forgot to mention, for the compiler the type deduction for 
the >> operator is done with the i variable, so it chooses the 
right template with unsigned int. For the optimized code as the 
calculation is done during compilation and there is no spill to 
the variable the type deduction for the >> operator for cout is 
done with that internal promoted temporary value and it deduces 
it as long (funnily declaring i as volatile doesn't change that 
even if the value is spilled to the stack).

 On the -O2 version it's even simpler. The calculation is done 
 at compile time and the endresult -1 is put directly to the 
 output. The test is even removed. Everything happens in the 
 compiler.

Jul 24 2018

Ecstatic Coder <ecstatic.coder gmail.com> writes:

 He gets different results with and without optimization because 
 without optimization the result of the calculation is spilled 
 to the i unsigned int and then reloaded for the print call. 
 This save and reload truncated the value to its real value. In 
 the optimized version, the compiler removed the spill and the 
 overflowed value contained in the register is printed as is.

Btw you are actually confirming what I said.

if (i != 0xFFFFFFFF) ...

In the optimized version, when the 64 bits "i" value is compared 
to a 32 bits constant, the test fails...

Proof that the value is stored in a **64** bits register, not 
32...

Jul 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 24 July 2018 at 19:39:10 UTC, Ecstatic Coder wrote:
 He gets different results with and without optimization 
 because without optimization the result of the calculation is 
 spilled to the i unsigned int and then reloaded for the print 
 call. This save and reload truncated the value to its real 
 value. In the optimized version, the compiler removed the 
 spill and the overflowed value contained in the register is 
 printed as is.

 Btw you are actually confirming what I said.

 if (i != 0xFFFFFFFF) ...

 In the optimized version, when the 64 bits "i" value is 
 compared to a 32 bits constant, the test fails...

 Proof that the value is stored in a **64** bits register, not 
 32...

We're nitpicking over vocabulary. For me buffer != register. 
Buffer is something in memory in my mental model (or is hardware 
like the store buffer between register and the cache) but never 
would I denominate a register as a buffer.

Jul 24 2018

Ecstatic Coder <ecstatic.coder gmail.com> writes:

On Tuesday, 24 July 2018 at 21:03:00 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 19:39:10 UTC, Ecstatic Coder wrote:
 He gets different results with and without optimization 
 because without optimization the result of the calculation is 
 spilled to the i unsigned int and then reloaded for the print 
 call. This save and reload truncated the value to its real 
 value. In the optimized version, the compiler removed the 
 spill and the overflowed value contained in the register is 
 printed as is.

 Btw you are actually confirming what I said.

 if (i != 0xFFFFFFFF) ...

 In the optimized version, when the 64 bits "i" value is 
 compared to a 32 bits constant, the test fails...

 Proof that the value is stored in a **64** bits register, not 
 32...

 We're nitpicking over vocabulary. For me buffer != register. 
 Buffer is something in memory in my mental model (or is 
 hardware like the store buffer between register and the cache) 
 but never would I denominate a register as a buffer.

Pick the word you prefer, the i value is stored in a 64 bits 
"place", hence the weird behavior.

Jul 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

int promotion rule. char is signed. The 256 are signed. When the 
result goes above INT_MAX it overflows (i.e. we're in UB 
territory) and the result can be anything. The registers of the 
CPUs are 64 bit wide so it sign extends the calculation and as 
the optimization removes the truncating memory write and reload, 
the value of the complete register is then printed by the cout>>.

Conclusion: typical C(++) undefined behavior due to signed value 
overflow.
Fix: 256u
and always compile with -ftrapv . In your case it would have 
catched the overflow.

In D, signed overflow is not UB so everything works as planned.

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.

Jul 24 2018

Caspar Kielwein <Caspar Kielwein.de> writes:

On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:
 (...)
 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D: ....
 So it is code gen bug on c++ side, or there is something wrong 
 with that code.

Signedness of char in C++ is platform dependent.
See https://en.cppreference.com/w/cpp/language/types "char"
You seem to be running into "signed overflow is undefined 
behaviour" shenanigans.

with all optimizations clang gives a different result than gcc.
https://godbolt.org/g/Dz5djj

Generally use unsigned char (or std::byte) when char means 
"memory".
And prefer a std::vector<unsigned char> to std::string in these 
cases as well.

Jul 24 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Comparing D vs C++ (wierd behaviour of C++)