www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Comparing D vs C++ (wierd behaviour of C++)

reply Daniel Kozak <kozzi11 gmail.com> writes:
I am not C++ expert so this seems wierd to me:

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv)
{
	char c = 0xFF;
	std::string sData = {c,c,c,c};
	unsigned int i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
					
	if (i != 0xFFFFFFFF) { // it is true why?
		// this print 18446744073709551615 wow
		std::cout << "WTF: " << i  << std::endl;
	}	    	
	return 0;
}

compiled with:
g++ -O2 -Wall  -o "test" "test.cxx"
when compiled with -O0 it works as expected

Vs. D:

import std.stdio;

void main(string[] args)
{
	char c = 0xFF;
	string sData = [c,c,c,c];
	uint i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
	if (i != 0xFFFFFFFF) { // is false - make sense
		writefln("WTF: %d", i);
	}			
}

compiled with:
dmd -release -inline -boundscheck=off -w -of"test" "test.d"

So it is code gen bug on c++ side, or there is something wrong 
with that code.
Jul 24 2018
next sibling parent reply Ecstatic Coder <ecstatic.coder gmail.com> writes:
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.
As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
Jul 24 2018
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.
As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
That's not exactly what happens here. There's no 64 bit buffer. It's signed overflow which is undefined behavior in C and C++. He gets different results with and without optimization because without optimization the result of the calculation is spilled to the i unsigned int and then reloaded for the print call. This save and reload truncated the value to its real value. In the optimized version, the compiler removed the spill and the overflowed value contained in the register is printed as is.
Jul 24 2018
next sibling parent reply Ecstatic Coder <ecstatic.coder gmail.com> writes:
On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something 
 wrong with that code.
As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
That's not exactly what happens here. There's no 64 bit buffer.
Sure about that ? ;) As "i" is printed as 18446744073709551615 when put into cout, I don't see how I couldn't be stored as a uint64... It's actually -1 stored as an uint64. This kind of optimizer problem is classical when mixing signed and unsigned values into such bit shifting expressions. This is why you should always cast the signed input values to the unsigned result type right from the start before starting to mix/shift them.
Jul 24 2018
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter 
 wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }

 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something 
 wrong with that code.
As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
That's not exactly what happens here. There's no 64 bit buffer.
Sure about that ? ;)
Yes, there are no "buffers" only register and a place on the stack for the variable i. As said it's undefined behaviour so anything goes. I just checked on godbolt what code is generated. https://godbolt.org/g/wxqfmM So with -O0 this happens: From line 41 to line 77 the instruction to make the calculation. At line 78 mov DWORD PTR [rbp-40], eax which is writing out 32 bits to reserved space of i. At line 85 mov eax, DWORD PTR [rbp-40] reloads that value in eax, this annuls the high part of RAX => RAX contains 0x0000_0000_FFFF_FFFF On the -O2 version it's even simpler. The calculation is done at compile time and the endresult -1 is put directly to the output. The test is even removed. Everything happens in the compiler.
Jul 24 2018
parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 24 July 2018 at 20:59:22 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
 On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter 
 wrote:
 On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder 
 wrote:
 On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 [...]
As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
That's not exactly what happens here. There's no 64 bit buffer.
Sure about that ? ;)
Yes, there are no "buffers" only register and a place on the stack for the variable i. As said it's undefined behaviour so anything goes. I just checked on godbolt what code is generated. https://godbolt.org/g/wxqfmM So with -O0 this happens: From line 41 to line 77 the instruction to make the calculation. At line 78 mov DWORD PTR [rbp-40], eax which is writing out 32 bits to reserved space of i. At line 85 mov eax, DWORD PTR [rbp-40] reloads that value in eax, this annuls the high part of RAX => RAX contains 0x0000_0000_FFFF_FFFF
what I forgot to mention, for the compiler the type deduction for the >> operator is done with the i variable, so it chooses the right template with unsigned int. For the optimized code as the calculation is done during compilation and there is no spill to the variable the type deduction for the >> operator for cout is done with that internal promoted temporary value and it deduces it as long (funnily declaring i as volatile doesn't change that even if the value is spilled to the stack).
 On the -O2 version it's even simpler. The calculation is done 
 at compile time and the endresult -1 is put directly to the 
 output. The test is even removed. Everything happens in the 
 compiler.
Jul 24 2018
prev sibling parent reply Ecstatic Coder <ecstatic.coder gmail.com> writes:
 He gets different results with and without optimization because 
 without optimization the result of the calculation is spilled 
 to the i unsigned int and then reloaded for the print call. 
 This save and reload truncated the value to its real value. In 
 the optimized version, the compiler removed the spill and the 
 overflowed value contained in the register is printed as is.
Btw you are actually confirming what I said. if (i != 0xFFFFFFFF) ... In the optimized version, when the 64 bits "i" value is compared to a 32 bits constant, the test fails... Proof that the value is stored in a **64** bits register, not 32...
Jul 24 2018
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 24 July 2018 at 19:39:10 UTC, Ecstatic Coder wrote:
 He gets different results with and without optimization 
 because without optimization the result of the calculation is 
 spilled to the i unsigned int and then reloaded for the print 
 call. This save and reload truncated the value to its real 
 value. In the optimized version, the compiler removed the 
 spill and the overflowed value contained in the register is 
 printed as is.
Btw you are actually confirming what I said. if (i != 0xFFFFFFFF) ... In the optimized version, when the 64 bits "i" value is compared to a 32 bits constant, the test fails... Proof that the value is stored in a **64** bits register, not 32...
We're nitpicking over vocabulary. For me buffer != register. Buffer is something in memory in my mental model (or is hardware like the store buffer between register and the cache) but never would I denominate a register as a buffer.
Jul 24 2018
parent Ecstatic Coder <ecstatic.coder gmail.com> writes:
On Tuesday, 24 July 2018 at 21:03:00 UTC, Patrick Schluter wrote:
 On Tuesday, 24 July 2018 at 19:39:10 UTC, Ecstatic Coder wrote:
 He gets different results with and without optimization 
 because without optimization the result of the calculation is 
 spilled to the i unsigned int and then reloaded for the print 
 call. This save and reload truncated the value to its real 
 value. In the optimized version, the compiler removed the 
 spill and the overflowed value contained in the register is 
 printed as is.
Btw you are actually confirming what I said. if (i != 0xFFFFFFFF) ... In the optimized version, when the 64 bits "i" value is compared to a 32 bits constant, the test fails... Proof that the value is stored in a **64** bits register, not 32...
We're nitpicking over vocabulary. For me buffer != register. Buffer is something in memory in my mental model (or is hardware like the store buffer between register and the cache) but never would I denominate a register as a buffer.
Pick the word you prefer, the i value is stored in a 64 bits "place", hence the weird behavior.
Jul 24 2018
prev sibling next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:

 #include <iostream>
 #include <string>

 using namespace std;

 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D:

 import std.stdio;

 void main(string[] args)
 {
 	char c = 0xFF;
 	string sData = [c,c,c,c];
 	uint i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 	if (i != 0xFFFFFFFF) { // is false - make sense
 		writefln("WTF: %d", i);
 	}			
 }
int promotion rule. char is signed. The 256 are signed. When the result goes above INT_MAX it overflows (i.e. we're in UB territory) and the result can be anything. The registers of the CPUs are 64 bit wide so it sign extends the calculation and as the optimization removes the truncating memory write and reload, the value of the complete register is then printed by the cout>>. Conclusion: typical C(++) undefined behavior due to signed value overflow. Fix: 256u and always compile with -ftrapv . In your case it would have catched the overflow. In D, signed overflow is not UB so everything works as planned.
 compiled with:
 dmd -release -inline -boundscheck=off -w -of"test" "test.d"

 So it is code gen bug on c++ side, or there is something wrong 
 with that code.
Jul 24 2018
prev sibling parent Caspar Kielwein <Caspar Kielwein.de> writes:
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
 I am not C++ expert so this seems wierd to me:
 (...)
 int main(int argc, char **argv)
 {
 	char c = 0xFF;
 	std::string sData = {c,c,c,c};
 	unsigned int i = (((((sData[0]&0xFF)*256
 					+ (sData[1]&0xFF))*256)
 					+ (sData[2]&0xFF))*256
 					+ (sData[3]&0xFF));
 					
 	if (i != 0xFFFFFFFF) { // it is true why?
 		// this print 18446744073709551615 wow
 		std::cout << "WTF: " << i  << std::endl;
 	}	    	
 	return 0;
 }

 compiled with:
 g++ -O2 -Wall  -o "test" "test.cxx"
 when compiled with -O0 it works as expected

 Vs. D: ....
 So it is code gen bug on c++ side, or there is something wrong 
 with that code.
Signedness of char in C++ is platform dependent. See https://en.cppreference.com/w/cpp/language/types "char" You seem to be running into "signed overflow is undefined behaviour" shenanigans. with all optimizations clang gives a different result than gcc. https://godbolt.org/g/Dz5djj Generally use unsigned char (or std::byte) when char means "memory". And prefer a std::vector<unsigned char> to std::string in these cases as well.
Jul 24 2018