digitalmars.D - size_t for length on x64 will make app slower than on x86?

FrankLike (12/12) Nov 16 2014 Many old projects need move from x86 to x64,but the 'length' type

Iain Buclaw via Digitalmars-d (5/17) Nov 16 2014 That's benchmarking C#, not D. :)
Maxim Fomin (3/15) Nov 16 2014 It means where you have uint x = arr.length you should have had

FrankLike (26/28) Nov 16 2014 I test it :

Flamencofantasy (4/32) Nov 16 2014 I am not sure your test is significant; calling to!string and
Matthias Bentrup (5/33) Nov 17 2014 I ran your test program through a profiler, and it spends >40% of

FrankLike (42/42) Nov 17 2014 I test it:

Freddy (3/45) Nov 17 2014 Don't profile with out optimzation.

FrankLike (2/5) Nov 17 2014 I mean projects moved from x86 to x64, 'cast(int)length ' is

Matthias Bentrup (4/9) Nov 18 2014 I think the reason for the existence of size_t, is that the C

FrankLike (10/15) Nov 18 2014 But now 'int' is enough, not huge and not small.

Marco Leise (11/34) Nov 18 2014 Somehow I always wrote that as

bearophile (14/18) Nov 18 2014 Better:

Marco Leise (6/30) Nov 18 2014 I know, _ doesn't cut it for 2D operations (2 loops) though or
ketmar via Digitalmars-d (4/11) Nov 18 2014 the same as for `foreach (auto n; ...)` -- "cosmetic changes are not

Xinok (4/16) Nov 16 2014 We're missing too many details regarding how he ran his

Flamencofantasy (17/37) Nov 16 2014 That's correct. Moving 64 bit values on a 32 bit machine results

Walter Bright (2/4) Nov 16 2014 -release does not turn on function inlining. Use -inline for that.

ponce (13/25) Nov 17 2014 At least on x86, I would recommand to cast size_t in "int" almost
Marco Leise (64/80) Nov 18 2014 No, you will not get 'int' instead of 'size_t' in 2.067

"FrankLike" <1150015857 qq.com> writes:

Many old projects need move from x86 to x64,but the 'length' type 
is size_t,it will change on x64,so a lot of work must to do.but I 
find some info which is help for d:
http://www.dotnetperls.com/array-length.
it means:
   test length and longlength, and found 'test longlength' is  
slower than 'test length'.

   0.64 ns   Length
   2.55 ns   LongLength

I love D.So I don't want my app on x64 slower than on x86.

Hope change in 2.067.

Thank you all.

Nov 16 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 16 November 2014 13:39, FrankLike via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Many old projects need move from x86 to x64,but the 'length' type is
 size_t,it will change on x64,so a lot of work must to do.but I find some
 info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
   test length and longlength, and found 'test longlength' is  slower than
 'test length'.

   0.64 ns   Length
   2.55 ns   LongLength

 I love D.So I don't want my app on x64 slower than on x86.

 Hope change in 2.067.

 Thank you all.




D it's a field (there's no overhead).

Nov 16 2014

"Maxim Fomin" <maxim-fomin outlook.com> writes:

On Sunday, 16 November 2014 at 13:39:24 UTC, FrankLike wrote:
 Many old projects need move from x86 to x64,but the 'length' 
 type is size_t,it will change on x64,so a lot of work must to 
 do.but I find some info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
   test length and longlength, and found 'test longlength' is  
 slower than 'test length'.

   0.64 ns   Length
   2.55 ns   LongLength

 I love D.So I don't want my app on x64 slower than on x86.

 Hope change in 2.067.

 Thank you all.

It means where you have uint x = arr.length you should have had 
size_t x = arr.length from the very beginning.

Nov 16 2014

"FrankLike" <1150015857 qq.com> writes:

 It means where you have uint x = arr.length you should have had 
 size_t x = arr.length from the very beginning.

I test it :

module aatest;
import std.stdio;
import std.datetime;
import std.conv;
size_t[string] aa;

void ada()
{
	for(size_t i=0;i<1000000;i++)
	{
		aa[to!string(i)] =i;
	}
}
void main()
{
	StopWatch sw;
	sw.start();
	ada();
	sw.stop();
	writeln("\n time is :" , sw.peek().msecs/1000.0," secs");

}
dmd -m64 aatest.d ,and dmd   aatest.d -ofaa32.exe
Result:
m64 :0.553 secs;
m32:0.5 secs;

Thank you all.

Nov 16 2014

"Flamencofantasy" <Flamencofantasy gmail.com> writes:

I am not sure your test is significant; calling to!string and 
inserting into an AA is likely orders of magnitude slower than 
the overhead of shuffling a 64 bit value vs a 32 bit value.




On Sunday, 16 November 2014 at 16:02:20 UTC, FrankLike wrote:
 It means where you have uint x = arr.length you should have 
 had size_t x = arr.length from the very beginning.

 I test it :

 module aatest;
 import std.stdio;
 import std.datetime;
 import std.conv;
 size_t[string] aa;

 void ada()
 {
 	for(size_t i=0;i<1000000;i++)
 	{
 		aa[to!string(i)] =i;
 	}
 }
 void main()
 {
 	StopWatch sw;
 	sw.start();
 	ada();
 	sw.stop();
 	writeln("\n time is :" , sw.peek().msecs/1000.0," secs");

 }
 dmd -m64 aatest.d ,and dmd   aatest.d -ofaa32.exe
 Result:
 m64 :0.553 secs;
 m32:0.5 secs;

 Thank you all.

Nov 16 2014

"Matthias Bentrup" <matthias.bentrup googlemail.com> writes:

On Sunday, 16 November 2014 at 16:02:20 UTC, FrankLike wrote:
 It means where you have uint x = arr.length you should have 
 had size_t x = arr.length from the very beginning.

 I test it :

 module aatest;
 import std.stdio;
 import std.datetime;
 import std.conv;
 size_t[string] aa;

 void ada()
 {
 	for(size_t i=0;i<1000000;i++)
 	{
 		aa[to!string(i)] =i;
 	}
 }
 void main()
 {
 	StopWatch sw;
 	sw.start();
 	ada();
 	sw.stop();
 	writeln("\n time is :" , sw.peek().msecs/1000.0," secs");

 }
 dmd -m64 aatest.d ,and dmd   aatest.d -ofaa32.exe
 Result:
 m64 :0.553 secs;
 m32:0.5 secs;

 Thank you all.

I ran your test program through a profiler, and it spends >40% of 
the time in garbage collection. So I think the slightly longer 
run time is due to the 64 bit GC being a bit slower than the 32 
bit GC.

Nov 17 2014

"FrankLike" <1150015857 qq.com> writes:

I test it:

module aasize_t;
import std.stdio;
import std.datetime;
import std.conv;
import std.string;

size_t[string] aa;

void gettime()
{
	for(size_t i=0;i<3000000;i++)
	{
		aa[to!string(i)] = i;
	}
}
void main()
{  	writeln("size_t.max",size_t.max);
     gettime();
     void getlen(){auto alne = aa.length;}
	auto r = benchmark!(getlen)(10000);
	auto f0Result = to!Duration(r[0]); // time f0 took to run 10,000 
times
	writeln("\n size_t time is :",f0Result);
	StopWatch sw;
	sw.start();
	gettime();
	sw.stop();
	writeln("\n size_t time is sw:",sw.peek.msecs," msecs");
}
----------and anoter is uint[string] aa

dmd -m64 aauint.d
dmd -m64 aasize_t.d
dmd aaint.d -ofaauint32.exe
dmd aasize_t.d -ofaasize_t32.exe

 del *.obj

aaint
aasize_t

aaint32
aasize_t32
 pause

Last Result:

They take the almost same time,and usage memory. but uint(or int) 
is more practical for length to use.

Nov 17 2014

"Freddy" <Hexagonalstar64 gmail.com> writes:

On Monday, 17 November 2014 at 15:28:52 UTC, FrankLike wrote:
 I test it:

 module aasize_t;
 import std.stdio;
 import std.datetime;
 import std.conv;
 import std.string;

 size_t[string] aa;

 void gettime()
 {
 	for(size_t i=0;i<3000000;i++)
 	{
 		aa[to!string(i)] = i;
 	}
 }
 void main()
 {  	writeln("size_t.max",size_t.max);
     gettime();
     void getlen(){auto alne = aa.length;}
 	auto r = benchmark!(getlen)(10000);
 	auto f0Result = to!Duration(r[0]); // time f0 took to run 
 10,000 times
 	writeln("\n size_t time is :",f0Result);
 	StopWatch sw;
 	sw.start();
 	gettime();
 	sw.stop();
 	writeln("\n size_t time is sw:",sw.peek.msecs," msecs");
 }
 ----------and anoter is uint[string] aa

 dmd -m64 aauint.d
 dmd -m64 aasize_t.d
 dmd aaint.d -ofaauint32.exe
 dmd aasize_t.d -ofaasize_t32.exe

  del *.obj

 aaint
 aasize_t

 aaint32
 aasize_t32
  pause

 Last Result:

 They take the almost same time,and usage memory. but uint(or 
 int) is more practical for length to use.

Don't profile with out optimzation.
Add "-O -inline -release -boundscheck=off" to your dmd arguments.

Nov 17 2014

"FrankLike" <1150015857 qq.com> writes:

 Don't profile with out optimzation.
 Add "-O -inline -release -boundscheck=off" to your dmd 
 arguments.

I mean projects moved from x86 to x64, 'cast(int)length ' is 
better than 'size_t i=(something).length '.

Nov 17 2014

"Matthias Bentrup" <matthias.bentrup googlemail.com> writes:

On Tuesday, 18 November 2014 at 07:04:50 UTC, FrankLike wrote:
 Don't profile with out optimzation.
 Add "-O -inline -release -boundscheck=off" to your dmd 
 arguments.

 I mean projects moved from x86 to x64, 'cast(int)length ' is 
 better than 'size_t i=(something).length '.

I think the reason for the existence of size_t, is that the C 
designers thought that the second way is better than the first 
way.

Nov 18 2014

"FrankLike" <1150015857 qq.com> writes:

 I mean projects moved from x86 to x64, 'cast(int)length ' is 
 better than 'size_t i=(something).length '.

 I think the reason for the existence of size_t, is that the C 
 designers thought that the second way is better than the first 
 way.

But now 'int' is enough, not huge and  not small.

if you do this:

   string[] a ={"abc","def","ghk"... };//Assuming a's length is 
1,000,000

  for(int i=0;i<a.length;i++)
   {
   	somework();
   }

it's enough! 'int' easy to write,not Waste.

Most important is easy to migrate code from x86 to x64.

Nov 18 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 18 Nov 2014 12:22:58 +0000
schrieb "FrankLike" <1150015857 qq.com>:

 
 I mean projects moved from x86 to x64, 'cast(int)length ' is 
 better than 'size_t i=(something).length '.

 I think the reason for the existence of size_t, is that the C 
 designers thought that the second way is better than the first 
 way.

 
 But now 'int' is enough, not huge and  not small.
 
 if you do this:
 
    string[] a ={"abc","def","ghk"... };//Assuming a's length is 
 1,000,000
 
   for(int i=0;i<a.length;i++)
    {
    	somework();
    }
 
 it's enough! 'int' easy to write,not Waste.
 
 Most important is easy to migrate code from x86 to x64.

Somehow I always wrote that as

foreach (i; 0 .. a.length)
{
    somework();
}

and benefited from the fact that the compiler only needs to
evaluate a.length once as opposed to the for(...) case.

-- 
Marco

Nov 18 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Marco Leise:

 foreach (i; 0 .. a.length)
 {
     somework();
 }

Better:

foreach (immutable _; 0 .. a.length)
{
     somework();
}

Unfortunately this syntax is not yet supported, for unknown 
reasons:

foreach (; 0 .. a.length)
{
     somework();
}

Bye,
bearophile

Nov 18 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 18 Nov 2014 19:33:42 +0000
schrieb "bearophile" <bearophileHUGS lycos.com>:

 Marco Leise:
 
 foreach (i; 0 .. a.length)
 {
     somework();
 }

 
 Better:
 
 foreach (immutable _; 0 .. a.length)
 {
      somework();
 }
 
 Unfortunately this syntax is not yet supported, for unknown 
 reasons:
 
 foreach (; 0 .. a.length)
 {
      somework();
 }
 
 Bye,
 bearophile

I know, _ doesn't cut it for 2D operations (2 loops) though or
you end up with _ and __ or _1 and _2.

-- 
Marco

Nov 18 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tue, 18 Nov 2014 19:33:42 +0000
bearophile via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Unfortunately this syntax is not yet supported, for unknown=20
 reasons:
=20
 foreach (; 0 .. a.length)
 {
      somework();
 }

the same as for `foreach (auto n; ...)` -- "cosmetic changes are not
necessary".

Nov 18 2014

"Xinok" <xinok live.com> writes:

On Sunday, 16 November 2014 at 13:39:24 UTC, FrankLike wrote:
 Many old projects need move from x86 to x64,but the 'length' 
 type is size_t,it will change on x64,so a lot of work must to 
 do.but I find some info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
   test length and longlength, and found 'test longlength' is  
 slower than 'test length'.

   0.64 ns   Length
   2.55 ns   LongLength

 I love D.So I don't want my app on x64 slower than on x86.

 Hope change in 2.067.

 Thank you all.

We're missing too many details regarding how he ran his 
benchmark. If he compiled and ran his code as 32-bit, that could 
explain the discrepancy.

Nov 16 2014

"Flamencofantasy" <Flamencofantasy gmail.com> writes:

That's correct. Moving 64 bit values on a 32 bit machine results 
in at least 2 machine instructions.


whereas LongLenth is a function call even in release mode.


var length = array.Length;
000007FE8E453AF2  mov         rax,qword ptr [rsp+20h]
000007FE8E453AF7  mov         rax,qword ptr [rax+8]
000007FE8E453AFB  mov         dword ptr [rsp+30h],eax


var longLength = array.LongLength;
000007FE8E453B56  mov         rax,qword ptr [rsp+20h]
000007FE8E453B5B  cmp         byte ptr [rax],0
000007FE8E453B5E  mov         rcx,qword ptr [rsp+20h]
000007FE8E453B63  call        000007FEEE082AB4
000007FE8E453B68  mov         qword ptr [rsp+68h],rax
000007FE8E453B6D  mov         rax,qword ptr [rsp+68h]
000007FE8E453B72  mov         qword ptr [rsp+40h],rax



On Sunday, 16 November 2014 at 16:03:30 UTC, Xinok wrote:
 On Sunday, 16 November 2014 at 13:39:24 UTC, FrankLike wrote:
 Many old projects need move from x86 to x64,but the 'length' 
 type is size_t,it will change on x64,so a lot of work must to 
 do.but I find some info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
  test length and longlength, and found 'test longlength' is  
 slower than 'test length'.

  0.64 ns   Length
  2.55 ns   LongLength

 I love D.So I don't want my app on x64 slower than on x86.

 Hope change in 2.067.

 Thank you all.

 We're missing too many details regarding how he ran his 
 benchmark. If he compiled and ran his code as 32-bit, that 
 could explain the discrepancy.

Nov 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 11/16/2014 8:20 AM, Flamencofantasy wrote:

 LongLenth is a function call even in release mode.

-release does not turn on function inlining. Use -inline for that.

Nov 16 2014

"ponce" <contact gam3sfrommars.fr> writes:

On Sunday, 16 November 2014 at 13:39:24 UTC, FrankLike wrote:
 Many old projects need move from x86 to x64,but the 'length' 
 type is size_t,it will change on x64,so a lot of work must to 
 do.but I find some info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
   test length and longlength, and found 'test longlength' is  
 slower than 'test length'.

   0.64 ns   Length
   2.55 ns   LongLength

 I love D.So I don't want my app on x64 slower than on x86.

 Hope change in 2.067.

 Thank you all.

At least on x86, I would recommand to cast size_t in "int" almost 
everytime for speed.

- signed overflow is undefined behaviour and optimizers can take 
advantage of it.
- 64-bits instructions on x86 takes more bytes to encode. i-cache 
and instruction decoding suffer.
- 32-bits instructions on x86 fill the upper range with zeroes, 
so that false dependencies are eliminated.

For these reasons 32-bits ops on x86 are more often than not 
faster than "native"-sized int, opposite what intuition would 
tell. For better or worse, int has been made the fastest integer 
type by chip-makers.

Nov 17 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 16 Nov 2014 13:39:22 +0000
schrieb "FrankLike" <1150015857 qq.com>:

 Many old projects need move from x86 to x64,but the 'length' type=20
 is size_t,it will change on x64,so a lot of work must to do.but I=20
 find some info which is help for d:
 http://www.dotnetperls.com/array-length.
 it means:
    test length and longlength, and found 'test longlength' is =20
 slower than 'test length'.
=20
    0.64 ns   Length
    2.55 ns   LongLength
=20
 I love D.So I don't want my app on x64 slower than on x86.
=20
 Hope change in 2.067.
=20
 Thank you all.

No, you will not get 'int' instead of 'size_t' in 2.067
because a dubious showed you it is faster. In fact when you
write the code like this and use 1000 times more iterations to
get a reading at all, it looks like this:

--------------------------
import std.stdio;
import std.datetime;

alias =E2=84=95 =3D size_t;

void ada()
{
	foreach (=E2=84=95 i; 0 .. 1_000_000_000) {}
}

void main()
{
	StopWatch sw;
	sw.start();
	ada();
	sw.stop();
	writefln("time is: %s secs", sw.peek().msecs/1000.0);
}
------------------------

And prints 0.461 secs for both
dmd -m32 -boundscheck=3Doff -release -inline -O
and
dmd -m64 -boundscheck=3Doff -release -inline -O
on my laptop.

When I change 'ada' to:

=E2=84=95 ada()
{
	=E2=84=95 v;
	foreach (=E2=84=95 i; 0 .. 1_000_000_000)
	{
		v =3D i+i;
	}
	return v;
}

the -m64 version becomes a lot slower (0.731 secs) compared to
the -m32 version (which stays at 0.461 secs). That does not
have to do with size_t though: If I change the definition of =E2=84=95
to uint or int in the 64-bit version it stays slow. It is just
a difference in the generated code for the loop that makes the
64-bit version generally 270 ms slower.

Now to get some more interesting numbers let's chose an
operation that is inherently O(n) in regards to bit-width:
division

=E2=84=95 ada()
{
	=E2=84=95 v;
	foreach (=E2=84=95 i; 1 .. 1_000_000_001)
	{
		v =3D i/i;
	}
	return v;
}

Results:
alias =E2=84=95 =3D ulong: 17.07 secs
alias =E2=84=95 =3D uint:   5.80 secs
alias =E2=84=95 =3D int:    5.53 secs

The differences for uint and int are compiler dependent. With
LDC uint is faster than int by a similar amount.

--=20
Marco

Nov 18 2014

D Programming

C/C++ Programming

Other

digitalmars.D - size_t for length on x64 will make app slower than on x86?