digitalmars.D.announce - GC vs. Manual Memory Management Real World Comparison

Benjamin Thaut (20/20) Sep 05 2012 I rewrote a 3d game I created during my studies with D 2.0 to manual

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (12/32) Sep 05 2012 Is source code available anywhere?

Benjamin Thaut (13/20) Sep 05 2012 The sourcecode is not aviable yet, as it is in a repository of my

bearophile (5/8) Sep 05 2012 Maybe a compiler-enforced annotation for functions and modules is
=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (7/30) Sep 05 2012 Sure, I just want to point out that it's a problem with the language (GC...

Benjamin Thaut (15/17) Sep 05 2012 Thats exactly what I want to cause with this post. More effort should be...

Benjamin Thaut (9/16) Sep 05 2012 Should be:
=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (9/28) Sep 05 2012 Very true. I've often thought we should ship a GC-less druntime in the

Benjamin Thaut (37/39) Sep 05 2012 Everything is in object_.d:

Peter Alexander (3/10) Sep 05 2012 Wow.

Benjamin Thaut (8/18) Sep 05 2012 I already have a fix for this.

Iain Buclaw (14/45) Sep 05 2012 This got fixed. Said code is now:
Iain Buclaw (12/60) Sep 05 2012 Oops, let me correct myself.

Benjamin Thaut (5/67) Sep 05 2012 Still, comparing two type info objects will result in one or multiple

Andrei Alexandrescu (4/6) Sep 05 2012 Could you please submit a patch for that? Thanks!

Piotr Szturmaj (3/7) Sep 05 2012 There's one proposed solution to this problem:

SomeDude (5/15) Sep 10 2012 It's a bad solution imho. Monitoring the druntime and hunting

bearophile (6/9) Sep 11 2012 Why do you think such hunt is better than letting the compiler

Iain Buclaw (10/15) Sep 11 2012 Is not difficult to implement, as the compiler only needs to warn that t...
SomeDude (21/30) Sep 11 2012 My problem is you litter your codebase with nogc everywhere. In

Felix Hufnagel (31/61) Sep 12 2012 class Foo
Paulo Pinto (2/36) Sep 13 2012 This is partially what happens in C++/CLI and C++/CX.

Rob T (7/14) Oct 23 2012 The compiler option warning about undesirable heap allocations

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (6/26) Sep 05 2012 BTW, your blog post appears to have comparison misspelled.

anonymous (4/42) Sep 05 2012 Also "development".

Benjamin Thaut (9/10) Sep 05 2012 The GDC druntime does have a different folder structure, which makes it

Andrei Alexandrescu (4/5) Sep 05 2012 Smile, you're on reddit:
bearophile (9/12) Sep 05 2012 Regardind your issues list, most of them are fixable, like the

Iain Buclaw (15/19) Sep 05 2012 I have no clue what the issue with invariant handlers is... Calls to

bearophile (40/47) Sep 05 2012 Iain Buclaw:

Iain Buclaw (10/53) Sep 05 2012 I think it was mostly due to that you can't tell the difference

bearophile (14/20) Sep 05 2012 I use fixed size arrays all the time in D. Heap-allocated arrays

bearophile (6/10) Sep 05 2012 Also, the lack of variable length stack allocated arrays in D

Benjamin Thaut (14/24) Sep 05 2012 Well, as overloading new and delete is deprecated, and the new which is
Sean Kelly (5/13) Sep 05 2012 It sounds like your code has escaping references? I think the presence ...

Benjamin Thaut (4/4) Sep 05 2012 My "standard" library is now aviable on github:
Johannes Pfau (10/37) Sep 05 2012 Would be great if some of the code could be merged into phobos,

Benjamin Thaut (8/16) Sep 05 2012 I personally really like my composite template, which allows for direct

Nathan M. Swan (3/25) Sep 05 2012 Did you try GC.disable/enable?
Walter Bright (3/6) Sep 05 2012 I'd like it if you could add some instrumentation to see what accounts f...

Iain Buclaw (6/14) Sep 05 2012 I'd say they are identical, but I don't really look at what goes on
Andrej Mitrovic (5/7) Sep 05 2012 Speaking of which, I'd like to see if the Unilink linker would make
bearophile (6/9) Sep 05 2012 Maybe that performance difference comes from the sum of some

Walter Bright (11/16) Sep 05 2012 We can trade guesses all day, and not get anywhere. Instrumentation and

bearophile (11/12) Sep 06 2012 In that case I think I didn't specify what subsystem of the D

Peter Alexander (4/13) Sep 06 2012 In addition to Walter's response, it is very rare for advanced

Sean Cavanaugh (14/17) Sep 06 2012 I love trying to explain to people our debug builds are too slow because...

Benjamin Thaut (12/19) Sep 06 2012 The code is identical, I did not change anything in the GC code. So it

Jacob Carlborg (6/23) Sep 06 2012 I don't know what Windows has but on Mac OS X there's this application:
ponce (2/9) Sep 06 2012 You don't necessarily need to recompile anything with a sampling

Benjamin Thaut (10/19) Sep 06 2012 I just tried profiling it with Very Sleepy but basically it only tells

ponce (3/18) Sep 06 2012 You might try AMD Code Analyst, it will highlight the bottleneck
Walter Bright (2/5) Sep 06 2012 Even so, that in itself is a good clue.

Sven Torvinger (22/30) Sep 06 2012 my bet is on, cross-module-inlining of bitop.btr failing...

Iain Buclaw (10/42) Sep 06 2012 You would be wrong. btr is a compiler intrinsic, so it is *always* inli...

Walter Bright (2/5) Sep 07 2012 Would it be easy to give that a try, and see what happens?

Iain Buclaw (6/12) Sep 07 2012 Sure, can do. Give me something to work against, and I will be able

Walter Bright (2/15) Sep 07 2012 Well, gdc with and without it!

Sean Kelly (6/25) Sep 06 2012 version.

Jacob Carlborg (4/5) Sep 06 2012 He's using only Windows as far as I understand, GDC MinGW.

Sean Kelly (6/11) Sep 07 2012 Well sure, but MinGW is weird. I'd expect the Windows flag to be set for...

Benjamin Thaut (11/32) Sep 07 2012 I did build druntime and phobos with -release -noboundscheck -inline -O

Andrei Alexandrescu (7/17) Sep 07 2012 You mentioned some issues in Phobos with memory allocation, that you had...

Benjamin Thaut (53/59) Sep 07 2012 Let me give a bit more details about what I did and why.

ponce (7/9) Sep 07 2012 You make some good points about what happen under the hood.
Jens Mueller (7/18) Sep 07 2012 Interesting.
Benjamin Thaut (5/5) Sep 09 2012 The full sourcecode for the non-GC version is now aviable on github. The...
Benjamin Thaut (9/9) Oct 23 2012 Here a small update:

Rob T (20/30) Oct 23 2012 That's a very significant difference in performance that should

Paulo Pinto (6/39) Oct 24 2012 Having dealt with systems programming in languages with GC

Rob T (20/25) Oct 24 2012 Well, performnce is only part of the GC equation. There's

Paulo Pinto (20/48) Oct 24 2012 I do understand that.

Rob T (17/21) Oct 24 2012 Probably no one in here is thinking that we should not have a GC.

Jakob Ovrum (7/12) Oct 24 2012 You can very much link to C and C++ code, or have C and C++ code
Jakob Ovrum (7/12) Oct 24 2012 You can very much link to C and C++ code, or have C and C++ code

Paulo Pinto (9/21) Oct 24 2012 I am speaking without knowing if such thing already exists.
Rob T (14/20) Oct 25 2012 My understanding of dynamic linking and the runtime is based on

Jakob Ovrum (6/19) Oct 25 2012 You are right that compiling the runtime itself (druntime and

Rob T (19/24) Oct 25 2012 Yes I can build my own D shared libs, both as static PIC (.a) and

Jakob Ovrum (27/45) Oct 25 2012 Sorry, I keep forgetting that this is needed on non-Windows
Jakob Ovrum (27/45) Oct 25 2012 Sorry, I keep forgetting that this is needed on non-Windows

Rob T (3/6) Oct 25 2012 What is the GC proxy system, and how do I make use of it?

Jakob Ovrum (14/20) Oct 25 2012 There's a function Runtime.loadLibrary in core.runtime that is

bearophile (20/20) Oct 26 2012 I use this GC thread to show a little GC-related benchmark.

Rob T (2/6) Oct 26 2012 Is this happening with dmd 2.060 as released?

bearophile (6/7) Oct 26 2012 I'm using 2.061alpha git head, but I guess the situation is the

Rob T (13/20) Oct 26 2012 I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried

Rob T (16/40) Oct 26 2012 OK my bad, partially.
Rob T (11/35) Oct 26 2012 I made a mistake. The clear and destroy operations require that a

bearophile (4/5) Oct 27 2012 And setting trades.length to zero and then using GC.free() on its
bearophile (5/5) Oct 27 2012 And with the usual optimizations (struct splitting) coming from

Benjamin Thaut <code benjamin-thaut.de> writes:

I rewrote a 3d game I created during my studies with D 2.0 to manual 
memory mangement. If I'm not studying I'm working in the 3d Engine 
deparement of Havok. As I needed to pratice manual memory management and 
did want to get rid of the GC in D for quite some time, I did go through 
all this effort to create a GC free version of my game.

The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

As you see the manual managed version is twice as fast as the garbage 
collected one. Even the highly optimized version created with GDC is 
still slower the the manual memory management.

You can find the full article at:

http://3d.benjamin-thaut.de/?p=20#more-20


Feedback is welcome.

Kind Regards
Benjamin Thaut

Sep 05 2012

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management and
 did want to get rid of the GC in D for quite some time, I did go through
 all this effort to create a GC free version of my game.

 The results are:

      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the garbage
 collected one. Even the highly optimized version created with GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

Is source code available anywhere?

Also, I have to point out that programming for a garbage collected 
runtime is very different from doing manual memory management. The same 
patterns don't apply, and you optimize in different ways. For instance, 
when using a GC, it is very recommendable that you allocate up front and 
use object pooling - and most importantly, don't allocate at all during 
your render loop.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 13:10, schrieb Alex R�nne Petersen:
 Is source code available anywhere?

 Also, I have to point out that programming for a garbage collected
 runtime is very different from doing manual memory management. The same
 patterns don't apply, and you optimize in different ways. For instance,
 when using a GC, it is very recommendable that you allocate up front and
 use object pooling - and most importantly, don't allocate at all during
 your render loop.

The sourcecode is not aviable yet, as it is in a repository of my 
university, but I can zip it and upload the current version if that is 
wanted. But it currently does only support Windows and does not have any 
setup instructions yet.

I do object pooling in both versions, as in game developement you 
usually don't allocate during the frame. But still in the GC version you 
have the problem that way to many parts of the language allocate and you 
don't event notice it when using the GC.

Just to clarify, I'm into 3d engine developement since about 7 years 
now. So I'm not a newcomer to the subject.

Kind Regards
Benjamin Thaut

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Benjamin Thaut:

 But still in the GC version you have the problem that way to 
 many parts of the language allocate and you don't event notice 
 it when using the GC.

Maybe a compiler-enforced annotation for functions and modules is 
able to remove this problem in D.

Bye,
bearophile

Sep 05 2012

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 05-09-2012 13:19, Benjamin Thaut wrote:
 Am 05.09.2012 13:10, schrieb Alex R�nne Petersen:
 Is source code available anywhere?

 Also, I have to point out that programming for a garbage collected
 runtime is very different from doing manual memory management. The same
 patterns don't apply, and you optimize in different ways. For instance,
 when using a GC, it is very recommendable that you allocate up front and
 use object pooling - and most importantly, don't allocate at all during
 your render loop.

 The sourcecode is not aviable yet, as it is in a repository of my
 university, but I can zip it and upload the current version if that is
 wanted. But it currently does only support Windows and does not have any
 setup instructions yet.

 I do object pooling in both versions, as in game developement you
 usually don't allocate during the frame. But still in the GC version you
 have the problem that way to many parts of the language allocate and you
 don't event notice it when using the GC.

 Just to clarify, I'm into 3d engine developement since about 7 years
 now. So I'm not a newcomer to the subject.

 Kind Regards
 Benjamin Thaut

Sure, I just want to point out that it's a problem with the language (GC 
allocations being very non-obvious) as opposed to the nature of GC.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 14:00, schrieb Alex R�nne Petersen:
 Sure, I just want to point out that it's a problem with the language (GC
 allocations being very non-obvious) as opposed to the nature of GC.

Thats exactly what I want to cause with this post. More effort should be 
put into the parts of D that currently allocate, but absolutley don't 
have to. Also the statement "You can use D without a GC" is not quite as 
easy as the homepage makes it sound.

My favorite hidden allocation so far is:

class A {}
class B : A{}

A a = new A();
B b = new B();

if(a == b) //this will allocate
{
}

Kind Regards
Benjamin Thaut

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 14:07, schrieb Benjamin Thaut:
 class A {}
 class B : A{}

 A a = new A();
 B b = new B();

 if(a == b) //this will allocate
 {
 }

Should be:

class A {}
class B : A{}

const(A) a = new A();
const(B) b = new B();

if(a == b) //this will allocate
{
}

Sep 05 2012

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 05-09-2012 14:07, Benjamin Thaut wrote:
 Am 05.09.2012 14:00, schrieb Alex R�nne Petersen:
  >
 Sure, I just want to point out that it's a problem with the language (GC
 allocations being very non-obvious) as opposed to the nature of GC.

 Thats exactly what I want to cause with this post. More effort should be
 put into the parts of D that currently allocate, but absolutley don't
 have to. Also the statement "You can use D without a GC" is not quite as
 easy as the homepage makes it sound.

Very true. I've often thought we should ship a GC-less druntime in the 
normal distribution.

 My favorite hidden allocation so far is:

 class A {}
 class B : A{}

 A a = new A();
 B b = new B();

 if(a == b) //this will allocate
 {
 }

Where's the catch? From looking in druntime, I don't see where the 
allocation could occur.

 Kind Regards
 Benjamin Thaut

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 14:14, schrieb Alex R�nne Petersen:
 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

Everything is in object_.d:

     equals_t opEquals(Object lhs, Object rhs)
     {
         if (lhs is rhs)
             return true;
         if (lhs is null || rhs is null)
             return false;
         if (typeid(lhs) == typeid(rhs))
             return lhs.opEquals(rhs);
         return lhs.opEquals(rhs) &&
                rhs.opEquals(lhs);
     }

Will trigger a comparison of the TypeInfo objects with
if (typeid(lhs) == typeid(rhs))

Which will after some function calls trigger opEquals of TypeInfo

     override equals_t opEquals(Object o)
     {
         /* TypeInfo instances are singletons, but duplicates can exist
          * across DLL's. Therefore, comparing for a name match is
          * sufficient.
          */
         if (this is o)
             return true;
         TypeInfo ti = cast(TypeInfo)o;
         return ti && this.toString() == ti.toString();
     }

Then because they are const, TypeInfo_Const.toString() will be called:

     override string toString()
     {
         return cast(string) ("const(" ~ base.toString() ~ ")");
     }

which allocates, due to array concardination.

But this only happens, if they are not of the same type, and if one of 
them has a storage qualifier.

Kind Regards
Benjamin Thaut

Sep 05 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut 
wrote:
 Then because they are const, TypeInfo_Const.toString() will be 
 called:

     override string toString()
     {
         return cast(string) ("const(" ~ base.toString() ~ ")");
     }

 which allocates, due to array concardination.

Wow.

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 14:34, schrieb Peter Alexander:
 On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut wrote:
 Then because they are const, TypeInfo_Const.toString() will be called:

     override string toString()
     {
         return cast(string) ("const(" ~ base.toString() ~ ")");
     }

 which allocates, due to array concardination.

 Wow.

I already have a fix for this.

https://github.com/Ingrater/druntime/commit/74713f7af496fd50fe4cfe60b3d9906b87efbdb6
https://github.com/Ingrater/druntime/commit/05c440b0322d39cf98425f50172c468c6659efb8

If I find a good description how to do pull requests, I might be able to 
do one.

Kind Regards
Benjamin Thaut

Sep 05 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

 Everything is in object_.d:

     equals_t opEquals(Object lhs, Object rhs)
     {
         if (lhs is rhs)
             return true;
         if (lhs is null || rhs is null)
             return false;
         if (typeid(lhs) =3D=3D typeid(rhs))
             return lhs.opEquals(rhs);
         return lhs.opEquals(rhs) &&
                rhs.opEquals(lhs);
     }

 Will trigger a comparison of the TypeInfo objects with
 if (typeid(lhs) =3D=3D typeid(rhs))

 Which will after some function calls trigger opEquals of TypeInfo

     override equals_t opEquals(Object o)
     {
         /* TypeInfo instances are singletons, but duplicates can exist
          * across DLL's. Therefore, comparing for a name match is
          * sufficient.
          */
         if (this is o)
             return true;
         TypeInfo ti =3D cast(TypeInfo)o;
         return ti && this.toString() =3D=3D ti.toString();
     }

This got fixed.  Said code is now:

override equals_t opEquals(Object o)
{
    if (this is o)
        return true;
    auto c =3D cast(const TypeInfo_Class)o;
    return c && this.info.name =3D=3D c.info.name;
}

Causing no hidden allocation.


Regards
--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Sep 05 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

 Everything is in object_.d:

     equals_t opEquals(Object lhs, Object rhs)
     {
         if (lhs is rhs)
             return true;
         if (lhs is null || rhs is null)
             return false;
         if (typeid(lhs) =3D=3D typeid(rhs))
             return lhs.opEquals(rhs);
         return lhs.opEquals(rhs) &&
                rhs.opEquals(lhs);
     }

 Will trigger a comparison of the TypeInfo objects with
 if (typeid(lhs) =3D=3D typeid(rhs))

 Which will after some function calls trigger opEquals of TypeInfo

     override equals_t opEquals(Object o)
     {
         /* TypeInfo instances are singletons, but duplicates can exist
          * across DLL's. Therefore, comparing for a name match is
          * sufficient.
          */
         if (this is o)
             return true;
         TypeInfo ti =3D cast(TypeInfo)o;
         return ti && this.toString() =3D=3D ti.toString();
     }

 This got fixed.  Said code is now:

 override equals_t opEquals(Object o)
 {
     if (this is o)
         return true;
     auto c =3D cast(const TypeInfo_Class)o;
     return c && this.info.name =3D=3D c.info.name;
 }

 Causing no hidden allocation.

Oops, let me correct myself.

This was hacked at to call the *correct* opEquals method above.


bool opEquals(const Object lhs, const Object rhs)
{
    // A hack for the moment.
    return opEquals(cast()lhs, cast()rhs);
}


Regards
--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 15:07, schrieb Iain Buclaw:
 On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex R�nne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

 Everything is in object_.d:

      equals_t opEquals(Object lhs, Object rhs)
      {
          if (lhs is rhs)
              return true;
          if (lhs is null || rhs is null)
              return false;
          if (typeid(lhs) == typeid(rhs))
              return lhs.opEquals(rhs);
          return lhs.opEquals(rhs) &&
                 rhs.opEquals(lhs);
      }

 Will trigger a comparison of the TypeInfo objects with
 if (typeid(lhs) == typeid(rhs))

 Which will after some function calls trigger opEquals of TypeInfo

      override equals_t opEquals(Object o)
      {
          /* TypeInfo instances are singletons, but duplicates can exist
           * across DLL's. Therefore, comparing for a name match is
           * sufficient.
           */
          if (this is o)
              return true;
          TypeInfo ti = cast(TypeInfo)o;
          return ti && this.toString() == ti.toString();
      }

 This got fixed.  Said code is now:

 override equals_t opEquals(Object o)
 {
      if (this is o)
          return true;
      auto c = cast(const TypeInfo_Class)o;
      return c && this.info.name == c.info.name;
 }

 Causing no hidden allocation.

 Oops, let me correct myself.

 This was hacked at to call the *correct* opEquals method above.


 bool opEquals(const Object lhs, const Object rhs)
 {
      // A hack for the moment.
      return opEquals(cast()lhs, cast()rhs);
 }


 Regards

Still, comparing two type info objects will result in one or multiple 
allocations most of the time.

Kind Regards
Benjamin Thaut

Sep 05 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/5/12 4:59 PM, Benjamin Thaut wrote:
 Still, comparing two type info objects will result in one or multiple
 allocations most of the time.

Could you please submit a patch for that? Thanks!

Andrei

P.S. Very nice work. Congrats!

Sep 05 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Benjamin Thaut wrote:
 I do object pooling in both versions, as in game developement you
 usually don't allocate during the frame. But still in the GC version you
 have the problem that way to many parts of the language allocate and you
 don't event notice it when using the GC.

There's one proposed solution to this problem: 
http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com

Sep 05 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Wednesday, 5 September 2012 at 12:28:43 UTC, Piotr Szturmaj 
wrote:
 Benjamin Thaut wrote:
 I do object pooling in both versions, as in game developement 
 you
 usually don't allocate during the frame. But still in the GC 
 version you
 have the problem that way to many parts of the language 
 allocate and you
 don't event notice it when using the GC.

 There's one proposed solution to this problem: 
 http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com

It's a bad solution imho. Monitoring the druntime and hunting 
every part that allocates until our codebase is correct like 
Benjamen Thaut is a much better solution

Sep 10 2012

"bearophile" <bearophileHUGS lycos.com> writes:

SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

Why do you think such hunt is better than letting the compiler 
tell you what parts of your program have the side effects you 
want to avoid?

Bye,
bearophile

Sep 11 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

Is not difficult to implement, as the compiler only needs to warn that the
emission of /certain/ library calls /may/ cause heap allocations.

Regards.
----
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

On 11 Sep 2012 11:31, "bearophile" <bearophileHUGS lycos.com> wrote:
 SomeDude:


 It's a bad solution imho. Monitoring the druntime and hunting every part


that allocates until our codebase is correct like Benjamen Thaut is a much
better solution
 Why do you think such hunt is better than letting the compiler tell you

what parts of your program have the side effects you want to avoid?
 Bye,
 bearophile

Sep 11 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

 Why do you think such hunt is better than letting the compiler 
 tell you what parts of your program have the side effects you 
 want to avoid?

 Bye,
 bearophile

My problem is you litter your codebase with nogc everywhere. In  
similar fashion, the nothrow keyword, for instance, has to be 
appended just about everywhere and I find it very ugly on its 
own. Basically, with this scheme, you have to annotate every 
single method you write for each and every guarantee (nothrow, 
nogc, nosideeffect, noshared, whatever you fancy) you want to 
ensure. This doesn't scale well at all.

I would find it okay to use a  noalloc annotation as a shortcut 
for a compiler switch or a an external tool to detect allocations 
in some part of code (as a digression, I tend to think D 
 annotations as compiler or tooling switches. One could imagine a 
general scheme where one associates a  annotation with a 
compiler/tool switch whose effect is limited to the annotated 
scope).
I suppose the tool has to build the full call tree starting with 
the  nogc method until it reaches the leaves or finds calls to 
new or malloc; you would have to do that for every single  nogc 
annotation, which could be very slow, unless you trust the 
developer that indeed his code doesn't allocate, which means he 
effectively needs to litter his codebase with nogc keywords.

Sep 11 2012

"Felix Hufnagel" <suicide xited.de> writes:

class Foo
{
      safe nothrow:
	void method_is_nothrow(){}
	void method_is_also_nothrow(){}
}


or

class Foo
{
      safe nothrow
     {	=

	void method_is_nothrow(){}
	void method_is_also_nothrow(){}
     }
}

no need to append it to every single method by hand...



Am 12.09.2012, 04:38 Uhr, schrieb SomeDude <lovelydear mailmetrash.com>:=


 On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting every =



 =

 part that allocates until our codebase is correct like Benjamen Thau=



t  =

 is a much better solution

 Why do you think such hunt is better than letting the compiler tell y=


ou  =

 what parts of your program have the side effects you want to avoid?

 Bye,
 bearophile

 My problem is you litter your codebase with nogc everywhere. In  simil=

ar  =

 fashion, the nothrow keyword, for instance, has to be appended just  =

 about everywhere and I find it very ugly on its own. Basically, with  =

 this scheme, you have to annotate every single method you write for ea=

ch  =

 and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever y=

ou  =

 fancy) you want to ensure. This doesn't scale well at all.

 I would find it okay to use a  noalloc annotation as a shortcut for a =

 =

 compiler switch or a an external tool to detect allocations in some pa=

rt  =

 of code (as a digression, I tend to think D  annotations as compiler o=

r  =

 tooling switches. One could imagine a general scheme where one  =

 associates a  annotation with a compiler/tool switch whose effect is  =

 limited to the annotated scope).
 I suppose the tool has to build the full call tree starting with the  =

  nogc method until it reaches the leaves or finds calls to new or  =

 malloc; you would have to do that for every single  nogc annotation,  =

 which could be very slow, unless you trust the developer that indeed h=

is  =

 code doesn't allocate, which means he effectively needs to litter his =

 =

 codebase with nogc keywords.


-- =

Erstellt mit Operas revolution=E4rem E-Mail-Modul: http://www.opera.com/=
mail/

Sep 12 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Wednesday, 12 September 2012 at 02:37:52 UTC, SomeDude wrote:
 On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

 Why do you think such hunt is better than letting the compiler 
 tell you what parts of your program have the side effects you 
 want to avoid?

 Bye,
 bearophile

 My problem is you litter your codebase with nogc everywhere. In
  similar fashion, the nothrow keyword, for instance, has to be 
 appended just about everywhere and I find it very ugly on its 
 own. Basically, with this scheme, you have to annotate every 
 single method you write for each and every guarantee (nothrow, 
 nogc, nosideeffect, noshared, whatever you fancy) you want to 
 ensure. This doesn't scale well at all.

 I would find it okay to use a  noalloc annotation as a shortcut 
 for a compiler switch or a an external tool to detect 
 allocations in some part of code (as a digression, I tend to 
 think D  annotations as compiler or tooling switches. One could 
 imagine a general scheme where one associates a  annotation 
 with a compiler/tool switch whose effect is limited to the 
 annotated scope).
 I suppose the tool has to build the full call tree starting 
 with the  nogc method until it reaches the leaves or finds 
 calls to new or malloc; you would have to do that for every 
 single  nogc annotation, which could be very slow, unless you 
 trust the developer that indeed his code doesn't allocate, 
 which means he effectively needs to litter his codebase with 
 nogc keywords.

This is partially what happens in C++/CLI and C++/CX.

Sep 13 2012

"Rob T" <rob ucora.com> writes:

On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

 Why do you think such hunt is better than letting the compiler 
 tell you what parts of your program have the side effects you 
 want to avoid?

The compiler option warning about undesirable heap allocations 
will allow for complete undesirable allocations to be identified 
much more easily and without missing anything. This is a general 
solution to a general problem where a programmer wishes to avoid 
heap allocations for whatever reason.

--rt

Oct 23 2012

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management and
 did want to get rid of the GC in D for quite some time, I did go through
 all this effort to create a GC free version of my game.

 The results are:

      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the garbage
 collected one. Even the highly optimized version created with GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

BTW, your blog post appears to have comparison misspelled.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Sep 05 2012

"anonymous" <anonymous nobody.alone> writes:

On Wednesday, 5 September 2012 at 12:22:52 UTC, Alex Rønne 
Petersen wrote:
 On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to 
 manual
 memory mangement. If I'm not studying I'm working in the 3d 
 Engine
 deparement of Havok. As I needed to pratice manual memory 
 management and
 did want to get rid of the GC in D for quite some time, I did 
 go through
 all this effort to create a GC free version of my game.

 The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the 
 garbage
 collected one. Even the highly optimized version created with 
 GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

 BTW, your blog post appears to have comparison misspelled.

Also "development".

It was interesting to read it. What about GDC MMM?

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 16:07, schrieb anonymous:
 It was interesting to read it. What about GDC MMM?

The GDC druntime does have a different folder structure, which makes it 
a lot more time consuming to add in the changes. Also it is not possible 
to rebuild phobos or druntime with the binary release of GDC Mingw. You 
need the complete build setup for GDC mingw to do that. As this is not 
documented very well and quite some work I didn't go through that 
additional effort.

Kind Regards
Benjamin Thaut

Sep 05 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/5/12 1:03 PM, Benjamin Thaut wrote:
 http://3d.benjamin-thaut.de/?p=20#more-20

Smile, you're on reddit:

http://www.reddit.com/r/programming/comments/ze4cx/real_world_comparison_gc_vs_manual_memory/


Andrei

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20

Regardind your issues list, most of them are fixable, like the 
one regarding array literals, and even the one regarding the 
invariant handler.

But I didn't know about this, and I don't know how and if this is 
fixable:

The new statement will not free any memory if the constructor 
throws a exception.<

Insights welcome.

Bye,
bearophile

Sep 05 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 5 September 2012 15:57, bearophile <bearophileHUGS lycos.com> wrote:
 Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Regardind your issues list, most of them are fixable, like the one regarding
 array literals, and even the one regarding the invariant handler.

I have no clue what the issue with invariant handlers is...  Calls to
them are not emitted in release code, and if you think they are, then
you've probably built either your application, or the library you are
using wrong.

Array literals are not so easy to fix.  I once thought that it would
be optimal to make it a stack initialisation given that all values are
known at compile time, this infact caused many strange SEGV's in quite
a few of my programs  (most are parsers / interpreters, so things that
go down *heavy* nested into itself, and it was under these
circumstances that array literals on the stack would go corrupt in one
way or another causing *huge* errors in perfectly sound code).



-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Iain Buclaw:

Most of the array allocation cases we are talking about are like:

void main() {
   int[3] a = [1, 2, 3]; // fixed size array
}


That currently produces, with DMD:

__Dmain:
L0:     sub ESP, 010h
         mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ
         push EBX
         push 0Ch
         push 3
         push EAX
         call near ptr __d_arrayliteralTX
         add ESP, 8
         mov EBX, EAX
         mov dword ptr [EAX], 1
         mov ECX, EBX
         push EBX
         lea EDX, 010h[ESP]
         mov dword ptr 4[EBX], 2
         mov dword ptr 8[EBX], 3
         push EDX
         call near ptr _memcpy
         add ESP, 0Ch
         xor EAX, EAX
         pop EBX
         add ESP, 010h
         ret



There is also the case for dynamic arrays:

void main() {
   int[] a = [1, 2, 3];
   // use a here
}

But this is a harder problem, to leave for later.


 this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so 
 things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt 
 in one
 way or another causing *huge* errors in perfectly sound code).

Do you know the cause of such corruptions? maybe they are caused 
by other compiler bugs...

And what to do regarding those exceptions in constructors? :-)

Bye,
bearophile

Sep 05 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 5 September 2012 16:31, bearophile <bearophileHUGS lycos.com> wrote:
 Iain Buclaw:

 Most of the array allocation cases we are talking about are like:

 void main() {
   int[3] a = [1, 2, 3]; // fixed size array
 }


 That currently produces, with DMD:

 __Dmain:
 L0:     sub ESP, 010h
         mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ
         push EBX
         push 0Ch
         push 3
         push EAX
         call near ptr __d_arrayliteralTX
         add ESP, 8
         mov EBX, EAX
         mov dword ptr [EAX], 1
         mov ECX, EBX
         push EBX
         lea EDX, 010h[ESP]
         mov dword ptr 4[EBX], 2
         mov dword ptr 8[EBX], 3
         push EDX
         call near ptr _memcpy
         add ESP, 0Ch
         xor EAX, EAX
         pop EBX
         add ESP, 010h
         ret



 There is also the case for dynamic arrays:

 void main() {
   int[] a = [1, 2, 3];
   // use a here
 }

 But this is a harder problem, to leave for later.



 this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt in one
 way or another causing *huge* errors in perfectly sound code).


 Do you know the cause of such corruptions? maybe they are caused by other
 compiler bugs...

 And what to do regarding those exceptions in constructors? :-)

I think it was mostly due to that you can't tell the difference
between array literals that are to be assigned to either dynamic or
static arrays (as far as I can tell).   I do believe that the issues
surrounded dynamic arrays causing SEGVs, and not static  (I don't
recall ever needing the use of a static array :-).


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Iain Buclaw:

 I think it was mostly due to that you can't tell the difference
 between array literals that are to be assigned to either 
 dynamic or static arrays (as far as I can tell).

I see.


 I do believe that the issues
 surrounded dynamic arrays causing SEGVs, and not static  (I 
 don't recall ever needing the use of a static array :-).

I use fixed size arrays all the time in D. Heap-allocated arrays 
are overused in D. They produce garbage and in lot of cases they 
are not needed. Using them a lot is sometimes a bad habit (if you 
are writing script-like programs they are OK), that's also 
encouraged by making them almost second-class citizens in Phobos 
(and druntime, using them as AA keys causes performance troubles).

If you take a look at Ada language you see how much 
static/stack-allocated arrays are used. In high performance code 
they help, and I'd like D programmers and Phobos devs to give 
them a little more consideration.

Bye,
bearophile

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

 If you take a look at Ada language you see how much 
 static/stack-allocated arrays are used. In high performance 
 code they help, and I'd like D programmers and Phobos devs to 
 give them a little more consideration.

Also, the lack of variable length stack allocated arrays in D 
forces you to over-allocate, wasting stack space, or forces you 
to use alloca() that is bug-prone and makes things not easy if 
you need a multi dimensional array.

Bye,
bearophile

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 16:57, schrieb bearophile:
 Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20

 Regardind your issues list, most of them are fixable, like the one
 regarding array literals, and even the one regarding the invariant handler.

 But I didn't know about this, and I don't know how and if this is fixable:

 The new statement will not free any memory if the constructor throws a
 exception.<

 Insights welcome.

 Bye,
 bearophile

Well, as overloading new and delete is deprecated, and the new which is 
part of the language only works together with a GC I don't think that 
anything will be done about this.

Its not a big problem in D because you can't create arrays of objects so 
that multiple constructors will be called at the same time. (Which is 
the biggest issue in c++ with exceptions and constructors). Also doe to 
memory pre initialization the object will always be in a meaningfull 
state, which helps with exception handling too. My replacement just 
calls the constructor, and if a exception is thrown, the destructor is 
called and the memory is freed, then the new statement returns null. 
Works flawlessley so far.

Kind Regards
Benjamin Thaut

Sep 05 2012

Sean Kelly <sean invisibleduck.org> writes:

On Sep 5, 2012, at 8:08 AM, Iain Buclaw <ibuclaw ubuntu.com> wrote:
=20
 Array literals are not so easy to fix.  I once thought that it would
 be optimal to make it a stack initialisation given that all values are
 known at compile time, this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt in one
 way or another causing *huge* errors in perfectly sound code).

It sounds like your code has escaping references?  I think the presence =
of a GC tends to eliminate a lot of thought about data ownership.  This =
is usually beneficial in that maintaining ownership rules tends to be a =
huge pain, but then it also tends to avoid issues like this.=

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

My "standard" library is now aviable on github:

https://github.com/Ingrater/thBase

Kind Regards
Benjamin Thaut

Sep 05 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 05 Sep 2012 13:03:37 +0200
schrieb Benjamin Thaut <code benjamin-thaut.de>:

 I rewrote a 3d game I created during my studies with D 2.0 to manual 
 memory mangement. If I'm not studying I'm working in the 3d Engine 
 deparement of Havok. As I needed to pratice manual memory management
 and did want to get rid of the GC in D for quite some time, I did go
 through all this effort to create a GC free version of my game.
 
 The results are:
 
      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime
 
 GC collection times:
 
      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms
 
 As you see the manual managed version is twice as fast as the garbage 
 collected one. Even the highly optimized version created with GDC is 
 still slower the the manual memory management.
 
 You can find the full article at:
 
 http://3d.benjamin-thaut.de/?p=20#more-20
 
 
 Feedback is welcome.

Would be great if some of the code could be merged into phobos,
especially the memory tracker. But also things like memory or object
pools would be great in phobos, an emplace wrapper which accepts a
custom alloc function to replace new (and something similar for delete),
etc. We really need a module for manual memory management (std.mmm?).
And functions which currently use the GC to allocate should get
overloads which take buffers (Or better support custom allocators, but
that needs an allocator design first).

Sep 05 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 05.09.2012 19:31, schrieb Johannes Pfau:
 Would be great if some of the code could be merged into phobos,
 especially the memory tracker. But also things like memory or object
 pools would be great in phobos, an emplace wrapper which accepts a
 custom alloc function to replace new (and something similar for delete),
 etc. We really need a module for manual memory management (std.mmm?).
 And functions which currently use the GC to allocate should get
 overloads which take buffers (Or better support custom allocators, but
 that needs an allocator design first).

I personally really like my composite template, which allows for direct 
composition of one class instance into another. It does not introduce 
additional indirections and the compiler will remind you, if you forgett 
to initialize it.

https://github.com/Ingrater/druntime/blob/master/src/core/allocator.d#L670

Kind Regards
Benjamin Thaut

Sep 05 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Wednesday, 5 September 2012 at 11:03:03 UTC, Benjamin Thaut 
wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to 
 manual memory mangement. If I'm not studying I'm working in the 
 3d Engine deparement of Havok. As I needed to pratice manual 
 memory management and did want to get rid of the GC in D for 
 quite some time, I did go through all this effort to create a 
 GC free version of my game.

 The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the 
 garbage collected one. Even the highly optimized version 
 created with GDC is still slower the the manual memory 
 management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

Did you try GC.disable/enable?

Sep 05 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

I'd like it if you could add some instrumentation to see what accounts for the 
time difference. I presume they both use the same D source code.

Sep 05 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 6 September 2012 00:10, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms


 I'd like it if you could add some instrumentation to see what accounts for
 the time difference. I presume they both use the same D source code.

I'd say they are identical, but I don't really look at what goes on
over on the MinGW port.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Sep 05 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 9/6/12, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 I'd say they are identical, but I don't really look at what goes on
 over on the MinGW port.

Speaking of which, I'd like to see if the Unilink linker would make
any difference as well. It's known to make smaller binaries than
Optlink. I think Unilink could be tested with MinGW if it supports
whatever GDC outputs, to compare against LD.

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 I'd like it if you could add some instrumentation to see what 
 accounts for the time difference. I presume they both use the 
 same D source code.

Maybe that performance difference comes from the sum of some 
metric tons of different little optimizations done by the GCC 
back-end.

Bye,
bearophile

Sep 05 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 9/5/2012 5:01 PM, bearophile wrote:
 Walter Bright:

 I'd like it if you could add some instrumentation to see what accounts for the
 time difference. I presume they both use the same D source code.

 Maybe that performance difference comes from the sum of some metric tons of
 different little optimizations done by the GCC back-end.

We can trade guesses all day, and not get anywhere. Instrumentation and 
measurement is needed.

I've investigated many similar things, and the truth usually turned out to be 
something nobody guessed or assumed. I recall the benchmark you posted where
you 
guessed that dmd's integer code generation was woefully deficient. Examining
the 
actual output showed that there wasn't a dime's worth of difference in the code 
generated from dmd vs gcc.

The problem turned out to be the long division runtime library function. Fixing 
that brought the timings to parity.

No code gen changes whatsoever were needed.

Sep 05 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 No code gen changes whatsoever were needed.

In that case I think I didn't specify what subsystem of the D 
compiler was not "good enough", I have just shown a performance 
difference. The division was slow, regardless of the cause. This 
is what's important for the final C/D programmer, not if the 
cause is a badly written division routine, or a bad/missing 
optimization stage.

And regarding divisions, currently they are not optimized by dmd 
if divisors are small (like 10) and statically known.

Bye,
bearophile

Sep 06 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Thursday, 6 September 2012 at 00:00:31 UTC, bearophile wrote:
 Walter Bright:

 I'd like it if you could add some instrumentation to see what 
 accounts for the time difference. I presume they both use the 
 same D source code.

 Maybe that performance difference comes from the sum of some 
 metric tons of different little optimizations done by the GCC 
 back-end.

 Bye,
 bearophile

In addition to Walter's response, it is very rare for advanced 
compiler optimisations to make >2x difference on any non-trivial 
code. Not impossible, but it's definitely suspicious.

Sep 06 2012

Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:

On 9/6/2012 4:30 AM, Peter Alexander wrote:
 In addition to Walter's response, it is very rare for advanced compiler
 optimisations to make >2x difference on any non-trivial code. Not
 impossible, but it's definitely suspicious.

I love trying to explain to people our debug builds are too slow because 
they have instrumented too much of the code, and haven't disabled any of 
it.  A lot of people are pushed into debugging release builds as a 
result, which is pretty silly.

Now there are some pathological cases:
   non-inlined constructors can sometimes kill in some cases you for 3d 
vector math type libraries
   128 bit SIMD intrinsics with microsofts compiler in debug builds 
makes horrifically slow code, each operation has its results written to 
memory and then is reloaded for the next 'instruction'.  I believe its 
two order of magnitudes slower (the extra instructions, plus pegging the 
read and write ports of the CPU hurt quite a lot too).  These tend to be 
right functions so can be optimized in debug builds selectively . . .

Sep 06 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 06.09.2012 01:10, schrieb Walter Bright:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

 DMD GC Version: 8.9 ms
 GDC GC Version: 4.1 ms

 I'd like it if you could add some instrumentation to see what accounts
 for the time difference. I presume they both use the same D source code.

The code is identical, I did not change anything in the GC code. So it 
uses whatever code comes with the MinGW GDC 2.058 release.

The problem with intstrumentation is, that I can not recompile druntime 
for the MinGW GDC, as this is not possible with the binary release of 
MinGW GDC and I did not go thorugh the effort to setup the whole build.
I'm open to suggestions though how I could profile the GC without 
recompiling druntime. If someone else wants to profile this, I can also 
provide precompiled versions of both versions.

-- 
Kind Regards
Benjamin Thaut

Sep 06 2012

Jacob Carlborg <doob me.com> writes:

On 2012-09-06 14:12, Benjamin Thaut wrote:
 Am 06.09.2012 01:10, schrieb Walter Bright:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

 DMD GC Version: 8.9 ms
 GDC GC Version: 4.1 ms

 I'd like it if you could add some instrumentation to see what accounts
 for the time difference. I presume they both use the same D source code.

 The code is identical, I did not change anything in the GC code. So it
 uses whatever code comes with the MinGW GDC 2.058 release.

 The problem with intstrumentation is, that I can not recompile druntime
 for the MinGW GDC, as this is not possible with the binary release of
 MinGW GDC and I did not go thorugh the effort to setup the whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can also
 provide precompiled versions of both versions.

I don't know what Windows has but on Mac OS X there's this application:

https://developer.apple.com/library/mac/#documentation/developertools/conceptual/InstrumentsUserGuide/Introduction/Introduction.html

It lets you instrument any running application.

-- 
/Jacob Carlborg

Sep 06 2012

"ponce" <spam spam.org> writes:

 The problem with intstrumentation is, that I can not recompile 
 druntime for the MinGW GDC, as this is not possible with the 
 binary release of MinGW GDC and I did not go thorugh the effort 
 to setup the whole build.
 I'm open to suggestions though how I could profile the GC 
 without recompiling druntime. If someone else wants to profile 
 this, I can also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling
profiler like AMD Code Analyst or Very Sleepy

Sep 06 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

 You don't necessarily need to recompile anything with a sampling
 profiler like AMD Code Analyst or Very Sleepy

I just tried profiling it with Very Sleepy but basically it only tells 
me for both versions that most of the time is spend in gcx.fullcollect.
Just that the GDC version spends less time in gcx.fullcollect then the 
DMD version.

As I can not rebuild druntime with GDC it will be quite hard to get 
detailed profiling results.

I'm open for suggestions.

Kind Regards
Benjamin Thaut

Sep 06 2012

"ponce" <spam spam.org> writes:

 I just tried profiling it with Very Sleepy but basically it 
 only tells me for both versions that most of the time is spend 
 in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect 
 then the DMD version.

 As I can not rebuild druntime with GDC it will be quite hard to 
 get detailed profiling results.

 I'm open for suggestions.

 As I can not rebuild druntime with GDC it will be quite hard to 
 get detailed profiling results.
 
 I'm open for suggestions.
 
 Kind Regards
 Benjamin Thaut

You might try AMD Code Analyst, it will highlight the bottleneck
in the assembly listing. Then use a disassembler like IDA to get
a feel of what the bottleneck could be.

Sep 06 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it only tells me for
 both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the DMD
version.

Even so, that in itself is a good clue.

Sep 06 2012

"Sven Torvinger" <Sven torvinger.se> writes:

On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright 
wrote:
 On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it 
 only tells me for
 both versions that most of the time is spend in 
 gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect 
 then the DMD version.

 Even so, that in itself is a good clue.

my bet is on, cross-module-inlining of bitop.btr failing...

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d

version (DigitalMars)
{
     version = bitops;
}
else version (GNU)
{
     // use the unoptimized version
}
else version (D_InlineAsm_X86)
{
     version = Asm86;
}

wordtype testClear(size_t i)
{
   version (bitops)
   {
     return core.bitop.btr(data + 1, i);   // this is faster!
   }

Sep 06 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 7 September 2012 07:28, Sven Torvinger <Sven torvinger.se> wrote:
 On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:
 On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it only tells me
 for
 both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the
 DMD version.


 Even so, that in itself is a good clue.


 my bet is on, cross-module-inlining of bitop.btr failing...

 https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d

 version (DigitalMars)
 {
     version = bitops;
 }
 else version (GNU)
 {
     // use the unoptimized version
 }
 else version (D_InlineAsm_X86)
 {
     version = Asm86;
 }

 wordtype testClear(size_t i)
 {
   version (bitops)
   {
     return core.bitop.btr(data + 1, i);   // this is faster!
   }

You would be wrong.  btr is a compiler intrinsic, so it is *always* inlined!

Leaning towards Walter here that I would very much like to see hard
evidence of your claims.  :-)


On a side note of that though, GDC has bt, btr, bts, etc, as
intrinsics to its compiler front-end.  So it would be no problem
switching to version = bitops for version GNU.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Sep 06 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.

Would it be easy to give that a try, and see what happens?

Sep 07 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.


 Would it be easy to give that a try, and see what happens?

Sure, can do.  Give me something to work against, and I will be able
to produce the difference.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Sep 07 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 9/7/2012 2:52 AM, Iain Buclaw wrote:
 On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.


 Would it be easy to give that a try, and see what happens?

 Sure, can do.  Give me something to work against, and I will be able
 to produce the difference.

Well, gdc with and without it!

Sep 07 2012

Sean Kelly <sean invisibleduck.org> writes:

On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

=20
 You don't necessarily need to recompile anything with a sampling
 profiler like AMD Code Analyst or Very Sleepy
=20

=20
 I just tried profiling it with Very Sleepy but basically it only tells me f=

or both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the DMD=

 version.
=20
 As I can not rebuild druntime with GDC it will be quite hard to get detail=

ed profiling results.
=20
 I'm open for suggestions.

What version flags are set by GDC vs. DMD in your target apps?  The way "sto=
p the world" is done on Linux vs. Windows is different, for example.=20=

Sep 06 2012

Jacob Carlborg <doob me.com> writes:

On 2012-09-07 01:53, Sean Kelly wrote:

 What version flags are set by GDC vs. DMD in your target apps?  The way "stop
the world" is done on Linux vs. Windows is different, for example.

He's using only Windows as far as I understand, GDC MinGW.

-- 
/Jacob Carlborg

Sep 06 2012

Sean Kelly <sean invisibleduck.org> writes:

On Sep 6, 2012, at 10:57 PM, Jacob Carlborg <doob me.com> wrote:

 On 2012-09-07 01:53, Sean Kelly wrote:
=20
 What version flags are set by GDC vs. DMD in your target apps?  The way "=


stop the world" is done on Linux vs. Windows is different, for example.
=20
 He's using only Windows as far as I understand, GDC MinGW.

Well sure, but MinGW is weird. I'd expect the Windows flag to be set for Min=
GW and both the Windows and Posix flags set for Cygwin, but it seemed worth a=
sking. If Windows and Posix are both set, the Windows method will be used fo=
r "stop the world".=20=

Sep 07 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 07.09.2012 01:53, schrieb Sean Kelly:
 On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

 You don't necessarily need to recompile anything with a sampling
 profiler like AMD Code Analyst or Very Sleepy

 I just tried profiling it with Very Sleepy but basically it only tells me for
both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the DMD
version.

 As I can not rebuild druntime with GDC it will be quite hard to get detailed
profiling results.

 I'm open for suggestions.

 What version flags are set by GDC vs. DMD in your target apps?  The way "stop
the world" is done on Linux vs. Windows is different, for example.

I did build druntime and phobos with -release -noboundscheck -inline -O 
for DMD.
For MinGW GDC I just used whatever version of druntime and phobos came 
precompiled with it, so I can't tell you which flags have been used to 
compile that. But I can tell you that cygwin is not required to run or 
compile, so I think its not using any posix stuff.


I'm going to upload a zip-package with the source for the GC version 
soon, but I have to deal with some licence stuff first.

Kind Regards
Benjamin Thaut

Sep 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/7/12 6:31 PM, Benjamin Thaut wrote:
 I did build druntime and phobos with -release -noboundscheck -inline -O
 for DMD.
 For MinGW GDC I just used whatever version of druntime and phobos came
 precompiled with it, so I can't tell you which flags have been used to
 compile that. But I can tell you that cygwin is not required to run or
 compile, so I think its not using any posix stuff.


 I'm going to upload a zip-package with the source for the GC version
 soon, but I have to deal with some licence stuff first.

 Kind Regards
 Benjamin Thaut

You mentioned some issues in Phobos with memory allocation, that you had 
to replace with your own code. It would be awesome if you could post 
more about that, and possibly post a few pull requests where directly 
applicable.

Thanks,

Andrei

Sep 07 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 07.09.2012 18:36, schrieb Andrei Alexandrescu:
 You mentioned some issues in Phobos with memory allocation, that you had
 to replace with your own code. It would be awesome if you could post
 more about that, and possibly post a few pull requests where directly
 applicable.

 Thanks,

 Andrei

Let me give a bit more details about what I did and why.

Druntime:

I added a reference counting mechanism. core.refcounted in my druntime 
branch.
I created a reference counted array which is as close to the native D 
array as currently possible (compiler bugs, type system issues, etc). 
also in core.refcounted. It however does not replace the default string 
or array type in all cases because it would lead to reference counting 
in uneccessary places. The focus is to get only reference couting where 
absolutly neccessary. I'm still using the standard string type as a 
"only valid for current scope" kind of string.
I created a allocator base interface which is used by everything that 
allocates, also I created replacement templates for new and delete.
Located in core.allocator
I created a new hashmap container wich is cache friendly and does not 
leak memory. Located in core.hashmap
I created a memory tracking allocator in core.allocator which can be 
turned on and off with a version statement (as it has to run before and 
after module ctors dtors etc)

I changed all parts of druntime that do string processing to use the 
reference counted array, so it no longer leaks. I made the Thread class 
reference counted so it no longer leaks. I fixed the type info 
comparsion and numerous other issues. Of all these changes only the type 
info fix will be easily convertible into the default druntime because it 
does not depend on any of my other stuff. I will do a merge request for 
this fix as soon as I find some time.

Phobos:
I threw away most of phobos because it didn't match my requirements.
The only modules I kept are
std.traits, std.random, std.math, std.typetuple, std.uni

The parts of these modules that I use have been changed so they don't 
leak memory. Mostly this comes down to use reference counted strings for 
exception error message generation.

I did require the option to specify a allocator for any function that 
allocates. Either by template argument, by function parameter or both, 
depending on the case. As custom allocators can not be pure this is a 
major issue with phobos, because adding allocators to the functions 
would make them unpure instantly. I know about the C-Linkage pure hack 
but its really a hack and it does not work for templates.

So I think most of my changes are not directly applicable because:

- You most likely won't like the way I implemented reference counting
- You might won't like my allocator design
- My standard library goes more into the C++ direction and is not as 
easly usable as phobos (as performance comes first for me, and usability 
is second)
- All my changes heavily depend on some of the functionality I added to 
druntime.
- The neccessary changes to phobos would break a lot of code because 
some of the function properties like pure couldn't be used any more, as 
a result of language limitations.

Kind Regards
Benjamin Thaut

Sep 07 2012

"ponce" <spam spam.org> writes:

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20

You make some good points about what happen under the hood.

Especially:
- homogeneous variadic function call allocate
- comparison of const object allocate
- useless druntime invariant handlers calls

I removed some homogeneous variadic function calls from my own
code.

Sep 07 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management
 and did want to get rid of the GC in D for quite some time, I did go
 through all this effort to create a GC free version of my game.
 
 The results are:
 
     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

Interesting.
What about measuring a GDC MMM version? Because I wonder what is the GC
overhead. With DMD it's two. Maybe that factor is lower with GDC.

I would be interested in some numbers regarding memory overhead. To get
a more complete picture of the impact on resources when using the GC.

Jens

Sep 07 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

The full sourcecode for the non-GC version is now aviable on github. The 
GC version will follow soon.

https://github.com/Ingrater/Spacecraft

Kind Regards
Benjamin Thaut

Sep 09 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Here a small update:

I found a piece of code that did manually slow down the simulation in 
case it got to fast. This code never kicked in with the GC version, 
because it never reached the margin. The manual memory managed version 
however did reach the margin and was slowed down. With this piece of 
code removed the manual memory managed version runs at 5 ms which is 200 
FPS and thus nearly 3 times as fast as the GC collected version.

Kind Regards
Benjamin Thaut

Oct 23 2012

"Rob T" <rob ucora.com> writes:

On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut wrote:
 Here a small update:

 I found a piece of code that did manually slow down the 
 simulation in case it got to fast. This code never kicked in 
 with the GC version, because it never reached the margin. The 
 manual memory managed version however did reach the margin and 
 was slowed down. With this piece of code removed the manual 
 memory managed version runs at 5 ms which is 200 FPS and thus 
 nearly 3 times as fast as the GC collected version.

 Kind Regards
 Benjamin Thaut

That's a very significant difference in performance that should 
not be taken lightly. I don't really see a general solution to 
the GC problem other than to design things such that a D 
programmer has a truely practical ability to not use the GC at 
all and ensure that it does not sneak back in. IMHO I think it 
was a mistake to assume that D should depend on a GC to the 
degree that has taken place.

The GC is also the reason why D has a few other significant 
technical problems not related to performance, such as inability 
to link D code to C/C++ code if the GC is required on the D side, 
and inability to build dynamic liraries and runtime loadable 
plugins that link to the runtime system - the GC apparently does 
not work correctly in these situatons, although the problem is 
solvable how this was allowed to happen in the first place is 
difficult to understand.

I'll be a much more happy D programmer if I could guarantee where 
and when the GC is used, therefore the GC should be 100% optional 
in practice, not just in theory.

--rt

Oct 23 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 23 October 2012 at 22:31:03 UTC, Rob T wrote:
 On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut 
 wrote:
 Here a small update:

 I found a piece of code that did manually slow down the 
 simulation in case it got to fast. This code never kicked in 
 with the GC version, because it never reached the margin. The 
 manual memory managed version however did reach the margin and 
 was slowed down. With this piece of code removed the manual 
 memory managed version runs at 5 ms which is 200 FPS and thus 
 nearly 3 times as fast as the GC collected version.

 Kind Regards
 Benjamin Thaut

 That's a very significant difference in performance that should 
 not be taken lightly. I don't really see a general solution to 
 the GC problem other than to design things such that a D 
 programmer has a truely practical ability to not use the GC at 
 all and ensure that it does not sneak back in. IMHO I think it 
 was a mistake to assume that D should depend on a GC to the 
 degree that has taken place.

 The GC is also the reason why D has a few other significant 
 technical problems not related to performance, such as 
 inability to link D code to C/C++ code if the GC is required on 
 the D side, and inability to build dynamic liraries and runtime 
 loadable plugins that link to the runtime system - the GC 
 apparently does not work correctly in these situatons, although 
 the problem is solvable how this was allowed to happen in the 
 first place is difficult to understand.

 I'll be a much more happy D programmer if I could guarantee 
 where and when the GC is used, therefore the GC should be 100% 
 optional in practice, not just in theory.

 --rt


Having dealt with systems programming in languages with GC 
(Native Oberon, Modula-3), I wonder how much an optional GC would 
really matter, if D's GC had better performance.

--
Paulo

Oct 24 2012

"Rob T" <rob ucora.com> writes:

On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto wrote:
 Having dealt with systems programming in languages with GC 
 (Native Oberon, Modula-3), I wonder how much an optional GC 
 would really matter, if D's GC had better performance.

 --
 Paulo

Well, performnce is only part of the GC equation. There's 
determinism, knowing when the GC is invoked and ability to 
control it, and increased complexity introduced by a GC, which 
tends to increase considerably when improving the GCs performance 
and ability to manage it manually. All this means there's a lot 
more potential for things going wrong, and this cycle of fixing 
the fix may never end.

The cost of clinging onto a GC may be too high to be worth 
relying on as heavily as is being done, and effectivly forcing a 
GC on programmers is the wrong approach because not everyone has 
the same requirements that require its use. When I say "forcing", 
look at what had to be done to fix the performance of the game in 
question, what was done to get rid of the GC was a super-human 
effort and that is simply not a practical solution by any stretch 
of the imagination.

A GC is both good and bad, not good for everyone and not bad for 
everyone, with shades of gray in between, so it has to be made 
fully optional, with good manual control, and easily so.

--rt

Oct 24 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Wednesday, 24 October 2012 at 18:26:48 UTC, Rob T wrote:
 On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto 
 wrote:
 Having dealt with systems programming in languages with GC 
 (Native Oberon, Modula-3), I wonder how much an optional GC 
 would really matter, if D's GC had better performance.

 --
 Paulo

 Well, performnce is only part of the GC equation. There's 
 determinism, knowing when the GC is invoked and ability to 
 control it, and increased complexity introduced by a GC, which 
 tends to increase considerably when improving the GCs 
 performance and ability to manage it manually. All this means 
 there's a lot more potential for things going wrong, and this 
 cycle of fixing the fix may never end.

 The cost of clinging onto a GC may be too high to be worth 
 relying on as heavily as is being done, and effectivly forcing 
 a GC on programmers is the wrong approach because not everyone 
 has the same requirements that require its use. When I say 
 "forcing", look at what had to be done to fix the performance 
 of the game in question, what was done to get rid of the GC was 
 a super-human effort and that is simply not a practical 
 solution by any stretch of the imagination.

 A GC is both good and bad, not good for everyone and not bad 
 for everyone, with shades of gray in between, so it has to be 
 made fully optional, with good manual control, and easily so.

 --rt

I do understand that.

But on the other hand there are operating systems fully developed 
in such languages, like Blue Bottle,

http://www.ocp.inf.ethz.ch/wiki/Documentation/WindowManager

Or the real time system developed at ETHZ to control robot 
helicopters,

http://static.usenix.org/events/vee05/full_papers/p35-kirsch.pdf

I surely treble at the thought of a full GC collection in plane 
software. On the other hand I am old enough to remember the 
complaints that C was too slow and one needed to write everything 
in Assembly to have full control of the application code.

Followed by C++ was too slow and one should use C structs with 
embedded pointers to have full control over the memory layout of 
the object table, instead of strange compiler generated VMT 
tables.

So I always take the assertions that manual memory management is 
a must with a grain of salt.

--
Paulo

Oct 24 2012

"Rob T" <rob ucora.com> writes:

On Wednesday, 24 October 2012 at 21:02:34 UTC, Paulo Pinto wrote:
 So I always take the assertions that manual memory management 
 is a must with a grain of salt.

 --
 Paulo

Probably no one in here is thinking that we should not have a GC.

I'm sure that many applications will benefit from a GC, but I'm 
also certain that not all applicatins require a GC, and it's a 
mistake to assume everyone will be happy to have one as was 
illustrated in the OP.

In my case, I'm not too concerned about performance, or pauses in 
the execution, but I do require dynamic loadable libraries, and I 
do want to link D code to existing C/C++ code, but in order to do 
these things, I cannot use the GC because I'm told that it will 
not work under these situations.

It may be theoretically possible to build a near perfect GC that 
will work well for even RT applications, and will work for 
dynamic loadable libraies, etc, but while waiting for one to 
materialize in D, what are we supposed to do when the current GC 
is unsuitable?

--rt

Oct 24 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

You can very much link to C and C++ code, or have C and C++ code 
link to your D code, while still using the GC, you just have to 
be careful when you send GC memory to external code.

You can even share the same GC between dynamic libraries and the 
host application  (if both are D and use GC, of course) using the 
GC proxy system.

Oct 24 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

You can very much link to C and C++ code, or have C and C++ code 
link to your D code, while still using the GC, you just have to 
be careful when you send GC memory to external code.

You can even share the same GC between dynamic libraries and the 
host application  (if both are D and use GC, of course) using the 
GC proxy system.

Oct 24 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

 You can very much link to C and C++ code, or have C and C++ 
 code link to your D code, while still using the GC, you just 
 have to be careful when you send GC memory to external code.

 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

I am speaking without knowing if such thing already exists.

Maybe someone that knows the best way to do so, could write an 
article about best practices of using C and C++ code together in 
D applications.

So that we could point them to it, in similar vein to the 
wonderful article about templates.

--
Paulo

Oct 24 2012

"Rob T" <rob ucora.com> writes:

On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can very much link to C and C++ code, or have C and C++ 
 code link to your D code, while still using the GC, you just 
 have to be careful when you send GC memory to external code.

 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

My understanding of dynamic linking and the runtime is based on 
this thread

http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html

The runtime is not compiled to be sharable, so you cannot link it 
to shared libs by defult. However, hacking the gdc build system 
allowed me to compile the runtime into a sharable state, and all 
seemed well.

However, based on the input from that thread, my understanding 
was that the GC would be unreliable at best.

I suppose I could do some tests on it, but tests can only confirm 
so much. I'd also have to decipher the runtime source code to see 
what the heck it is doing or not.

--rt

Oct 25 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Thursday, 25 October 2012 at 08:34:15 UTC, Rob T wrote:
 My understanding of dynamic linking and the runtime is based on 
 this thread

 http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html

 The runtime is not compiled to be sharable, so you cannot link 
 it to shared libs by defult. However, hacking the gdc build 
 system allowed me to compile the runtime into a sharable state, 
 and all seemed well.

 However, based on the input from that thread, my understanding 
 was that the GC would be unreliable at best.

 I suppose I could do some tests on it, but tests can only 
 confirm so much. I'd also have to decipher the runtime source 
 code to see what the heck it is doing or not.

 --rt

You are right that compiling the runtime itself (druntime and 
Phobos) as a shared library is not yet fully realized, but that 
doesn't stop you from compiling your own libraries and 
applications as shared libraries even if they statically link to 
the runtime (which is the current default behaviour).

Oct 25 2012

"Rob T" <rob ucora.com> writes:

On Thursday, 25 October 2012 at 08:50:19 UTC, Jakob Ovrum wrote:
 You are right that compiling the runtime itself (druntime and 
 Phobos) as a shared library is not yet fully realized, but that 
 doesn't stop you from compiling your own libraries and 
 applications as shared libraries even if they statically link 
 to the runtime (which is the current default behaviour).

Yes I can build my own D shared libs, both as static PIC (.a) and 
dynamically loadable (.so). however I cannot statically link my 
shared libs to druntime + phobos as-is. The only way I can do 
that, is to also compile druntime + phobos into PIC, which can be 
done as a static PIC lib.

So what you are saying is that I can statically link PIC compiled 
druntime to my own shared lib, but I cannot build druntime as a 
dynamically loadable shared lib? I can see why thatmay work, if 
each shared lib has it's own private compy of the GC. Correct?

I recall that druntime may have some ASM code that will not work 
when compiled to PIC. I think gdc removed the offending ASM code, 
but it may still be present in the dmd version, but I don't know 
for sure.

Another question is if I can link a dynamic loadable D lib to 
C/C++ code or not? Yes I can do it, and it seems to work, but I 
was told that the GC will not necessarily work. Am I 
misunderstanding this part?

--rt

Oct 25 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:
 Yes I can build my own D shared libs, both as static PIC (.a) 
 and dynamically loadable (.so). however I cannot statically 
 link my shared libs to druntime + phobos as-is. The only way I 
 can do that, is to also compile druntime + phobos into PIC, 
 which can be done as a static PIC lib.

Sorry, I keep forgetting that this is needed on non-Windows 
systems.

 So what you are saying is that I can statically link PIC 
 compiled druntime to my own shared lib, but I cannot build 
 druntime as a dynamically loadable shared lib? I can see why 
 thatmay work, if each shared lib has it's own private compy of 
 the GC. Correct?

Yes, this is possible.

Sending references to GC memory between the D modules then has 
the same rules as when sending it to non-D code, unless the host 
(the loader module) uses druntime to load the other modules, in 
which case it can in principle share the same GC with them.

 I recall that druntime may have some ASM code that will not 
 work when compiled to PIC. I think gdc removed the offending 
 ASM code, but it may still be present in the dmd version, but I 
 don't know for sure.

I think it was relatively recently that DMD could also compile 
the runtime as PIC, but I might be remembering wrong.

 Another question is if I can link a dynamic loadable D lib to 
 C/C++ code or not? Yes I can do it, and it seems to work, but I 
 was told that the GC will not necessarily work. Am I 
 misunderstanding this part?

The GC will work the same as usual inside the D code, but you 
have to manually keep track of references you send outside the 
scope of the GC, such as references to GC memory put on the C 
heap.

This can be done with the GC.addRoot/removeRoot and 
GC.addRange/removeRange functions found in core.memory, or by 
retaining the references in global, TLS or GC memory.

It's good practice to do this for all GC references sent to 
external code, as you don't know where the reference may end up.

Of course, you have other options. You don't have to send 
references to GC memory to external code, you can always copy the 
data over to a different buffer, such as one on the C heap (i.e. 
malloc()).

If the caller (in the case of a return value) or the callee (in 
the case of a function argument) expects to be able to call 
free() on the memory referenced, then you must do it this way 
regardless.

Oct 25 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:
 Yes I can build my own D shared libs, both as static PIC (.a) 
 and dynamically loadable (.so). however I cannot statically 
 link my shared libs to druntime + phobos as-is. The only way I 
 can do that, is to also compile druntime + phobos into PIC, 
 which can be done as a static PIC lib.

Sorry, I keep forgetting that this is needed on non-Windows 
systems.

 So what you are saying is that I can statically link PIC 
 compiled druntime to my own shared lib, but I cannot build 
 druntime as a dynamically loadable shared lib? I can see why 
 thatmay work, if each shared lib has it's own private compy of 
 the GC. Correct?

Yes, this is possible.

Sending references to GC memory between the D modules then has 
the same rules as when sending it to non-D code, unless the host 
(the loader module) uses druntime to load the other modules, in 
which case it can in principle share the same GC with them.

 I recall that druntime may have some ASM code that will not 
 work when compiled to PIC. I think gdc removed the offending 
 ASM code, but it may still be present in the dmd version, but I 
 don't know for sure.

I think it was relatively recently that DMD could also compile 
the runtime as PIC, but I might be remembering wrong.

 Another question is if I can link a dynamic loadable D lib to 
 C/C++ code or not? Yes I can do it, and it seems to work, but I 
 was told that the GC will not necessarily work. Am I 
 misunderstanding this part?

The GC will work the same as usual inside the D code, but you 
have to manually keep track of references you send outside the 
scope of the GC, such as references to GC memory put on the C 
heap.

This can be done with the GC.addRoot/removeRoot and 
GC.addRange/removeRange functions found in core.memory, or by 
retaining the references in global, TLS or GC memory.

It's good practice to do this for all GC references sent to 
external code, as you don't know where the reference may end up.

Of course, you have other options. You don't have to send 
references to GC memory to external code, you can always copy the 
data over to a different buffer, such as one on the C heap (i.e. 
malloc()).

If the caller (in the case of a return value) or the callee (in 
the case of a function argument) expects to be able to call 
free() on the memory referenced, then you must do it this way 
regardless.

Oct 25 2012

"Rob T" <rob ucora.com> writes:

On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

What is the GC proxy system, and how do I make use of it?

--rt

Oct 25 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Thursday, 25 October 2012 at 17:20:40 UTC, Rob T wrote:
 On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

 What is the GC proxy system, and how do I make use of it?

 --rt

There's a function Runtime.loadLibrary in core.runtime that is 
supposed to load a shared library and get the symbol named 
`gc_setProxy` using the platform's dynamic library loading 
routines, then use that to share the host GC with the loaded 
library.

I say "is supposed to" because I checked the code and it's 
currently a throwing stub on POSIX systems, it's only implemented 
for Windows (the source of the function can be found in 
rt_loadLibrary in rt/dmain2.d of druntime).

When it comes to gc_setProxy - GDC exports this symbol by default 
on Windows, while DMD doesn't. I don't know why this is the case. 
I haven't built shared libraries on other OS' before so I don't 
know how GDC and DMD behave there.

Oct 25 2012

"bearophile" <bearophileHUGS lycos.com> writes:

I use this GC thread to show a little GC-related benchmark.

A little Reddit thread about using memory more compactly in Java:

http://www.reddit.com/r/programming/comments/120xvf/compact_offheap_structurestuples_in_java/

The relative blog post:
http://mechanical-sympathy.blogspot.it/2012/10/compact-off-heap-structurestuples-in.html

So I have written a D version, in my test I have reduced the 
amount of memory allocated (NUM_RECORDS = 10_000_000):
http://codepad.org/IhHjqUua

With this lower memory usage the D version it's more than twice 
faster than the compact Java version that uses the same 
NUM_RECORDS (0.5 seconds against 1.2 seconds each loop after the 
first two ones).

In D I have improved the loops, I have used an align() and a 
minimallyInitializedArray, this is not too much bad.

But in the main() I have also had to use a deprecated "delete", 
because otherwise the GC doesn't deallocate the arrays and the 
program burns all the memory (setting the array to null and using 
GC.collect() isn't enough). This is not good.

Bye,
bearophile

Oct 26 2012

"Rob T" <rob ucora.com> writes:

On Friday, 26 October 2012 at 14:21:51 UTC, bearophile wrote:
 But in the main() I have also had to use a deprecated "delete", 
 because otherwise the GC doesn't deallocate the arrays and the 
 program burns all the memory (setting the array to null and 
 using GC.collect() isn't enough). This is not good.

Is this happening with dmd 2.060 as released?

Oct 26 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Rob T:

 Is this happening with dmd 2.060 as released?

I'm using 2.061alpha git head, but I guess the situation is the 
same with dmd 2.060. The code is linked in my post, so trying it 
is easy, it's one small module.

Bye,
bearophile

Oct 26 2012

"Rob T" <rob ucora.com> writes:

On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

 I'm using 2.061alpha git head, but I guess the situation is the 
 same with dmd 2.060. The code is linked in my post, so trying 
 it is easy, it's one small module.

 Bye,
 bearophile

I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried 
to check if memory was being freed by creating a struc destructor 
for JavaMemoryTrade, but that did not work as expected, leading 
me down the confusing and inconsistent path of figuring out why 
destructors do not get called when memory is freed.

Long story short, I could not force a struct to execute its 
destructor if it was allocated on the heap unless I used delete. 
I tried destroy and clear, as well as GC.collect and GC.free(), 
nothing else worked.

Memory heap management as well as struct destructors appear to be 
seriously broken.

--rt

Oct 26 2012

"Rob T" <rob ucora.com> writes:

On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:
 On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

 I'm using 2.061alpha git head, but I guess the situation is 
 the same with dmd 2.060. The code is linked in my post, so 
 trying it is easy, it's one small module.

 Bye,
 bearophile

 I tried it with dmd 2.60 (released), and gdc 4.7 branch. I 
 tried to check if memory was being freed by creating a struc 
 destructor for JavaMemoryTrade, but that did not work as 
 expected, leading me down the confusing and inconsistent path 
 of figuring out why destructors do not get called when memory 
 is freed.

 Long story short, I could not force a struct to execute its 
 destructor if it was allocated on the heap unless I used 
 delete. I tried destroy and clear, as well as GC.collect and 
 GC.free(), nothing else worked.

 Memory heap management as well as struct destructors appear to 
 be seriously broken.

 --rt

OK my bad, partially.

Heap allocated struct destructors will not get called using clear 
or destroy unless the struct reference is manually dereferenced. 
I got confused that class references behave differently than heap 
allocated struct references. I cannot be the first person to do 
this, and it must happen all the time. The auto dereferencing of 
a struc pointer when accessing members may be nice, but it makes 
struct pointers look exactly like class references, which will 
lead to mistakes.

I do get the concept between classes and structs, but why does 
clear and destroy using a struct pointer not give me a compiler 
error or at least a warning? Is there any valid purpose to clear 
or destroy a pointer that is not dereferenced? Seems like a bug 
to me.

--rt

Oct 26 2012

"Rob T" <rob ucora.com> writes:

On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:
 On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

 I'm using 2.061alpha git head, but I guess the situation is 
 the same with dmd 2.060. The code is linked in my post, so 
 trying it is easy, it's one small module.

 Bye,
 bearophile

 I tried it with dmd 2.60 (released), and gdc 4.7 branch. I 
 tried to check if memory was being freed by creating a struc 
 destructor for JavaMemoryTrade, but that did not work as 
 expected, leading me down the confusing and inconsistent path 
 of figuring out why destructors do not get called when memory 
 is freed.

 Long story short, I could not force a struct to execute its 
 destructor if it was allocated on the heap unless I used 
 delete. I tried destroy and clear, as well as GC.collect and 
 GC.free(), nothing else worked.

 Memory heap management as well as struct destructors appear to 
 be seriously broken.

 --rt

I made a mistake. The clear and destroy operations require that a 
pointer to a struc be manually dereferenced. What I don't 
understand is why the compiler allows you to pass a -not- 
dereferenced pointer to clear and destroy, this looks like a bug 
to me. It should either work just like a class reference does, or 
it should refuse to compile.

I'm sure you've heard this many times before, but I have to say 
that it's very confusing when struct pointers behave exactly like 
class references, but not always.

--rt

Oct 26 2012

"bearophile" <bearophileHUGS lycos.com> writes:

 But in the main() I have also had to use a deprecated "delete",

And setting trades.length to zero and then using GC.free() on its 
ptr gives the same good result.

Bye,
bearophile

Oct 27 2012

"bearophile" <bearophileHUGS lycos.com> writes:

And with the usual optimizations (struct splitting) coming from 
talking a look at the access patterns, the D code gets faster:

http://codepad.org/SnxnpcAB

Bye,
bearophile

Oct 27 2012

D Programming

C/C++ Programming

Other

digitalmars.D.announce - GC vs. Manual Memory Management Real World Comparison