digitalmars.D - top time wasters in DMD, as reported by gprof
- Walter Bright (63/63) Jun 23 2013 Flat profile:
- Timothee Cour (4/78) Jun 23 2013 can you give more context ? what was the command line to get this?
- Walter Bright (4/6) Jun 23 2013 Compile on Linux with -pg. Run with:
- Timothee Cour (8/16) Jun 23 2013 more profiliing numbers from Ian Buclaw, for gdc:
- Iain Buclaw (9/32) Jun 23 2013 With that report, all modules are compiled with roughly the equivalent
- dennis luehring (5/68) Jun 24 2013 how does that look using msvc compiling the dmd compiler
- Dicebot (3/5) Jun 24 2013 Cosnidering keyword "gprof", chances are extremely high it was
- dennis luehring (5/10) Jun 24 2013 i know - my question was - how does that look using msvc...
- Richard Webb (11/12) Jun 24 2013 I just did a very quick test using the latest DMD source:
- dennis luehring (5/19) Jun 24 2013 so it could be std library implementation related - can DMC use the msvc...
- SomeDude (2/6) Jun 25 2013 Is there still a free version of the VS compiler ?
- SomeDude (2/9) Jun 25 2013 Ah yes, Visual Studio Express 2012 is free.
- Jonathan M Davis (5/13) Jun 25 2013 Yes.
- dennis luehring (3/10) Jun 26 2013 always the latest (currently) vs2012 as express edition (only the MFC
- dennis luehring (2/16) Jun 24 2013 i got similar speed improvemnts using vs2010
- dennis luehring (3/9) Jun 24 2013 i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached
- dennis luehring (3/12) Jun 24 2013 sorry it was VS2010 (and VTune Trial)
- dennis luehring (5/14) Jun 24 2013 the AMD CodeXL results are also different - both VTune and CodeXL fully...
- Kiith-Sa (31/52) Jun 25 2013 GProf tends to be pretty useless for actual profiling in my
- Walter Bright (5/8) Jun 25 2013 One thing gprof does give is accurate counts, and accurate fan in / fan ...
- Andrej Mitrovic (8/9) Jun 24 2013 The recent speed improvements in git-head have improve both the
- Adam Wilson (15/24) Jun 24 2013 I asked Walter about that recently, he says that with all the focus on D...
- Walter Bright (2/6) Jun 24 2013 It's also possible that a faster malloc() is responsible for much of it.
- Walter Bright (3/5) Jun 24 2013 The profile report was done by gcc/gprof.
- dennis luehring (3/8) Jun 24 2013 different optimization strategies can shuffle the results totaly - or do...
- John Colvin (3/8) Jun 25 2013 I'm confused.... Different optimisers often produce radically
- dennis luehring (3/13) Jun 26 2013 maybe he talk about the dmd -release parameter but even that should
- Martin Nowak (7/7) Jun 24 2013 For some reason gprof nerver terminates on my system. So here is the
- dennis luehring (3/10) Jun 24 2013 what is the sense of doing profiling in BUILD=debug? or is the optimizer...
- Walter Bright (2/3) Jun 24 2013 Non-debug builds may strip out the symbols, rendering the profiling repo...
- Martin Nowak (5/7) Jun 24 2013 That is compiling phobos in debug (non-optimized) mode not dmd. As I
- Martin Nowak (2/4) Jun 24 2013 http://codepad.org/gWrGvm40
- Marco Leise (6/12) Jun 24 2013 Interesting. So to troll a bit, do I see it right, that dmd is
- Iain Buclaw (7/17) Jun 25 2013 The D front end does nothing *but* allocate memory... and sometimes
- Andrei Alexandrescu (4/21) Jun 25 2013 Then maybe it should use its own malloc that uses the bump-the-pointer
- Iain Buclaw (10/38) Jun 25 2013 I meant it in the most light hearted way possible. Though it's no
- Peter Alexander (7/32) Jun 26 2013 The problem is not that malloc is slow, but malloc is being
- Wyatt (8/19) Jun 25 2013 Maybe not DMD, so much as GNU ld? Whatever the case, I'm not
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 5.19 0.08 0.08 945494 0.00 0.00 _aaGetRvalue(AA*, void*) 3.90 0.14 0.06 926415 0.00 0.00 Lexer::scan(Token*) 3.25 0.19 0.05 397097 0.00 0.00 calcHash(char const*, unsigned long) 2.60 0.23 0.04 373646 0.00 0.00 StringTable::search(char const*, unsigned lon 1.95 0.26 0.03 33255 0.00 0.00 TemplateInstance::semantic(Scope*, Array<Expr 1.30 0.28 0.02 968985 0.00 0.00 ScopeDsymbol::search(Loc, Identifier*, int) 1.30 0.30 0.02 424079 0.00 0.00 calccodsize 1.30 0.32 0.02 234547 0.00 0.00 Parser::parseShiftExp() 1.30 0.34 0.02 168933 0.00 0.00 AliasDeclaration::toAlias() 1.30 0.36 0.02 136371 0.00 0.00 _aaGet(AA**, void*) 1.30 0.38 0.02 132102 0.00 0.00 TypeStruct::hasPointers() 1.30 0.40 0.02 32490 0.00 0.00 TemplateDeclaration::matchWithInstance(Templa 1.30 0.42 0.02 21666 0.00 0.00 FuncDeclaration::functionSemantic() 1.30 0.44 0.02 17983 0.00 0.00 VarDeclaration::syntaxCopy(Dsymbol*) 1.30 0.46 0.02 12908 0.00 0.00 TemplateDeclaration::deduceFunctionTemplateMa 1.30 0.48 0.02 6890 0.00 0.00 assignaddrc 1.30 0.50 0.02 6563 0.00 0.00 ecom(elem**) 1.30 0.52 0.02 5675 0.00 0.00 TypeInfo_Idxstr::getHash(void*) 1.30 0.54 0.02 3780 0.01 0.01 TemplateDeclaration::semantic(Scope*) 1.30 0.56 0.02 2648 0.01 0.01 operator delete(void*) 0.97 0.58 0.02 42592 0.00 0.00 IdentifierExp::semantic(Scope*) 0.97 0.59 0.02 36403 0.00 0.00 BinExp::syntaxCopy() 0.65 0.60 0.01 2668529 0.00 0.00 operator new(unsigned long) 0.65 0.61 0.01 2575503 0.00 0.00 Dsymbol::isTemplateMixin() 0.65 0.62 0.01 2310757 0.00 0.00 Type::toBasetype() 0.65 0.63 0.01 2096689 0.00 0.00 OutBuffer::reserve(unsigned long) 0.65 0.64 0.01 1030635 0.00 0.00 Dsymbol::isTemplateInstance() 0.65 0.65 0.01 708615 0.00 0.00 Module::search(Loc, Identifier*, int) 0.65 0.66 0.01 683509 0.00 0.00 VarDeclaration::isDataseg() 0.65 0.67 0.01 662180 0.00 0.00 Module::isModule() 0.65 0.68 0.01 567921 0.00 0.00 isType(RootObject*) 0.65 0.69 0.01 536331 0.00 0.00 Mem::realloc(void*, unsigned long) 0.65 0.70 0.01 454836 0.00 0.00 Parameter::foreach(Array<Parameter>*, int (*) 0.65 0.71 0.01 375732 0.00 0.00 Type::merge() 0.65 0.72 0.01 353615 0.00 0.00 Mem::free(void*) 0.65 0.73 0.01 348546 0.00 0.00 isDsymbol(RootObject*) 0.65 0.74 0.01 339336 0.00 0.00 exceptionOrCantInterpret(Expression*) 0.65 0.75 0.01 311493 0.00 0.00 isTuple(RootObject*) 0.65 0.76 0.01 300069 0.00 0.00 TypeBasic::isscalar() 0.65 0.77 0.01 270438 0.00 0.00 VarDeclaration::hasPointers() 0.65 0.78 0.01 248163 0.00 0.00 Identifier::equals(RootObject*) 0.65 0.79 0.01 234806 0.00 0.00 Parser::parseAddExp()
Jun 23 2013
can you give more context ? what was the command line to get this? I thought lexing was not a bottleneck, yet it seems to be in second place? On Sun, Jun 23, 2013 at 2:28 PM, Walter Bright <newshound2 digitalmars.com>wrote:Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 5.19 0.08 0.08 945494 0.00 0.00 _aaGetRvalue(AA*, void*) 3.90 0.14 0.06 926415 0.00 0.00 Lexer::scan(Token*) 3.25 0.19 0.05 397097 0.00 0.00 calcHash(char const*, unsigned long) 2.60 0.23 0.04 373646 0.00 0.00 StringTable::search(char const*, unsigned lon 1.95 0.26 0.03 33255 0.00 0.00 TemplateInstance::semantic(**Scope*, Array<Expr 1.30 0.28 0.02 968985 0.00 0.00 ScopeDsymbol::search(Loc, Identifier*, int) 1.30 0.30 0.02 424079 0.00 0.00 calccodsize 1.30 0.32 0.02 234547 0.00 0.00 Parser::parseShiftExp() 1.30 0.34 0.02 168933 0.00 0.00 AliasDeclaration::toAlias() 1.30 0.36 0.02 136371 0.00 0.00 _aaGet(AA**, void*) 1.30 0.38 0.02 132102 0.00 0.00 TypeStruct::hasPointers() 1.30 0.40 0.02 32490 0.00 0.00 TemplateDeclaration:: **matchWithInstance(Templa 1.30 0.42 0.02 21666 0.00 0.00 FuncDeclaration::** functionSemantic() 1.30 0.44 0.02 17983 0.00 0.00 VarDeclaration::syntaxCopy(**Dsymbol*) 1.30 0.46 0.02 12908 0.00 0.00 TemplateDeclaration:: **deduceFunctionTemplateMa 1.30 0.48 0.02 6890 0.00 0.00 assignaddrc 1.30 0.50 0.02 6563 0.00 0.00 ecom(elem**) 1.30 0.52 0.02 5675 0.00 0.00 TypeInfo_Idxstr::getHash(void***) 1.30 0.54 0.02 3780 0.01 0.01 TemplateDeclaration::semantic(**Scope*) 1.30 0.56 0.02 2648 0.01 0.01 operator delete(void*) 0.97 0.58 0.02 42592 0.00 0.00 IdentifierExp::semantic(Scope***) 0.97 0.59 0.02 36403 0.00 0.00 BinExp::syntaxCopy() 0.65 0.60 0.01 2668529 0.00 0.00 operator new(unsigned long) 0.65 0.61 0.01 2575503 0.00 0.00 Dsymbol::isTemplateMixin() 0.65 0.62 0.01 2310757 0.00 0.00 Type::toBasetype() 0.65 0.63 0.01 2096689 0.00 0.00 OutBuffer::reserve(unsigned long) 0.65 0.64 0.01 1030635 0.00 0.00 Dsymbol::isTemplateInstance() 0.65 0.65 0.01 708615 0.00 0.00 Module::search(Loc, Identifier*, int) 0.65 0.66 0.01 683509 0.00 0.00 VarDeclaration::isDataseg() 0.65 0.67 0.01 662180 0.00 0.00 Module::isModule() 0.65 0.68 0.01 567921 0.00 0.00 isType(RootObject*) 0.65 0.69 0.01 536331 0.00 0.00 Mem::realloc(void*, unsigned long) 0.65 0.70 0.01 454836 0.00 0.00 Parameter::foreach(Array<**Parameter>*, int (*) 0.65 0.71 0.01 375732 0.00 0.00 Type::merge() 0.65 0.72 0.01 353615 0.00 0.00 Mem::free(void*) 0.65 0.73 0.01 348546 0.00 0.00 isDsymbol(RootObject*) 0.65 0.74 0.01 339336 0.00 0.00 exceptionOrCantInterpret(**Expression*) 0.65 0.75 0.01 311493 0.00 0.00 isTuple(RootObject*) 0.65 0.76 0.01 300069 0.00 0.00 TypeBasic::isscalar() 0.65 0.77 0.01 270438 0.00 0.00 VarDeclaration::hasPointers() 0.65 0.78 0.01 248163 0.00 0.00 Identifier::equals(RootObject***) 0.65 0.79 0.01 234806 0.00 0.00 Parser::parseAddExp()
Jun 23 2013
On 6/23/2013 8:39 PM, Timothee Cour wrote:can you give more context ? what was the command line to get this?Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithmI thought lexing was not a bottleneck, yet it seems to be in second place?The fun with profilers is they tell you things you didn't think were true.
Jun 23 2013
On Sun, Jun 23, 2013 at 8:55 PM, Walter Bright <newshound2 digitalmars.com>wrote:On 6/23/2013 8:39 PM, Timothee Cour wrote:more profiliing numbers from Ian Buclaw, for gdc: http://forum.dlang.org/post/mailman.1469.1339580395.24740.digitalmars-d puremagic.com http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf That cam up in an old proposal i made a year ago: "AST files instead of DI interface files for faster compilation and easier distribution".can you give more context ? what was the command line to get this?Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithmI thought lexing was not a bottleneck, yet it seems to be in secondplace? The fun with profilers is they tell you things you didn't think were true.
Jun 23 2013
On 24 June 2013 05:53, Timothee Cour <thelastmammoth gmail.com> wrote:On Sun, Jun 23, 2013 at 8:55 PM, Walter Bright <newshound2 digitalmars.com> wrote:With that report, all modules are compiled with roughly the equivalent of dmd's -O -g -release. So for std.algorithm, it is show to take 0.52 seconds to compile. However with -unittest turned on, I know it takes at least a minute for gdc to compile that module (up there with std.datetime) - and consumes about 2GBs of memory. ;-) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';On 6/23/2013 8:39 PM, Timothee Cour wrote:more profiliing numbers from Ian Buclaw, for gdc: http://forum.dlang.org/post/mailman.1469.1339580395.24740.digitalmars-d puremagic.com http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf That cam up in an old proposal i made a year ago: "AST files instead of DI interface files for faster compilation and easier distribution".can you give more context ? what was the command line to get this?Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithmI thought lexing was not a bottleneck, yet it seems to be in second place?The fun with profilers is they tell you things you didn't think were true.
Jun 23 2013
how does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much faster btw: the memory profile tool i mentioned is also available as performance profiler: http://www.softwareverify.com/cpp-profiler.php Am 23.06.2013 23:28, schrieb Walter Bright:Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 5.19 0.08 0.08 945494 0.00 0.00 _aaGetRvalue(AA*, void*) 3.90 0.14 0.06 926415 0.00 0.00 Lexer::scan(Token*) 3.25 0.19 0.05 397097 0.00 0.00 calcHash(char const*, unsigned long) 2.60 0.23 0.04 373646 0.00 0.00 StringTable::search(char const*, unsigned lon 1.95 0.26 0.03 33255 0.00 0.00 TemplateInstance::semantic(Scope*, Array<Expr 1.30 0.28 0.02 968985 0.00 0.00 ScopeDsymbol::search(Loc, Identifier*, int) 1.30 0.30 0.02 424079 0.00 0.00 calccodsize 1.30 0.32 0.02 234547 0.00 0.00 Parser::parseShiftExp() 1.30 0.34 0.02 168933 0.00 0.00 AliasDeclaration::toAlias() 1.30 0.36 0.02 136371 0.00 0.00 _aaGet(AA**, void*) 1.30 0.38 0.02 132102 0.00 0.00 TypeStruct::hasPointers() 1.30 0.40 0.02 32490 0.00 0.00 TemplateDeclaration::matchWithInstance(Templa 1.30 0.42 0.02 21666 0.00 0.00 FuncDeclaration::functionSemantic() 1.30 0.44 0.02 17983 0.00 0.00 VarDeclaration::syntaxCopy(Dsymbol*) 1.30 0.46 0.02 12908 0.00 0.00 TemplateDeclaration::deduceFunctionTemplateMa 1.30 0.48 0.02 6890 0.00 0.00 assignaddrc 1.30 0.50 0.02 6563 0.00 0.00 ecom(elem**) 1.30 0.52 0.02 5675 0.00 0.00 TypeInfo_Idxstr::getHash(void*) 1.30 0.54 0.02 3780 0.01 0.01 TemplateDeclaration::semantic(Scope*) 1.30 0.56 0.02 2648 0.01 0.01 operator delete(void*) 0.97 0.58 0.02 42592 0.00 0.00 IdentifierExp::semantic(Scope*) 0.97 0.59 0.02 36403 0.00 0.00 BinExp::syntaxCopy() 0.65 0.60 0.01 2668529 0.00 0.00 operator new(unsigned long) 0.65 0.61 0.01 2575503 0.00 0.00 Dsymbol::isTemplateMixin() 0.65 0.62 0.01 2310757 0.00 0.00 Type::toBasetype() 0.65 0.63 0.01 2096689 0.00 0.00 OutBuffer::reserve(unsigned long) 0.65 0.64 0.01 1030635 0.00 0.00 Dsymbol::isTemplateInstance() 0.65 0.65 0.01 708615 0.00 0.00 Module::search(Loc, Identifier*, int) 0.65 0.66 0.01 683509 0.00 0.00 VarDeclaration::isDataseg() 0.65 0.67 0.01 662180 0.00 0.00 Module::isModule() 0.65 0.68 0.01 567921 0.00 0.00 isType(RootObject*) 0.65 0.69 0.01 536331 0.00 0.00 Mem::realloc(void*, unsigned long) 0.65 0.70 0.01 454836 0.00 0.00 Parameter::foreach(Array<Parameter>*, int (*) 0.65 0.71 0.01 375732 0.00 0.00 Type::merge() 0.65 0.72 0.01 353615 0.00 0.00 Mem::free(void*) 0.65 0.73 0.01 348546 0.00 0.00 isDsymbol(RootObject*) 0.65 0.74 0.01 339336 0.00 0.00 exceptionOrCantInterpret(Expression*) 0.65 0.75 0.01 311493 0.00 0.00 isTuple(RootObject*) 0.65 0.76 0.01 300069 0.00 0.00 TypeBasic::isscalar() 0.65 0.77 0.01 270438 0.00 0.00 VarDeclaration::hasPointers() 0.65 0.78 0.01 248163 0.00 0.00 Identifier::equals(RootObject*) 0.65 0.79 0.01 234806 0.00 0.00 Parser::parseAddExp()
Jun 24 2013
On Monday, 24 June 2013 at 13:21:23 UTC, dennis luehring wrote:how does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterCosnidering keyword "gprof", chances are extremely high it was run with gcc on some not-so-MS platform.
Jun 24 2013
Am 24.06.2013 16:09, schrieb Dicebot:On Monday, 24 June 2013 at 13:21:23 UTC, dennis luehring wrote:i know - my question was - how does that look using msvc... because the msvc build tends to be faster - would be a nice compare if an windows (dmc and msvc) build will show similar hotspots under (because of msvc) an windows based profilerhow does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterCosnidering keyword "gprof", chances are extremely high it was run with gcc on some not-so-MS platform.
Jun 24 2013
On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:i know - my question was - how does that look using msvc...I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
Jun 24 2013
Am 24.06.2013 18:15, schrieb Richard Webb:On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:so it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)i know - my question was - how does that look using msvc...I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
Jun 24 2013
On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:so it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)Is there still a free version of the VS compiler ?
Jun 25 2013
On Wednesday, 26 June 2013 at 05:39:00 UTC, SomeDude wrote:On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:Ah yes, Visual Studio Express 2012 is free.so it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)Is there still a free version of the VS compiler ?
Jun 25 2013
On Wednesday, June 26, 2013 07:38:51 SomeDude wrote:On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:Yes. http://www.microsoft.com/visualstudio/eng/downloads Though it's obviously going to be a stripped down version. - Jonathan M Davisso it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)Is there still a free version of the VS compiler ?
Jun 25 2013
Am 26.06.2013 07:38, schrieb SomeDude:On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:always the latest (currently) vs2012 as express edition (only the MFC library is missing)so it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)Is there still a free version of the VS compiler ?
Jun 26 2013
Am 24.06.2013 18:15, schrieb Richard Webb:On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:i got similar speed improvemnts using vs2010i know - my question was - how does that look using msvc...I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
Jun 24 2013
Am 24.06.2013 18:15, schrieb Richard Webb:DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
Jun 24 2013
Am 25.06.2013 07:51, schrieb dennis luehring:Am 24.06.2013 18:15, schrieb Richard Webb:sorry it was VS2010 (and VTune Trial) the results of VTune seems to be different to WaltersDMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
Jun 24 2013
Am 25.06.2013 07:51, schrieb dennis luehring:Am 24.06.2013 18:15, schrieb Richard Webb:the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.htmlDMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
Jun 24 2013
On Tuesday, 25 June 2013 at 06:21:09 UTC, dennis luehring wrote:Am 25.06.2013 07:51, schrieb dennis luehring:GProf tends to be pretty useless for actual profiling in my experience. I think the best way is to use a sampling profiler such as 'perf' (a part of the linux project on a recent debian/ubuntu/mint type 'perf' into console to get info about what package to install, docs at https://perf.wiki.kernel.org/index.php/Tutorial, 'oprofile' (pretty much the same featureset as perf, sometimes hard to set up) or VTune mentioned here. Never expect gprof to give you reliable data as to how much time which function takes. Callgrind/kcachegrind is also pretty good if your code doesn't spend a lot of time on i/o, system calls, etc (as the main code is running in a slow VM - anything not running in that VM will seem to run much faster). Furthermore, _neither_ of these requires compiling with special flags. As for debug symbols, it's best to enable optimizations together with enabling debug symbols. Optimizations are not a big issue - even if some functions were inlined, these tools give you per-line and per-instruction results. Not to mention cache hits/misses, branches, branch mispredictions, and if you use CPU specific event IDs whatever else your CPU can record. AND it doesn't affect performance of profiled code measurably, unless you set an insanely high sample rate. And if this sounds difficult to configure, most of these tools (perf at the very least) have very sane defaults that give way more useful results than gprof. TLDR: gprof is horrible. Never use it for profiling. There are approximaly 5 billion better tools that give more detailed results _and_ are easier to use. I seriously need to write a blog post/article about this.Am 24.06.2013 18:15, schrieb Richard Webb:the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.htmlDMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
Jun 25 2013
On 6/25/2013 2:27 AM, Kiith-Sa wrote:TLDR: gprof is horrible. Never use it for profiling. There are approximaly 5 billion better tools that give more detailed results _and_ are easier to use.One thing gprof does give is accurate counts, and accurate fan in / fan out data. The fan data is very useful in finding out who is calling who and how many times.I seriously need to write a blog post/article about this.Please do so!
Jun 25 2013
On 6/24/13, dennis luehring <dl.soluz gmx.net> wrote:i know - my question was - how does that look using msvc...The recent speed improvements in git-head have improve both the DMC-based and MSVC-based DMD executables. It shaved off from 700msecs on small projects up to 4 seconds on large projects for clean builds (for me). That being said the MSVC built compiler is usually twice as fast as the DMC built one. It goes to show that whatever is in that DMC backend is long outdated.
Jun 24 2013
On Mon, 24 Jun 2013 11:44:24 -0700, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 6/24/13, dennis luehring <dl.soluz gmx.net> wrote:I asked Walter about that recently, he says that with all the focus on D there hasn't been enough time to seriously develop the backend in a LONG time. IIRC he said it's been something like 15 years since it's seen any serious development work... NOTE: It came up because I was asking about getting things like auto-vectorization and Haswell's new Transactional Synchronization Extensions and the related instructions into the backend. -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/i know - my question was - how does that look using msvc...The recent speed improvements in git-head have improve both the DMC-based and MSVC-based DMD executables. It shaved off from 700msecs on small projects up to 4 seconds on large projects for clean builds (for me). That being said the MSVC built compiler is usually twice as fast as the DMC built one. It goes to show that whatever is in that DMC backend is long outdated.
Jun 24 2013
On 6/24/2013 12:27 PM, Adam Wilson wrote:I asked Walter about that recently, he says that with all the focus on D there hasn't been enough time to seriously develop the backend in a LONG time. IIRC he said it's been something like 15 years since it's seen any serious development work...It's also possible that a faster malloc() is responsible for much of it.
Jun 24 2013
On 6/24/2013 6:19 AM, dennis luehring wrote:how does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterThe profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
Jun 24 2013
Am 24.06.2013 20:01, schrieb Walter Bright:On 6/24/2013 6:19 AM, dennis luehring wrote:different optimization strategies can shuffle the results totaly - or do we talk about different thingshow does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterThe profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
Jun 24 2013
On Monday, 24 June 2013 at 18:01:11 UTC, Walter Bright wrote:On 6/24/2013 6:19 AM, dennis luehring wrote:I'm confused.... Different optimisers often produce radically different profile results. Or are am I misunderstanding you?how does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterThe profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
Jun 25 2013
Am 26.06.2013 08:47, schrieb John Colvin:On Monday, 24 June 2013 at 18:01:11 UTC, Walter Bright wrote:maybe he talk about the dmd -release parameter but even that should produce different results (or is the same code use the same amount?)On 6/24/2013 6:19 AM, dennis luehring wrote:I'm confused.... Different optimisers often produce radically different profile results. Or are am I misunderstanding you?how does that look using msvc compiling the dmd compiler as it turns out that msvc make dmd much fasterThe profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
Jun 26 2013
For some reason gprof nerver terminates on my system. So here is the same data recorded with oprofile's operf. It also includes kernel, libc and ld figures. http://codepad.org/zL3C6eQd whole phobos as shared library with unittests and BUILD=debug http://codepad.org/lITSMbhD BUILD=release didn't compile with some error about .offsetof
Jun 24 2013
Am 24.06.2013 16:52, schrieb Martin Nowak:For some reason gprof nerver terminates on my system. So here is the same data recorded with oprofile's operf. It also includes kernel, libc and ld figures. http://codepad.org/zL3C6eQd whole phobos as shared library with unittests and BUILD=debug http://codepad.org/lITSMbhD BUILD=release didn't compile with some error about .offsetofwhat is the sense of doing profiling in BUILD=debug? or is the optimizer still active?
Jun 24 2013
On 6/24/2013 8:57 AM, dennis luehring wrote:what is the sense of doing profiling in BUILD=debug?Non-debug builds may strip out the symbols, rendering the profiling report useless.
Jun 24 2013
On 06/24/2013 05:57 PM, dennis luehring wrote:what is the sense of doing profiling in BUILD=debug? or is the optimizer still active?That is compiling phobos in debug (non-optimized) mode not dmd. As I said building phobos in release didn't succeed for whatever reason. I can try to install kernel debuginfo that 12% might contain some useful information.
Jun 24 2013
On 06/24/2013 08:43 PM, Martin Nowak wrote:I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 24 2013
Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak <code dawg.eu>:On 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ? -- MarcoI can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 24 2013
On 25 June 2013 07:46, Marco Leise <Marco.Leise gmx.de> wrote:Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak <code dawg.eu>:The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';On 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 25 2013
On 6/25/13 2:13 AM, Iain Buclaw wrote:On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de> wrote:Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc. AndreiAm Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak<code dawg.eu>:The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.On 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 25 2013
On 25 June 2013 17:56, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 6/25/13 2:13 AM, Iain Buclaw wrote:I meant it in the most light hearted way possible. Though it's no secret that heavily templated code, coupled with string mixins (eg: Neat is a good example which consumes at least 3GB memory to compile one module IIRC) can see memory allocations sore. I do believe feep preloads the boehm-gc when compiling to mitigate this. :o) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de> wrote:Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc.Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak<code dawg.eu>:The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.On 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 25 2013
On Tuesday, 25 June 2013 at 16:56:55 UTC, Andrei Alexandrescu wrote:On 6/25/13 2:13 AM, Iain Buclaw wrote:The problem is not that malloc is slow, but malloc is being called too much. A hierarchical profile would be more useful here, to see exactly what's allocating all that memory. I'm also intrigued as to why so much string activity is happening (vfprintf!? num_to_str!?)On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de> wrote:Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc.Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak<code dawg.eu>:The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.On 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 26 2013
On Tuesday, 25 June 2013 at 06:46:54 UTC, Marco Leise wrote:Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak <code dawg.eu>:Maybe not DMD, so much as GNU ld? Whatever the case, I'm not surprised to see their iconv that high up in the list; in my experience, it's horrifically slow. Ah, and a fun question: does that number change significantly when you modify your locale variables? (I think it should be enough to just export LC_CTYPE="C".) -WyattOn 06/24/2013 08:43 PM, Martin Nowak wrote:Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?I can try to install kernel debuginfo that 12% might contain some useful information.http://codepad.org/gWrGvm40
Jun 25 2013