www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D on next-gen consoles and for game development

reply "Brad Anderson" <eco gnuk.net> writes:
While there hasn't been anything official, I think it's a safe 
bet to say that D is being used for a major title, Remedy's 
Quantum Break, featured prominently during the announcement of 
Xbox One. Quantum Break doesn't come out until 2014 so the 
timeline seems about right (Remedy doesn't appear to work on more 
than one game at a time from what I can tell).


That's pretty huge news.


Now I'm wondering what can be done to foster this newly acquired 
credibility in games.  By far the biggest issue I hear about when 
it comes to people working on games in D is the garbage 
collector.  You can work around the GC without too much 
difficulty as Manu's experience shared in his DConf talk shows 
but a lot of people new to D don't know how to do that.  We could 
also use some tools and guides to help people identify and avoid 
GC use when necessary.

 nogc comes to mind (I believe Andrei mentioned it during one of 
the talks released). [1][2]

Johannes Pfau's work in progress -vgc command line option [3] 
would be another great tool that would help people identify GC 
allocations.  This or something similar could also be used to 
document throughout phobos when GC allocations can happen (and 
help eliminate it where it makes sense to).

There was a lot of interesting stuff in Benjamin Thaut's article 
about GC versus manual memory management in a game [4] and the 
discussion about it on the forums [5].  A lot of this collective 
knowledge built up on manual memory management techniques 
specific to D should probably be formalized and added to the 
official documentation.  There is a Memory Management [6] page in 
the documentation but it appears to be rather dated at this point 
and not particularly applicable to modern D2 (no mention of 
emplace or scoped and it talks about using delete and scope 
classes).

Game development is one place D can really get a foothold but all 
too often the GC is held over D's head because people taking 
their first look at D don't know how to avoid using it and often 
don't realize you can avoid using it entirely. This is easily the 
most common issue raised by newcomers to D with a C or C++ 
background that I see in the #d IRC channel (many of which are 
interested in game dev but concerned the GC will kill their 
game's performance).


1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
2: http://wiki.dlang.org/DIP18
3: https://github.com/D-Programming-Language/dmd/pull/1886
4: http://3d.benjamin-thaut.de/?p=20#more-20
5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
6: http://dlang.org/memory.html
May 23 2013
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/23/2013 08:13 PM, Brad Anderson wrote:
 Now I'm wondering what can be done to foster this newly acquired credibility in
 games.  By far the biggest issue I hear about when it comes to people working
on
 games in D is the garbage collector.  You can work around the GC without too
 much difficulty as Manu's experience shared in his DConf talk shows but a lot
of
 people new to D don't know how to do that.  We could also use some tools and
 guides to help people identify and avoid GC use when necessary.
As a starting point, do we have a list of the Phobos functions that allocate using GC when there's no need to? That's a concern of Manu's that it ought to be possible to address relatively swiftly if the information is to hand.
May 23 2013
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 23 May 2013 at 18:22:54 UTC, Joseph Rushton Wakeling 
wrote:
 On 05/23/2013 08:13 PM, Brad Anderson wrote:
 Now I'm wondering what can be done to foster this newly 
 acquired credibility in
 games.  By far the biggest issue I hear about when it comes to 
 people working on
 games in D is the garbage collector.  You can work around the 
 GC without too
 much difficulty as Manu's experience shared in his DConf talk 
 shows but a lot of
 people new to D don't know how to do that.  We could also use 
 some tools and
 guides to help people identify and avoid GC use when necessary.
As a starting point, do we have a list of the Phobos functions that allocate using GC when there's no need to? That's a concern of Manu's that it ought to be possible to address relatively swiftly if the information is to hand.
I think that's where Johannes Pfau's -vgc can come in and help. The phobos unit tests have pretty good coverage so building those with -vgc would, in theory, point out the vast majority of places in phobos that use the gc.
May 23 2013
prev sibling parent reply "Don" <turnyourkidsintocash nospam.com> writes:
On Thursday, 23 May 2013 at 18:22:54 UTC, Joseph Rushton Wakeling 
wrote:
 On 05/23/2013 08:13 PM, Brad Anderson wrote:
 Now I'm wondering what can be done to foster this newly 
 acquired credibility in
 games.  By far the biggest issue I hear about when it comes to 
 people working on
 games in D is the garbage collector.  You can work around the 
 GC without too
 much difficulty as Manu's experience shared in his DConf talk 
 shows but a lot of
 people new to D don't know how to do that.  We could also use 
 some tools and
 guides to help people identify and avoid GC use when necessary.
It's worth noting that our code at Sociomantic faces *exactly* the same issues. We cannot use Phobos because of its reliance on the GC. Essentially, we want to have the option of avoiding GC usage in every single function.
 As a starting point, do we have a list of the Phobos functions 
 that allocate
 using GC when there's no need to?  That's a concern of Manu's 
 that it ought to
 be possible to address relatively swiftly if the information is 
 to hand.
That is only part of the problem with Phobos. The bigger problem is with the functions that DO need to allocate memory. In Tango, and in our code, all such functions accept a buffer to store the results in. So that, even though they need to allocate memory, if you call the function a thousand times, it only allocates memory once, and keeps reusing the buffer. I'm not sure how feasible it is to add that afterwards. I hope it can be done without changing all the API's, but I fear it might not be. But anyway, after fixing the obvious Phobos offenders, another huge step would be to get TempAlloc into druntime and used wherever possible in Phobos.
May 24 2013
next sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Friday, 24 May 2013 at 07:57:42 UTC, Don wrote:
 It's worth noting that our code at Sociomantic faces *exactly* 
 the same issues.
It is worth noting that _anyone_ trying to write code with either soft or hard real-time requirements faces exactly the same issues ;)
May 24 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 17:57, Don <turnyourkidsintocash nospam.com> wrote:

 On Thursday, 23 May 2013 at 18:22:54 UTC, Joseph Rushton Wakeling wrote:

 On 05/23/2013 08:13 PM, Brad Anderson wrote:

 Now I'm wondering what can be done to foster this newly acquired
 credibility in
 games.  By far the biggest issue I hear about when it comes to people
 working on
 games in D is the garbage collector.  You can work around the GC without
 too
 much difficulty as Manu's experience shared in his DConf talk shows but
 a lot of
 people new to D don't know how to do that.  We could also use some tools
 and
 guides to help people identify and avoid GC use when necessary.
It's worth noting that our code at Sociomantic faces *exactly* the same issues. We cannot use Phobos because of its reliance on the GC. Essentially, we want to have the option of avoiding GC usage in every single function. As a starting point, do we have a list of the Phobos functions that
 allocate
 using GC when there's no need to?  That's a concern of Manu's that it
 ought to
 be possible to address relatively swiftly if the information is to hand.
That is only part of the problem with Phobos. The bigger problem is with the functions that DO need to allocate memory. In Tango, and in our code, all such functions accept a buffer to store the results in. So that, even though they need to allocate memory, if you call the function a thousand times, it only allocates memory once, and keeps reusing the buffer. I'm not sure how feasible it is to add that afterwards. I hope it can be done without changing all the API's, but I fear it might not be.
Yeah, I've often wanted API's in that fashion too. I wonder if it would be worth creating overloads of allocating functions that receive an output buffer argument, rather than return an allocated buffer... Too messy? But anyway, after fixing the obvious Phobos offenders, another huge step
 would be to get TempAlloc into druntime and used wherever possible in
 Phobos.
How does that work? One pattern I've used a lot is, since we have a regular 60hz timeslice and fairly a regular pattern from frame to frame, we use a temp heap which pushes allocations on the end like a stack, then wipe it clean at the state of the next frame. Great for any small allocations that last no longer than a single frame. It's fast (collection is instant), and it also combats memory fragmentation, which is also critically important when working on memory limited systems with no virtual memory/page file.
May 24 2013
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 05/24/2013 04:33 PM, Manu wrote:
     But anyway, after fixing the obvious Phobos offenders, another huge
     step would be to get TempAlloc into druntime and used wherever
     possible in Phobos.


 How does that work?

 One pattern I've used a lot is, since we have a regular 60hz timeslice
 and fairly a regular pattern from frame to frame, we use a temp heap
 which pushes allocations on the end like a stack, then wipe it clean at
 the state of the next frame.
 Great for any small allocations that last no longer than a single frame.
 It's fast (collection is instant), and it also combats memory
 fragmentation, which is also critically important when working on memory
 limited systems with no virtual memory/page file.
Yes, that is basically it. https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/region.d
May 25 2013
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Saturday, May 25, 2013 00:33:10 Manu wrote:
 Yeah, I've often wanted API's in that fashion too.
 I wonder if it would be worth creating overloads of allocating functions
 that receive an output buffer argument, rather than return an allocated
 buffer...
 Too messy?
We already have stuff like format vs formattedWrite where one allocates and the other takes an output range. We should adopt that practice in general. Where possible, it should probably be done with an overload of the function, but where that's not possible, we can simply create a new function with a similar name. Then any function which could allocate has the option of writing to an output range instead (which could be a delegate or an array or whatever) and avoid the allocation - though I'm not sure that arrays as output ranges currently handle running out of space very well, so we might need to figure something out there to properly deal with the case where there isn't enough room in the output range (arguably, output ranges need a bit of work in general though). Regardless, the main question with regards to messiness is whether we can get away with creating overloads for existing functions which allocate or whether we'd be forced to create new ones (possibly using a naming scheme similar to how we have InPlace, only which indicates that it takes an output range or doesn't allocate or whatever). - Jonathan M Davis
May 24 2013
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 24 May 2013 at 19:44:23 UTC, Jonathan M Davis wrote:
 We already have stuff like format vs formattedWrite where one 
 allocates and the
 other takes an output range. We should adopt that practice in 
 general. Where
 possible, it should probably be done with an overload of the 
 function, but
 where that's not possible, we can simply create a new function 
 with a similar
 name.
Sounds good to me. Should the overloads return the output range or void?
May 24 2013
next sibling parent "Diggory" <diggsey googlemail.com> writes:
On Saturday, 25 May 2013 at 02:41:00 UTC, Brad Anderson wrote:
 On Friday, 24 May 2013 at 19:44:23 UTC, Jonathan M Davis wrote:
 We already have stuff like format vs formattedWrite where one 
 allocates and the
 other takes an output range. We should adopt that practice in 
 general. Where
 possible, it should probably be done with an overload of the 
 function, but
 where that's not possible, we can simply create a new function 
 with a similar
 name.
Sounds good to me. Should the overloads return the output range or void?
If it returned the output range it would be possible to make another function which returns a temporary output range and then easily chain together function calls: CallWindowsApiW(mystr.writeUTF16z(tempBuffer())) No GC allocation but not an unpleasant syntax either.
May 24 2013
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, May 25, 2013 04:40:58 Brad Anderson wrote:
 On Friday, 24 May 2013 at 19:44:23 UTC, Jonathan M Davis wrote:
 We already have stuff like format vs formattedWrite where one
 allocates and the
 other takes an output range. We should adopt that practice in
 general. Where
 possible, it should probably be done with an overload of the
 function, but
 where that's not possible, we can simply create a new function
 with a similar
 name.
Sounds good to me. Should the overloads return the output range or void?
Right now, all of the functions that we have like that don't return the output range, but I don't know that it would be a bad idea if they did. - Jonathan M Davis
May 24 2013
prev sibling next sibling parent reply "Szymon Gatner" <noemail gmail.com> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a safe 
 bet to say that D is being used for a major title, Remedy's 
 Quantum Break, featured prominently during the announcement of
May I ask where this intel comes from? Do you have any more details on how D is used in the project?
May 23 2013
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 23 May 2013 at 18:43:01 UTC, Szymon Gatner wrote:
 On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a safe 
 bet to say that D is being used for a major title, Remedy's 
 Quantum Break, featured prominently during the announcement of
May I ask where this intel comes from? Do you have any more details on how D is used in the project?
You can watch Manu's talk from DConf here: http://www.youtube.com/watch?v=FKceA691Wcg tl;dw They are using it as a rapid turn around scripting language for their C++ engine.
May 23 2013
parent reply "Szymon Gatner" <noemail gmail.com> writes:
On Thursday, 23 May 2013 at 18:50:11 UTC, Brad Anderson wrote:
 On Thursday, 23 May 2013 at 18:43:01 UTC, Szymon Gatner wrote:
 On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a 
 safe bet to say that D is being used for a major title, 
 Remedy's Quantum Break, featured prominently during the 
 announcement of
May I ask where this intel comes from? Do you have any more details on how D is used in the project?
You can watch Manu's talk from DConf here: http://www.youtube.com/watch?v=FKceA691Wcg tl;dw They are using it as a rapid turn around scripting language for their C++ engine.
Ah I did watch it. Didn't realize Manu works at Remedy. Being small indie game dev I totally agree on the industry needing salvation from C++. I am watching D closely for few years now but until compiler is more stable (tho this is less and less of a problem) and there is decent ARM support I still can't allow myself to switch. And the day of the switch will be glorious one.
May 23 2013
next sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 05:02, Szymon Gatner <noemail gmail.com> wrote:

 On Thursday, 23 May 2013 at 18:50:11 UTC, Brad Anderson wrote:

 On Thursday, 23 May 2013 at 18:43:01 UTC, Szymon Gatner wrote:

 On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:

 While there hasn't been anything official, I think it's a safe bet to
 say that D is being used for a major title, Remedy's Quantum Break,
 featured prominently during the announcement of
May I ask where this intel comes from? Do you have any more details on how D is used in the project?
You can watch Manu's talk from DConf here: http://www.youtube.com/watch?**v=FKceA691Wcg<http://www.youtube.com/watch?v=FKceA691Wcg> tl;dw They are using it as a rapid turn around scripting language for their C++ engine.
Ah I did watch it. Didn't realize Manu works at Remedy. Being small indie game dev I totally agree on the industry needing salvation from C++. I am watching D closely for few years now but until compiler is more stable (tho this is less and less of a problem) and there is decent ARM support I still can't allow myself to switch. And the day of the switch will be glorious one.
I really hope D on ARM gets some more attention in the near future. The day it can be used on Android will be a very significant breakthrough!
May 23 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/24/2013 01:25 AM, Manu wrote:
 I really hope D on ARM gets some more attention in the near future. The day it
 can be used on Android will be a very significant breakthrough!
GDC is close to being fully usable on ARM, no? And as I recall the only (albeit major) problem you had with GDC was the delay between bugfixes landing in the D frontend and carrying over to GDC. So, the solution here might be the work to properly generalize the frontend so that it will plug-and-play on top of any of the available backends.
May 23 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 09:44, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net>wrote:

 On 05/24/2013 01:25 AM, Manu wrote:
 I really hope D on ARM gets some more attention in the near future. The
day it
 can be used on Android will be a very significant breakthrough!
GDC is close to being fully usable on ARM, no? And as I recall the only (albeit major) problem you had with GDC was the delay between bugfixes landing in the D frontend and carrying over to GDC. So, the solution here might be the work to properly generalize the frontend so that it will plug-and-play on top of any of the available backends.
Well the compiler seems fine actually. It generates good ARM code in my experience, ditto for PPC, MIPS, SH4 (those are all I have tested). Druntime needs to be ported to Bionic. People have made a start, but I recall mention of some complications that need some work? iOS needs extern(ObjC), but it's a fairly standard posix underneath, so should be less work on the runtime. Systems like WiiU/Wii/PS3/XBox360, etc all need runtimes, and those will probably not be developed by the D community. It would land on a general gamedev's shoulders to do those, so I would suggest the approach here would be to make a step-buy-step guide to porting druntime. Make the process as simple as possible for individuals wanting to support other 'niche' platforms...
May 23 2013
parent reply "Joseph Rushton Wakeling" <joseph.wakeling webdrake.net> writes:
On Friday, 24 May 2013 at 00:06:05 UTC, Manu wrote:
 Systems like WiiU/Wii/PS3/XBox360, etc all need runtimes, and 
 those will
 probably not be developed by the D community.
 It would land on a general gamedev's shoulders to do those, so 
 I would
 suggest the approach here would be to make a step-buy-step 
 guide to porting
 druntime. Make the process as simple as possible for 
 individuals wanting to
 support other 'niche' platforms...
Do you think we could expect those ports to be given back to the community when they get written? Or is it more likely that game studios will keep their ports to themselves?
May 23 2013
next sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 10:59, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net>wrote:

 On Friday, 24 May 2013 at 00:06:05 UTC, Manu wrote:

 Systems like WiiU/Wii/PS3/XBox360, etc all need runtimes, and those will
 probably not be developed by the D community.
 It would land on a general gamedev's shoulders to do those, so I would
 suggest the approach here would be to make a step-buy-step guide to
 porting
 druntime. Make the process as simple as possible for individuals wanting
 to
 support other 'niche' platforms...
Do you think we could expect those ports to be given back to the community when they get written? Or is it more likely that game studios will keep their ports to themselves?
I'd like to think they'd be made available. I certainly would. But you can never predict what the suits up top will tell you that you have to do.
May 23 2013
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Fri, 24 May 2013 02:59:44 +0200
"Joseph Rushton Wakeling" <joseph.wakeling webdrake.net> wrote:

 On Friday, 24 May 2013 at 00:06:05 UTC, Manu wrote:
 Systems like WiiU/Wii/PS3/XBox360, etc all need runtimes, and 
 those will
 probably not be developed by the D community.
 It would land on a general gamedev's shoulders to do those, so 
 I would
 suggest the approach here would be to make a step-buy-step 
 guide to porting
 druntime. Make the process as simple as possible for 
 individuals wanting to
 support other 'niche' platforms...
Do you think we could expect those ports to be given back to the community when they get written? Or is it more likely that game studios will keep their ports to themselves?
It would be prohibited by console manufacturer's NDAs/developer-licenses. If you're an official licensed console developer, you can't provide any console-specific code or technical specs to anyone who isn't also covered by the same licensed developer agreement. I'm sure it could be released to, or shared with, other licensed developers (might be paperwork involved, I dunno), but not to the general community. Licensed console dev is fairly cloak-and-dagger (minus the dagger, perhaps). Game console manufacturers keep a tight enough grip on their systems to make even Apple blush. For such a thing to be released back to the "community" it would have to come from the homebrew scene (which AIUI, could then be used by licensed devs too...or at least that was my understanding with GBA, so my info may be out-of-date). An officially licensed developer would lose their license, or get sued, or something.
May 23 2013
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 23, 2013 at 08:22:43PM +0200, Joseph Rushton Wakeling wrote:
 On 05/23/2013 08:13 PM, Brad Anderson wrote:
 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it
 comes to people working on games in D is the garbage collector.  You
 can work around the GC without too much difficulty as Manu's
 experience shared in his DConf talk shows but a lot of people new to
 D don't know how to do that.  We could also use some tools and
 guides to help people identify and avoid GC use when necessary.
As a starting point, do we have a list of the Phobos functions that allocate using GC when there's no need to? That's a concern of Manu's that it ought to be possible to address relatively swiftly if the information is to hand.
I listened to Manu's talk yesterday, and I agree with what he said, that Phobos functions that don't *need* to allocate, shouldn't. Andrei was also enthusiastic about std.algorithm being almost completely allocation-free. Maybe we should file bugs (enhancement requests?) for all such Phobos functions? On the other hand, perhaps functions that *need* to allocate should be labelled as such (esp. in the Phobos docs), so that users know what they're getting into. T -- My program has no bugs! Only unintentional features...
May 23 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-05-23 20:43, H. S. Teoh wrote:

 On the other hand, perhaps functions that *need* to allocate should be
 labelled as such (esp. in the Phobos docs), so that users know what
 they're getting into.
Perhaps using a UDA. -- /Jacob Carlborg
May 23 2013
prev sibling next sibling parent reply "Kiith-Sa" <kiithsacmp gmail.com> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a safe 
 bet to say that D is being used for a major title, Remedy's 
 Quantum Break, featured prominently during the announcement of 
 Xbox One. Quantum Break doesn't come out until 2014 so the 
 timeline seems about right (Remedy doesn't appear to work on 
 more than one game at a time from what I can tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly 
 acquired credibility in games.  By far the biggest issue I hear 
 about when it comes to people working on games in D is the 
 garbage collector.  You can work around the GC without too much 
 difficulty as Manu's experience shared in his DConf talk shows 
 but a lot of people new to D don't know how to do that.  We 
 could also use some tools and guides to help people identify 
 and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one 
 of the talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] 
 would be another great tool that would help people identify GC 
 allocations.  This or something similar could also be used to 
 document throughout phobos when GC allocations can happen (and 
 help eliminate it where it makes sense to).

 There was a lot of interesting stuff in Benjamin Thaut's 
 article about GC versus manual memory management in a game [4] 
 and the discussion about it on the forums [5].  A lot of this 
 collective knowledge built up on manual memory management 
 techniques specific to D should probably be formalized and 
 added to the official documentation.  There is a Memory 
 Management [6] page in the documentation but it appears to be 
 rather dated at this point and not particularly applicable to 
 modern D2 (no mention of emplace or scoped and it talks about 
 using delete and scope classes).

 Game development is one place D can really get a foothold but 
 all too often the GC is held over D's head because people 
 taking their first look at D don't know how to avoid using it 
 and often don't realize you can avoid using it entirely. This 
 is easily the most common issue raised by newcomers to D with a 
 C or C++ background that I see in the #d IRC channel (many of 
 which are interested in game dev but concerned the GC will kill 
 their game's performance).


 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-Programming-Language/dmd/pull/1886
 4: http://3d.benjamin-thaut.de/?p=20#more-20
 5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
 6: http://dlang.org/memory.html
Without official confirmation, I think it's rather early to assume D's being used in Quantum Break. D might compile on the new consoles, but what about druntime/phobos/etc ? That said, I support this idea. When I get time I'll try looking at Phobos if there is some low-hanging fruit with regards to GC usage and submit pull requests (I didn't make any non-doc contribution to Phobos yet, but I have a general idea of how its source looks). I also think that many people overreact about GC too much. nogc is certainly a good idea, but I think strategically using malloc, disabling/reenabling GC, using GC.free and even just using standard GC features *while taking care to avoid unnecessary allocations* is vastly better than outright removing GC. It'd be good to have an easy-to-use way to manually allocate classes/structs in Phobos (higher-level than emplace, something close in usability to C++ new/delete), preferably with a way to override the allocation mechanism (I assume the fabled "allocators" have something to do with this? Maybe we'll get them once DNF is released... ... ...)
May 23 2013
next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Thu, 23 May 2013 21:37:26 +0200
"Kiith-Sa" <kiithsacmp gmail.com> wrote:

 On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a safe 
 bet to say that D is being used for a major title, Remedy's 
 Quantum Break, featured prominently during the announcement of 
 Xbox One. Quantum Break doesn't come out until 2014 so the 
 timeline seems about right (Remedy doesn't appear to work on 
 more than one game at a time from what I can tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly 
 acquired credibility in games.  By far the biggest issue I hear 
 about when it comes to people working on games in D is the 
 garbage collector.  You can work around the GC without too much 
 difficulty as Manu's experience shared in his DConf talk shows 
 but a lot of people new to D don't know how to do that.  We 
 could also use some tools and guides to help people identify 
 and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one 
 of the talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] 
 would be another great tool that would help people identify GC 
 allocations.  This or something similar could also be used to 
 document throughout phobos when GC allocations can happen (and 
 help eliminate it where it makes sense to).

 There was a lot of interesting stuff in Benjamin Thaut's 
 article about GC versus manual memory management in a game [4] 
 and the discussion about it on the forums [5].  A lot of this 
 collective knowledge built up on manual memory management 
 techniques specific to D should probably be formalized and 
 added to the official documentation.  There is a Memory 
 Management [6] page in the documentation but it appears to be 
 rather dated at this point and not particularly applicable to 
 modern D2 (no mention of emplace or scoped and it talks about 
 using delete and scope classes).

 Game development is one place D can really get a foothold but 
 all too often the GC is held over D's head because people 
 taking their first look at D don't know how to avoid using it 
 and often don't realize you can avoid using it entirely. This 
 is easily the most common issue raised by newcomers to D with a 
 C or C++ background that I see in the #d IRC channel (many of 
 which are interested in game dev but concerned the GC will kill 
 their game's performance).


 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-Programming-Language/dmd/pull/1886
 4: http://3d.benjamin-thaut.de/?p=20#more-20
 5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
 6: http://dlang.org/memory.html
Without official confirmation, I think it's rather early to assume D's being used in Quantum Break. D might compile on the new consoles, but what about druntime/phobos/etc ?
I'd like to hear an official confirmation (or denial) at this point, too (assuming Remedy is at a point where they're comfortable making a statement on the matter - and after all, it would make sense if they're still keeping open the possibility of backing out of D for whatever they're using it on by release if they end up needing to do so, even if such a possibility is very unlikely). However, I do think it's a safe bet: Like Brad said, Remedy is a relatively small dev company that doesn't have a history of working on multiple AAA titles simultaneously. They *are* known to have one other mystery title besides Quantum Break in development, but it's for iOS - so it's not a AAA title, and it's definitely not x86, so that one can definitely be ruled out (unless Manu was messing with us to keep it super-secret ;) ). As far as I'm concerned, the whole "Quantum Break uses D" thing *is* technically a rumor, and I think it's probably best to keep it framed that way out of respect for Manu and his employer. But it's a very convincing rumor that I do believe.
May 23 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, May 23, 2013 21:37:26 Kiith-Sa wrote:
 It'd be good to have an easy-to-use way to manually allocate
 classes/structs in Phobos (higher-level than emplace, something
 close in usability to C++ new/delete), preferably with a way to
 override the allocation mechanism (I assume the fabled
 "allocators" have something to do with this? Maybe we'll get them
 once DNF is released... ... ...)'
Presumably, we'll get that with custom allocators. So, it's probably just a question of how long it'll take to sort those out. - Jonathan M Davis
May 23 2013
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 05:37, Kiith-Sa <kiithsacmp gmail.com> wrote:

 On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:

 While there hasn't been anything official, I think it's a safe bet to sa=
y
 that D is being used for a major title, Remedy's Quantum Break, featured
 prominently during the announcement of Xbox One. Quantum Break doesn't c=
ome
 out until 2014 so the timeline seems about right (Remedy doesn't appear =
to
 work on more than one game at a time from what I can tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it com=
es
 to people working on games in D is the garbage collector.  You can work
 around the GC without too much difficulty as Manu's experience shared in
 his DConf talk shows but a lot of people new to D don't know how to do
 that.  We could also use some tools and guides to help people identify a=
nd
 avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] would be
 another great tool that would help people identify GC allocations.  This=
or
 something similar could also be used to document throughout phobos when =
GC
 allocations can happen (and help eliminate it where it makes sense to).

 There was a lot of interesting stuff in Benjamin Thaut's article about G=
C
 versus manual memory management in a game [4] and the discussion about i=
t
 on the forums [5].  A lot of this collective knowledge built up on manua=
l
 memory management techniques specific to D should probably be formalized
 and added to the official documentation.  There is a Memory Management [=
6]
 page in the documentation but it appears to be rather dated at this poin=
t
 and not particularly applicable to modern D2 (no mention of emplace or
 scoped and it talks about using delete and scope classes).

 Game development is one place D can really get a foothold but all too
 often the GC is held over D's head because people taking their first loo=
k
 at D don't know how to avoid using it and often don't realize you can av=
oid
 using it entirely. This is easily the most common issue raised by newcom=
ers
 to D with a C or C++ background that I see in the #d IRC channel (many o=
f
 which are interested in game dev but concerned the GC will kill their
 game's performance).


 1: http://d.puremagic.com/issues/**show_bug.cgi?id=3D5219<http://d.purem=
agic.com/issues/show_bug.cgi?id=3D5219>
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-**Programming-Language/dmd/pull/**1886<https://g=
ithub.com/D-Programming-Language/dmd/pull/1886>
 4: http://3d.benjamin-thaut.de/?**p=3D20#more-20<http://3d.benjamin-thau=
t.de/?p=3D20#more-20>
 5: http://forum.dlang.org/post/**k27bh7$t7f$1 digitalmars.com<http://for=
um.dlang.org/post/k27bh7$t7f$1 digitalmars.com>
 6: http://dlang.org/memory.html
Without official confirmation, I think it's rather early to assume D's being used in Quantum Break. D might compile on the new consoles, but what about druntime/phobos/etc ? That said, I support this idea. When I get time I'll try looking at Phobos if there is some low-hanging fruit with regards to GC usage and submit pull requests (I didn't make an=
y
 non-doc contribution to Phobos yet, but I have a general idea of how its
 source looks).

 I also think that many people overreact about GC too much.  nogc is
 certainly a good idea, but I think strategically using malloc,
 disabling/reenabling GC, using GC.free and even just using standard GC
 features *while taking care to
 avoid unnecessary allocations* is vastly better than outright removing GC=
.

Just to be clear, while I've hard many have, I've NEVER argued for removing
the GC. I think that's a hallmark of a modern language. I want to use the
GC in games, but it needs to have performance characteristics that are
applicable to realtime and embedded use.
Those are:
1. Can't stop the world.
2. Needs tight controls, enable/disable, and the allocators interface so
alternative memory sources can be used in mane places.
3. Needs to (somehow) run incrementally. I'm happy to budget a few hundred
=C2=B5s per frame, but not a millisecond every 10 frames, or 1 second every=
 1000.
    It can have 1-2% of overall frame time each frame, but it can't have
10-100% of random frames here and there. This results in framerate spikes.

The GC its self can be much less efficient than the existing GC if it
want's, it's only important that it can be halted at fine grained
intervals, and that it will eventually complete its collect cycle over the
long-term.
I know that an incremental GC like this is very complex, but I've never
heard of any real experiments, so maybe it's not impossible?

It'd be good to have an easy-to-use way to manually allocate
 classes/structs in Phobos (higher-level than emplace, something close in
 usability to C++ new/delete), preferably with a way to override the
 allocation mechanism (I assume the fabled "allocators" have something to =
do
 with this? Maybe we'll get them once DNF is released... ... ...)
May 23 2013
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/24/2013 01:34 AM, Manu wrote:
 Just to be clear, while I've hard many have, I've NEVER argued for removing the
 GC. I think that's a hallmark of a modern language. I want to use the GC in
 games, but it needs to have performance characteristics that are applicable to
 realtime and embedded use.
 Those are:
 1. Can't stop the world.
 2. Needs tight controls, enable/disable, and the allocators interface so
 alternative memory sources can be used in mane places.
 3. Needs to (somehow) run incrementally. I'm happy to budget a few hundred µs
 per frame, but not a millisecond every 10 frames, or 1 second every 1000.
     It can have 1-2% of overall frame time each frame, but it can't have
10-100%
 of random frames here and there. This results in framerate spikes.
 
 The GC its self can be much less efficient than the existing GC if it want's,
 it's only important that it can be halted at fine grained intervals, and that
it
 will eventually complete its collect cycle over the long-term.
 I know that an incremental GC like this is very complex, but I've never heard
of
 any real experiments, so maybe it's not impossible?
Maybe someone else can point to an example, but I can't think of any language prior to D that has both the precision and speed to be useful for games and embedded programming, and that also has GC built in. So it seems to me that this might well be an entirely new problem, as no other GC language or library has had the motivation to create something that satisfies these use parameters. This also seems to suggest that an ideal solution might be to have several different GC strategies, the choice of which could be made at compile time depending on what's most suitable for the application in question.
May 23 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-05-24 01:51, Joseph Rushton Wakeling wrote:

 This also seems to suggest that an ideal solution might be to have several
 different GC strategies, the choice of which could be made at compile time
 depending on what's most suitable for the application in question.
You can already swap the GC implementation at link time. -- /Jacob Carlborg
May 24 2013
next sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Friday, 24 May 2013 at 08:01:35 UTC, Jacob Carlborg wrote:
 On 2013-05-24 01:51, Joseph Rushton Wakeling wrote:

 This also seems to suggest that an ideal solution might be to 
 have several
 different GC strategies, the choice of which could be made at 
 compile time
 depending on what's most suitable for the application in 
 question.
You can already swap the GC implementation at link time.
Yep, exactly. Hard-wired GC is not the problem. Lack of alternative GC's is the problem. Lack of tools to reliably control avoiding of GC calls at all is the problem.
May 24 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 24 May 2013 at 08:01:35 UTC, Jacob Carlborg wrote:
 On 2013-05-24 01:51, Joseph Rushton Wakeling wrote:

 This also seems to suggest that an ideal solution might be to 
 have several
 different GC strategies, the choice of which could be made at 
 compile time
 depending on what's most suitable for the application in 
 question.
You can already swap the GC implementation at link time.
Granted the GC fit in the model. Which means no barriers for instance.
May 24 2013
prev sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 18:01, Jacob Carlborg <doob me.com> wrote:

 On 2013-05-24 01:51, Joseph Rushton Wakeling wrote:

  This also seems to suggest that an ideal solution might be to have several
 different GC strategies, the choice of which could be made at compile time
 depending on what's most suitable for the application in question.
You can already swap the GC implementation at link time.
Sure, but there's not an established suite of options to choose from. How do I select the incremental GC option? :)
May 24 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, May 24, 2013 at 09:34:41AM +1000, Manu wrote:
[...]
 Just to be clear, while I've hard many have, I've NEVER argued for
 removing the GC. I think that's a hallmark of a modern language. I
 want to use the GC in games, but it needs to have performance
 characteristics that are applicable to realtime and embedded use.
 Those are:
 1. Can't stop the world.
 2. Needs tight controls, enable/disable, and the allocators interface
 so alternative memory sources can be used in mane places.
 3. Needs to (somehow) run incrementally. I'm happy to budget a few
 hundred s per frame, but not a millisecond every 10 frames, or 1
 second every 1000.
     It can have 1-2% of overall frame time each frame, but it can't
 have 10-100% of random frames here and there. This results in
 framerate spikes.
Makes sense, so basically the GC should not cause jittery framerates, but should distribute its workload across frames so that the framerate is more-or-less constant?
 The GC its self can be much less efficient than the existing GC if it
 want's, it's only important that it can be halted at fine grained
 intervals, and that it will eventually complete its collect cycle over
 the long-term.

 I know that an incremental GC like this is very complex, but I've
 never heard of any real experiments, so maybe it's not impossible?
Is there a hard upper limit to how much time the GC can take per frame? Is it acceptable to use, say, a millisecond every frame as long as it's *every* frame and not every 10 frames (which causes jitter)? For me, I'm also interested in incremental GCs -- for time-sensitive applications (even if it's just soft realtime, not hard), long stop-the-world pauses are really disruptive. I'd rather have the option of a somewhat larger memory footprint and a less efficient GC (in terms of rate of memory reclamation) if it can be incremental, rather than a very efficient GC that introduces big pauses every now and then. I'm even willing to settle for lower framerates if it means I don't have to deal with framerate spikes that makes the result jittery and unpleasant. T -- "I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
May 24 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 00:58, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Fri, May 24, 2013 at 09:34:41AM +1000, Manu wrote:
 [...]
 Just to be clear, while I've hard many have, I've NEVER argued for
 removing the GC. I think that's a hallmark of a modern language. I
 want to use the GC in games, but it needs to have performance
 characteristics that are applicable to realtime and embedded use.
 Those are:
 1. Can't stop the world.
 2. Needs tight controls, enable/disable, and the allocators interface
 so alternative memory sources can be used in mane places.
 3. Needs to (somehow) run incrementally. I'm happy to budget a few
 hundred =E7=9B=9C per frame, but not a millisecond every 10 frames, or =
1
 second every 1000.
     It can have 1-2% of overall frame time each frame, but it can't
 have 10-100% of random frames here and there. This results in
 framerate spikes.
Makes sense, so basically the GC should not cause jittery framerates, but should distribute its workload across frames so that the framerate is more-or-less constant?
Precisely.
 The GC its self can be much less efficient than the existing GC if it
 want's, it's only important that it can be halted at fine grained
 intervals, and that it will eventually complete its collect cycle over
 the long-term.

 I know that an incremental GC like this is very complex, but I've
 never heard of any real experiments, so maybe it's not impossible?
Is there a hard upper limit to how much time the GC can take per frame? Is it acceptable to use, say, a millisecond every frame as long as it's *every* frame and not every 10 frames (which causes jitter)?
Errr, well, 1ms is about 7% of the frame, that's quite a long time. I'd be feeling pretty uneasy about any library that claimed to want 7% of the whole game time, and didn't offer any visual/gameplay benefits... Maybe if the GC happened to render some sweet water effects, or perform some awesome cloth physics or something while it was at it ;) I'd say 7% is too much for many developers. I think 2% sacrifice for simplifying memory management would probably get through without much argument. That's ~300=C2=B5s... a few hundred microseconds seems reasonable. Maybe a little more if targeting 30fps. If it stuck to that strictly, I'd possibly even grant it permission to stop the world... For me, I'm also interested in incremental GCs -- for time-sensitive
 applications (even if it's just soft realtime, not hard), long
 stop-the-world pauses are really disruptive. I'd rather have the option
 of a somewhat larger memory footprint and a less efficient GC (in terms
 of rate of memory reclamation) if it can be incremental, rather than a
 very efficient GC that introduces big pauses every now and then. I'm
 even willing to settle for lower framerates if it means I don't have to
 deal with framerate spikes that makes the result jittery and unpleasant.
One important detail to consider for realtime usage, is that it's very unconventional to allocate at runtime at all... Perhaps a couple of short lived temp buffers each frame, and the occasional change in resources as you progress through a world (which are probably not allocated in GC memory anyway). Surely the relatively high temporal consistency of the heap across cycles can be leveraged here somehow to help?
May 24 2013
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Friday, 24 May 2013 at 15:17:00 UTC, Manu wrote:
 Errr, well, 1ms is about 7% of the frame, that's quite a long 
 time.
 I'd be feeling pretty uneasy about any library that claimed to 
 want 7% of
 the whole game time, and didn't offer any visual/gameplay 
 benefits...
 Maybe if the GC happened to render some sweet water effects, or 
 perform
 some awesome cloth physics or something while it was at it ;)
 I'd say 7% is too much for many developers.

 I think 2% sacrifice for simplifying memory management would 
 probably get
 through without much argument.
 That's ~300µs... a few hundred microseconds seems reasonable. 
 Maybe a
 little more if targeting 30fps.
 If it stuck to that strictly, I'd possibly even grant it 
 permission to stop
 the world...
That is kind of biased, as you'll generally win on other aspects. You don't free anymore, you don't need to count reference (which can become qui te costly in multithreaded code), etc . . . Generally, I think what is needed for games is a concurrent GC. This incurs a memory usage overhead (floating garbage), and a tax on pointers write, but eliminate pause. That is a easy way to export a part of the load in another thread, improving concurrency in the application with little effort. With real time constraint, a memory overhead is better than a pause.
 One important detail to consider for realtime usage, is that 
 it's very
 unconventional to allocate at runtime at all...
 Perhaps a couple of short lived temp buffers each frame, and 
 the occasional
 change in resources as you progress through a world (which are 
 probably not
 allocated in GC memory anyway).
 Surely the relatively high temporal consistency of the heap 
 across cycles
 can be leveraged here somehow to help?
That is good because it means not a lot of floating garbage.
May 24 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, May 24, 2013 at 07:55:44PM +0200, deadalnix wrote:
 On Friday, 24 May 2013 at 15:17:00 UTC, Manu wrote:
Errr, well, 1ms is about 7% of the frame, that's quite a long time.
I'd be feeling pretty uneasy about any library that claimed to want
7% of the whole game time, and didn't offer any visual/gameplay
benefits...  Maybe if the GC happened to render some sweet water
effects, or perform some awesome cloth physics or something while it
was at it ;) I'd say 7% is too much for many developers.
OK.
I think 2% sacrifice for simplifying memory management would probably
get through without much argument.  That's ~300s... a few hundred
microseconds seems reasonable.  Maybe a little more if targeting
30fps.  If it stuck to that strictly, I'd possibly even grant it
permission to stop the world...
Makes sense. So basically some kind of incremental algorithm is in order.
 That is kind of biased, as you'll generally win on other aspects.
 You don't free anymore, you don't need to count reference (which can
 become qui te costly in multithreaded code), etc . . .
 
 Generally, I think what is needed for games is a concurrent GC. This
 incurs a memory usage overhead (floating garbage), and a tax on
 pointers write, but eliminate pause.
 
 That is a easy way to export a part of the load in another thread,
 improving concurrency in the application with little effort.
Wouldn't that require compiler support? Unless you're willing to forego nice slicing syntax and use custom types for all references / pointers.
 With real time constraint, a memory overhead is better than a pause.
 
One important detail to consider for realtime usage, is that it's
very unconventional to allocate at runtime at all...  Perhaps a
couple of short lived temp buffers each frame, and the occasional
change in resources as you progress through a world (which are
probably not allocated in GC memory anyway).  Surely the relatively
high temporal consistency of the heap across cycles can be leveraged
here somehow to help?
That is good because it means not a lot of floating garbage.
Isn't the usual solution here to use a memory pool that gets deallocated in one shot at the end of the cycle? So during a frame, you'd create a pool, allocate all short-lived objects on it, and at the end free the entire pool in one shot (which could just be a no-op if you recycle the pool memory for the temp objects in the next frame). Long-lived objects, of course, will have to live in the heap, and since they usually aren't in GC memory anyway, it wouldn't matter. A nave, hackish implementation might be a function to reset all GC memory to a clean slate. So basically, you treat the entire GC memory as your pool, and you allocate at will during a single frame; then at the end of the frame, you reset the GC, which is equivalent to collecting every object from GC memory except it can probably be done much faster than a real collection cycle. Anything that needs to live past a single frame will have to be allocated via malloc/free. So this way, you don't need any collection cycle at all. Of course, this may interact badly with certain language constructs: if any reference to GC objects lingers past a frame, you may break language guarantees (e.g. immutable array gets reused, violating immutability when you dereference the stale array pointer in the next frame). But if the per-frame code has no escaping GC references, this problem won't occur. Maybe if the per-frame code is marked pure? It doesn't work if you need to malloc/free, though (as those are inherently impure -- the pointers need to survive past the current frame). Can UDAs be used somehow to enforce no escaping GC references but allow non-GC references to persist past the frame? T -- People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
May 24 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 03:55, deadalnix <deadalnix gmail.com> wrote:

 On Friday, 24 May 2013 at 15:17:00 UTC, Manu wrote:

 Errr, well, 1ms is about 7% of the frame, that's quite a long time.
 I'd be feeling pretty uneasy about any library that claimed to want 7% o=
f
 the whole game time, and didn't offer any visual/gameplay benefits...
 Maybe if the GC happened to render some sweet water effects, or perform
 some awesome cloth physics or something while it was at it ;)
 I'd say 7% is too much for many developers.

 I think 2% sacrifice for simplifying memory management would probably ge=
t
 through without much argument.
 That's ~300=C2=B5s... a few hundred microseconds seems reasonable. Maybe=
a
 little more if targeting 30fps.
 If it stuck to that strictly, I'd possibly even grant it permission to
 stop
 the world...
That is kind of biased, as you'll generally win on other aspects. You don't free anymore, you don't need to count reference (which can become q=
ui
 te costly in multithreaded code), etc . . .
Freeing is a no-realtime-cost operation, since memory management is usually scheduled for between-scenes, or passed to other threads. And I've never heard of a major title that uses smart pointers, and assigns them around the place at runtime. I'm accustomed to memory management having a virtually zero cost at runtime= . So I don't think it's biased at all (in the sense you say), I think I'm being quite reasonable. Generally, I think what is needed for games is a concurrent GC. This incurs
 a memory usage overhead (floating garbage), and a tax on pointers write,
 but eliminate pause.
How much floating garbage? This might be acceptable... I don't know enough about it. That is a easy way to export a part of the load in another thread,
 improving concurrency in the application with little effort.
Are you saying a concurrent GC would operate exclusively in another thread? How does it scan the stack of all other threads? With real time constraint, a memory overhead is better than a pause. I wouldn't necessarily agree. Depends on the magnitude of each. What sort of magnitude are we talking? If you had 64mb of ram, and no virtual memory, would you be happy to sacrifice 20% of it? 5% of it? One important detail to consider for realtime usage, is that it's very
 unconventional to allocate at runtime at all...
 Perhaps a couple of short lived temp buffers each frame, and the
 occasional
 change in resources as you progress through a world (which are probably
 not
 allocated in GC memory anyway).
 Surely the relatively high temporal consistency of the heap across cycle=
s
 can be leveraged here somehow to help?
That is good because it means not a lot of floating garbage.
Right. But what's the overhead of a scan process (that's almost entirely redundant work)?
May 24 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 25 May 2013 at 01:26:19 UTC, Manu wrote:
 Freeing is a no-realtime-cost operation, since memory 
 management is usually
 scheduled for between-scenes, or passed to other threads.
 And I've never heard of a major title that uses smart pointers, 
 and assigns
 them around the place at runtime.
 I'm accustomed to memory management having a virtually zero 
 cost at runtime.
 So I don't think it's biased at all (in the sense you say), I 
 think I'm
 being quite reasonable.
Same goes for the GC, if you don't allocate, it wont trigger.
 How much floating garbage? This might be acceptable... I don't 
 know enough
 about it.
It about how much garbage you produce while the GC is collecting. This won't be collected before the next cycle. You say you don't generate a lot of garbage, so the cost should be pretty low.
 That is a easy way to export a part of the load in another 
 thread,
 improving concurrency in the application with little effort.
Are you saying a concurrent GC would operate exclusively in another thread? How does it scan the stack of all other threads? With real time constraint, a memory overhead is better than a pause.
Yes, it imply a pause to scan stack/registers, but then the thread can live it's life and the heap get scanned/collected. You never need to stop the world.
 I wouldn't necessarily agree. Depends on the magnitude of each.
 What sort of magnitude are we talking?
 If you had 64mb of ram, and no virtual memory, would you be 
 happy to
 sacrifice 20% of it? 5% of it?
They are so many different variations here with each pro and cons. Hard to give some hard numbers. In non VM code, you have basically 2 choices : - Tax on every pointer write and check a flag to know if some operations are needed. if the flag is true, you mark the old value as a root to the GC. - Only while collecting using page protection (seems like a better option for you as you'll not be collecting that much). The cost is way higher when collecting, but it is free when you aren't.
 Right. But what's the overhead of a scan process (that's almost 
 entirely
 redundant work)?
Roughly proportional to the live set of object you have. It is triggered when your heap grow past a certain limit.
May 24 2013
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On 25 May 2013 05:05, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Fri, May 24, 2013 at 07:55:44PM +0200, deadalnix wrote:
 On Friday, 24 May 2013 at 15:17:00 UTC, Manu wrote:
One important detail to consider for realtime usage, is that it's
very unconventional to allocate at runtime at all...  Perhaps a
couple of short lived temp buffers each frame, and the occasional
change in resources as you progress through a world (which are
probably not allocated in GC memory anyway).  Surely the relatively
high temporal consistency of the heap across cycles can be leveraged
here somehow to help?
That is good because it means not a lot of floating garbage.
Isn't the usual solution here to use a memory pool that gets deallocated in one shot at the end of the cycle? So during a frame, you'd create a pool, allocate all short-lived objects on it, and at the end free the entire pool in one shot (which could just be a no-op if you recycle the pool memory for the temp objects in the next frame). Long-lived objects, of course, will have to live in the heap, and since they usually aren't in GC memory anyway, it wouldn't matter.
This totally depends on the task. Almost every task will have its own solution. I think there are 3 common approaches though: 1. Just don't allocate. Seriously, you don't need dynamic memory anywhere near as much as you think you do. Get creative! 2. Use a pool like you say. 3. Use a scratch buffer or some sort. Allocate from this buffer linearly, and wipe it clean each frame. Similar to a pool but supporting irregularly sized allocations. A na=C4=ABve, hackish implementation might be a function to reset all GC
 memory to a clean slate. So basically, you treat the entire GC memory as
 your pool, and you allocate at will during a single frame; then at the
 end of the frame, you reset the GC, which is equivalent to collecting
 every object from GC memory except it can probably be done much faster
 than a real collection cycle. Anything that needs to live past a single
 frame will have to be allocated via malloc/free. So this way, you don't
 need any collection cycle at all.
Problem with implementing that pattern in the GC, is it's global now. You can no longer choose the solution for the problem as such. How do you allocate something with long life? malloc? What do non-realtime threads to? Of course, this may interact badly with certain language constructs: if
 any reference to GC objects lingers past a frame, you may break language
 guarantees (e.g. immutable array gets reused, violating immutability
 when you dereference the stale array pointer in the next frame). But if
 the per-frame code has no escaping GC references, this problem won't
 occur. Maybe if the per-frame code is marked pure? It doesn't work if
 you need to malloc/free, though (as those are inherently impure -- the
 pointers need to survive past the current frame). Can UDAs be used
 somehow to enforce no escaping GC references but allow non-GC references
 to persist past the frame?


 T

 --
 People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
May 24 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 11:26, Manu <turkeyman gmail.com> wrote:

 On 25 May 2013 03:55, deadalnix <deadalnix gmail.com> wrote:

 With real time constraint, a memory overhead is better than a pause.
I wouldn't necessarily agree. Depends on the magnitude of each. What sort of magnitude are we talking? If you had 64mb of ram, and no virtual memory, would you be happy to sacrifice 20% of it? 5% of it?
Actually, I don't think I've made this point clearly before, but it is of critical importance. The single biggest threat when considering unexpected memory-allocation, a la, that in phobos, is NOT performance, it is non-determinism. Granted, this is the biggest problem with using a GC on embedded hardware in general. So let's say I need to keep some free memory over-head, so that I don't run out of memory when a collect hasn't happened recently... How much over-head do I need? I can't afford much/any, so precisely how much do I need? Understand, I have no virtual-memory manager, it won't page, it's not a performance problem, it will just crash if I mis-calculate this value. And does the amount of overhead required change throughout development? How often do I need to re-calibrate? What about memory fragmentation? Functions that perform many small short-lived allocations have a tendency to fragment the heap. This is probably the most critical reason why phobos function's can't allocate internally. General realtime code may have some small flexibility, but embedded use has hard limits. So we need to know where allocations are coming from for reasons of determinism. We need to be able to tightly control these factors to make confident use of a GC. The more I think about it, the more I wonder if ref-counting is just better for strictly embedded use across the board...? Does D actually have a ref-counted GC? Surely it wouldn't be particularly hard? Requires compiler support though I suppose.
May 24 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 25 May 2013 at 01:56:42 UTC, Manu wrote:
 Understand, I have no virtual-memory manager, it won't page, 
 it's not a
 performance problem, it will just crash if I mis-calculate this 
 value.
So the GC is kind of out.
May 24 2013
parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 15:00, deadalnix <deadalnix gmail.com> wrote:

 On Saturday, 25 May 2013 at 01:56:42 UTC, Manu wrote:

 Understand, I have no virtual-memory manager, it won't page, it's not a
 performance problem, it will just crash if I mis-calculate this value.
So the GC is kind of out.
Yeah, I'm wondering if that's just a basic truth for embedded. Can D implement a ref-counting GC? That would probably still be okay, since collection is immediate. Modern consoles and portables have plenty of memory; can use a GC, but simpler/embedded platforms probably just can't. An alternative solution still needs to be offered for that sort of hardware.
May 24 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 25 May 2013 at 05:18:12 UTC, Manu wrote:
 On 25 May 2013 15:00, deadalnix <deadalnix gmail.com> wrote:

 On Saturday, 25 May 2013 at 01:56:42 UTC, Manu wrote:

 Understand, I have no virtual-memory manager, it won't page, 
 it's not a
 performance problem, it will just crash if I mis-calculate 
 this value.
So the GC is kind of out.
Yeah, I'm wondering if that's just a basic truth for embedded. Can D implement a ref-counting GC? That would probably still be okay, since collection is immediate.
This is technically possible, but you said you make few allocations. So with the tax on pointer write or the reference counting, you'll pay a lot to collect very few garbages. I'm not sure the tradeoff is worthwhile. Paradoxically, when you create few garbage, GC are really goos as they don't need to trigger often. But if you need to add a tax on each reference write/copy, you'll probably pay more tax than you get out of it.
 Modern consoles and portables have plenty of memory; can use a 
 GC, but
 simpler/embedded platforms probably just can't. An alternative 
 solution
 still needs to be offered for that sort of hardware.
May 24 2013
next sibling parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 15:29, deadalnix <deadalnix gmail.com> wrote:

 On Saturday, 25 May 2013 at 05:18:12 UTC, Manu wrote:

 On 25 May 2013 15:00, deadalnix <deadalnix gmail.com> wrote:

  On Saturday, 25 May 2013 at 01:56:42 UTC, Manu wrote:
  Understand, I have no virtual-memory manager, it won't page, it's not a
 performance problem, it will just crash if I mis-calculate this value.
So the GC is kind of out.
Yeah, I'm wondering if that's just a basic truth for embedded. Can D implement a ref-counting GC? That would probably still be okay, since collection is immediate.
This is technically possible, but you said you make few allocations. So with the tax on pointer write or the reference counting, you'll pay a lot to collect very few garbages. I'm not sure the tradeoff is worthwhile.
But it would be deterministic, and if the allocations are few, the cost should be negligible. Paradoxically, when you create few garbage, GC are really goos as they
 don't need to trigger often. But if you need to add a tax on each reference
 write/copy, you'll probably pay more tax than you get out of it.
They're still non-deterministic though. And unless (even if?) they're precise, they might leak. What does ObjC do? It seems to work okay on embedded hardware (although not particularly memory-constrained hardware). Didn't ObjC recently reject GC in favour of refcounting?
May 24 2013
next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
Am 25.05.2013 07:52, schrieb Manu:
 On 25 May 2013 15:29, deadalnix <deadalnix gmail.com
 <mailto:deadalnix gmail.com>> wrote:

     On Saturday, 25 May 2013 at 05:18:12 UTC, Manu wrote:

         On 25 May 2013 15:00, deadalnix <deadalnix gmail.com
         <mailto:deadalnix gmail.com>> wrote:

             On Saturday, 25 May 2013 at 01:56:42 UTC, Manu wrote:

                 Understand, I have no virtual-memory manager, it won't
                 page, it's not a
                 performance problem, it will just crash if I
                 mis-calculate this value.


             So the GC is kind of out.


         Yeah, I'm wondering if that's just a basic truth for embedded.
         Can D implement a ref-counting GC? That would probably still be
         okay, since
         collection is immediate.


     This is technically possible, but you said you make few allocations.
     So with the tax on pointer write or the reference counting, you'll
     pay a lot to collect very few garbages. I'm not sure the tradeoff is
     worthwhile.


 But it would be deterministic, and if the allocations are few, the cost
 should be negligible.


     Paradoxically, when you create few garbage, GC are really goos as
     they don't need to trigger often. But if you need to add a tax on
     each reference write/copy, you'll probably pay more tax than you get
     out of it.


 They're still non-deterministic though. And unless (even if?) they're
 precise, they might leak.

 What does ObjC do? It seems to work okay on embedded hardware (although
 not particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Yes, but is was mainly for not being able to have a stable working GC able to cope with the Objective-C code available in the wild. It had quite a few issues. Objective-C reference counting requires compiler and runtime support. Basically it is based in how Cocoa does reference counting, but instead of requiring the developers to manually write the [retain], [release] and [autorelease] messages, the compiler is able to infer them based on Cocoa memory access patterns. Additionally it makes use of dataflow analysis to remove superfluous use of those calls. There is a WWDC talk on iTunes where they explain that. I can look for it if there is interest. Microsoft did the same thing with their C++/CX language extensions and COM for WinRT. -- Paulo
May 25 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 25 May 2013 at 05:52:23 UTC, Manu wrote:
 But it would be deterministic, and if the allocations are few, 
 the cost
 should be negligible.
You'll pay a tax on pointer write, not on allocations ! It won't be negligible !
 They're still non-deterministic though. And unless (even if?) 
 they're
 precise, they might leak.
Not if they are precise. But this is another topic.
 What does ObjC do? It seems to work okay on embedded hardware 
 (although not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
ObjC is an horrible three headed monster in that regard, and I don't think this is the way to go.
May 25 2013
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 25 May 2013 01:52:10 -0400, Manu <turkeyman gmail.com> wrote:

 What does ObjC do? It seems to work okay on embedded hardware (although  
 not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Having used ObjC for the last year or so working on iOS, it is a very nice memory management model. Essentially, all objects (and only objects) are ref-counted automatically by the compiler. In code, whenever you assign or pass a pointer to an object, the compiler automatically inserts retains and releases extremely conservatively. Then, the optimizer comes along and factors out extra retains and releases, if it can prove they are necessary. What I really like about this is, unlike a library-based solution where every assignment to a 'smart pointer' incurs a release/retain, the compiler knows what this means and will factor them out, removing almost all of them. It's as if you inserted the retains and releases in the most optimized way possible, and it's all for free. Also, I believe the compiler is then free to reorder retains and releases since it understands how they work. Of course, a retain/release is an atomic operation, and requires memory barriers, so the CPU/cache cannot reorder, but the compiler still can. I asked David Nadlinger at the conference whether we could leverage this power in LDC, since LLVM is the compiler back-end used by Apple, but he said all those optimization passes are in the Objective-C front-end. It would be cool/useful to have compiler-native reference counting. The only issue is, Objective-C is quite object-heavy, and it's statically checkable whether a pointer is an Object pointer or not. In D, you would have to conservatively use retains/releases on every pointer, since any memory block could be ref-counted. But just like Objective-C most of them could be factored out. Add in that D has the shared-ness of the pointer built into the type system, and you may have something that is extremely effective. -Steve
May 28 2013
next sibling parent reply "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 28 May 2013 at 13:33:39 UTC, Steven Schveighoffer 
wrote:
 I asked David Nadlinger at the conference whether we could 
 leverage this power in LDC, since LLVM is the compiler back-end 
 used by Apple, but he said all those optimization passes are in 
 the Objective-C front-end.
Hm, apparently I was imprecise or I slightly misunderstood your question: The actual optimizations _are_ done in LLVM, and are part of its source tree (see lib/Transforms/ObjCARC). What I meant to say is that they are tied to the ObjC runtime function calls emitted by Clang – there is no notion of a "reference counted pointer" on the LLVM level. Thus, we could definitely base a similar implementation D on this, which would recognize D runtime calls (potentially accompanied by D-specific LLVM metadata) instead of Objective-C ones. It's just that there would be quite a bit of adjusting involved, as the ObjC ARC implementation isn't designed to be language-agnostic. David
May 28 2013
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 May 2013 09:50:42 -0400, David Nadlinger <see klickverbot.at>  
wrote:

 On Tuesday, 28 May 2013 at 13:33:39 UTC, Steven Schveighoffer wrote:
 I asked David Nadlinger at the conference whether we could leverage  
 this power in LDC, since LLVM is the compiler back-end used by Apple,  
 but he said all those optimization passes are in the Objective-C  
 front-end.
Hm, apparently I was imprecise or I slightly misunderstood your question:
More like I am a compiler ignorant and didn't understand/properly remember your answer :) Thanks for clarifying. -Steve
May 28 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 28 May 2013 23:33, Steven Schveighoffer <schveiguy yahoo.com> wrote:

 On Sat, 25 May 2013 01:52:10 -0400, Manu <turkeyman gmail.com> wrote:

  What does ObjC do? It seems to work okay on embedded hardware (although
 not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Having used ObjC for the last year or so working on iOS, it is a very nice memory management model. Essentially, all objects (and only objects) are ref-counted automatically by the compiler. In code, whenever you assign or pass a pointer to an object, the compiler automatically inserts retains and releases extremely conservatively. Then, the optimizer comes along and factors out extra retains and releases, if it can prove they are necessary. What I really like about this is, unlike a library-based solution where every assignment to a 'smart pointer' incurs a release/retain, the compiler knows what this means and will factor them out, removing almost all of them. It's as if you inserted the retains and releases in the most optimized way possible, and it's all for free. Also, I believe the compiler is then free to reorder retains and releases since it understands how they work. Of course, a retain/release is an atomic operation, and requires memory barriers, so the CPU/cache cannot reorder, but the compiler still can.
Right. This is almost precisely how I imagined it would be working. I wonder what it would take to have this as a GC strategy in D? I'm more and more thinking this would be the best approach for realtime software. It's deterministic, and while being safe like a GC, the programmer retains absolute control. Also, things are destroyed when you expect (again, the deterministic thing). I think this GC strategy will open D for use on much more embedded hardware. I asked David Nadlinger at the conference whether we could leverage this
 power in LDC, since LLVM is the compiler back-end used by Apple, but he
 said all those optimization passes are in the Objective-C front-end.
Yeah, this would require D front-end work I'm sure. It would be cool/useful to have compiler-native reference counting. The
 only issue is, Objective-C is quite object-heavy, and it's statically
 checkable whether a pointer is an Object pointer or not.  In D, you would
 have to conservatively use retains/releases on every pointer, since any
 memory block could be ref-counted.  But just like Objective-C most of them
 could be factored out.  Add in that D has the shared-ness of the pointer
 built into the type system, and you may have something that is extremely
 effective.
Yep, I can imagine it would work really well, if the front-end implemented the logic to factor out redundant inc/dec ref's.
May 28 2013
parent reply "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 28 May 2013 at 13:56:03 UTC, Manu wrote:
 Yep, I can imagine it would work really well, if the front-end 
 implemented
 the logic to factor out redundant inc/dec ref's.
It isn't the best idea to do this sort of optimizations (entirely) in the front-end, because you really want to be able to aggressively optimize away such redundant operations after inlining at what previously were function boundaries. But then again, if you use AST-based inlining like DMD does, it might just work… ;) David
May 28 2013
parent Manu <turkeyman gmail.com> writes:
On 29 May 2013 00:01, David Nadlinger <see klickverbot.at> wrote:

 On Tuesday, 28 May 2013 at 13:56:03 UTC, Manu wrote:

 Yep, I can imagine it would work really well, if the front-end implement=
ed
 the logic to factor out redundant inc/dec ref's.
It isn't the best idea to do this sort of optimizations (entirely) in the front-end, because you really want to be able to aggressively optimize aw=
ay
 such redundant operations after inlining at what previously were function
 boundaries.

 But then again, if you use AST-based inlining like DMD does, it might jus=
t
 work=E2=80=A6 ;)
Can you comment on the complexity of implementing this sort of garbage collection? (not really garbage collection, but serves the same purpose)
May 28 2013
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 28.05.2013 15:33, schrieb Steven Schveighoffer:
 On Sat, 25 May 2013 01:52:10 -0400, Manu <turkeyman gmail.com> wrote:

 What does ObjC do? It seems to work okay on embedded hardware
 (although not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Having used ObjC for the last year or so working on iOS, it is a very nice memory management model. Essentially, all objects (and only objects) are ref-counted automatically by the compiler. In code, whenever you assign or pass a pointer to an object, the compiler automatically inserts retains and releases extremely conservatively. Then, the optimizer comes along and factors out extra retains and releases, if it can prove they are necessary. What I really like about this is, unlike a library-based solution where every assignment to a 'smart pointer' incurs a release/retain, the compiler knows what this means and will factor them out, removing almost all of them. It's as if you inserted the retains and releases in the most optimized way possible, and it's all for free. Also, I believe the compiler is then free to reorder retains and releases since it understands how they work. Of course, a retain/release is an atomic operation, and requires memory barriers, so the CPU/cache cannot reorder, but the compiler still can. ...
I imagine Microsoft also does a similar thing with their C++/CX language extensions (WinRT handles).
May 28 2013
parent reply Manu <turkeyman gmail.com> writes:
On 29 May 2013 03:27, Paulo Pinto <pjmlp progtools.org> wrote:

 Am 28.05.2013 15:33, schrieb Steven Schveighoffer:

 On Sat, 25 May 2013 01:52:10 -0400, Manu <turkeyman gmail.com> wrote:

  What does ObjC do? It seems to work okay on embedded hardware
 (although not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Having used ObjC for the last year or so working on iOS, it is a very nice memory management model. Essentially, all objects (and only objects) are ref-counted automatically by the compiler. In code, whenever you assign or pass a pointer to an object, the compiler automatically inserts retains and releases extremely conservatively. Then, the optimizer comes along and factors out extra retains and releases, if it can prove they are necessary. What I really like about this is, unlike a library-based solution where every assignment to a 'smart pointer' incurs a release/retain, the compiler knows what this means and will factor them out, removing almost all of them. It's as if you inserted the retains and releases in the most optimized way possible, and it's all for free. Also, I believe the compiler is then free to reorder retains and releases since it understands how they work. Of course, a retain/release is an atomic operation, and requires memory barriers, so the CPU/cache cannot reorder, but the compiler still can. ...
I imagine Microsoft also does a similar thing with their C++/CX language extensions (WinRT handles).
Yeah certainly. It's ref counted, not garbage collected. And Android's V8 uses a "generational<http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29> incremental<http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Stop-the-world_vs._incremental_vs._concurrent> collector"... That'd be nice! ObjC and WinRT are both used successfully on embedded hardware, I'm really wondering if this is the way to go for embedded in D. V8 uses an incremental collector (somehow?), which I've been saying is basically mandatory for embedded/realtime use. Apparently Google agree. Clearly others have already had this quarrel, their resolutions are worth consideration. Implementing a ref-counted GC would probably be much simpler than V8's mythical incremental collector that probably relies on Java restrictions to operate?
May 28 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 May 2013 20:40:03 -0400, Manu <turkeyman gmail.com> wrote:


 ObjC and WinRT are both used successfully on embedded hardware, I'm  
 really
 wondering if this is the way to go for embedded in D.
 V8 uses an incremental collector (somehow?), which I've been saying is
 basically mandatory for embedded/realtime use. Apparently Google agree.
 Clearly others have already had this quarrel, their resolutions are worth
 consideration.
An interesting thing to note, Apple tried garbage collection with Obj-C, but only on MacOS, and it's now been deprecated since automatic reference counting was introduced [1]. It never was on iOS. So that is a telling omission I think. -Steve [1] https://en.wikipedia.org/wiki/Objective-C#Garbage_collection
May 28 2013
next sibling parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 29 May 2013 at 00:46:18 UTC, Steven Schveighoffer 
wrote:
 On Tue, 28 May 2013 20:40:03 -0400, Manu <turkeyman gmail.com> 
 wrote:


 ObjC and WinRT are both used successfully on embedded 
 hardware, I'm really
 wondering if this is the way to go for embedded in D.
 V8 uses an incremental collector (somehow?), which I've been 
 saying is
 basically mandatory for embedded/realtime use. Apparently 
 Google agree.
 Clearly others have already had this quarrel, their 
 resolutions are worth
 consideration.
An interesting thing to note, Apple tried garbage collection with Obj-C, but only on MacOS, and it's now been deprecated since automatic reference counting was introduced [1]. It never was on iOS. So that is a telling omission I think. -Steve [1] https://en.wikipedia.org/wiki/Objective-C#Garbage_collection
The main reason was that the GC never worked properly given the C underpinnings of Objective-C. Too many libraries failed to work properly with GC enabled, plus you needed to fill your code with GC friendly annotations. So I imagine Apple tried to find a compromises that would work better in a language with C "safety". Even that is only supported at the Objective-C language level and it requires both compiler support and that objects inherit from NSObject as top most class, as far as I am aware. Anyway it is way better than pure manual memory management. -- Paulo
May 29 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-05-29 09:05, Paulo Pinto wrote:

 The main reason was that the GC never worked properly given the C
 underpinnings of Objective-C.

 Too many libraries failed to work properly with GC enabled, plus you
 needed to fill your code with GC friendly annotations.

 So I imagine Apple tried to find a compromises that would work better in
 a language with C "safety".

 Even that is only supported at the Objective-C language level and it
 requires both compiler support and that objects inherit from NSObject as
 top most class, as far as I am aware.

 Anyway it is way better than pure manual memory management.
I'm pretty it works for their CoreFoundation framework which is a C library. NSObject, NSString and other classes are built on top of CoreFoundation. -- /Jacob Carlborg
May 29 2013
parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-05-29 09:46:20 +0000, Jacob Carlborg <doob me.com> said:

 On 2013-05-29 09:05, Paulo Pinto wrote:
 
 The main reason was that the GC never worked properly given the C
 underpinnings of Objective-C.
 
 Too many libraries failed to work properly with GC enabled, plus you
 needed to fill your code with GC friendly annotations.
 
 So I imagine Apple tried to find a compromises that would work better in
 a language with C "safety".
 
 Even that is only supported at the Objective-C language level and it
 requires both compiler support and that objects inherit from NSObject as
 top most class, as far as I am aware.
 
 Anyway it is way better than pure manual memory management.
I'm pretty it works for their CoreFoundation framework which is a C library. NSObject, NSString and other classes are built on top of CoreFoundation.
It does for CF types which are toll-free bridged, if you mark them to be GC managed while casting. http://developer.apple.com/library/ios/#documentation/CoreFoundation/Conceptual/CFDesignConcepts/Articles/tollFreeBridgedTypes.html For instance, CFString and NSString are just different APIs for the same underlying object, so you can cast between them. But CoreFoundation itself won't use the GC if you don't involve Objective-C APIs. The interesting thing is that objects managed by the now deprecated Objective-C GC also have a reference count, and won't be candidate for garbage collection until the reference count reaches zero. You can use CFRetain/CFRelease on GC-managed Objective-C objects if you want, it's not a noop. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
May 29 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 29.05.2013 02:46, Steven Schveighoffer wrote:
 On Tue, 28 May 2013 20:40:03 -0400, Manu <turkeyman gmail.com> wrote:


 ObjC and WinRT are both used successfully on embedded hardware, I'm
 really
 wondering if this is the way to go for embedded in D.
 V8 uses an incremental collector (somehow?), which I've been saying is
 basically mandatory for embedded/realtime use. Apparently Google agree.
 Clearly others have already had this quarrel, their resolutions are worth
 consideration.
An interesting thing to note, Apple tried garbage collection with Obj-C, but only on MacOS, and it's now been deprecated since automatic reference counting was introduced [1]. It never was on iOS. So that is a telling omission I think. -Steve [1] https://en.wikipedia.org/wiki/Objective-C#Garbage_collection
Please note that you have to deal with circular references manually in Objective-C, introducing two types of pointers, strong and weak. I don't think this is optimal. If you want to deal with circular references automatically you again need some other kind of other garbage collection running. A problem with the naive approach of atomic reference counting a counter inside the object (as usually done in COM interfaces, I don't know how it is done in Objective-C) is that it is not thread-safe to modify a pointer without locking (or a CAS2 operation that you don't have on popular processors). You can avoid that using deferred reference counting (logging pointer changes to some thread local buffer), but that introduces back a garbage collection step with possibly massive destruction. This step might be done concurrently, but that adds another layer of complexity to finding circles. Another issue might be that incrementing a reference of an object when taking an interior pointer (like you do when using slices) can be pretty expensive because you usually have to find the base of the object to access the counter. I won't dismiss RC garbage collection as impossible, but doing it efficiently and concurrently is not so easy.
May 29 2013
next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 29 May 2013 at 07:18:49 UTC, Rainer Schuetze wrote:
 On 29.05.2013 02:46, Steven Schveighoffer wrote:
 On Tue, 28 May 2013 20:40:03 -0400, Manu <turkeyman gmail.com> 
 wrote:


 ObjC and WinRT are both used successfully on embedded 
 hardware, I'm
 really
 wondering if this is the way to go for embedded in D.
 V8 uses an incremental collector (somehow?), which I've been 
 saying is
 basically mandatory for embedded/realtime use. Apparently 
 Google agree.
 Clearly others have already had this quarrel, their 
 resolutions are worth
 consideration.
An interesting thing to note, Apple tried garbage collection with Obj-C, but only on MacOS, and it's now been deprecated since automatic reference counting was introduced [1]. It never was on iOS. So that is a telling omission I think. -Steve [1] https://en.wikipedia.org/wiki/Objective-C#Garbage_collection
Please note that you have to deal with circular references manually in Objective-C, introducing two types of pointers, strong and weak. I don't think this is optimal. If you want to deal with circular references automatically you again need some other kind of other garbage collection running. A problem with the naive approach of atomic reference counting a counter inside the object (as usually done in COM interfaces, I don't know how it is done in Objective-C) is that it is not thread-safe to modify a pointer without locking (or a CAS2 operation that you don't have on popular processors). You can avoid that using deferred reference counting (logging pointer changes to some thread local buffer), but that introduces back a garbage collection step with possibly massive destruction. This step might be done concurrently, but that adds another layer of complexity to finding circles. Another issue might be that incrementing a reference of an object when taking an interior pointer (like you do when using slices) can be pretty expensive because you usually have to find the base of the object to access the counter. I won't dismiss RC garbage collection as impossible, but doing it efficiently and concurrently is not so easy.
There is a nice document where it is described alongside all restrictions, https://developer.apple.com/library/mac/#releasenotes/ObjectiveC/RN-TransitioningToARC/Introduction/Introduction.html#//apple_ref/doc/uid/TP40011226
May 29 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 29 May 2013 17:18, Rainer Schuetze <r.sagitario gmx.de> wrote:

 On 29.05.2013 02:46, Steven Schveighoffer wrote:

 On Tue, 28 May 2013 20:40:03 -0400, Manu <turkeyman gmail.com> wrote:


  ObjC and WinRT are both used successfully on embedded hardware, I'm
 really
 wondering if this is the way to go for embedded in D.
 V8 uses an incremental collector (somehow?), which I've been saying is
 basically mandatory for embedded/realtime use. Apparently Google agree.
 Clearly others have already had this quarrel, their resolutions are worth
 consideration.
An interesting thing to note, Apple tried garbage collection with Obj-C, but only on MacOS, and it's now been deprecated since automatic reference counting was introduced [1]. It never was on iOS. So that is a telling omission I think. -Steve [1] https://en.wikipedia.org/wiki/**Objective-C#Garbage_collection<https://en.wikipedia.org/wiki/Objective-C#Garbage_collection>
Please note that you have to deal with circular references manually in Objective-C, introducing two types of pointers, strong and weak. I don't think this is optimal. If you want to deal with circular references automatically you again need some other kind of other garbage collection running. A problem with the naive approach of atomic reference counting a counter inside the object (as usually done in COM interfaces, I don't know how it is done in Objective-C) is that it is not thread-safe to modify a pointer without locking (or a CAS2 operation that you don't have on popular processors). You can avoid that using deferred reference counting (logging pointer changes to some thread local buffer), but that introduces back a garbage collection step with possibly massive destruction. This step might be done concurrently, but that adds another layer of complexity to finding circles. Another issue might be that incrementing a reference of an object when taking an interior pointer (like you do when using slices) can be pretty expensive because you usually have to find the base of the object to access the counter. I won't dismiss RC garbage collection as impossible, but doing it efficiently and concurrently is not so easy.
What do you think is easier, or perhaps even POSSIBLE in D? A good RC approach, or a V8 quality concurrent+incremental GC? I get the feeling either would be acceptable, but I still kinda like idea of the determinism an RC collector offers. I reckon this should probably be the next big ticket for D. The long-standing shared library problems seem to be being addressed.
May 29 2013
next sibling parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-05-29 08:06:15 +0000, Manu <turkeyman gmail.com> said:

 What do you think is easier, or perhaps even POSSIBLE in D?
 A good RC approach, or a V8 quality concurrent+incremental GC?
 I get the feeling either would be acceptable, but I still kinda like idea
 of the determinism an RC collector offers.
Given that both require calling a function of some sort on pointer assignment, I'd say they're pretty much equivalent in implementation effort. One thing the compiler should do with RC that might require some effort is cancel out redundant increments/decrement pairs inside functions, and also offer some kind of weak pointer to deal with cycles. On the GC side, well you have to write the new GC. Also, with RC, you have to be careful not to create cycles with closures. Those are often hard to spot absent of an explicit list of captured variables. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
May 29 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 29.05.2013 10:06, Manu wrote:
 What do you think is easier, or perhaps even POSSIBLE in D?
 A good RC approach, or a V8 quality concurrent+incremental GC?
I think none of them is feasible without write-barriers on pointer modifications in heap memory. That means extra code needs to be generated for each pointer modification (if the compiler cannot optimize it away as LLVM seems to be doing in case of Objectve-C). As an alternative, Leandros concurrent GC implements them with hardware support by COW, though at a pretty large granularity (page size). I'm not sure if this approach can be sensibly combined with RC or incremental collection.
 I get the feeling either would be acceptable, but I still kinda like
 idea of the determinism an RC collector offers.
If you want it to be safe and efficient, it needs to use deferred reference counting, and this ain't so deterministic anymore. The good thing about it is that you usually don't have to scan the whole heap to find candidates for reclamation.
 I reckon this should probably be the next big ticket for D. The
 long-standing shared library problems seem to be being addressed.
The GC proposed by Leandro looks very promising, though it needs support by the hardware and the OS. I think we should see how far we can get with this approach.
May 30 2013
parent reply Manu <turkeyman gmail.com> writes:
On 30 May 2013 19:50, Rainer Schuetze <r.sagitario gmx.de> wrote:

 On 29.05.2013 10:06, Manu wrote:

 What do you think is easier, or perhaps even POSSIBLE in D?
 A good RC approach, or a V8 quality concurrent+incremental GC?
I think none of them is feasible without write-barriers on pointer modifications in heap memory. That means extra code needs to be generated for each pointer modification (if the compiler cannot optimize it away as LLVM seems to be doing in case of Objectve-C). As an alternative, Leandros concurrent GC implements them with hardware support by COW, though at a pretty large granularity (page size). I'm not sure if this approach can be sensibly combined with RC or incremental collection.
I'm talking about embedded hardware. No virtualisation, tight memory limit, no significant OS. Is it possible? I get the feeling either would be acceptable, but I still kinda like
 idea of the determinism an RC collector offers.
If you want it to be safe and efficient, it needs to use deferred reference counting, and this ain't so deterministic anymore. The good thing about it is that you usually don't have to scan the whole heap to find candidates for reclamation.
Well, it's a bit more deterministic, at least you could depend on the deferred free happening within a frame let's say, rather than at some un-knowable future time when the GC feels like performing a collect... That said, I'd be interested to try it without a deferred free. Performance impact depends on the amount of temporaries/frees... I don't imagine it would impact much/at-all since there is so little memory allocation or pointer assignments in realtime software. People use horrific C++ smart pointer templates successfully, without any compiler support at all. It works because the frequency of pointer assignments is so low. RC is key to avoid scanning the whole heap, which completely destroys your dcache. I reckon this should probably be the next big ticket for D. The
 long-standing shared library problems seem to be being addressed.
The GC proposed by Leandro looks very promising, though it needs support by the hardware and the OS. I think we should see how far we can get with this approach.
His GC looked good, clearly works better for the sociomantic guys, but I can't imagine it, or anything like it, will ever work on embedded platforms? No hardware/OS support... is it possible to emulate the requires features?
May 30 2013
next sibling parent reply "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 30 May 2013 at 11:17:08 UTC, Manu wrote:
 His GC looked good, clearly works better for the sociomantic 
 guys, but I
 can't imagine it, or anything like it, will ever work on 
 embedded platforms?
 No hardware/OS support... is it possible to emulate the 
 requires features?
Well, anything that is done by OS can also be done by program itself ;) I am more curious - is it possible to have a sane design for both cases within one code base?
May 30 2013
parent reply Manu <turkeyman gmail.com> writes:
On 30 May 2013 21:20, Dicebot <m.strashun gmail.com> wrote:

 On Thursday, 30 May 2013 at 11:17:08 UTC, Manu wrote:

 His GC looked good, clearly works better for the sociomantic guys, but I
 can't imagine it, or anything like it, will ever work on embedded
 platforms?
 No hardware/OS support... is it possible to emulate the requires features?
Well, anything that is done by OS can also be done by program itself ;) I am more curious - is it possible to have a sane design for both cases within one code base?
Which 'both' cases?
May 30 2013
parent reply "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 30 May 2013 at 11:31:53 UTC, Manu wrote:
 Which 'both' cases?
"OS support for fork+CoW" vs "no support, own implementation"
May 30 2013
parent reply "Diggory" <diggsey googlemail.com> writes:
On Thursday, 30 May 2013 at 11:34:20 UTC, Dicebot wrote:
 On Thursday, 30 May 2013 at 11:31:53 UTC, Manu wrote:
 Which 'both' cases?
"OS support for fork+CoW" vs "no support, own implementation"
If you can modify the DMD compiler to output a special sequence of instructions whenever you assign to a pointer type then you can do a concurrent/incremental GC with minimal OS or hardware support.
May 30 2013
parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-05-30 12:04:09 +0000, "Diggory" <diggsey googlemail.com> said:

 If you can modify the DMD compiler to output a special sequence of 
 instructions whenever you assign to a pointer type then you can do a 
 concurrent/incremental GC with minimal OS or hardware support.
This also happens to be the same requirement for automatic reference counting. I thought about implementing that for my D/Objective-C compiler (which is stalled since a while). The job isn't that big: just replace any pointer assignments/initialization by a call to a template function (that can be inlined) doing the assignment and it becomes very easy to implement such things by tweaking a template in druntime. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
May 30 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 30.05.2013 13:16, Manu wrote:
 On 30 May 2013 19:50, Rainer Schuetze <r.sagitario gmx.de
 <mailto:r.sagitario gmx.de>> wrote:



     On 29.05.2013 10:06, Manu wrote:


         What do you think is easier, or perhaps even POSSIBLE in D?
         A good RC approach, or a V8 quality concurrent+incremental GC?


     I think none of them is feasible without write-barriers on pointer
     modifications in heap memory. That means extra code needs to be
     generated for each pointer modification (if the compiler cannot
     optimize it away as LLVM seems to be doing in case of Objectve-C).
     As an alternative, Leandros concurrent GC implements them with
     hardware support by COW, though at a pretty large granularity (page
     size). I'm not sure if this approach can be sensibly combined with
     RC or incremental collection.


 I'm talking about embedded hardware. No virtualisation, tight memory
 limit, no significant OS. Is it possible?

         I get the feeling either would be acceptable, but I still kinda like
         idea of the determinism an RC collector offers.


     If you want it to be safe and efficient, it needs to use deferred
     reference counting, and this ain't so deterministic anymore. The
     good thing about it is that you usually don't have to scan the whole
     heap to find candidates for reclamation.


 Well, it's a bit more deterministic, at least you could depend on the
 deferred free happening within a frame let's say, rather than at some
 un-knowable future time when the GC feels like performing a collect...

 That said, I'd be interested to try it without a deferred free.
 Performance impact depends on the amount of temporaries/frees... I don't
 imagine it would impact much/at-all since there is so little memory
 allocation or pointer assignments in realtime software.
 People use horrific C++ smart pointer templates successfully, without
 any compiler support at all. It works because the frequency of pointer
 assignments is so low.
 RC is key to avoid scanning the whole heap, which completely destroys
 your dcache.

         I reckon this should probably be the next big ticket for D. The
         long-standing shared library problems seem to be being addressed.


     The GC proposed by Leandro looks very promising, though it needs
     support by the hardware and the OS. I think we should see how far we
     can get with this approach.


 His GC looked good, clearly works better for the sociomantic guys, but I
 can't imagine it, or anything like it, will ever work on embedded platforms?
 No hardware/OS support... is it possible to emulate the requires features?
I suspected embedded systems would not have enough support for COW. I think the only way to emulate it would be with write barriers, and then you can do better than emulating page protection. The way Michel Fortin proposed to implement it (lowering pointer writes to some druntime-defined template) is also how imagine it. A template argument that specifies whether the compiler knows that it is a stack access would be nice aswell. One possible complication: memory block operations would have to treat pointer fields differently somehow.
May 30 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
 One possible complication: memory block operations would have to treat
 pointer fields differently somehow.
Would they? Shouldn't it be possible to make this part of the post-blit constructor? Kind Regards Benjamin Thaut
May 30 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 30.05.2013 22:59, Benjamin Thaut wrote:
 One possible complication: memory block operations would have to treat
 pointer fields differently somehow.
Would they? Shouldn't it be possible to make this part of the post-blit constructor?
Not in general, e.g. reference counting needs to know the state before and after the copy.
May 30 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-05-31 06:02:20 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 30.05.2013 22:59, Benjamin Thaut wrote:
 One possible complication: memory block operations would have to treat
 pointer fields differently somehow.
Would they? Shouldn't it be possible to make this part of the post-blit constructor?
Not in general, e.g. reference counting needs to know the state before and after the copy.
No. Reference counting would work with post-blit: you have the pointer, you just need to increment the reference count once. Also, if you're moving instead of copying there's no post-blit called but there's also no need to change the reference count so it's fine. What wouldn't work with post-blit (I think) is a concurrent GC, as the GC will likely want to be notified when pointers are moved. Post-blit doesn't help there, and the compiler currently assumes it can move things around without calling any function. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
May 31 2013
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 31.05.2013 12:54, Michel Fortin wrote:
 On 2013-05-31 06:02:20 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 30.05.2013 22:59, Benjamin Thaut wrote:
 One possible complication: memory block operations would have to treat
 pointer fields differently somehow.
Would they? Shouldn't it be possible to make this part of the post-blit constructor?
Not in general, e.g. reference counting needs to know the state before and after the copy.
No. Reference counting would work with post-blit: you have the pointer, you just need to increment the reference count once. Also, if you're moving instead of copying there's no post-blit called but there's also no need to change the reference count so it's fine.
I was thinking about struct assignment through copying and then calling the postblit constructor, not copy construction. But I forgot about the swap semantics involved. If I interpret the disassembly correctly, the assignment in S s1, s2; s2 = s1; translates to S tmp1, tmp2; memcpy(&tmp1, &s1); tmp1.__postblit; // user defined this(this) s2.opAssign(tmp1); // makes a copy of tmp1 on the stack //opAssign does: memcpy(&tmp2,&s2); memcpy(&s2,&tmp1); tmp1.__dtor; There are a number of additional copies of the original structs, but the number of constructor/destructor calls are balanced. That should work for reference counting.
 What wouldn't work with post-blit (I think) is a concurrent GC, as the
 GC will likely want to be notified when pointers are moved. Post-blit
 doesn't help there, and the compiler currently assumes it can move
 things around without calling any function.
It would not allow to create a write barrier that needs to atomically change the pointer at a given location, or at least to record the old value before overwriting it with the new value. But that might not exclude concurrency, for example a concurrent GC with deferred reference counting.
May 31 2013
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 29 May 2013 at 00:40:16 UTC, Manu wrote:
 On 29 May 2013 03:27, Paulo Pinto <pjmlp progtools.org> wrote:

 Am 28.05.2013 15:33, schrieb Steven Schveighoffer:

 On Sat, 25 May 2013 01:52:10 -0400, Manu 
 <turkeyman gmail.com> wrote:

  What does ObjC do? It seems to work okay on embedded hardware
 (although not
 particularly memory-constrained hardware).
 Didn't ObjC recently reject GC in favour of refcounting?
Having used ObjC for the last year or so working on iOS, it is a very nice memory management model. Essentially, all objects (and only objects) are ref-counted automatically by the compiler. In code, whenever you assign or pass a pointer to an object, the compiler automatically inserts retains and releases extremely conservatively. Then, the optimizer comes along and factors out extra retains and releases, if it can prove they are necessary. What I really like about this is, unlike a library-based solution where every assignment to a 'smart pointer' incurs a release/retain, the compiler knows what this means and will factor them out, removing almost all of them. It's as if you inserted the retains and releases in the most optimized way possible, and it's all for free. Also, I believe the compiler is then free to reorder retains and releases since it understands how they work. Of course, a retain/release is an atomic operation, and requires memory barriers, so the CPU/cache cannot reorder, but the compiler still can. ...
I imagine Microsoft also does a similar thing with their C++/CX language extensions (WinRT handles).
Yeah certainly. It's ref counted, not garbage collected. And Android's V8 uses a "generational<http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29> incremental<http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Stop-the-world_vs._incremental_vs._concurrent> collector"... That'd be nice! ObjC and WinRT are both used successfully on embedded hardware, I'm really wondering if this is the way to go for embedded in D. V8 uses an incremental collector (somehow?), which I've been saying is basically mandatory for embedded/realtime use. Apparently Google agree. Clearly others have already had this quarrel, their resolutions are worth consideration. Implementing a ref-counted GC would probably be much simpler than V8's mythical incremental collector that probably relies on Java restrictions to operate?
Actually what I was implying was the cleverness of the compiler to remove unnecessary increment/decrement operations via dataflows, similar to what Clang does. Otherwise you pay too much for performance impact specially if multiple threads access the same objects. An incremental real time GC wins hands down in such scenarios. Google IO is always a nice source of information on how V8 works, https://developers.google.com/events/io/sessions/324431687 https://developers.google.com/events/io/sessions/324908972 -- Paulo
May 29 2013
prev sibling parent "Patrick Down" <patrick.down gmail.com> writes:
On Saturday, 25 May 2013 at 05:29:31 UTC, deadalnix wrote:

 This is technically possible, but you said you make few 
 allocations. So with the tax on pointer write or the reference 
 counting, you'll pay a lot to collect very few garbages. I'm 
 not sure the tradeoff is worthwhile.
Incidentally, I ran across this paper that talks about a reference counted garbage collector that claims to address this issue. MIght be of interest to this group. http://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon03Pure.pdf From the paper: There are two primary problems with reference counting, namely: (1) run-time overhead of incrementing and decrementing the reference count each time a pointer is copied, particularly on the stack; and (2) inability to detect cycles and consequent necessity of including a second garbage collection technique to deal with cyclic garbage. In this paper we present new algorithms that address these problems and describe a new multiprocessor garbage collector based on these techniques that achieves maximum measured pause times of 2.6 milliseconds over a set of eleven benchmark programs that perform significant amounts of memory allocation.
May 25 2013
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Sat, 25 May 2013 01:16:47 +1000
Manu <turkeyman gmail.com> wrote:
=20
 Errr, well, 1ms is about 7% of the frame, that's quite a long time.
 I'd be feeling pretty uneasy about any library that claimed to want
 7% of the whole game time, and didn't offer any visual/gameplay
 benefits... Maybe if the GC happened to render some sweet water
 effects, or perform some awesome cloth physics or something while it
 was at it ;)
Heh, I think that'd be nobel-prize territory. "Side Effect Oriented Developement" It'd be like old-school optimization, but maintains safety and developer sanity. :)
=20
 I think 2% sacrifice for simplifying memory management would probably
 get through without much argument.
 That's ~300=B5s... a few hundred microseconds seems reasonable. Maybe a
 little more if targeting 30fps.
 If it stuck to that strictly, I'd possibly even grant it permission
 to stop the world...
=20
Perhaps a naive idea, but Would running the GC in a fiber be a feasible approach? Every time the GC fiber is activated, it checks the time, and then has various points where it yields if the elapsed time passes a threshold value. I see two problems though: 1. The state of GC-controlled heaps can change while the GC fiber is yieled. Don't know how much that could screw things up, or if the issue is even solvable. 2. Does a fiber context-switch take too long? If so, what about a stackless fiber? Ex: http://dunkels.com/adam/pt/
May 24 2013
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/24/2013 01:51 AM, Joseph Rushton Wakeling wrote:
 Maybe someone else can point to an example, but I can't think of any language
 prior to D that has both the precision and speed to be useful for games and
 embedded programming, and that also has GC built in.
 
 So it seems to me that this might well be an entirely new problem, as no other
 GC language or library has had the motivation to create something that
satisfies
 these use parameters.
Don't have the experience to judge it, but someone made a remark about Nimrod that might be relevant here: http://www.reddit.com/r/programming/comments/1fc9jt/dmd_2063_the_d_programming_language_reference/ca968xg
May 31 2013
prev sibling next sibling parent "Flamaros" <flamaros.xavier gmail.com> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 While there hasn't been anything official, I think it's a safe 
 bet to say that D is being used for a major title, Remedy's 
 Quantum Break, featured prominently during the announcement of 
 Xbox One. Quantum Break doesn't come out until 2014 so the 
 timeline seems about right (Remedy doesn't appear to work on 
 more than one game at a time from what I can tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly 
 acquired credibility in games.  By far the biggest issue I hear 
 about when it comes to people working on games in D is the 
 garbage collector.  You can work around the GC without too much 
 difficulty as Manu's experience shared in his DConf talk shows 
 but a lot of people new to D don't know how to do that.  We 
 could also use some tools and guides to help people identify 
 and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one 
 of the talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] 
 would be another great tool that would help people identify GC 
 allocations.  This or something similar could also be used to 
 document throughout phobos when GC allocations can happen (and 
 help eliminate it where it makes sense to).

 There was a lot of interesting stuff in Benjamin Thaut's 
 article about GC versus manual memory management in a game [4] 
 and the discussion about it on the forums [5].  A lot of this 
 collective knowledge built up on manual memory management 
 techniques specific to D should probably be formalized and 
 added to the official documentation.  There is a Memory 
 Management [6] page in the documentation but it appears to be 
 rather dated at this point and not particularly applicable to 
 modern D2 (no mention of emplace or scoped and it talks about 
 using delete and scope classes).

 Game development is one place D can really get a foothold but 
 all too often the GC is held over D's head because people 
 taking their first look at D don't know how to avoid using it 
 and often don't realize you can avoid using it entirely. This 
 is easily the most common issue raised by newcomers to D with a 
 C or C++ background that I see in the #d IRC channel (many of 
 which are interested in game dev but concerned the GC will kill 
 their game's performance).


 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-Programming-Language/dmd/pull/1886
 4: http://3d.benjamin-thaut.de/?p=20#more-20
 5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
 6: http://dlang.org/memory.html
As a game developer I will be really enjoyed to be able to develop our games in D, and for kind of games we do the major issue isn't the GC but the portability and links with 3-party libraries (mostly for our internal tools). We essentially works on Point & Click games : https://www.facebook.com/pages/Koalabs-Studio/380167978739812?ref=stream A lot of games companies target many architectures like ARM, X86, or PowerPC,... And for our internal tools we essentially use Qt, but for the moment I didn't try QtD. I don't have the chance for the moment to work on D during my work time.
May 23 2013
prev sibling next sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
W dniu 23.05.2013 20:13, Brad Anderson pisze:
  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]

 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
When I started learning D 2 years ago, I read on the D webpage that D allows manual memory management and it's possible to disable the GC. My first thought was that standard library is written without using GC. This could be kind of lowest common denominator solution to the problem. Later, I found it was only my wishful thinking. So, you can disable GC, but then you can't reliably use the standard library. Lowest common denominator has its own weaknesses, mainly it sometimes sacrifices performance, as some algorithms may perform better using managed slices for example, than using manually managed memory. nogc attribute could be used to not only block GC, it could be used to select between GC and non-GC code with the help of the overloading mechanism. So, the programmer instead of writing one function, would actually write two functions, one for the GC and one for manual memory management - only if they need separete code. This will surely double the effort for some functions, but certainly not for the whole library. Majority of code doesn't need separete functions, mainly because it's non allocating code. But some, like containers would surely need these two "branches". It would be inconvenient to do twice the work when writing programs, but I'm not so sure it is when writing a library. And this is just because it's a _library_, usually written once and then reused. I think that double effort is not that discouraging in this case. So, I'd kindly suggest to at least think about this. I'm proposing that nogc functions could be overloaded similarly to immutable/const functions. The other idea is to divide threads into two thread groups: managed and unmanaged. This is like running two programs together, one written in D, and one written in C++. If we can run managed and unmanaged processes separately, why not run two analogous "subprocesses" inside one process. nogc could help with that. Obviously, such thread groups must NOT share any mutable data. They could communicate with some sort of IPC, or perhaps ITC - std.concurrency comes to mind. This kind of separation _inside one process_ could help many applications. Imagine real time sound application, with managed GUI and unmanaged real time sound loop. I know, everything is possible now, but I'd rather wait for a safe and clean solution - the one in the D style:)
May 23 2013
prev sibling next sibling parent reply "QAston" <qaston gmail.com> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 There was a lot of interesting stuff in Benjamin Thaut's 
 article about GC versus manual memory management in a game [4] 
 and the discussion about it on the forums [5].  A lot of this 
 collective knowledge built up on manual memory management 
 techniques specific to D should probably be formalized and 
 added to the official documentation.  There is a Memory 
 Management [6] page in the documentation but it appears to be 
 rather dated at this point and not particularly applicable to 
 modern D2 (no mention of emplace or scoped and it talks about 
 using delete and scope classes).

 Game development is one place D can really get a foothold but 
 all too often the GC is held over D's head because people 
 taking their first look at D don't know how to avoid using it 
 and often don't realize you can avoid using it entirely. This 
 is easily the most common issue raised by newcomers to D with a 
 C or C++ background that I see in the #d IRC channel (many of 
 which are interested in game dev but concerned the GC will kill 
 their game's performance).
I think that Phobos should have some support for manual memory management. I don't mean clearing out the gc usage there, as it's fairly obvious. I rather think about something like unique_ptr/shared_ptr in the std. I think unique_ptr can't be implemented without rval refs, also C++ sollutions may not fit here. Anyways, now it's not so straightforward how to live without gc so standard sollution would be really helpful. Also, it should be visible in C++/D that D can really deal with manual memory management conveniently - when I checked out Dlang first time I felt very disappointed that "delete" operator is deprecated. "So - they advertise one can code without GC, yet they seem to deprecate the operator" - false claims discourage people from using new languages.
May 23 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 23 May 2013 16:02:05 -0400, QAston <qaston gmail.com> wrote:

 Also, it should be visible in C++/D that D can really deal with manual  
 memory management conveniently - when I checked out Dlang first time I  
 felt very disappointed that "delete" operator is deprecated. "So - they  
 advertise one can code without GC, yet they seem to deprecate the  
 operator" - false claims discourage people from using new languages.
While I'm not specifically addressing the ability or not to disable the GC (I agree D has problems tehre), deprecating the delete operator does NOT preclude manual memory management. The problem with delete is it conflates destruction with deallocation. Yes, when you deallocate, you want to destroy, but manual deallocation is a very dangerous operation. Most of the time, you want to destroy WITHOUT deallocating (this is for cases where you are relying on the GC). Then I think Andrei also had a gripe that D had a whole keyword dedicated to an unsafe operation. You can still destroy and deallocate with destroy() and GC.free(). -Steve
May 23 2013
parent reply "QAston" <qaston gmail.com> writes:
On Thursday, 23 May 2013 at 20:07:08 UTC, Steven Schveighoffer 
wrote:
 While I'm not specifically addressing the ability or not to 
 disable the GC (I agree D has problems tehre), deprecating the 
 delete operator does NOT preclude manual memory management.

 The problem with delete is it conflates destruction with 
 deallocation.  Yes, when you deallocate, you want to destroy, 
 but manual deallocation is a very dangerous operation.  Most of 
 the time, you want to destroy WITHOUT deallocating (this is for 
 cases where you are relying on the GC).

 Then I think Andrei also had a gripe that D had a whole keyword 
 dedicated to an unsafe operation.

 You can still destroy and deallocate with destroy() and 
 GC.free().

 -Steve
Yes, I know the rationale behind deprecating delete and i agree with it. But from newcomer's point of view this looks misleading - not everyone has enough patience (or hatered towards c++) to lurk inside mailing lists and official website shows the deprecated way of doing things: http://dlang.org/memory.html . IMO manual memory management howto should be in a visible place - to dispell the myths language suffers from. Maybe even place in to the malloc-howto in Efficency paragraph of main website.
May 23 2013
next sibling parent "QAston" <qaston gmail.com> writes:
On Thursday, 23 May 2013 at 20:15:51 UTC, QAston wrote:
 Maybe even place in to the malloc-howto in Efficency paragraph 
 of main website.
Sorry, should be: Maybe even place the malloc-howto in Efficency paragraph of main website.
May 23 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 23, 2013 at 10:15:50PM +0200, QAston wrote:
 On Thursday, 23 May 2013 at 20:07:08 UTC, Steven Schveighoffer
 wrote:
While I'm not specifically addressing the ability or not to
disable the GC (I agree D has problems tehre), deprecating the
delete operator does NOT preclude manual memory management.

The problem with delete is it conflates destruction with
deallocation.  Yes, when you deallocate, you want to destroy, but
manual deallocation is a very dangerous operation.  Most of the
time, you want to destroy WITHOUT deallocating (this is for cases
where you are relying on the GC).

Then I think Andrei also had a gripe that D had a whole keyword
dedicated to an unsafe operation.

You can still destroy and deallocate with destroy() and GC.free().

-Steve
Yes, I know the rationale behind deprecating delete and i agree with it. But from newcomer's point of view this looks misleading - not everyone has enough patience (or hatered towards c++) to lurk inside mailing lists and official website shows the deprecated way of doing things: http://dlang.org/memory.html . IMO manual memory management howto should be in a visible place - to dispell the myths language suffers from. Maybe even place in to the malloc-howto in Efficency paragraph of main website.
Please file a bug on the bugtracker to update memory.html to reflect current usage. Misleading (or outdated) documentation is often worse than no documentation. T -- Lawyer: (n.) An innocence-vending machine, the effectiveness of which depends on how much money is inserted.
May 23 2013
parent 1100110 <0b1100110 gmail.com> writes:
On 05/23/2013 03:21 PM, H. S. Teoh wrote:
 On Thu, May 23, 2013 at 10:15:50PM +0200, QAston wrote:
 On Thursday, 23 May 2013 at 20:07:08 UTC, Steven Schveighoffer
 wrote:
 While I'm not specifically addressing the ability or not to
 disable the GC (I agree D has problems tehre), deprecating the
 delete operator does NOT preclude manual memory management.

 The problem with delete is it conflates destruction with
 deallocation.  Yes, when you deallocate, you want to destroy, but
 manual deallocation is a very dangerous operation.  Most of the
 time, you want to destroy WITHOUT deallocating (this is for cases
 where you are relying on the GC).

 Then I think Andrei also had a gripe that D had a whole keyword
 dedicated to an unsafe operation.

 You can still destroy and deallocate with destroy() and GC.free().

 -Steve
Yes, I know the rationale behind deprecating delete and i agree with it. But from newcomer's point of view this looks misleading - not everyone has enough patience (or hatered towards c++) to lurk inside mailing lists and official website shows the deprecated way of doing things: http://dlang.org/memory.html . IMO manual memory management howto should be in a visible place - to dispell the myths language suffers from. Maybe even place in to the malloc-howto in Efficency paragraph of main website.
=20 Please file a bug on the bugtracker to update memory.html to reflect current usage. Misleading (or outdated) documentation is often worse than no documentation. =20 =20 T =20
Agreed, even if it's just a Warning Deprecated it would be much better.
May 24 2013
prev sibling next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 23 May 2013 at 20:02:06 UTC, QAston wrote:
 I think that Phobos should have some support for manual memory 
 management. I don't mean clearing out the gc usage there, as 
 it's fairly obvious. I rather think about something like 
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be 
 implemented without rval refs, also C++ sollutions may not fit 
 here. Anyways, now it's not so straightforward how to live 
 without gc so standard sollution would be really helpful.
There is std.typecons.Unique and std.typecons.RefCounted. Unique is more cumbersome than unique_ptr but it should work though I've never tried to use it. Proper rvalue references would be a nice improvement here. RefCounted doesn't support classes yet simply because nobody has taken the time to add support for them. It'd be nice to just be able to say shared_ptr = RefCounted, unique_ptr = Unique when somebody asks about smart pointers in D though. std.typecons.scoped is also useful but a bit buggy/cumbersome. jA_cOp (IRC handle) is working on improving it. Manu tried his hand at implementing his own version for fun (which came up because we were engaged in yet another GC argument with someone coming from C++).
May 23 2013
parent "QAston" <qaston gmail.com> writes:
On Thursday, 23 May 2013 at 20:51:42 UTC, Brad Anderson wrote:
 On Thursday, 23 May 2013 at 20:02:06 UTC, QAston wrote:
 I think that Phobos should have some support for manual memory 
 management. I don't mean clearing out the gc usage there, as 
 it's fairly obvious. I rather think about something like 
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be 
 implemented without rval refs, also C++ sollutions may not fit 
 here. Anyways, now it's not so straightforward how to live 
 without gc so standard sollution would be really helpful.
There is std.typecons.Unique and std.typecons.RefCounted. Unique is more cumbersome than unique_ptr but it should work though I've never tried to use it. Proper rvalue references would be a nice improvement here. RefCounted doesn't support classes yet simply because nobody has taken the time to add support for them. It'd be nice to just be able to say shared_ptr = RefCounted, unique_ptr = Unique when somebody asks about smart pointers in D though. std.typecons.scoped is also useful but a bit buggy/cumbersome. jA_cOp (IRC handle) is working on improving it. Manu tried his hand at implementing his own version for fun (which came up because we were engaged in yet another GC argument with someone coming from C++).
Thank you very much for the reply - I didn't realize those were already there.
May 23 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, May 23, 2013 22:02:05 QAston wrote:
 I think that Phobos should have some support for manual memory
 management. I don't mean clearing out the gc usage there, as it's
 fairly obvious. I rather think about something like
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be
 implemented without rval refs, also C++ sollutions may not fit
 here. Anyways, now it's not so straightforward how to live
 without gc so standard sollution would be really helpful.
We have std.typecons.RefCounted, which is basically a shared pointer.
 Also, it should be visible in C++/D that D can really deal with
 manual memory management conveniently - when I checked out Dlang
 first time I felt very disappointed that "delete" operator is
 deprecated. "So - they advertise one can code without GC, yet
 they seem to deprecate the operator" - false claims discourage
 people from using new languages.
delete is only used for GC memory, and manual memory management should really be done with malloc and free rather than explicitly freeing GC memory. But if you really want to risk blowing your foot off, you can always use destroy to destroy an object in GC memory and core.memory.GC.free to free it. Also, once we get custom allocators, it should be easier to manually manage memory (e.g. I would assume that it would properly abstract doing malloc and then emplacing the object in that memory so that you do something like allocator!MyObject(args) rather than having to deal with emplace directly). - Jonathan M Davis
May 23 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 09:02, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Thursday, May 23, 2013 22:02:05 QAston wrote:
 I think that Phobos should have some support for manual memory
 management. I don't mean clearing out the gc usage there, as it's
 fairly obvious. I rather think about something like
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be
 implemented without rval refs, also C++ sollutions may not fit
 here. Anyways, now it's not so straightforward how to live
 without gc so standard sollution would be really helpful.
We have std.typecons.RefCounted, which is basically a shared pointer.
I've always steered away from things like this because it creates a double-indirection. I have thought of making a similar RefCounted template, but where the refCount is stored in a hash table, and the pointer is used to index the table. This means the refCount doesn't pollute the class/structure being ref-counted, or avoids a double-indirection on general access. It will be slightly slower to inc/decrement, but that's a controlled operation. I would use a system like this for probably 80% of resources.
 Also, it should be visible in C++/D that D can really deal with
 manual memory management conveniently - when I checked out Dlang
 first time I felt very disappointed that "delete" operator is
 deprecated. "So - they advertise one can code without GC, yet
 they seem to deprecate the operator" - false claims discourage
 people from using new languages.
delete is only used for GC memory, and manual memory management should really be done with malloc and free rather than explicitly freeing GC memory. But if you really want to risk blowing your foot off, you can always use destroy to destroy an object in GC memory and core.memory.GC.free to free it. Also, once we get custom allocators, it should be easier to manually manage memory (e.g. I would assume that it would properly abstract doing malloc and then emplacing the object in that memory so that you do something like allocator!MyObject(args) rather than having to deal with emplace directly).
Custom allocators will probably be very useful, but if there's one thing STL has taught me, it's hard to use them effectively, and in practise, nobody ever uses them. One problem is the implicit allocation functions (array concatenation, AA's, etc). How to force those to allocate somewhere else for the scope?
 - Jonathan M Davis
May 23 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/23/13 7:42 PM, Manu wrote:
 I've always steered away from things like this because it creates a
 double-indirection.
There's no double indirection for the payload.
 I have thought of making a similar RefCounted template, but where the
 refCount is stored in a hash table, and the pointer is used to index the
 table.
 This means the refCount doesn't pollute the class/structure being
 ref-counted, or avoids a double-indirection on general access.
But that's worse than non-intrusive refcounting, and way worse than intrusive refcounting (which should be the elective method for classes).
 Custom allocators will probably be very useful, but if there's one thing
 STL has taught me, it's hard to use them effectively, and in practise,
 nobody ever uses them.
Agreed.
 One problem is the implicit allocation functions (array concatenation,
 AA's, etc). How to force those to allocate somewhere else for the scope?
I have some ideas. Andrei
May 23 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Friday, 24 May 2013 at 00:44:14 UTC, Andrei Alexandrescu wrote:
 Custom allocators will probably be very useful, but if there's 
 one thing
 STL has taught me, it's hard to use them effectively, and in 
 practise,
 nobody ever uses them.
Agreed.
To benefit from a custom allocator, you need to be under a very specific use case. Generic allocator are pretty good in most cases.
May 23 2013
parent reply Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:
On 5/24/2013 12:25 AM, deadalnix wrote:
 On Friday, 24 May 2013 at 00:44:14 UTC, Andrei Alexandrescu wrote:
 Custom allocators will probably be very useful, but if there's one thing
 STL has taught me, it's hard to use them effectively, and in practise,
 nobody ever uses them.
Agreed.
To benefit from a custom allocator, you need to be under a very specific use case. Generic allocator are pretty good in most cases.
Most general allocators choke on multi-threaded code, so a large part of customizing allocations is to get rid lock contention. While STL containers can have basic allocator templates assigned to them, if you really need performance you typically need to control all the different kinds of allocations a container does. For example, a std::unordered_set allocates a ton of link list nodes to keep iterators stable inserts and removes, but the actual data payload is another separate allocation, as is some kind of root data structure to hold the hash tables. In STL land this is all allocated through a single allocator object, making it very difficult (nearly impossible in a clean way) to allocate the payload data with some kind of fixed size block allocator, and allocate the metadata and link list nodes with a different allocator. Some people would complain this exposes implementation details of a class, but the class is a template, it should be able to be configured to work the way you need it to. class tHashMapNodeDefaultAllocator { public: static void* allocateMemory(size_t size, size_t alignment) { return mAlloc(size, alignment); } static void freeMemory(void* pointer) NOEXCEPT { mFree(pointer); } }; template <typename DefaultKeyType, typename DefaultValueType> class tHashMapConfiguration { public: typedef typename tHashClass<DefaultKeyType> HashClass; typedef typename tEqualsClass<DefaultKeyType> EqualClass; typedef tHashMapNodeDefaultAllocator NodeAllocator; typedef typename tDynamicArrayConfiguration<typename tHashMapNode<DefaultKeyType, DefaultValueType>> NodeArrayConfiguration; }; template <typename KeyType, typename ValueType, typename HashMapConfiguration = tHashMapConfiguration<KeyType, ValueType>> class tHashMap { }; // the tHashMap also has an array inside, so there is a way to configure that too: class tDynamicArrayDefaultAllocator { public: static void* allocateMemory(size_t size, size_t alignment) { return mAlloc(size, alignment); } static void freeMemory(void* pointer) NOEXCEPT { mFree(pointer); } }; class tDynamicArrayDefaultStrategy { public: static size_t nextAllocationSize(size_t currentSize, size_t objectSize, size_t numNewItemsRequested) { // return some size to grow the array by when the capacity is reached return currentSize + numNewItemsRequested * 2; } } template <typename DefaultObjectType> class tDynamicArrayConfiguration { public: typedef tDynamicArrayDefaultStrategy DynamicArrayStrategy; typedef tDynamicArrayDefaultAllocator DynamicArrayAllocator; };
May 23 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 24 May 2013 at 05:49:18 UTC, Sean Cavanaugh wrote:
 Most general allocators choke on multi-threaded code, so a 
 large part of customizing allocations is to get rid lock 
 contention.
It is safe to assume that the future is multithreaded and that general allocator won't choke on that for long. They already exists, you probably don't need (and don't want if your are not affected by NIH syndrome) to roll your own here.
May 23 2013
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-05-23 23:42:10 +0000, Manu <turkeyman gmail.com> said:

 I have thought of making a similar RefCounted template, but where the
 refCount is stored in a hash table, and the pointer is used to index the
 table.
 This means the refCount doesn't pollute the class/structure being
 ref-counted, or avoids a double-indirection on general access.
 It will be slightly slower to inc/decrement, but that's a controlled
 operation.
 I would use a system like this for probably 80% of resources.
I just want to note that this is exactly how reference counts are handled in Apple's Objective-C implementation, with a spin-lock protecting the table. Actually, on OS X (but not on iOS) there's 4 tables (if I remember well) and which table to use is determined by bits 4 & 5 of the pointer. It probably helps when you have more cores. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
May 23 2013
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 23 May 2013 at 23:42:22 UTC, Manu wrote:
 I've always steered away from things like this because it 
 creates a
 double-indirection.
 I have thought of making a similar RefCounted template, but 
 where the
 refCount is stored in a hash table, and the pointer is used to 
 index the
 table.
 This means the refCount doesn't pollute the class/structure 
 being
 ref-counted, or avoids a double-indirection on general access.
 It will be slightly slower to inc/decrement, but that's a 
 controlled
 operation.
 I would use a system like this for probably 80% of resources.
Reference counting also tend to create die in mass effect (objects tends to die in cluster) and freeze program for a while. I'm not sure it is that better (better than current D's GC for sure, but I'm not sure it is better than a good GC). It probably depends on the usage pattern.
May 23 2013
parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 15:21, deadalnix <deadalnix gmail.com> wrote:

 On Thursday, 23 May 2013 at 23:42:22 UTC, Manu wrote:

 I've always steered away from things like this because it creates a
 double-indirection.
 I have thought of making a similar RefCounted template, but where the
 refCount is stored in a hash table, and the pointer is used to index the
 table.
 This means the refCount doesn't pollute the class/structure being
 ref-counted, or avoids a double-indirection on general access.
 It will be slightly slower to inc/decrement, but that's a controlled
 operation.
 I would use a system like this for probably 80% of resources.
Reference counting also tend to create die in mass effect (objects tends to die in cluster) and freeze program for a while. I'm not sure it is that better (better than current D's GC for sure, but I'm not sure it is better than a good GC). It probably depends on the usage pattern.
In my experience that's fine. In realtime code, you tend not to allocate/deallocate at runtime. Unless it's some short lived temp's, which tend not to cluster how you describe... When you eventually do free some big resources, causing a cluster free, you will have probably done it at an appropriate time where you intend such a thing to happen.
May 23 2013
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-May-2013 22:13, Brad Anderson пишет:
 While there hasn't been anything official, I think it's a safe bet to
 say that D is being used for a major title, Remedy's Quantum Break,
 featured prominently during the announcement of Xbox One. Quantum Break
 doesn't come out until 2014 so the timeline seems about right (Remedy
 doesn't appear to work on more than one game at a time from what I can
 tell).

 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it
 comes to people working on games in D is the garbage collector.  You can
 work around the GC without too much difficulty as Manu's experience
 shared in his DConf talk shows but a lot of people new to D don't know
 how to do that.  We could also use some tools and guides to help people
 identify and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]
I have simple and future proof proposal: 1. Acknowledge how containers would look like (API level is fine, and std.container has it). Postpone allocator or consider them be backed into container. 2. Then for any function that has to allocate something (array typically) add a compile-time parameter - container to use. Obviously there has to be a constraint on what kind of operations it must provide. 3. std.algorithm and std.range become usable. We then can extend this policy beyond. Some examples to boot: 1. std.array.array - incredibly nice tool, turns any range into array. Let's make a construct function that does the same for any container: auto arr = array(iota(0, 10).map....) ---> auto arr = construct!(Array!int)(iota(0, 10).map...) by repeatedly calling insertAny in general, and doing better things depending on the primitives available (like reserving space beforehand for array-like types). BTW users can use alias FTW: alias toArray = construct!(Array!int); // Yay! 2. schwartzSort - allocates array internally. We just need to pass it the right replacement type for array. schwartzSort!(Array!int)(...) - no GC required now. Ditto for levenshteinDistance etc. There could be some limitations on how far such approach can go with introducing new overloads. Alternative is new functions with some suffix/prefix. -- Dmitry Olshansky
May 23 2013
prev sibling next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
Am 23.05.2013 20:13, schrieb Brad Anderson:
 While there hasn't been anything official, I think it's a safe bet to
 say that D is being used for a major title, Remedy's Quantum Break,
 featured prominently during the announcement of Xbox One. Quantum Break
 doesn't come out until 2014 so the timeline seems about right (Remedy
 doesn't appear to work on more than one game at a time from what I can
 tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it
 comes to people working on games in D is the garbage collector.  You can
 work around the GC without too much difficulty as Manu's experience
 shared in his DConf talk shows but a lot of people new to D don't know
 how to do that.  We could also use some tools and guides to help people
 identify and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] would be
 another great tool that would help people identify GC allocations.  This
 or something similar could also be used to document throughout phobos
 when GC allocations can happen (and help eliminate it where it makes
 sense to).

 There was a lot of interesting stuff in Benjamin Thaut's article about
 GC versus manual memory management in a game [4] and the discussion
 about it on the forums [5].  A lot of this collective knowledge built up
 on manual memory management techniques specific to D should probably be
 formalized and added to the official documentation.  There is a Memory
 Management [6] page in the documentation but it appears to be rather
 dated at this point and not particularly applicable to modern D2 (no
 mention of emplace or scoped and it talks about using delete and scope
 classes).

 Game development is one place D can really get a foothold but all too
 often the GC is held over D's head because people taking their first
 look at D don't know how to avoid using it and often don't realize you
 can avoid using it entirely. This is easily the most common issue raised
 by newcomers to D with a C or C++ background that I see in the #d IRC
 channel (many of which are interested in game dev but concerned the GC
 will kill their game's performance).


 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-Programming-Language/dmd/pull/1886
 4: http://3d.benjamin-thaut.de/?p=20#more-20
 5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
 6: http://dlang.org/memory.html
With the increase usage of Windows Phone 8 (I know I know), MonoGame, of D losing that train, even with Remedy's good example. -- Paulo
May 23 2013
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/23/2013 08:43 PM, H. S. Teoh wrote:
 I listened to Manu's talk yesterday, and I agree with what he said, that
 Phobos functions that don't *need* to allocate, shouldn't. Andrei was
 also enthusiastic about std.algorithm being almost completely
 allocation-free. Maybe we should file bugs (enhancement requests?) for
 all such Phobos functions?
I'm also in agreement with Manu. There may well already be bugs for some of them -- e.g. there is one for toUpperInPlace which he referred to, and the source of the allocation is clear and is even responsible for other bugs: http://d.puremagic.com/issues/show_bug.cgi?id=9629 I asked for a list because, even if all the cases are registered as bugs, it's not necessarily easy to find them. So, we need to either tag all the bugs so they can be found easily, or make a list somewhere.
May 23 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-05-23 23:42, Joseph Rushton Wakeling wrote:

 I'm also in agreement with Manu.  There may well already be bugs for some of
 them -- e.g. there is one for toUpperInPlace which he referred to, and the
 source of the allocation is clear and is even responsible for other bugs:
 http://d.puremagic.com/issues/show_bug.cgi?id=9629
toUpper/lower cannot be made in place if it should handle all Unicode. Some characters will change their length when convert to/from uppercase. Examples of these are the German double S and some Turkish I. -- /Jacob Carlborg
May 24 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-May-2013 13:49, Jacob Carlborg пишет:
 On 2013-05-23 23:42, Joseph Rushton Wakeling wrote:

 I'm also in agreement with Manu.  There may well already be bugs for
 some of
 them -- e.g. there is one for toUpperInPlace which he referred to, and
 the
 source of the allocation is clear and is even responsible for other bugs:
 http://d.puremagic.com/issues/show_bug.cgi?id=9629
toUpper/lower cannot be made in place if it should handle all Unicode. Some characters will change their length when convert to/from uppercase. Examples of these are the German double S and some Turkish I.
Yes! Now we're getting somewhere. The function was a mistake to begin with. -- Dmitry Olshansky
May 24 2013
prev sibling next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 24 May 2013 at 09:49:40 UTC, Jacob Carlborg wrote:
 toUpper/lower cannot be made in place if it should handle all 
 Unicode. Some characters will change their length when convert 
 to/from uppercase. Examples of these are the German double S 
 and some Turkish I.
In that case it should only allocate when needed. Most strings are ASCII and will not change size.
May 24 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-05-24 12:01, Peter Alexander wrote:

 In that case it should only allocate when needed. Most strings are ASCII
 and will not change size.
What I mean is that something called "InPlace" doesn't go hand in hand with something that allocates. There's always std.ascii. -- /Jacob Carlborg
May 24 2013
parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 24 May 2013 at 12:29:43 UTC, Jacob Carlborg wrote:
 On 2013-05-24 12:01, Peter Alexander wrote:

 In that case it should only allocate when needed. Most strings 
 are ASCII
 and will not change size.
What I mean is that something called "InPlace" doesn't go hand in hand with something that allocates. There's always std.ascii.
Ah right, I see your point. My bad.
May 24 2013
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/24/2013 11:49 AM, Jacob Carlborg wrote:
 toUpper/lower cannot be made in place if it should handle all Unicode. Some
 characters will change their length when convert to/from uppercase. Examples of
 these are the German double S and some Turkish I.
Surely it's possible to put in-place checks for whether the character length changes, and ensure in-place replacement without any allocation if it doesn't. (To be honest, feels a bit of a design flaw in Unicode that character length can change between lower- and uppercase.)
May 24 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 24 May 2013 at 13:37:36 UTC, Joseph Rushton Wakeling 
wrote:
 (To be honest, feels a bit of a design flaw in Unicode that 
 character length can
 change between lower- and uppercase.)
Unfortunately it's either that or lose compatibility with ASCII. Lower case dotted-i needs to be one byte for ASCII, and upper case dotted-i isn't ASCII, so it needs to be more than one byte. P.S. it's a problem with UTF-8, not Unicode.
May 24 2013
parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On 2013-05-24, 16:24, Peter Alexander wrote:

 On Friday, 24 May 2013 at 13:37:36 UTC, Joseph Rushton Wakeling wrote:=
 (To be honest, feels a bit of a design flaw in Unicode that character=
=
 length can
 change between lower- and uppercase.)
Unfortunately it's either that or lose compatibility with ASCII. Lower=
=
 case dotted-i needs to be one byte for ASCII, and upper case dotted-i =
=
 isn't ASCII, so it needs to be more than one byte.
One could certainly have two different lowercase dotted I's, with one mapping to I and the other to =C4=B0, and their unicode values close to = the upper-case versions. -- = Simen
May 24 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 19:49, Jacob Carlborg <doob me.com> wrote:

 On 2013-05-23 23:42, Joseph Rushton Wakeling wrote:

  I'm also in agreement with Manu.  There may well already be bugs for som=
e
 of
 them -- e.g. there is one for toUpperInPlace which he referred to, and t=
he
 source of the allocation is clear and is even responsible for other bugs=
:
 http://d.puremagic.com/issues/**show_bug.cgi?id=3D9629<http://d.puremagi=
c.com/issues/show_bug.cgi?id=3D9629>

 toUpper/lower cannot be made in place if it should handle all Unicode.
 Some characters will change their length when convert to/from uppercase.
 Examples of these are the German double S and some Turkish I.
=C3=9F and SS are both actually 2 bytes, so it works in UTF-8 at least! ;)
May 24 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-May-2013 18:38, Manu пишет:
 On 24 May 2013 19:49, Jacob Carlborg <doob me.com <mailto:doob me.com>>
 wrote:

     On 2013-05-23 23:42, Joseph Rushton Wakeling wrote:

         I'm also in agreement with Manu.  There may well already be bugs
         for some of
         them -- e.g. there is one for toUpperInPlace which he referred
         to, and the
         source of the allocation is clear and is even responsible for
         other bugs:
         http://d.puremagic.com/issues/__show_bug.cgi?id=9629
         <http://d.puremagic.com/issues/show_bug.cgi?id=9629>


     toUpper/lower cannot be made in place if it should handle all
     Unicode. Some characters will change their length when convert
     to/from uppercase. Examples of these are the German double S and
     some Turkish I.


 ß and SS are both actually 2 bytes, so it works in UTF-8 at least! ;)
Okay, here you go - an UTF-8 table of cased sin :) Codepoint - upper-case - lower-case 0x01e9e : 0x000df - 3 : 2 0x0023a : 0x02c65 - 2 : 3 0x0023e : 0x02c66 - 2 : 3 0x02c7e : 0x0023f - 3 : 2 0x02c7f : 0x00240 - 3 : 2 0x02c6f : 0x00250 - 3 : 2 0x02c6d : 0x00251 - 3 : 2 0x02c70 : 0x00252 - 3 : 2 0x0a78d : 0x00265 - 3 : 2 0x0a7aa : 0x00266 - 3 : 2 0x02c62 : 0x0026b - 3 : 2 0x02c6e : 0x00271 - 3 : 2 0x02c64 : 0x0027d - 3 : 2 0x01e9e : 0x000df - 3 : 2 0x02c62 : 0x0026b - 3 : 2 0x02c64 : 0x0027d - 3 : 2 0x0023a : 0x02c65 - 2 : 3 0x0023e : 0x02c66 - 2 : 3 0x02c6d : 0x00251 - 3 : 2 0x02c6e : 0x00271 - 3 : 2 0x02c6f : 0x00250 - 3 : 2 0x02c70 : 0x00252 - 3 : 2 0x02c7e : 0x0023f - 3 : 2 0x02c7f : 0x00240 - 3 : 2 0x0a78d : 0x00265 - 3 : 2 0x0a7aa : 0x00266 - 3 : 2 And this is only with 1:1 mapping. Generated by: void main(){ import std.uni, std.utf, std.stdio; char buf[4]; foreach(dchar ch; unicode.Cased_Letter.byCodepoint){ dchar upper = toUpper(ch); dchar lower = toLower(ch); int uLen = encode(buf, upper); int lLen = encode(buf, lower); if(uLen != lLen) writefln("0x%05x : 0x%05x - %d : %d", upper, lower, uLen, lLen); } } -- Dmitry Olshansky
May 24 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, May 24, 2013 09:42:10 Manu wrote:
 On 24 May 2013 09:02, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 On Thursday, May 23, 2013 22:02:05 QAston wrote:
 I think that Phobos should have some support for manual memory
 management. I don't mean clearing out the gc usage there, as it's
 fairly obvious. I rather think about something like
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be
 implemented without rval refs, also C++ sollutions may not fit
 here. Anyways, now it's not so straightforward how to live
 without gc so standard sollution would be really helpful.
We have std.typecons.RefCounted, which is basically a shared pointer.
I've always steered away from things like this because it creates a double-indirection. I have thought of making a similar RefCounted template, but where the refCount is stored in a hash table, and the pointer is used to index the table. This means the refCount doesn't pollute the class/structure being ref-counted, or avoids a double-indirection on general access. It will be slightly slower to inc/decrement, but that's a controlled operation. I would use a system like this for probably 80% of resources.
We use smart pointers where I work and it's a godsend for avoiding memory problems. We almost never have them whereas the idiots who designed the older software used manual refcounting everywhere, and they had tons of memory problems. But while we need to be performant, we don't need to be performant on quite the level that you do. So, maybe it's more of a problem in your environment.
 Also, it should be visible in C++/D that D can really deal with
 
 manual memory management conveniently - when I checked out Dlang
 first time I felt very disappointed that "delete" operator is
 deprecated. "So - they advertise one can code without GC, yet
 they seem to deprecate the operator" - false claims discourage
 people from using new languages.
delete is only used for GC memory, and manual memory management should really be done with malloc and free rather than explicitly freeing GC memory. But if you really want to risk blowing your foot off, you can always use destroy to destroy an object in GC memory and core.memory.GC.free to free it. Also, once we get custom allocators, it should be easier to manually manage memory (e.g. I would assume that it would properly abstract doing malloc and then emplacing the object in that memory so that you do something like allocator!MyObject(args) rather than having to deal with emplace directly).
Custom allocators will probably be very useful, but if there's one thing STL has taught me, it's hard to use them effectively, and in practise, nobody ever uses them.
Well, as Andrei said, they're hard, which is why they aren't done yet. Another think to think about with regards to C++ though is the fact that its new and delete don't having anything to do with a GC, so it has a built-in nice way of allocating memory which is managed manually, whereas in D, we're forced to use emplace, which is a lot more of a hassle. Even simply having something like allocator.make!MyObj(args) and allocator.free(args) would really help out. There's no question though that they get hairier when you start having to worry about containers and internal allocations and the like. It's a tough problem.
 One problem is the implicit allocation functions (array concatenation,
 AA's, etc). How to force those to allocate somewhere else for the scope?
I would fully expect that they use the GC and only the GC as they're language constructs, and custom allocators are going to be library constructs. The allocators may provide clean ways to do stuff like concatenating arrays using their API rather than ~, but if you really want to manipulate arrays with slicing and concatenation and whatnot without the GC, I think that you're pretty much going to have to create a new type to handle them, which is very doable, but it does mean not using the built-in arrays as much, which does kind of suck. But for most programs, I would expect that simply managing the GC more intelligently for stuff that has to be GC allocated would solve the problem nicely. Kiith-Sa and others have managed to quite well at getting the GC to work efficiently by managing when it's enabled and gets the chance to run and whatnot. You have extremely stringent requirements that may cause problems with that (though Kiith-Sa was doing a game of some variety IIRC), but pretty much the only way to make it so that built-in stuff that allocates doesn't use the GC is to use your own version of druntime. Kiith-Sa had a good post on how to go about dealing with the GC in performant code a while back: http://forum.dlang.org/post/vbsajlgotanuhmmpnspf forum.dlang.org Regardless, we're not going to get away from some language features requiring the GC, but they're also not features that exist in C++, so if you really can't use them, you still haven't lost anything over C++ (as much as it may still suck to not be able to them), and there are still plenty of other great features that you can take advantage of. - Jonathan M Davis
May 23 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 09:57, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Friday, May 24, 2013 09:42:10 Manu wrote:
 On 24 May 2013 09:02, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 On Thursday, May 23, 2013 22:02:05 QAston wrote:
 I think that Phobos should have some support for manual memory
 management. I don't mean clearing out the gc usage there, as it's
 fairly obvious. I rather think about something like
 unique_ptr/shared_ptr in the std. I think unique_ptr can't be
 implemented without rval refs, also C++ sollutions may not fit
 here. Anyways, now it's not so straightforward how to live
 without gc so standard sollution would be really helpful.
We have std.typecons.RefCounted, which is basically a shared pointer.
I've always steered away from things like this because it creates a double-indirection. I have thought of making a similar RefCounted template, but where the refCount is stored in a hash table, and the pointer is used to index the table. This means the refCount doesn't pollute the class/structure being ref-counted, or avoids a double-indirection on general access. It will be slightly slower to inc/decrement, but that's a controlled operation. I would use a system like this for probably 80% of resources.
We use smart pointers where I work and it's a godsend for avoiding memory problems. We almost never have them whereas the idiots who designed the older software used manual refcounting everywhere, and they had tons of memory problems. But while we need to be performant, we don't need to be performant on quite the level that you do. So, maybe it's more of a problem in your environment.
 Also, it should be visible in C++/D that D can really deal with

 manual memory management conveniently - when I checked out Dlang
 first time I felt very disappointed that "delete" operator is
 deprecated. "So - they advertise one can code without GC, yet
 they seem to deprecate the operator" - false claims discourage
 people from using new languages.
delete is only used for GC memory, and manual memory management should really be done with malloc and free rather than explicitly freeing GC memory.
But
 if
 you really want to risk blowing your foot off, you can always use
destroy
 to
 destroy an object in GC memory and core.memory.GC.free to free it.

 Also, once we get custom allocators, it should be easier to manually
 manage
 memory (e.g. I would assume that it would properly abstract doing
malloc
 and
 then emplacing the object in that memory so that you do something like
 allocator!MyObject(args) rather than having to deal with emplace
 directly).
Custom allocators will probably be very useful, but if there's one thing STL has taught me, it's hard to use them effectively, and in practise, nobody ever uses them.
Well, as Andrei said, they're hard, which is why they aren't done yet. Another think to think about with regards to C++ though is the fact that its new and delete don't having anything to do with a GC, so it has a built-in nice way of allocating memory which is managed manually, whereas in D, we're forced to use emplace, which is a lot more of a hassle. Even simply having something like allocator.make!MyObj(args) and allocator.free(args) would really help out. There's no question though that they get hairier when you start having to worry about containers and internal allocations and the like. It's a tough problem.
 One problem is the implicit allocation functions (array concatenation,
 AA's, etc). How to force those to allocate somewhere else for the scope?
I would fully expect that they use the GC and only the GC as they're language constructs, and custom allocators are going to be library constructs. The allocators may provide clean ways to do stuff like concatenating arrays using their API rather than ~, but if you really want to manipulate arrays with slicing and concatenation and whatnot without the GC, I think that you're pretty much going to have to create a new type to handle them, which is very doable, but it does mean not using the built-in arrays as much, which does kind of suck. But for most programs, I would expect that simply managing the GC more intelligently for stuff that has to be GC allocated would solve the problem nicely. Kiith-Sa and others have managed to quite well at getting the GC to work efficiently by managing when it's enabled and gets the chance to run and whatnot. You have extremely stringent requirements that may cause problems with that (though Kiith-Sa was doing a game of some variety IIRC), but pretty much the only way to make it so that built-in stuff that allocates doesn't use the GC is to use your own version of druntime. Kiith-Sa had a good post on how to go about dealing with the GC in performant code a while back: http://forum.dlang.org/post/vbsajlgotanuhmmpnspf forum.dlang.org Regardless, we're not going to get away from some language features requiring the GC, but they're also not features that exist in C++, so if you really can't use them, you still haven't lost anything over C++ (as much as it may still suck to not be able to them), and there are still plenty of other great features that you can take advantage of.
/agree, except the issue I raised, when ~ is used in phobos. That means that function is now off-limits. And there's no way to know which functions they are... - Jonathan M Davis

May 23 2013
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 24 May 2013 01:11:17 +0100, Manu <turkeyman gmail.com> wrote:
 /agree, except the issue I raised, when ~ is used in phobos.
 That means that function is now off-limits. And there's no way to know
 which functions they are...
It's not the allocation caused by ~ which is the issue though is it, it's the collection it might trigger, right? So what you really need are 3 main things: 1. A way to prevent the GC collecting until a given point(*). 2. A way to limit the GC collection time. 3. For phobos functions to be optimised to not allocate or to use alloca where possible. (*) Until the collection point the GC would ask the OS for more memory (a new pool or page) or fail and throw an Error. Much like in Leandro's concurrent GC talk/example where he talks about eager allocation. this is.. Lets imagine you can mark a thread as not stopped by the pause-the-world. Lets imagine it still does allocations which we want to collect at some stage. How would this work.. 1. The GC would remove the thread stack and global space from it's list of roots scanned by normal collections. It would not pause it on normal collections. 2. (*) above would be in effect, the first allocation in the thread would cause the GC to create a thread local pool, this pool would not be shared by other threads (no locking required, not scanned by normal GC collections). This pool could be pre-allocated by a new GC primitive "GC.allocLocalPool();" for efficiency. Allocation would come from this thread-local pool, or trigger a new pool allocation - so minimal locking should be required. 3. The thread would call a new GC primitive at the point where collection was desired i.e. "GC.localCollect(size_t maxMicroSecs);". This collection would be special, it would not stop the thread, but would occur inline. It would only scan the thread local pool and would do so with an enforced upper bound collection time. 4. There are going to be issues around 'shared' /mutable/ data, e.g. - The non-paused thread accessing it (esp during collection) - If the thread allocated 'shared' data I am hoping that if the thread main function is marked as notpaused (or similar) then the compiler can statically verify neither of these occur and produce a compile time error. So, that's the idea. I don't know the current GC all that well so I've probably missed something crucial. I doubt this idea is revolutionary and it is perhaps debatable whether the complexity is worth the effort, also whether it actually makes placing an upper bound on the collection any easier. Thoughts? R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 24 2013
next sibling parent reply "Dicebot" <m.strashun gmail.com> writes:
On Friday, 24 May 2013 at 10:24:13 UTC, Regan Heath wrote:
 It's not the allocation caused by ~ which is the issue though 
 is it, it's the collection it might trigger, right?
Depends. When it comes to real-time software you can't say without studying specific task requirements. Stop-the-world collection is a complete disaster but, for example, if you consider concurrent one like Leandro has shown - it can satisfy soft real-time requirements. But only if heap size managed by GC stays reasonably low - thus the need to control that you don't allocate in an unexpected ways.
 So what you really need are 3 main things:

 1. A way to prevent the GC collecting until a given point(*).
You can do it now. Does not help if world is stopped and/or you can't limit collection time.
 2. A way to limit the GC collection time.
Or run it concurrently with low priority. Will do for lot of _soft_ real-time.
 3. For phobos functions to be optimised to not allocate or to 
 use alloca where possible.
Really important one as helps not only game dev / soft real-time servers, but also hardcore embedded.
May 24 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 24 May 2013 11:38:40 +0100, Dicebot <m.strashun gmail.com> wrote:

 On Friday, 24 May 2013 at 10:24:13 UTC, Regan Heath wrote:
 It's not the allocation caused by ~ which is the issue though is it,  
 it's the collection it might trigger, right?
Depends. When it comes to real-time software you can't say without studying specific task requirements. Stop-the-world collection is a complete disaster but, for example, if you consider concurrent one like Leandro has shown - it can satisfy soft real-time requirements. But only if heap size managed by GC stays reasonably low - thus the need to control that you don't allocate in an unexpected ways.
 So what you really need are 3 main things:

 1. A way to prevent the GC collecting until a given point(*).
You can do it now. Does not help if world is stopped and/or you can't limit collection time.
If you disable collection, then the GC runs out of memory what happens? Does it simply ask the OS for more memory? I assumed, from Leandro's talk, that it would block on the GC lock until collection completed, or simply fail if collection was disabled. Also, the key to the idea I gave was to control collection only in the real-time thread/part of the application.
 2. A way to limit the GC collection time.
Or run it concurrently with low priority. Will do for lot of _soft_ real-time.
I don't think Manu is doing _soft_ real-time, he wants a hard guarantee it will not exceed 100us (or similar). Concurrent may be a possible solution as well, but if you think about it my idea is basically a second isolated collector running in a real-time context concurrently.
 3. For phobos functions to be optimised to not allocate or to use  
 alloca where possible.
Really important one as helps not only game dev / soft real-time servers, but also hardcore embedded.
Sure, it's desirable to be more efficient but it's no longer essential if the allocations no longer cost you anything in the real-time thread - that's the point. What do you think of the idea of making marked threads except from normal GC processing and isolating their allocations to a single page/pool in order to control and reduce collection times? R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 24 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 20:24, Regan Heath <regan netmail.co.nz> wrote:

 On Fri, 24 May 2013 01:11:17 +0100, Manu <turkeyman gmail.com> wrote:

 /agree, except the issue I raised, when ~ is used in phobos.
 That means that function is now off-limits. And there's no way to know
 which functions they are...
It's not the allocation caused by ~ which is the issue though is it, it's the collection it might trigger, right?
Yes, but the unpredictability is the real concern. It's hard to control something that you don't know about. If the phobos function can avoid the allocation, then why not avoid it? So what you really need are 3 main things:
 1. A way to prevent the GC collecting until a given point(*).
 2. A way to limit the GC collection time.
 3. For phobos functions to be optimised to not allocate or to use alloca
 where possible.


I think we can already do this.

The incremental(+precise) GC idea, I think this would be the silver bullet
for games!



Yes, I think effort to improve this would be universally appreciated.


(*) Until the collection point the GC would ask the OS for more memory (a
 new pool or page) or fail and throw an Error.  Much like in Leandro's
 concurrent GC talk/example where he talks about eager allocation.
Bear in mind, most embedded hardware does now have virtual memory, and often a fairly small hard limit. If we are trying to manually sequence out allocations and collects, like schedule collects when you change scenes on a black screen or something for instance, then you can't have random phobos functions littering small allocations all over the place.
 this is..

 Lets imagine you can mark a thread as not stopped by the pause-the-world.
  Lets imagine it still does allocations which we want to collect at some
 stage.  How would this work..

 1. The GC would remove the thread stack and global space from it's list of
 roots scanned by normal collections.  It would not pause it on normal
 collections.

 2. (*) above would be in effect, the first allocation in the thread would
 cause the GC to create a thread local pool, this pool would not be shared
 by other threads (no locking required, not scanned by normal GC
 collections).  This pool could be pre-allocated by a new GC primitive
 "GC.allocLocalPool();" for efficiency.  Allocation would come from this
 thread-local pool, or trigger a new pool allocation - so minimal locking
 should be required.

 3. The thread would call a new GC primitive at the point where collection
 was desired i.e. "GC.localCollect(size_t maxMicroSecs);".  This collection
 would be special, it would not stop the thread, but would occur inline.  It
 would only scan the thread local pool and would do so with an enforced
 upper bound collection time.

 4. There are going to be issues around 'shared' /mutable/ data, e.g.

  - The non-paused thread accessing it (esp during collection)
  - If the thread allocated 'shared' data

 I am hoping that if the thread main function is marked as  notpaused (or
 similar) then the compiler can statically verify neither of these occur and
 produce a compile time error.

 So, that's the idea.  I don't know the current GC all that well so I've
 probably missed something crucial.  I doubt this idea is revolutionary and
 it is perhaps debatable whether the complexity is worth the effort, also
 whether it actually makes placing an upper bound on the collection any
 easier.

 Thoughts?
It sounds kinda complex... but I'm not qualified to comment.
May 24 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 24 May 2013 15:50:43 +0100, Manu <turkeyman gmail.com> wrote:
 On 24 May 2013 20:24, Regan Heath <regan netmail.co.nz> wrote:

 It sounds kinda complex... but I'm not qualified to comment.
Yeah, there is complexity. It all boils down to whether it is possible using modern GC techniques (precise, incremental, etc) to perform a collection in 300us as you require. If a full collection cannot be done in that time, perhaps a smaller subset can - that is where I was heading with this idea. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 24 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
I might just add that there are some other important targets as well in the
vein of this discussion.

DLL's *still* don't work properly. druntime/phobos still don't really work
as dll's.
They are getting some attention, but it's been a really long standing and
seriously major issue. Shared libraries are like, important!


On 25 May 2013 00:50, Manu <turkeyman gmail.com> wrote:

 On 24 May 2013 20:24, Regan Heath <regan netmail.co.nz> wrote:

 On Fri, 24 May 2013 01:11:17 +0100, Manu <turkeyman gmail.com> wrote:

 /agree, except the issue I raised, when ~ is used in phobos.
 That means that function is now off-limits. And there's no way to know
 which functions they are...
It's not the allocation caused by ~ which is the issue though is it, it's the collection it might trigger, right?
Yes, but the unpredictability is the real concern. It's hard to control something that you don't know about. If the phobos function can avoid the allocation, then why not avoid it? So what you really need are 3 main things:
 1. A way to prevent the GC collecting until a given point(*).
 2. A way to limit the GC collection time.
 3. For phobos functions to be optimised to not allocate or to use alloca
 where possible.


I think we can already do this.

 The incremental(+precise) GC idea, I think this would be the silver bullet
 for games!



 Yes, I think effort to improve this would be universally appreciated.


  (*) Until the collection point the GC would ask the OS for more memory (a
 new pool or page) or fail and throw an Error.  Much like in Leandro's
 concurrent GC talk/example where he talks about eager allocation.
Bear in mind, most embedded hardware does now have virtual memory, and often a fairly small hard limit. If we are trying to manually sequence out allocations and collects, like schedule collects when you change scenes on a black screen or something for instance, then you can't have random phobos functions littering small allocations all over the place.
 this is..

 Lets imagine you can mark a thread as not stopped by the pause-the-world.
  Lets imagine it still does allocations which we want to collect at some
 stage.  How would this work..

 1. The GC would remove the thread stack and global space from it's list
 of roots scanned by normal collections.  It would not pause it on normal
 collections.

 2. (*) above would be in effect, the first allocation in the thread would
 cause the GC to create a thread local pool, this pool would not be shared
 by other threads (no locking required, not scanned by normal GC
 collections).  This pool could be pre-allocated by a new GC primitive
 "GC.allocLocalPool();" for efficiency.  Allocation would come from this
 thread-local pool, or trigger a new pool allocation - so minimal locking
 should be required.

 3. The thread would call a new GC primitive at the point where collection
 was desired i.e. "GC.localCollect(size_t maxMicroSecs);".  This collection
 would be special, it would not stop the thread, but would occur inline.  It
 would only scan the thread local pool and would do so with an enforced
 upper bound collection time.

 4. There are going to be issues around 'shared' /mutable/ data, e.g.

  - The non-paused thread accessing it (esp during collection)
  - If the thread allocated 'shared' data

 I am hoping that if the thread main function is marked as  notpaused (or
 similar) then the compiler can statically verify neither of these occur and
 produce a compile time error.

 So, that's the idea.  I don't know the current GC all that well so I've
 probably missed something crucial.  I doubt this idea is revolutionary and
 it is perhaps debatable whether the complexity is worth the effort, also
 whether it actually makes placing an upper bound on the collection any
 easier.

 Thoughts?
It sounds kinda complex... but I'm not qualified to comment.
May 24 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 24.05.2013 17:02, schrieb Manu:
 I might just add that there are some other important targets as well in
 the vein of this discussion.

 DLL's *still* don't work properly. druntime/phobos still don't really
 work as dll's.
 They are getting some attention, but it's been a really long standing
 and seriously major issue. Shared libraries are like, important!
Fully agree there. See http://d.puremagic.com/issues/show_bug.cgi?id=9816 Kind Regards Benjamin Thaut
May 24 2013
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, May 24, 2013 10:11:17 Manu wrote:
 /agree, except the issue I raised, when ~ is used in phobos.
 That means that function is now off-limits. And there's no way to know
 which functions they are...
Yes, we need to look at that. I actually don't think that ~ gets used much (primarily because so much of it uses ranges which don't have ~), but it's something that we need to look out for and address. The suggestion of an nogc attribute which at least guarantees that new isn't used would be nice for that, since it could potentially both guarantee it and document it. But that woludn't work with templated functions for the most part, since types being used with them might allocate, though we could presumably add attribute inferrence for that so that the functions which call them can be marked with nogc and be able to know that the functions that they're calling obey that. My guess is that the functions which are most likely to allocate are those which specifically take strings or arrays, as they _can_ use ~, so they probably need to be examined first, but in some cases, they're also the type of function which may _have_ to allocate, depending on what they're doing. Probably the right approach for that is to track down all of those that are allocting, make it so that any of those that can avoid the allocation do, and then create overloads which take an output range or somesuch for those that have to allocate so that preallocated memory and the like can be used for them. And if we actually have any which can't possibly do anything but allocate, they should be clearly documented as such. All around though, figuring out how to minimize GC usage in Phobos and enforce that is an open problem which is still very much up for discussion on how best to address (particularly when it's quite easy to introduce inadventant allocations with some stuff). But with everything else that we've had to worry about, optimizations like that haven't been as high a priority as they're going to need to be long term. At some point, we're probably going to need to benchmark stuff more agressively and optimize Phobos in general more, because it's the standard library. And eliminating unnecessary memory allocations definitely goes along with that. - Jonathan M Davis
May 23 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 23 May 2013 20:21:47 -0400
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 At some point, we're probably going to need to 
 benchmark stuff more agressively and optimize Phobos in general more, because 
 it's the standard library. And eliminating unnecessary memory allocations 
 definitely goes along with that.
 
 - Jonathan M Davis
On a related note, a while back I benchmarked the naive Phobos approach to create a Windows API (wchar) string from a D string with using alloca to convert the string on a piece of stack memory like this: http://dpaste.1azy.net/b60d37d4 IIRC it was 13(!) times faster for ~100 chars of English text and 5 times for some multi-byte characters. I think this approach is too hackish for Phobos, but it demonstrates that there is much room. -- Marco
May 23 2013
parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Thu, 23 May 2013 20:21:47 -0400
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 At some point, we're probably going to need to
 benchmark stuff more agressively and optimize Phobos in general more,
because
 it's the standard library. And eliminating unnecessary memory allocations
 definitely goes along with that.

 - Jonathan M Davis
On a related note, a while back I benchmarked the naive Phobos approach to create a Windows API (wchar) string from a D string with using alloca to convert the string on a piece of stack memory like this: http://dpaste.1azy.net/b60d37d4 IIRC it was 13(!) times faster for ~100 chars of English text and 5 times for some multi-byte characters. I think this approach is too hackish for Phobos, but it demonstrates that there is much room.
I don't think it's hack-ish at all, that's precisely what the stack is there for. It would be awesome for people to use alloca in places that it makes sense. Especially in cases where the function is a leaf or leaf-stem (ie, if there is no possibility of recursion), then using the stack should be encouraged. For safety, obviously phobos should do something like: void[] buffer = bytes < reasonable_anticipated_buffer_size ? alloca(bytes) : new void[bytes]; toStringz is a very common source of allocations. This alloca approach would be great in those cases, filenames in particular.
May 23 2013
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Friday, 24 May 2013 at 05:02:33 UTC, Manu wrote:
 On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de> wrote:
 I don't think it's hack-ish at all, that's precisely what the 
 stack is
 there for. It would be awesome for people to use alloca in 
 places that it
 makes sense.
 Especially in cases where the function is a leaf or leaf-stem 
 (ie, if there
 is no possibility of recursion), then using the stack should be 
 encouraged.
 For safety, obviously phobos should do something like:
   void[] buffer = bytes < reasonable_anticipated_buffer_size ?
 alloca(bytes) : new void[bytes];
That is probably something that could be handled in the optimizer in many cases.
May 23 2013
next sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 15:29, deadalnix <deadalnix gmail.com> wrote:

 On Friday, 24 May 2013 at 05:02:33 UTC, Manu wrote:

 On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de> wrote:
 I don't think it's hack-ish at all, that's precisely what the stack is
 there for. It would be awesome for people to use alloca in places that it
 makes sense.
 Especially in cases where the function is a leaf or leaf-stem (ie, if
 there
 is no possibility of recursion), then using the stack should be
 encouraged.
 For safety, obviously phobos should do something like:
   void[] buffer = bytes < reasonable_anticipated_buffer_**size ?
 alloca(bytes) : new void[bytes];
That is probably something that could be handled in the optimizer in many cases.
The optimiser probably can't predict if the function may recurse, and as such, the amount of memory you feel is reasonable to take from the stack is hard to predict... It could possibly do so for leaf functions only, but then most of the opportunities aren't in leaf functions. I'd say a majority of phobos allocations are created when passing strings through to library/system calls.
May 23 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, May 24, 2013 15:37:39 Manu wrote:
 I'd say a majority of phobos
 allocations are created when passing strings through to library/system
 calls.
That does sound probable, as toStringz will often (and unpredictably) result in allocations, and it does seem like a prime location for at least attempting to use a static array instead as you suggested. But if toStringz _wouldn't_ result in an allocation, then copying to a static array would be inadvisable, so we're probably going to need a function which does toStringz's test so that it can be used outside of toStringz. - Jonathan M Davis
May 23 2013
prev sibling parent Manu <turkeyman gmail.com> writes:
On 24 May 2013 15:44, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Friday, May 24, 2013 15:37:39 Manu wrote:
 I'd say a majority of phobos
 allocations are created when passing strings through to library/system
 calls.
That does sound probable, as toStringz will often (and unpredictably) result in allocations, and it does seem like a prime location for at least attempting to use a static array instead as you suggested. But if toStringz _wouldn't_ result in an allocation, then copying to a static array would be inadvisable, so we're probably going to need a function which does toStringz's test so that it can be used outside of toStringz.
Yeah, an alloca based cstring helper which performs the zero-terminate check, then if it's not terminated, and short enough, alloca and copy, else if too long, new. I'm sure that would be a handy little template, and improve phobos a lot.
May 23 2013
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-May-2013 09:02, Manu пишет:
 On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de
 <mailto:Marco.Leise gmx.de>> wrote:

     Am Thu, 23 May 2013 20:21:47 -0400
     schrieb "Jonathan M Davis" <jmdavisProg gmx.com
     <mailto:jmdavisProg gmx.com>>:

      > At some point, we're probably going to need to
      > benchmark stuff more agressively and optimize Phobos in general
     more, because
      > it's the standard library. And eliminating unnecessary memory
     allocations
      > definitely goes along with that.
      >
      > - Jonathan M Davis

     On a related note, a while back I benchmarked the naive Phobos
     approach to create a Windows API (wchar) string from a D
     string with using alloca to convert the string on a piece of
     stack memory like this: http://dpaste.1azy.net/b60d37d4
     IIRC it was 13(!) times faster for ~100 chars of English text
     and 5 times for some multi-byte characters.
     I think this approach is too hackish for Phobos, but it
     demonstrates that there is much room.


 I don't think it's hack-ish at all, that's precisely what the stack is
 there for. It would be awesome for people to use alloca in places that
 it makes sense.
 Especially in cases where the function is a leaf or leaf-stem (ie, if
 there is no possibility of recursion), then using the stack should be
 encouraged.
 For safety, obviously phobos should do something like:
    void[] buffer = bytes < reasonable_anticipated_buffer_size ?
 alloca(bytes) : new void[bytes];

 toStringz is a very common source of allocations. This alloca approach
 would be great in those cases, filenames in particular.
Alternatively just make a TLS buffer as scratchpad and use that everywhere. -- Dmitry Olshansky
May 24 2013
next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 24 May 2013 at 09:40:03 UTC, Dmitry Olshansky wrote:
 Alternatively just make a TLS buffer as scratchpad and use that 
 everywhere.
I believe that's what TempAlloc is for.
May 24 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 24 May 2013 19:40, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:

 24-May-2013 09:02, Manu =D0=BF=D0=B8=D1=88=D0=B5=D1=82:

 On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de
 <mailto:Marco.Leise gmx.de>> wrote:

     Am Thu, 23 May 2013 20:21:47 -0400
     schrieb "Jonathan M Davis" <jmdavisProg gmx.com
     <mailto:jmdavisProg gmx.com>>:


      > At some point, we're probably going to need to
      > benchmark stuff more agressively and optimize Phobos in general
     more, because
      > it's the standard library. And eliminating unnecessary memory
     allocations
      > definitely goes along with that.
      >
      > - Jonathan M Davis

     On a related note, a while back I benchmarked the naive Phobos
     approach to create a Windows API (wchar) string from a D
     string with using alloca to convert the string on a piece of
     stack memory like this: http://dpaste.1azy.net/**b60d37d4<http://dpa=
ste.1azy.net/b60d37d4>
     IIRC it was 13(!) times faster for ~100 chars of English text
     and 5 times for some multi-byte characters.
     I think this approach is too hackish for Phobos, but it
     demonstrates that there is much room.


 I don't think it's hack-ish at all, that's precisely what the stack is
 there for. It would be awesome for people to use alloca in places that
 it makes sense.
 Especially in cases where the function is a leaf or leaf-stem (ie, if
 there is no possibility of recursion), then using the stack should be
 encouraged.
 For safety, obviously phobos should do something like:
    void[] buffer =3D bytes < reasonable_anticipated_buffer_**size ?
 alloca(bytes) : new void[bytes];

 toStringz is a very common source of allocations. This alloca approach
 would be great in those cases, filenames in particular.
Alternatively just make a TLS buffer as scratchpad and use that everywher=
e. How is that any different than just using the stack in practise?
May 24 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-May-2013 18:35, Manu пишет:
 On 24 May 2013 19:40, Dmitry Olshansky <dmitry.olsh gmail.com
 <mailto:dmitry.olsh gmail.com>> wrote:

     24-May-2013 09:02, Manu пишет:

         On 24 May 2013 14:11, Marco Leise <Marco.Leise gmx.de
         <mailto:Marco.Leise gmx.de>
         <mailto:Marco.Leise gmx.de <mailto:Marco.Leise gmx.de>>> wrote:

              Am Thu, 23 May 2013 20:21:47 -0400
              schrieb "Jonathan M Davis" <jmdavisProg gmx.com
         <mailto:jmdavisProg gmx.com>
              <mailto:jmdavisProg gmx.com <mailto:jmdavisProg gmx.com>>>:


               > At some point, we're probably going to need to
               > benchmark stuff more agressively and optimize Phobos in
         general
              more, because
               > it's the standard library. And eliminating unnecessary
         memory
              allocations
               > definitely goes along with that.
               >
               > - Jonathan M Davis

              On a related note, a while back I benchmarked the naive Phobos
              approach to create a Windows API (wchar) string from a D
              string with using alloca to convert the string on a piece of
              stack memory like this: http://dpaste.1azy.net/__b60d37d4
         <http://dpaste.1azy.net/b60d37d4>
              IIRC it was 13(!) times faster for ~100 chars of English text
              and 5 times for some multi-byte characters.
              I think this approach is too hackish for Phobos, but it
              demonstrates that there is much room.


         I don't think it's hack-ish at all, that's precisely what the
         stack is
         there for. It would be awesome for people to use alloca in
         places that
         it makes sense.
         Especially in cases where the function is a leaf or leaf-stem
         (ie, if
         there is no possibility of recursion), then using the stack
         should be
         encouraged.
         For safety, obviously phobos should do something like:
             void[] buffer = bytes < reasonable_anticipated_buffer___size ?
         alloca(bytes) : new void[bytes];

         toStringz is a very common source of allocations. This alloca
         approach
         would be great in those cases, filenames in particular.


     Alternatively just make a TLS buffer as scratchpad and use that
     everywhere.


 How is that any different than just using the stack in practise?
Can pass across function boundaries up/down. Can grow arbitrary large without blowing up. -- Dmitry Olshansky
May 24 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-05-24 07:02, Manu wrote:

 I don't think it's hack-ish at all, that's precisely what the stack is
 there for. It would be awesome for people to use alloca in places that
 it makes sense.
 Especially in cases where the function is a leaf or leaf-stem (ie, if
 there is no possibility of recursion), then using the stack should be
 encouraged.
 For safety, obviously phobos should do something like:
    void[] buffer = bytes < reasonable_anticipated_buffer_size ?
 alloca(bytes) : new void[bytes];

 toStringz is a very common source of allocations. This alloca approach
 would be great in those cases, filenames in particular.
Basically every function in Tango that operates on some kind of array takes an array and an optional buffer (also an array). If the buffer is too small it will allocate using the GC. If not, it won't allocate and builds the array in place. That worked great with D1 where strings weren't immutable. -- /Jacob Carlborg
May 24 2013
prev sibling next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
 Johannes Pfau's work in progress -vgc command line option [3] 
 would be another great tool that would help people identify GC 
 allocations.  This or something similar could also be used to 
 document throughout phobos when GC allocations can happen (and 
 help eliminate it where it makes sense to).
I have yet to look at any of these entries but I went ahead and built phobos with Johannes' -vgc and put the output into a spreadsheet. http://goo.gl/HP78r (google spreadsheet) I'm not exactly sure if this catches templates or not. This wasn't a unittest build, just building phobos. I did try to build the unittests with -vgc but it runs out of memory trying to build std/algorithm.d. There is substantially more -vgc output when building the unit tests though. Obviously a lot of these aren't going anywhere but there's probably some interesting things to be found wading through this.
May 23 2013
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 23.05.2013 20:13, schrieb Brad Anderson:
 While there hasn't been anything official, I think it's a safe bet to
 say that D is being used for a major title, Remedy's Quantum Break,
 featured prominently during the announcement of Xbox One. Quantum Break
 doesn't come out until 2014 so the timeline seems about right (Remedy
 doesn't appear to work on more than one game at a time from what I can
 tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it
 comes to people working on games in D is the garbage collector.  You can
 work around the GC without too much difficulty as Manu's experience
 shared in his DConf talk shows but a lot of people new to D don't know
 how to do that.  We could also use some tools and guides to help people
 identify and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] would be
 another great tool that would help people identify GC allocations.  This
 or something similar could also be used to document throughout phobos
 when GC allocations can happen (and help eliminate it where it makes
 sense to).

 There was a lot of interesting stuff in Benjamin Thaut's article about
 GC versus manual memory management in a game [4] and the discussion
 about it on the forums [5].  A lot of this collective knowledge built up
 on manual memory management techniques specific to D should probably be
 formalized and added to the official documentation.  There is a Memory
 Management [6] page in the documentation but it appears to be rather
 dated at this point and not particularly applicable to modern D2 (no
 mention of emplace or scoped and it talks about using delete and scope
 classes).

 Game development is one place D can really get a foothold but all too
 often the GC is held over D's head because people taking their first
 look at D don't know how to avoid using it and often don't realize you
 can avoid using it entirely. This is easily the most common issue raised
 by newcomers to D with a C or C++ background that I see in the #d IRC
 channel (many of which are interested in game dev but concerned the GC
 will kill their game's performance).


 1: http://d.puremagic.com/issues/show_bug.cgi?id=5219
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-Programming-Language/dmd/pull/1886
 4: http://3d.benjamin-thaut.de/?p=20#more-20
 5: http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com
 6: http://dlang.org/memory.html
Besides my studies I'm working at havok and the biggest problems most likely would be (in order of importance) - Compiler / druntime for all 9 plattforms we have to support simply do not exist - Full Visual Studio integration needed. Inclusive a really good code completion and a very nice debugging experience for all plattforms. VisualD is quite nice and debugging using the visual studio debugger works quite well but its a real pita that you have to patch dmd and compile it from source so you can debug in x64 on windows. - SIMD: core.simd is just not there yet. The last time I looked really basic stuff like unaligned loads where missing. - The GC. A no go, a GC free version of the runtime (non leaking) should be provided. - Better windows support. All of the developement we do happens on windows and most of D's community does not care about windows support. I'm curious how long it will take until D will get propper DLL support. Kind Regards Benjamin Thaut
May 24 2013
parent reply Manu <turkeyman gmail.com> writes:
On 25 May 2013 04:20, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 23.05.2013 20:13, schrieb Brad Anderson:

 While there hasn't been anything official, I think it's a safe bet to

 say that D is being used for a major title, Remedy's Quantum Break,
 featured prominently during the announcement of Xbox One. Quantum Break
 doesn't come out until 2014 so the timeline seems about right (Remedy
 doesn't appear to work on more than one game at a time from what I can
 tell).


 That's pretty huge news.


 Now I'm wondering what can be done to foster this newly acquired
 credibility in games.  By far the biggest issue I hear about when it
 comes to people working on games in D is the garbage collector.  You can
 work around the GC without too much difficulty as Manu's experience
 shared in his DConf talk shows but a lot of people new to D don't know
 how to do that.  We could also use some tools and guides to help people
 identify and avoid GC use when necessary.

  nogc comes to mind (I believe Andrei mentioned it during one of the
 talks released). [1][2]

 Johannes Pfau's work in progress -vgc command line option [3] would be
 another great tool that would help people identify GC allocations.  This
 or something similar could also be used to document throughout phobos
 when GC allocations can happen (and help eliminate it where it makes
 sense to).

 There was a lot of interesting stuff in Benjamin Thaut's article about
 GC versus manual memory management in a game [4] and the discussion
 about it on the forums [5].  A lot of this collective knowledge built up
 on manual memory management techniques specific to D should probably be
 formalized and added to the official documentation.  There is a Memory
 Management [6] page in the documentation but it appears to be rather
 dated at this point and not particularly applicable to modern D2 (no
 mention of emplace or scoped and it talks about using delete and scope
 classes).

 Game development is one place D can really get a foothold but all too
 often the GC is held over D's head because people taking their first
 look at D don't know how to avoid using it and often don't realize you
 can avoid using it entirely. This is easily the most common issue raised
 by newcomers to D with a C or C++ background that I see in the #d IRC
 channel (many of which are interested in game dev but concerned the GC
 will kill their game's performance).


 1: http://d.puremagic.com/issues/**show_bug.cgi?id=5219<http://d.puremagic.com/issues/show_bug.cgi?id=5219>
 2: http://wiki.dlang.org/DIP18
 3: https://github.com/D-**Programming-Language/dmd/pull/**1886<https://github.com/D-Programming-Language/dmd/pull/1886>
 4: http://3d.benjamin-thaut.de/?**p=20#more-20<http://3d.benjamin-thaut.de/?p=20#more-20>
 5: http://forum.dlang.org/post/**k27bh7$t7f$1 digitalmars.com<http://forum.dlang.org/post/k27bh7$t7f$1 digitalmars.com>
 6: http://dlang.org/memory.html
Besides my studies I'm working at havok and the biggest problems most likely would be (in order of importance) - Compiler / druntime for all 9 plattforms we have to support simply do not exist
Yup.
 - Full Visual Studio integration needed. Inclusive a really good code
 completion and a very nice debugging experience for all plattforms. VisualD
 is quite nice and debugging using the visual studio debugger works quite
 well but its a real pita that you have to patch dmd and compile it from
 source so you can debug in x64 on windows.
Win64 works for me out of the box... ?
 - SIMD: core.simd is just not there yet. The last time I looked really
 basic stuff like unaligned loads where missing.
I'm working on std.simd (slowly >_<) .. It'll get there.
 - The GC. A no go, a GC free version of the runtime (non leaking) should
 be provided.
See, I have spend a decade on core tech/engine code meticulously worrying about memory allocation. I don't think a GC is an outright no-go. But we certainly don't have a GC that fits the bill.
 - Better windows support. All of the developement we do happens on windows
 and most of D's community does not care about windows support. I'm curious
 how long it will take until D will get propper DLL support.
As with everyone in games! We need DLL's urgently.
May 24 2013
next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
Am 25.05.2013 03:29, schrieb Manu:
 On 25 May 2013 04:20, Benjamin Thaut <code benjamin-thaut.de
 <mailto:code benjamin-thaut.de>> wrote:
 [...]
 See, I have spend a decade on core tech/engine code meticulously
 worrying about memory allocation. I don't think a GC is an outright no-go.
 But we certainly don't have a GC that fits the bill.
Given that Android, Windows Phone 7/8 and PS Vita have system languages with GC, it does not seem to bother those developers. Yes I know that most AAA studios are actually bypassing them and using C and C++ directly, but already having indie developers using D would be a great win. One needs to start somehere.
     - Better windows support. All of the developement we do happens on
     windows and most of D's community does not care about windows
     support. I'm curious how long it will take until D will get propper
     DLL support.
Yeah, this is partially why I lost the train for game development. I was too much focused in FOSS issues, instead of focusing in doing a game. -- Paulo
May 25 2013
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 25.05.2013 03:29, schrieb Manu:
 Win64 works for me out of the box... ?
For me dmd produces type names like modulename.typename.subtypename which will causes internal errors within the visual studio debugger in some cases. Also debugging of static / global variabels is not possible (even when gshared) because they are also formatted like modulename.variablename; Kind Regards Benjamin Thaut
May 25 2013
next sibling parent Manu <turkeyman gmail.com> writes:
On 25 May 2013 21:03, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 25.05.2013 03:29, schrieb Manu:

 Win64 works for me out of the box... ?
For me dmd produces type names like modulename.typename.**subtypename which will causes internal errors within the visual studio debugger in some cases. Also debugging of static / global variabels is not possible (even when gshared) because they are also formatted like modulename.variablename;
True, sadly there are holes in the debug experience, which are pretty important to have fixed at some point.
May 25 2013
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 5/25/13 6:28 PM, Manu wrote:
 On 25 May 2013 21:03, Benjamin Thaut <code benjamin-thaut.de
 <mailto:code benjamin-thaut.de>> wrote:

     Am 25.05.2013 03:29, schrieb Manu:



         Win64 works for me out of the box... ?


     For me dmd produces type names like
     modulename.typename.__subtypename which will causes internal errors
     within the visual studio debugger in some cases. Also debugging of
     static / global variabels is not possible (even when gshared)
     because they are also formatted like modulename.variablename;


 True, sadly there are holes in the debug experience, which are pretty
 important to have fixed at some point.
Bugzilla links?
May 25 2013
prev sibling parent "Rob T" <alanb ucora.com> writes:
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
  nogc comes to mind (I believe Andrei mentioned it during one 
 of the talks released). [1][2]
I would love to have something like nogc to guarantee there's no hidden or misplaced allocations in a section of code or optionally throughout the entire application. Not only is the GC a cause of concern for game devs, it is also a concern for general systems development. For example I have a simple virtual network device driver that I'd like to re-write from C++ to D. It does not need a GC at all, all memory is preallocated in advance of use during initialization and it does not need anything from Phobos. If I could easily cut out the GC even from the executable binary that would be great, provided that I was certain that no allocations were going on by mistake. Yes I know I can get rid of the GC, but there should be an elegant solution for doing it that guarantees I am not using features of the language that require the GC. Keep in mind that even if the GC was improved, there will still be plenty of systems applications that will not require the GC, so while improving the GC is a huge deal in itself, it is still not a general solution for those who do not need a GC at all and want to be certain they are not allocating by mistake. --rt
May 24 2013