digitalmars.D - [xmlp] the recent garbage collector performance improvements
- Richard Webb (19/19) Feb 01 2012 Last night I tried loading a ~20 megabyte xml file using xmlp (via the
- dsimcha (14/39) Feb 01 2012 Interesting. I'm glad my improvements seem to matter in the real
- Martin Nowak (2/2) Feb 01 2012 XML is a prime example of a shallow tree structure,
- Richard Webb (21/24) Feb 01 2012 The 'test' is just
- Marco Leise (9/14) Feb 01 2012 Speaking of which, why not also compare the memory consumption (peak
- bearophile (4/13) Feb 01 2012 Not too much time ago Python devs have added an heuristic to the Python ...
- dsimcha (5/22) Feb 01 2012 I actually tried to add something like this a while back but I
- dsimcha (4/31) Feb 01 2012 Wait a minute, since when do we even have a std.xml2? I've never
- a (5/53) Feb 01 2012 It looks like someone wrote it a year ago, but it was never added
- Richard Webb (3/7) Feb 02 2012 Thats the one -> http://www.dsource.org/projects/xmlp/
- Jesse Phillips (9/12) Feb 02 2012 As others point out it is xmlp, I've been using it for the last
- Jonathan M Davis (6/20) Feb 02 2012 In theory, Tomek Sowiński was working on a replacement for std.xml whic...
- Richard Webb (10/12) Feb 02 2012 The GC hit is related to the number of dom nodes that exist at one i
- H. S. Teoh (24/24) Feb 02 2012 Speaking of GC improvements, I googled around a bit and found this
- Robert Jacques (2/25) Feb 02 2012 So to answer your question, yes, someone has made one of these types of ...
- H. S. Teoh (24/32) Feb 02 2012 To me, the final goal should be to make the GC swappable, that is,
- Jacob Carlborg (4/34) Feb 03 2012 The GC is already swappable during linking.
- H. S. Teoh (7/8) Feb 03 2012 [...]
- Jesse Phillips (5/13) Feb 01 2012 Interesting, I've been using XmlVisitor on two 30meg and one
- Richard Webb (8/12) Feb 01 2012 For reference, the file i was testing with has ~50000 root nodes, each
- dsimcha (8/12) Feb 01 2012 Sounds about right. For very small allocations sweeping time
- H. S. Teoh (10/23) Feb 01 2012 Out of curiosity, is there a way to optimize for the "many small
- David Nadlinger (4/9) Feb 01 2012 The easiest way would be to use a specialized allocator for this – I'm...
- bearophile (7/9) Feb 01 2012 dsimcha has done a good work on the GC. After this change I have seen th...
- dsimcha (10/20) Feb 01 2012 My RegionAllocator is probably the best thing for this if the
- Robert Jacques (2/25) Feb 01 2012 An XML parser would probably want some kind of stack segment growth sche...
- dsimcha (10/12) Feb 02 2012 I had considered putting that in RegionAllocator but I was
- Manu (7/12) Feb 02 2012 I don't know about the implications of your decision, but comment makes ...
- dsimcha (13/34) Feb 02 2012 I'm not saying that embedded isn't important. It's just that for
- Manu (10/40) Feb 02 2012 Okay, this reasoning seems sound to me. I wonder though, is it
- Jonathan M Davis (11/25) Feb 02 2012 PCs are not endangered in the least. It's just that we're getting an inc...
- Andrej Mitrovic (4/5) Feb 02 2012 Kind of like when we got rid of cars and trains and ships once we
- dsimcha (8/14) Feb 02 2012 Agreed. I just recently got my first smartphone and I love it.
- H. S. Teoh (13/28) Feb 02 2012 [...]
- a (1/6) Feb 02 2012 This doesn't sound very practical.
- Richard Webb (3/6) Feb 02 2012 Nah, what you really need are those holographic displays they use on CSI...
- Timon Gehr (3/29) Feb 02 2012 Chances are that region allocator can run on mobile phones just fine at
- Sean Kelly (7/17) Feb 02 2012 Long term I suspect the typical desktop experience will just be a phone ...
- F i L (7/7) Feb 02 2012 People and, more importantly, professionals will always want the
- H. S. Teoh (15/18) Feb 02 2012 [...]
- Manu (36/53) Feb 02 2012 You're obviously a nerd though, hanging out in a place like this :)
- Manu (6/8) Feb 02 2012 'Long term'? I think you're being a bit pessimistic. It's already here!
- Manu (49/76) Feb 02 2012 They're restrictive devices with weaker hardware, yet people seem to
- Richard Webb (11/11) Feb 02 2012 Out of interest, i just tried loading the same file with std.xml, and
- Michael Rynn (66/67) Feb 25 2012 With the xml package (xmlp) , and the linked node DOM, the GC is likely
Last night I tried loading a ~20 megabyte xml file using xmlp (via the DocumentBuilder.LoadFile function) and a recent dmd build, and found that it took ~48 seconds to complete, which is rather poor. I tried running it through a profiler, and that said that almost all the runtime was spent inside the garbage collector. I then tried the same test using the latest Git versions of dmd/druntime (with pull request 108 merged in), and that took less than 10 seconds. This is a rather nice improvement, though still somewhat on the slow side. Some profiler numbers, if anyone is interested: Old version: Gcxfullcollect: 31.14 seconds, 69.26% runtime. Gcxmark: 4.84 seconds, 10.77% runtime. Gcxfindpool: 2.10 seconds, 4.67% runtime. New version: Gcxmark: 11.67 seconds, 50.77% runtime. Gcxfindpool: 3.58 seconds, 15.55% runtime. Gcxfullcollect: 1.69 seconds, 7.37% runtime. (Assuming that Sleepy is giving me accurate numbers. The new version is definately faster though).
Feb 01 2012
Interesting. I'm glad my improvements seem to matter in the real world, though I'm thoroughly impressed with the amount of speedup. Even the small allocation benchmark that I was optimizing only sped up by ~50% from 2.057 to 2.058 overall and ~2x in collection time. I'd be very interested if you could make a small, self-contained test program to use as a benchmark. GC performance is one of D's biggest weak spots, so it would probably be a good bit of marketing to show that the performance is substantially better than it used to be even if it's not great yet. Over the past year I've been working on and off at speeding it up. It's now at least ~2x faster than it was last year at this time on every benchmark I've tried and up to several hundred times faster in the extreme case of huge allocations. On Wednesday, 1 February 2012 at 18:33:58 UTC, Richard Webb wrote:Last night I tried loading a ~20 megabyte xml file using xmlp (via the DocumentBuilder.LoadFile function) and a recent dmd build, and found that it took ~48 seconds to complete, which is rather poor. I tried running it through a profiler, and that said that almost all the runtime was spent inside the garbage collector. I then tried the same test using the latest Git versions of dmd/druntime (with pull request 108 merged in), and that took less than 10 seconds. This is a rather nice improvement, though still somewhat on the slow side. Some profiler numbers, if anyone is interested: Old version: Gcxfullcollect: 31.14 seconds, 69.26% runtime. Gcxmark: 4.84 seconds, 10.77% runtime. Gcxfindpool: 2.10 seconds, 4.67% runtime. New version: Gcxmark: 11.67 seconds, 50.77% runtime. Gcxfindpool: 3.58 seconds, 15.55% runtime. Gcxfullcollect: 1.69 seconds, 7.37% runtime. (Assuming that Sleepy is giving me accurate numbers. The new version is definately faster though).
Feb 01 2012
XML is a prime example of a shallow tree structure, so it's the prime target for the recent optimization.
Feb 01 2012
On 01/02/2012 19:35, dsimcha wrote:I'd be very interested if you could make a small, self-contained test program to use as a benchmark.The 'test' is just ///////////////// import std.xml2; void main() { string xmlPath = r"test.xml"; auto document = DocumentBuilder.LoadFile(xmlPath, false, false); } ///////////////// It's xmlp that does all the work (and takes all the time). I'll see about generating a simple test file, but basically: 50000 top level nodes each one has 6 child nodes each node has a single attribute, and the child nodes each have a short text value. Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.
Feb 01 2012
Am 02.02.2012, 01:41 Uhr, schrieb Richard Webb <webby beardmouse.org.uk>:Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.Speaking of which, why not also compare the memory consumption (peak working set size)? Memory vs. CPU is the typical trade-off, so it might be interesting from that point, but I also wonder what the overhead for GC managed memory is - assuming that MSXML6 uses only ref counting. And if it is a whole lot more (like >+50%), what methods could apply, that 'waste' less space. This API function should get the job done: GetProcessMemoryInfo http://msdn.microsoft.com/en-us/library/windows/desktop/ms683219(v=vs.85).aspx
Feb 01 2012
Richard Webb:Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.Not too much time ago Python devs have added an heuristic to the Python GC (that is a reference counter + cycle breaker), it "switches off" if it detects the program is allocating many items in a short time. Is it possible to add something similar to the D GC? Bye, bearophile
Feb 01 2012
On Thursday, 2 February 2012 at 01:27:44 UTC, bearophile wrote:Richard Webb:I actually tried to add something like this a while back but I couldn't find a heuristic that worked reasonably well. The idea was just to create a timeout where the GC can't run for x milliseconds after it just ran.Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.Not too much time ago Python devs have added an heuristic to the Python GC (that is a reference counter + cycle breaker), it "switches off" if it detects the program is allocating many items in a short time. Is it possible to add something similar to the D GC? Bye, bearophile
Feb 01 2012
Wait a minute, since when do we even have a std.xml2? I've never heard of it and it's not in the Phobos source tree (I just checked). On Thursday, 2 February 2012 at 00:41:31 UTC, Richard Webb wrote:On 01/02/2012 19:35, dsimcha wrote:I'd be very interested if you could make a small, self-contained test program to use as a benchmark.The 'test' is just ///////////////// import std.xml2; void main() { string xmlPath = r"test.xml"; auto document = DocumentBuilder.LoadFile(xmlPath, false, false); } ///////////////// It's xmlp that does all the work (and takes all the time). I'll see about generating a simple test file, but basically: 50000 top level nodes each one has 6 child nodes each node has a single attribute, and the child nodes each have a short text value. Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.
Feb 01 2012
It looks like someone wrote it a year ago, but it was never added to phobos: http://www.digitalmars.com/d/archives/digitalmars/D/announce/std.xml2_ andidate_19804.html . On Thursday, 2 February 2012 at 04:39:11 UTC, dsimcha wrote:Wait a minute, since when do we even have a std.xml2? I've never heard of it and it's not in the Phobos source tree (I just checked). On Thursday, 2 February 2012 at 00:41:31 UTC, Richard Webb wrote:On 01/02/2012 19:35, dsimcha wrote:I'd be very interested if you could make a small, self-contained test program to use as a benchmark.The 'test' is just ///////////////// import std.xml2; void main() { string xmlPath = r"test.xml"; auto document = DocumentBuilder.LoadFile(xmlPath, false, false); } ///////////////// It's xmlp that does all the work (and takes all the time). I'll see about generating a simple test file, but basically: 50000 top level nodes each one has 6 child nodes each node has a single attribute, and the child nodes each have a short text value. Parsing the file with DMD 2.057 takes ~25 seconds Parsing the file with DMD 2.058(Git) takes ~6.1 seconds Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds. For comparison, MSXML6 takes 1.6 seconds to load the same file.
Feb 01 2012
On 02/02/2012 04:53, a wrote:It looks like someone wrote it a year ago, but it was never added to phobos: http://www.digitalmars.com/d/archives/digitalmars/D/announce/std.xml2_candidate_19804.html .Thats the one -> http://www.dsource.org/projects/xmlp/ (the modules are under std. ).
Feb 02 2012
On Thursday, 2 February 2012 at 04:39:11 UTC, dsimcha wrote:Wait a minute, since when do we even have a std.xml2? I've never heard of it and it's not in the Phobos source tree (I just checked).As others point out it is xmlp, I've been using it for the last year or two, and recently seen an improvement on performance (as mentioned). I don't know how comparable it is, and Richard seems to see a slow down from GC, for me disabling the GC during load doesn't change load time, but I'm not using the document loader. I hope Michael will put some effort into getting it ready for review. But still, people should use it and get him some incentive to do the work.
Feb 02 2012
On Thursday, February 02, 2012 18:11:34 Jesse Phillips wrote:On Thursday, 2 February 2012 at 04:39:11 UTC, dsimcha wrote:In theory, Tomek Sowiński was working on a replacement for std.xml which was the likely replacement for std.xml, but he seems to have disappeared... His last post was on an XML writer back in June. So, I don't know what the current situation with that is. - Jonathan M DavisWait a minute, since when do we even have a std.xml2? I've never heard of it and it's not in the Phobos source tree (I just checked).As others point out it is xmlp, I've been using it for the last year or two, and recently seen an improvement on performance (as mentioned). I don't know how comparable it is, and Richard seems to see a slow down from GC, for me disabling the GC during load doesn't change load time, but I'm not using the document loader. I hope Michael will put some effort into getting it ready for review. But still, people should use it and get him some incentive to do the work.
Feb 02 2012
On 02/02/2012 17:11, Jesse Phillips wrote:for me disabling the GC during load doesn't change load time, but I'm not using the document loader.The GC hit is related to the number of dom nodes that exist at one i think -> the visitor approach doesnt allocate the whole dom tree, so there are far fewer items (and less allocated memory for the gc to scan). For comparison, parsing my test file using XmlVisitor takes less than 3 seconds (over twice as fast as the DOM version). I looked in to it a bit and found this: http://www.dsource.org/projects/xmlp/ticket/10 Seems that it's calling GC.qmalloc 350000+ times, mostly for 16byte blocks, when normalizing attributes. This doesn't seem hugely clever :-(
Feb 02 2012
Speaking of GC improvements, I googled around a bit and found this thread from 2 years ago: http://www.digitalmars.com/d/archives/digitalmars/D/Is_there_a_modern_GC_for_D_105967.html Especially interesting is this paper: Nonintrusive Cloning Garbage Collector with Stock Operating System Support. http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps Has anything been done along these lines since that time? Seems like this particular GC algorithm is exactly the kind we need for D. It's a conservative mark-and-sweep algo with a very low overhead(*), mark phase concurrent with mutator thread(s), and lazy incremental sweeping at allocation time. Synchronization is automatically done by default OS kernel-space mechanisms (copy on write memory pages). More to the point, how easy/hard is it to switch between GCs in the current D implementation(s)? I think it would be helpful if these kinds of experimental GCs were available in addition to the current default GC, and people can play around with them and find out which one(s) are the cream of the crop. Otherwise we're just bottlenecked at a small number of people who can actually play with GC algos for D -- which means improvements will be slow. (*) True on *nix systems anyway, but judging from comments in that thread, Windoze also was acquiring equivalent functionality -- and this being 2 years ago, I'm assuming it's available now. --T
Feb 02 2012
On Thu, 02 Feb 2012 19:31:50 -0600, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:Speaking of GC improvements, I googled around a bit and found this thread from 2 years ago: http://www.digitalmars.com/d/archives/digitalmars/D/Is_there_a_modern_GC_for_D_105967.html Especially interesting is this paper: Nonintrusive Cloning Garbage Collector with Stock Operating System Support. http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps Has anything been done along these lines since that time? Seems like this particular GC algorithm is exactly the kind we need for D. It's a conservative mark-and-sweep algo with a very low overhead(*), mark phase concurrent with mutator thread(s), and lazy incremental sweeping at allocation time. Synchronization is automatically done by default OS kernel-space mechanisms (copy on write memory pages). More to the point, how easy/hard is it to switch between GCs in the current D implementation(s)? I think it would be helpful if these kinds of experimental GCs were available in addition to the current default GC, and people can play around with them and find out which one(s) are the cream of the crop. Otherwise we're just bottlenecked at a small number of people who can actually play with GC algos for D -- which means improvements will be slow. (*) True on *nix systems anyway, but judging from comments in that thread, Windoze also was acquiring equivalent functionality -- and this being 2 years ago, I'm assuming it's available now.So to answer your question, yes, someone has made one of these types of GC for D called CDGC. No, it doesn't look like Windows will support this anytime soon. And cloning GCs, don't solve the problems of large heaps, soft/hard realtime issues like game render threads, and it actually makes the embarrassingly long pause (when they happen) longer. Now, CDGC, as I understand it, is better written than our current GC and would probably improve things, but I don't see it as the final end goal.
Feb 02 2012
On Thu, Feb 02, 2012 at 08:48:20PM -0600, Robert Jacques wrote: [...]So to answer your question, yes, someone has made one of these types of GC for D called CDGC. No, it doesn't look like Windows will support this anytime soon. And cloning GCs, don't solve the problems of large heaps, soft/hard realtime issues like game render threads, and it actually makes the embarrassingly long pause (when they happen) longer. Now, CDGC, as I understand it, is better written than our current GC and would probably improve things, but I don't see it as the final end goal.To me, the final goal should be to make the GC swappable, that is, there's a way to tell the compiler which GC you want to use. After I got convinced about the benefits of GC by the article on garbage collection in DPLO, I started reading up a bit about different GC algorithms, and my conclusion is that there is no such thing as a one-size-fits-all GC. Every GC algo out there is sensitive to the kind of allocations the program makes, and the memory/speed constraints you're running under. So the real solution to GC performance problems is to make it swappable. Today, for instance, I found a book that talks about a hard real-time GC, which is very different from the paper I mentioned earlier. The point is, if you want hard real-time guarantees, it should be possible to tell the compiler to use a GC that gives you that, and if you're OK with soft real-time, then you can tell the compiler to use something like CDGC. And if you're writing a batch-processing program you can use a GC with a long STW pause but better overall throughput. Anyway, I found the blog of the guy who implemented CDGC, and it seems that it did quite well performance-wise at the end. Apparently it's in an experimental D2 branch; anyone tried to use it recently? T -- Questions are the beginning of intelligence, but the fear of God is the beginning of wisdom.
Feb 02 2012
On 2012-02-03 05:01, H. S. Teoh wrote:On Thu, Feb 02, 2012 at 08:48:20PM -0600, Robert Jacques wrote: [...]The GC is already swappable during linking. -- /Jacob CarlborgSo to answer your question, yes, someone has made one of these types of GC for D called CDGC. No, it doesn't look like Windows will support this anytime soon. And cloning GCs, don't solve the problems of large heaps, soft/hard realtime issues like game render threads, and it actually makes the embarrassingly long pause (when they happen) longer. Now, CDGC, as I understand it, is better written than our current GC and would probably improve things, but I don't see it as the final end goal.To me, the final goal should be to make the GC swappable, that is, there's a way to tell the compiler which GC you want to use. After I got convinced about the benefits of GC by the article on garbage collection in DPLO, I started reading up a bit about different GC algorithms, and my conclusion is that there is no such thing as a one-size-fits-all GC. Every GC algo out there is sensitive to the kind of allocations the program makes, and the memory/speed constraints you're running under. So the real solution to GC performance problems is to make it swappable. Today, for instance, I found a book that talks about a hard real-time GC, which is very different from the paper I mentioned earlier. The point is, if you want hard real-time guarantees, it should be possible to tell the compiler to use a GC that gives you that, and if you're OK with soft real-time, then you can tell the compiler to use something like CDGC. And if you're writing a batch-processing program you can use a GC with a long STW pause but better overall throughput. Anyway, I found the blog of the guy who implemented CDGC, and it seems that it did quite well performance-wise at the end. Apparently it's in an experimental D2 branch; anyone tried to use it recently? T
Feb 03 2012
On Fri, Feb 03, 2012 at 09:30:36AM +0100, Jacob Carlborg wrote: [...]The GC is already swappable during linking.[...] Is there documentation for this? T -- No! I'm not in denial!
Feb 03 2012
On Wednesday, 1 February 2012 at 18:33:58 UTC, Richard Webb wrote:Last night I tried loading a ~20 megabyte xml file using xmlp (via the DocumentBuilder.LoadFile function) and a recent dmd build, and found that it took ~48 seconds to complete, which is rather poor. I tried running it through a profiler, and that said that almost all the runtime was spent inside the garbage collector.Interesting, I've been using XmlVisitor on two 30meg and one 10meg, load time for all files being 30secs, this is actually an improvement from an older version of xmlp which took 60secs. Not sure if DocumentBuilder would be much different.
Feb 01 2012
On 01/02/2012 20:02, Jesse Phillips wrote:Interesting, I've been using XmlVisitor on two 30meg and one 10meg, load time for all files being 30secs, this is actually an improvement from an older version of xmlp which took 60secs. Not sure if DocumentBuilder would be much different.For reference, the file i was testing with has ~50000 root nodes, each of which has several children. The number of nodes seems to have a much larger effect on the speed that the amount of data. I tried it with an old version of KXML and that seems somewhat faster, although the latest version doesn't seem to build with 2.058 and it's a more basic library.
Feb 01 2012
On Wednesday, 1 February 2012 at 22:53:11 UTC, Richard Webb wrote:For reference, the file i was testing with has ~50000 root nodes, each of which has several children. The number of nodes seems to have a much larger effect on the speed that the amount of data.Sounds about right. For very small allocations sweeping time dominates the total GC time. You can see the breakdown at https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 . The Tree1 benchmark is the very small allocation benchmark. Sweeping takes time linear in the number of memory blocks allocated and, for blocks <1 page, constant time in the size of the blocks.
Feb 01 2012
On Thu, Feb 02, 2012 at 12:09:01AM +0100, dsimcha wrote:On Wednesday, 1 February 2012 at 22:53:11 UTC, Richard Webb wrote:Out of curiosity, is there a way to optimize for the "many small allocations" case? E.g., if a function allocates, as temporary storage, a tree with a large number of nodes, which becomes garbage when it returns. Perhaps a way to sweep the entire space used by the tree in one go? Not sure if such a thing is possible. T -- Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin FranklinFor reference, the file i was testing with has ~50000 root nodes, each of which has several children. The number of nodes seems to have a much larger effect on the speed that the amount of data.Sounds about right. For very small allocations sweeping time dominates the total GC time. You can see the breakdown at https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 . The Tree1 benchmark is the very small allocation benchmark. Sweeping takes time linear in the number of memory blocks allocated and, for blocks <1 page, constant time in the size of the blocks.
Feb 01 2012
On 2/2/12 12:44 AM, H. S. Teoh wrote:Out of curiosity, is there a way to optimize for the "many small allocations" case? E.g., if a function allocates, as temporary storage, a tree with a large number of nodes, which becomes garbage when it returns. Perhaps a way to sweep the entire space used by the tree in one go?The easiest way would be to use a specialized allocator for this – I'm thinking of David Simcha's proposed RegionAllocator (formerly TempAlloc). David
Feb 01 2012
dsimcha has done a good work on the GC. After this change I have seen that some of my GC-heavy programs are just a bit slower, while most of them are significantly faster, so overall it's a good improvement. In my opinion this is one of the best things that DMD 2.058 brings (short lambda syntax is another one of them, I am appreciating it a lot. Sometimes syntax sugar matters). Still, even with dsimcha improvements, I think the current D GC is almost a toy compared to some of the more advanced GCs that you are able to find around. So is it right to work on small (but useful) incremental improvements on the current GC, instead of working on creating a fully redesigned D GC (this is a large work)? I don't know. But surely even incremental improvements are welcome, because there are (or there were) enough low hanging fruits to pick. I don't know how the current D GC performance scales to heaps of 5 or 30 Gbytes size, composed of mostly small objects. From what I've seen this is a hard problem even for almost-state-of-the-art GCs. David Nadlinger:The easiest way would be to use a specialized allocator for this – I'm thinking of David Simcha's proposed RegionAllocator (formerly TempAlloc).I'd like a simple way to define a hierarchy of such allocations (and when you free a node of the tree, the whole sub-tree gets deallocated). Bye, bearophile
Feb 01 2012
On Wednesday, 1 February 2012 at 23:43:24 UTC, H. S. Teoh wrote:Out of curiosity, is there a way to optimize for the "many small allocations" case? E.g., if a function allocates, as temporary storage, a tree with a large number of nodes, which becomes garbage when it returns. Perhaps a way to sweep the entire space used by the tree in one go? Not sure if such a thing is possible. TMy RegionAllocator is probably the best thing for this if the lifetime is deterministic as you describe. I rewrote the Tree1 benchmark using RegionAllocator a while back just for comparison. D Tree1 + RegionAllocator had comparable speed to a Java version of Tree1 run under HotSpot. (About 6 seconds on my box vs. in the low 30s for Tree1 with the 2.058 GC.) If all the objects are going to die at the same time but not at a deterministic time, you could just allocate a big block from the GC and place class instances in it using emplace().
Feb 01 2012
On Wed, 01 Feb 2012 18:40:20 -0600, dsimcha <dsimcha yahoo.com> wrote:On Wednesday, 1 February 2012 at 23:43:24 UTC, H. S. Teoh wrote:An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.Out of curiosity, is there a way to optimize for the "many small allocations" case? E.g., if a function allocates, as temporary storage, a tree with a large number of nodes, which becomes garbage when it returns. Perhaps a way to sweep the entire space used by the tree in one go? Not sure if such a thing is possible. TMy RegionAllocator is probably the best thing for this if the lifetime is deterministic as you describe. I rewrote the Tree1 benchmark using RegionAllocator a while back just for comparison. D Tree1 + RegionAllocator had comparable speed to a Java version of Tree1 run under HotSpot. (About 6 seconds on my box vs. in the low 30s for Tree1 with the 2.058 GC.) If all the objects are going to die at the same time but not at a deterministic time, you could just allocate a big block from the GC and place class instances in it using emplace().
Feb 01 2012
On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.I had considered putting that in RegionAllocator but I was skeptical of the benefit, at least assuming we're targeting PCs and not embedded devices. The default segment size is 4MB. Trying to make the initial size any smaller won't save much memory. Four megabytes is also big enough that new segments would be allocated so infrequently that the cost would be negligible. I concluded that the added complexity wasn't justified.
Feb 02 2012
On 2 February 2012 17:40, dsimcha <dsimcha yahoo.com> wrote:On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:I don't know about the implications of your decision, but comment makes me feel uneasy. I don't know how you can possibly make that assumption? Have you looked around at the devices people actually use these days? PC's are an endangered and dying species... I couldn't imagine a worse assumption if it influences the application of D on different systems.An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.at least assuming we're targeting PCs and not embedded devices.
Feb 02 2012
On Thursday, 2 February 2012 at 18:06:24 UTC, Manu wrote:On 2 February 2012 17:40, dsimcha <dsimcha yahoo.com> wrote:I'm not saying that embedded isn't important. It's just that for low level stuff like memory management it requires a completely different mindset. RegionAllocator is meant to be fast and simple at the expense of space efficiency. In embedded you'd probably want completely different tradeoffs. Depending on how deeply embedded, space efficiency might be the most important thing. I don't know exactly what tradeoffs you'd want, though, since I don't do embedded development. My guess is that you'd want something completely different, not RegionAllocator plus a few tweaks that would complicate it for PC use. Therefore, I designed RegionAllocator for PCs with no consideration for embedded environments.On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:I don't know about the implications of your decision, but comment makes me feel uneasy. I don't know how you can possibly make that assumption? Have you looked around at the devices people actually use these days? PC's are an endangered and dying species... I couldn't imagine a worse assumption if it influences the application of D on different systems.An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.at least assuming we're targeting PCs and not embedded devices.
Feb 02 2012
On 2 February 2012 21:15, dsimcha <dsimcha yahoo.com> wrote:On Thursday, 2 February 2012 at 18:06:24 UTC, Manu wrote:Okay, this reasoning seems sound to me. I wonder though, is it easy/possible/compatible to plug a new back end in for embedded systems? It should definitely conserve memory at all costs, but above all, it needs to be fast. Short term embedded platforms will surely have some lenience with memory size, but demand high performance. What sort of overheads are we talking about for these allocators? I wonder what the minimum footprint of a D app would be, ie. exe size + minimum memory allocation for the runtime/etc?On 2 February 2012 17:40, dsimcha <dsimcha yahoo.com> wrote: On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:I'm not saying that embedded isn't important. It's just that for low level stuff like memory management it requires a completely different mindset. RegionAllocator is meant to be fast and simple at the expense of space efficiency. In embedded you'd probably want completely different tradeoffs. Depending on how deeply embedded, space efficiency might be the most important thing. I don't know exactly what tradeoffs you'd want, though, since I don't do embedded development. My guess is that you'd want something completely different, not RegionAllocator plus a few tweaks that would complicate it for PC use. Therefore, I designed RegionAllocator for PCs with no consideration for embedded environments.An XML parser would probably want some kind of stack segment growthI don't know about the implications of your decision, but comment makes me feel uneasy. I don't know how you can possibly make that assumption? Have you looked around at the devices people actually use these days? PC's are an endangered and dying species... I couldn't imagine a worse assumption if it influences the application of D on different systems.schedule, which, IIRC isn't supported by RegionAllocator.at least assuming we're targeting PCs and not embedded devices.
Feb 02 2012
On Thursday, February 02, 2012 20:06:14 Manu wrote:On 2 February 2012 17:40, dsimcha <dsimcha yahoo.com> wrote:PCs are not endangered in the least. It's just that we're getting an increase in other devices (particularly smart phones and tablets). PCs are _heavily_ used, and there's no way that smart phones or tablets could replace them. They do different stuff. It _is_ true that applications are increasingly being written for non-PCs, but PCs definitely aren't dying off. Also, how much do you really treat smart phones or tablets like embedded devices rather than PCs? They're certainly more like PCs than the embedded devices of yore. True, they have stricter performance requirements, but they're nowhere near as restrictive as they used to be. - Jonathan M DavisOn Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:I don't know about the implications of your decision, but comment makes me feel uneasy. I don't know how you can possibly make that assumption? Have you looked around at the devices people actually use these days? PC's are an endangered and dying species... I couldn't imagine a worse assumption if it influences the application of D on different systems.An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.at least assuming we're targeting PCs and not embedded devices.
Feb 02 2012
On 2/2/12, Manu <turkeyman gmail.com> wrote:PC's are an endangered and dying species...Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. Oh wait, that didn't happen.
Feb 02 2012
On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic wrote:On 2/2/12, Manu <turkeyman gmail.com> wrote:Agreed. I just recently got my first smartphone and I love it. I see it as a complement to a PC, though, not as a substitute. It's great for when I'm on the go, but when I'm at home or at work I like a bigger screen, a full keyboard, a faster processor, more memory, etc. Of course smartphones will get more powerful but I doubt any will ever have dual 22 inch monitors.PC's are an endangered and dying species...Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. Oh wait, that didn't happen.
Feb 02 2012
On Thu, Feb 02, 2012 at 08:20:55PM +0100, dsimcha wrote:On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic wrote:[...] Funny you should mention that, a number of years ago I got to know a guy who worked in a lab that was researching organic polymer-based electronics, which lets you build things like screens and keyboards that can be rolled up or folded and put into your pocket. It was still experimental technology at the time, though, and I'm not expecting it to be publicly available anytime soon. But the day may come when your smartphone *can* literally have a 22" monitor (that folds into a pocket sized card). T -- Famous last words: I wonder what will happen if I do *this*...On 2/2/12, Manu <turkeyman gmail.com> wrote:Agreed. I just recently got my first smartphone and I love it. I see it as a complement to a PC, though, not as a substitute. It's great for when I'm on the go, but when I'm at home or at work I like a bigger screen, a full keyboard, a faster processor, more memory, etc. Of course smartphones will get more powerful but I doubt any will ever have dual 22 inch monitors.PC's are an endangered and dying species...Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. Oh wait, that didn't happen.
Feb 02 2012
But the day may come when your smartphone *can* literally have a 22" monitor (that folds into a pocket sized card). TThis doesn't sound very practical.
Feb 02 2012
On 02/02/2012 19:32, H. S. Teoh wrote:But the day may come when your smartphone *can* literally have a 22" monitor (that folds into a pocket sized card).Nah, what you really need are those holographic displays they use on CSI Miami. No need for physical displays at all ;-)
Feb 02 2012
On 02/02/2012 08:32 PM, H. S. Teoh wrote:On Thu, Feb 02, 2012 at 08:20:55PM +0100, dsimcha wrote:Chances are that region allocator can run on mobile phones just fine at the time that happens. ;)On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic wrote:[...] Funny you should mention that, a number of years ago I got to know a guy who worked in a lab that was researching organic polymer-based electronics, which lets you build things like screens and keyboards that can be rolled up or folded and put into your pocket. It was still experimental technology at the time, though, and I'm not expecting it to be publicly available anytime soon. But the day may come when your smartphone *can* literally have a 22" monitor (that folds into a pocket sized card). TOn 2/2/12, Manu<turkeyman gmail.com> wrote:Agreed. I just recently got my first smartphone and I love it. I see it as a complement to a PC, though, not as a substitute. It's great for when I'm on the go, but when I'm at home or at work I like a bigger screen, a full keyboard, a faster processor, more memory, etc. Of course smartphones will get more powerful but I doubt any will ever have dual 22 inch monitors.PC's are an endangered and dying species...Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. Oh wait, that didn't happen.
Feb 02 2012
Long term I suspect the typical desktop experience will just be a phone or o= ther mobile device talking wirelessly to input and display peripherals. On Feb 2, 2012, at 11:20 AM, "dsimcha" <dsimcha yahoo.com> wrote:On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic wrote:s a complement to a PC, though, not as a substitute. It's great for when I'= m on the go, but when I'm at home or at work I like a bigger screen, a full k= eyboard, a faster processor, more memory, etc. Of course smartphones will g= et more powerful but I doubt any will ever have dual 22 inch monitors.On 2/2/12, Manu <turkeyman gmail.com> wrote:=20 Agreed. I just recently got my first smartphone and I love it. I see it a=PC's are an endangered and dying species...=20 Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. =20 Oh wait, that didn't happen.
Feb 02 2012
People and, more importantly, professionals will always want the most power they can get to do their work. Whether that power remains on the client system or is off-set to the home/office server or cloud is the big question. I think the future will see an increase of local servers which intelligently allocate resources to transport-friendly modular clients devices in both homes and businesses.
Feb 02 2012
On Thu, Feb 02, 2012 at 01:22:54PM -0800, Sean Kelly wrote:Long term I suspect the typical desktop experience will just be a phone or other mobile device talking wirelessly to input and display peripherals.[...] Funny, it comes full-circle to the good ole days of dumb X terminals that only handle input/display, and leave the actual computing to the server. Except now the server is no longer an oversized ceiling-high machine in the server room, but a tablet you can fit into your pocket. :) However, due to the inconvenience of needing to be near a bulky monitor and keyboard, I suspect rather that the mobile device will come with a built-in projector and foldable keyboard. The latter already exists; the former allows the convenience of usability anywhere there's a large flat surface. T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
Feb 02 2012
On 2 February 2012 21:20, dsimcha <dsimcha yahoo.com> wrote:On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic wrote:You're obviously a nerd though, hanging out in a place like this :) There is no small number of people who use their PC's purely for communication, facebook, emails, skype, etc. and there are well documented trends showing more and more people are performing those tasks exclusively from their phones/portable devices. They are now, already, ex-pc-users. That trend is accelerating (I wish I could remember where the slides wereOn 2/2/12, Manu <turkeyman gmail.com> wrote:Agreed. I just recently got my first smartphone and I love it. I see it as a complement to a PC, though, not as a substitute. It's great for when I'm on the go, but when I'm at home or at work I like a bigger screen, a full keyboard, a faster processor, more memory, etc. Of course smartphones will get more powerful but I doubt any will ever have dual 22 inch monitors.PC's are an endangered and dying species...Kind of like when we got rid of cars and trains and ships once we started making jumbo jets. Oh wait, that didn't happen._<)My girlfriend for instance, I bought an iPad to do some dev on... she confiscated it one day after I showed her the eBook app, which she loved, and promptly threw away her whole hard copy library... Funnily though, and completely unconsciously on her part, I haven't seen her open her computer more than once or twice since that time. It has everything on it she ever used her computer for, it's more portable and convenient, fits in her bag (takes it to class), and it even has a recipe app and conveniently sits in the bench while she's baking :) I have had the same discussion in the office, and I'm not the only one who has seem a similar transition first hand. Most people aren't computer nerds... they don't give 2 shits about computers, it's just a tool, and the second something more practical or convenient comes along for their purposes, they'll use that instead. That time has already past, the trend is accelerating quickly, the snowball is rolling, even REAL nerds are starting to slip. (I must confess, I'm seriously tempted by a transformer prime...) I think the MOST interesting part about this trend though, is that many of these people who used their laptop/pc *exclusively* for communication have now discovered these mobile app stores, and they're being exposed to apps that do all sorts of things. Far more than they ever did on their PC's of days past. Even games; people who would say they have no interest in video games at all are being shown to be picking up simple mobile/tablet games at an alarming rate. This is a huge and fast growing software market, almost all the software is written from scratch (read: developers *could* choose D if it were available), and D should be a part of it! The language landscape in this space is very turbulent at the moment, but it will settle, and D needs to be in it before it does, or it will miss the boat.
Feb 02 2012
On 2 February 2012 23:22, Sean Kelly <sean invisibleduck.org> wrote:Long term I suspect the typical desktop experience will just be a phone or other mobile device talking wirelessly to input and display peripherals.'Long term'? I think you're being a bit pessimistic. It's already here! http://eee.asus.com/eeepad/transformer-prime/features/ (do want!) I give it... maybe 2-3 years before it's standard (a very significant portion of the hardware landscape). I don't think that's very long... certainly not a lot of time for D to get its shit together! ;)
Feb 02 2012
On 2 February 2012 20:13, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Thursday, February 02, 2012 20:06:14 Manu wrote:They're restrictive devices with weaker hardware, yet people seem to generally expect a roughly equivalent experience from them as they get from their PC. A modern PC's software can be written in any language the programmer likes, and expect that the result will generally run well. Performance is almost a 'solved' problem on PC's for most applications. I really just argue that D, a so called systems language [this is surely D's primary offering/niche, or am I mistaken? It basically competes with C/C++, and on roughly the same terms], needs to constantly consider (and potentially even favour) the performance requirements of these weaker systems. They are the devices with the highest demand for efficiency placed on them. For productivity applications, PC's will remain dominant, sure, but consider the number of productivity applications vs the number of entertainment products/games/apps/toys. The scope doesn't even compare. Also consider that productivity software is rarely written from scratch. Most productivity software people use is already written, only updated now, and isn't going to restart in D any time soon. Games/apps/toys, in terms of number of pieces of software developed, and number of developers writing such software (read: **NEW** software, where the choice of using a new language like D is a *real* possibility), completely blows the PC developer base SOOO far out of the water it's barely quantifiable, probably thousands to 1. That developer base are not working on x86 platforms, they're targeting games consoles, phones, tablets... TV's will be big very soon. They are a large community, grown very tired of C/C++, want a language that's not shit, and that is exactly as efficient as C/C++ when used correctly. D fits this bill like a glove, and also has the advantage of being compatible with their existing libraries. Embedded hardware toyed with the notion of using managed languages for a little while, not even offering C toolchains at first (Android, Windows Phone), but that failed, and both revoked that policy. The bar is raising fast on those platforms. Everyone producing high end products required native development to remain competitive. There's really not a lot of modern native language competition out there, D is positioned well, and with this fact in mind, D's performance must be as comparable as possible to C/C++. In fact, D has potential for many performance *advantages* over C/C++, but the GC is a key concern for me personally... and if possible to tune it for these systems, it should be. They require it more, and PC honestly won't care that much :) Apologies for once again derailing an unrelated topic! :) I really just can't handle seeing this sort of presumption constantly popping up all over the news group. This PC-centric mentality has to go if D is to have any commercial success. The commercial software industry moved on years ago, it's not on PC anymore... the D community needs to become intimately aware of that fact, whether they like it or not. (presuming of course that the intent here if for D to be successful :)On 2 February 2012 17:40, dsimcha <dsimcha yahoo.com> wrote:meOn Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:I don't know about the implications of your decision, but comment makesAn XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.at least assuming we're targeting PCs and not embedded devices.feel uneasy. I don't know how you can possibly make that assumption? Have you looked around at the devices people actually use these days? PC's are an endangered and dying species... I couldn't imagine a worse assumption if it influences the application of D on different systems.PCs are not endangered in the least. It's just that we're getting an increase in other devices (particularly smart phones and tablets). PCs are _heavily_ used, and there's no way that smart phones or tablets could replace them. They do different stuff. It _is_ true that applications are increasingly being written for non-PCs, but PCs definitely aren't dying off. Also, how much do you really treat smart phones or tablets like embedded devices rather than PCs? They're certainly more like PCs than the embedded devices of yore. True, they have stricter performance requirements, but they're nowhere near as restrictive as they used to be.
Feb 02 2012
Out of interest, i just tried loading the same file with std.xml, and the performance there is pretty similar in each version, possibly slightly slower in 2.058 (~21-22 seconds in each case). Disabling the GC during the load gets 9 seconds, though task manager reports a peak memory usage of almost 600 megabytes in that case! It looks like most of the time here is spent in Gcxmark whereas with xmlp it was in Gcxfullcollect (and fullcollect is the one that is faster in 2.058). The profiler makes it look like things are spending more time in Gcxmark than they were before. Is that the case? I'll try to have a go with Tango when i get some more time.
Feb 02 2012
On Thu, 02 Feb 2012 15:44:56 +0000, Richard Webb wrote:With the xml package (xmlp) , and the linked node DOM, the GC is likely to fail cleanup. I divided the generated test file, with its 2 layer elements, into 5, 50, 500, 5000 sized files. I put in a mixin on the Node class to do static counts of Node objects created and deleted. I also set the document to be read multiple times, calling GC.collect between each one, and timed it, and the increase in read time, and the increase in GC.collect time, was strong evidence that the GC was unable to clean it up, even if, to my naive code inspection, the previous document instance was no longer reachable from GC "roots", ie, should have been directly replaced on stack. The linkdom, rather too faithfully copies the java dom model, with double- linked pointers. Elements have an "owner document" (I felt like ditching this). All element children derive from ChildNode (Element, Text, CDATA and Comment), and have a pointer to the parent_ node. (could ditch this, maybe for CharData). Anyway, its a pretty linked up conventional model. Java GC, evidently has got around problems of GC xml documents. When I looked at C code for libxml2, for instance, all the Nodes have that sort of interlocked linkage. I tried once to see how far I could get (on windows) calling the libxml2 from D. I did not get very far, and gave up after crashes and a headache. On the GC cleanup stats, with 5 nodes of the generated test was cleanable, but on 50,000, well, the poor GC could not delete a single Node object. I've only done this with dmd2_32 on windows. 64 bit may be different. Its also Github release of 2.058 - 162, and may not be in releasable state for GC. The original std.xml seemed to fair best, on GC cleanup, but still developed troubles on each larger document size. std.xmlp.arraydom was similar. Both of these have no back pointers to parents, use an array to hold children. I thought of just nulling the backpointers, and wrote a version of an explode method. I tried this with the node version doing, or not doing a delete. With a big document, it currently has to do a delete, or the GC will not clean up everything. I turned the gc stats counters, into a mixin template, and put them on the std.xml1, and on the std.arraydom, and any intermediate objects from the parsing, that I wanted to confirm were not hanging around. For to many of them, I found it necessary to chase them down, and force delete (I now prefer explode, because delete implies a complete cleanup, and for me, explde means to split into little non-referencing pieces, that the GC can manage). This means I use too many pointer linked structures for the GC. It may be just as difficult for RedBlack tree in std.container, I haven't checked. For me one lesson at least, is, do not assume a set of interlocked classes, with circular references, can always be isolated in a single full GC collection, and properly deleted. For internal temporaries, that will never be seen by external code, it seems a good idea to explode them. For large complex tree structures, it might be less CPU time to explode them, rather than this particular version of GC having to work it out with less information. Do explode again. After exploding, repeated runs and bigger loads ran optimally. So I read with interest other posts on mechanisms of GC and how to integrate with the compiler to tag classes and stack layout, to know pointers precisely. I particularly liked the suggestion for mixin on each class, for GC layout. What could be done, even adhoc, is to do GC stats on various kinds of complex self referring classes, to examine the effectiveness of GC collection. This could even be a compiler flag, or version. It was useful for me to spot where I could reuse a class rather than reconstruct new. A lot of D applications may be run once, and GC doesn't matter so much. But for long running games, and servers, verified GC behaviour will bring more confidence. Aggressive measurement may allow better detection of where and when the GC fails, and allow better assessment of effects of GC code changes. GC is a tough problem. A hard one to write a whole language based around it.
Feb 25 2012