digitalmars.D - Java moves to copying for substrings
- Jesse Phillips (11/11) Nov 18 2013 Somewhat interesting, Java has chosen to make substring result in
- monarch_dodra (12/14) Nov 18 2013 I think it is pretty important to remember that slicing, while
- deadalnix (4/19) Nov 19 2013 The problem with Java's string is that you don't have enough
- Walter Bright (4/11) Nov 19 2013 And D gives you that choice. You can "slice & hold" or "slice & dup". Yo...
- bearophile (11/23) Nov 19 2013 I presume in Java slices weren't very common, unlike in D. So I
- bearophile (4/5) Nov 19 2013 Please ignore this part. Some answers of that Reddit thread show
- Jonathan M Davis (13/30) Nov 19 2013 Yikes. Maybe that's a good idea for Java for some reason, but I'd consid...
- Brian Schott (8/22) Nov 19 2013 I ended up doing this in DCD. The lexing step sliced the source
- Walter Bright (5/8) Nov 19 2013 D doesn't need to change its approach at all because it offers both opti...
- Jonathan M Davis (6/8) Nov 19 2013 Good point.
Somewhat interesting, Java has chosen to make substring result in a copy of the string data rather than returning a window of the underlying chars. http://www.reddit.com/r/programming/comments/1qw73v/til_oracle_changed_the_internal_string/ "reduce the size of String instances. [...] This was the trigger." "avoid memory leakage caused by retained substrings holding the entire character array." So apparently substrings were considered a common cause of memory leaks. I got the impression most of the comments agreed the result is good, but changing the complexity is bad. I'm not advocating such a change for D.
Nov 18 2013
On Tuesday, 19 November 2013 at 05:38:14 UTC, Jesse Phillips wrote:So apparently substrings were considered a common cause of memory leaks.I think it is pretty important to remember that slicing, while giving you a small view, still holds the entire array. I think there is nothing wrong with pipping a ".dup" every now and then, after slicing something. As a matter of fact, I've been playing around with transcoding strings (UTF-8/16/32). I start by allocating a large buffer to write into. When I'm done, I look at the buffer's useage, and if it's too low, I return a dup of the buffer slice, allowing the GC to reclaim the original buffer. Not only does this take up less memory, but overall, I actually get better run-times too (!)
Nov 18 2013
On Tuesday, 19 November 2013 at 07:36:29 UTC, monarch_dodra wrote:On Tuesday, 19 November 2013 at 05:38:14 UTC, Jesse Phillips wrote:The problem with Java's string is that you don't have enough control to do this. Instead of giving extra control, because it is obviously necessary, they dumb down the thing even more.So apparently substrings were considered a common cause of memory leaks.I think it is pretty important to remember that slicing, while giving you a small view, still holds the entire array. I think there is nothing wrong with pipping a ".dup" every now and then, after slicing something. As a matter of fact, I've been playing around with transcoding strings (UTF-8/16/32). I start by allocating a large buffer to write into. When I'm done, I look at the buffer's useage, and if it's too low, I return a dup of the buffer slice, allowing the GC to reclaim the original buffer. Not only does this take up less memory, but overall, I actually get better run-times too (!)
Nov 19 2013
On 11/18/2013 11:36 PM, monarch_dodra wrote:On Tuesday, 19 November 2013 at 05:38:14 UTC, Jesse Phillips wrote:And D gives you that choice. You can "slice & hold" or "slice & dup". Your choice, as the circumstances dictate. With Java, there is no choice, and they are stuck with one size fits all.So apparently substrings were considered a common cause of memory leaks.I think it is pretty important to remember that slicing, while giving you a small view, still holds the entire array. I think there is nothing wrong with pipping a ".dup" every now and then, after slicing something.
Nov 19 2013
Jesse Phillips:Somewhat interesting, Java has chosen to make substring result in a copy of the string data rather than returning a window of the underlying chars.I presume in Java slices weren't very common, unlike in D. So I think this is the right design choice for Java (also because those Java strings are too much large, four instance fields), but D is better designed as it is. On the other hand the idea of putting the hash code inside the string in D was not discussed enough :-) From the discussion, Dmd associative arrays were designed like this:In Java 8 an improved solution devised by Doug Lea is used. In this solution colliding but Comparable Map keys are placed in a tree rather than a linked listed. Performance degenerates to O(log n) for the collisions but this is usually small unless someone is creating keys which intentionally collide or has a very, very bad hashcode implementation, ie. "return 3".<I am reminded of a denial of service attack that used intentionally colliding request field names/values to attack web servers and bringing down servers to their knees.<Bye, bearophile
Nov 19 2013
I presume in Java slices weren't very common,Please ignore this part. Some answers of that Reddit thread show that some people slice a lot in Java too :-) Bye, bearophile
Nov 19 2013
On Tuesday, November 19, 2013 06:38:12 Jesse Phillips wrote:Somewhat interesting, Java has chosen to make substring result in a copy of the string data rather than returning a window of the underlying chars. http://www.reddit.com/r/programming/comments/1qw73v/til_oracle_changed_the_i nternal_string/ "reduce the size of String instances. [...] This was the trigger." "avoid memory leakage caused by retained substrings holding the entire character array." So apparently substrings were considered a common cause of memory leaks. I got the impression most of the comments agreed the result is good, but changing the complexity is bad. I'm not advocating such a change for D.Yikes. Maybe that's a good idea for Java for some reason, but I'd consider slicing strings to be a _huge_ strength of D. Still, Java's situation is rather different, because all of the slicing stuff is more of an implementation detail than a core feature like it is in D. It _is_ true however that if you're not careful about it, you can end up with a lot of slices that keep whole blocks of memory from being collected when they don't really need to refer to that memory anymore. So, depending on what profiling shows, some applications may need to make adjustments to avoid having slices keep too much extraneous memory from being collected. So, it's something to keep in mind, but I defintely don't think that we should be changing our approach at all. - Jonathan M Davis
Nov 19 2013
On Tuesday, 19 November 2013 at 10:38:06 UTC, Jonathan M Davis wrote:It _is_ true however that if you're not careful about it, you can end up with a lot of slices that keep whole blocks of memory from being collected when they don't really need to refer to that memory anymore. So, depending on what profiling shows, some applications may need to make adjustments to avoid having slices keep too much extraneous memory from being collected. So, it's something to keep in mind, but I defintely don't think that we should be changing our approach at all. - Jonathan M DavisI ended up doing this in DCD. The lexing step sliced the source code, so when caching autocompletion information for all of Phobos, Druntime, etc, the memory usage would get fairly large. Adding a few ".dup" calls in the phase that converts the AST to the autocompletion cache structures greatly reduced the memory usage.
Nov 19 2013
On 11/19/2013 2:37 AM, Jonathan M Davis wrote:So, it's something to keep in mind, but I defintely don't think that we should be changing our approach at all.D doesn't need to change its approach at all because it offers both options - the user can choose. Note that the article says that some Java apps improved with this change, and some degraded. There is no correct answer.
Nov 19 2013
On Tuesday, November 19, 2013 13:54:37 Walter Bright wrote:D doesn't need to change its approach at all because it offers both options - the user can choose.Good point. While I wouldn't say that D's string handling is perfect, it's by far the best that I've ever dealt with, and I think that it's one of its great strengths and easily one of the things that I miss the most when I program in C++. - Jonathan M Davis
Nov 19 2013