digitalmars.D - Re: LRU cache for ~=

Jason House <jason.james.house gmail.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

dsimcha <dsimcha yahoo.com> Oct 19 2009

Fawzi Mohamed <fmohamed mac.com> Oct 19 2009
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

dsimcha <dsimcha yahoo.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Sean Kelly <sean invisibleduck.org> Oct 19 2009
dsimcha <dsimcha yahoo.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

dsimcha <dsimcha yahoo.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

dsimcha <dsimcha yahoo.com> Oct 19 2009

Sean Kelly <sean invisibleduck.org> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Rainer Deyke <rainerd eldwood.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009
Rainer Deyke <rainerd eldwood.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Don <nospam nospam.com> Oct 20 2009

Walter Bright <newshound1 digitalmars.com> Oct 19 2009

Brad Roberts <braddr bellevue.puremagic.com> Oct 19 2009

Christopher Wright <dhasenan gmail.com> Oct 20 2009

Fawzi Mohamed <fmohamed mac.com> Oct 20 2009

Rainer Deyke <rainerd eldwood.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Rainer Deyke <rainerd eldwood.com> Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 19 2009

Rainer Deyke <rainerd eldwood.com> Oct 19 2009

"Denis Koroskin" <2korden gmail.com> Oct 19 2009
Bill Baxter <wbaxter gmail.com> Oct 19 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Oct 20 2009
"Robert Jacques" <sandford jhu.edu> Oct 20 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Oct 20 2009
"Robert Jacques" <sandford jhu.edu> Oct 20 2009

Sean Kelly <sean invisibleduck.org> Oct 19 2009
Leandro Lucarella <llucax gmail.com> Oct 19 2009
Leandro Lucarella <llucax gmail.com> Oct 19 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 I just wrote this to Sean and Walter and subsequently discussed it with 
 Walter. Walter thinks this should work. Does anyone have the time and 
 inclination to test this out? It would involve hacking into druntime's 
 implementation of ~= (I'm not sure what the function name is). I'd 
 really appreciate this; I'm overloaded as it is.
 
 ==================
 
 In wake of the recent demise of T[new], I was thinking of finding ways 
 of making ~= efficient for T[].
 
 Currently ~= is slow because accessing GC.sizeOf(void*) acquires a 
 global lock and generally must figure out a lot of things about the 
 pointer to make a decision.
 
 Also, ~= is dangerous because it allows slices to stomp over other slices.
 
 I was thinking of solving these issues by keeping an LRU (Least Recently 
 Used) cache inside the implementation of ~=. The LRU would only have a 
 few entries (4-8) and would store the parameters of the last ~= calls, 
 and their cached capacities.


Shouldn't it be MRU (Most Recently Used)?

I assume ~= will be disallowed for shared arrays?

Will all arrays overallicate when created in order to allow efficient appending?

I feel like bolting on ~= to slices gives mediocre performance and a lot of
gotchas. Some users may expect slices to be reference copied arrays and want to
see appended data in other slices. Some may get away with mutating the array in
the MRU or a slice and seeing the changes, only to get bitten later. A separate
type really makes more sense to me...

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Andrei Alexandrescu Wrote:
 
 I just wrote this to Sean and Walter and subsequently discussed it
 with Walter. Walter thinks this should work. Does anyone have the
 time and inclination to test this out? It would involve hacking
 into druntime's implementation of ~= (I'm not sure what the
 function name is). I'd really appreciate this; I'm overloaded as it
 is.
 
 ==================
 
 In wake of the recent demise of T[new], I was thinking of finding
 ways of making ~= efficient for T[].
 
 Currently ~= is slow because accessing GC.sizeOf(void*) acquires a
  global lock and generally must figure out a lot of things about
 the pointer to make a decision.
 
 Also, ~= is dangerous because it allows slices to stomp over other
 slices.
 
 I was thinking of solving these issues by keeping an LRU (Least
 Recently Used) cache inside the implementation of ~=. The LRU would
 only have a few entries (4-8) and would store the parameters of the
 last ~= calls, and their cached capacities.


 Shouldn't it be MRU (Most Recently Used)?


Yes. LRU would be the eviction strategy.

 I assume ~= will be disallowed for shared arrays?


I think so.

 Will all arrays overallicate when created in order to allow efficient
 appending?


Upon the first append.

 I feel like bolting on ~= to slices gives mediocre performance and a
 lot of gotchas.


You may be right about the gotchas, but you better produce some numbers 
wrt performance.

 Some users may expect slices to be reference copied
 arrays and want to see appended data in other slices. Some may get
 away with mutating the array in the MRU or a slice and seeing the
 changes, only to get bitten later. A separate type really makes more
 sense to me...


Making ~= work well for slices does not preclude creating a distinct 
library type.


Andrei

Oct 19 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Making ~= work well for slices does not preclude creating a distinct
 library type.
 Andrei


Yes, I see LRU/MRU/whatever as a kludge to allow convenience without being
unsafe
egregiously inefficient.  The fact that it is a kludge is not a criticism
because
there is clearly no easy answer, and thus a kludge is genuinely necessary.
However, I think there needs to be a separate array builder type for "heavy
duty"
appending operations.  In TDPL I would just say that concatenating slices can be
inefficient and that people should use array builder for heavy duty appending,
length changing, etc., w/o getting into the details.

Oct 19 2009

Fawzi Mohamed <fmohamed mac.com> writes:

On 2009-10-20 00:13:05 +0200, dsimcha <dsimcha yahoo.com> said:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Making ~= work well for slices does not preclude creating a distinct
 library type.
 Andrei


 Yes, I see LRU/MRU/whatever as a kludge to allow convenience without 
 being unsafe
 egregiously inefficient.  The fact that it is a kludge is not a 
 criticism because
 there is clearly no easy answer, and thus a kludge is genuinely necessary.
 However, I think there needs to be a separate array builder type for 
 "heavy duty"
 appending operations.  In TDPL I would just say that concatenating 
 slices can be
 inefficient and that people should use array builder for heavy duty appending,
 length changing, etc., w/o getting into the details.


I fully agree

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Making ~= work well for slices does not preclude creating a distinct
 library type.
 Andrei


 Yes, I see LRU/MRU/whatever as a kludge to allow convenience without being
unsafe
 egregiously inefficient.  The fact that it is a kludge is not a criticism
because
 there is clearly no easy answer, and thus a kludge is genuinely necessary.
 However, I think there needs to be a separate array builder type for "heavy
duty"
 appending operations.  In TDPL I would just say that concatenating slices can
be
 inefficient and that people should use array builder for heavy duty appending,
 length changing, etc., w/o getting into the details.


But how about the opposite view. What if the *previous* implementation 
of GC and ~= were a kludge that led to egregious inefficiency and 
revolting unsafety, kludge that that now is getting fixed.

What I'm saying is that with the cache in place we'll have slices that 
are safe and efficient. Right now they are not safe and not efficient. I 
can hardly find reasons to characterize the new state of affairs as kludgey.

One surprising (but safe) behavior that remains with slices is this:

void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
}

The caller may or may not see 42 in the first slot after the call.


Andrei

Oct 19 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Making ~= work well for slices does not preclude creating a distinct
 library type.
 Andrei


 Yes, I see LRU/MRU/whatever as a kludge to allow convenience without being
unsafe
 egregiously inefficient.  The fact that it is a kludge is not a criticism
because
 there is clearly no easy answer, and thus a kludge is genuinely necessary.
 However, I think there needs to be a separate array builder type for "heavy
duty"
 appending operations.  In TDPL I would just say that concatenating slices can
be
 inefficient and that people should use array builder for heavy duty appending,
 length changing, etc., w/o getting into the details.


 of GC and ~= were a kludge that led to egregious inefficiency and
 revolting unsafety, kludge that that now is getting fixed.
 What I'm saying is that with the cache in place we'll have slices that
 are safe and efficient. Right now they are not safe and not efficient. I
 can hardly find reasons to characterize the new state of affairs as kludgey.
 One surprising (but safe) behavior that remains with slices is this:
 void fun(int[] a) {
     a[0] = 0;
     a ~= 42;
     a[0] = 42;
 }
 The caller may or may not see 42 in the first slot after the call.
 Andrei


Started playing w/ the implementation a little and I see a problem.  What about
the garbage collector?  There are two possibilities:

1.  The pointer gets stored as a pointer.  The array never gets freed until it
gets evicted from the cache.  If it's a huge array, this is very bad.

2.  The pointer gets cast to a size_t so it doesn't look like a reference. 
Then,
we can have something like:

string foo;
foreach(i; 0..5) {
    foo ~= 'a';
}
foo = null;
GC.collect;

// Now we have (foo.ptr, 5) in our LRU cache, but foo has been GC'd.
string bar = "123456789";  // bar gets the memory reclaimed from foo.
string baz = bar[0..5];  // Same length, ptr as foo.
baz ~= '1';              // Finds stale information from foo in cache.
writeln(bar);  // Prints [1 2 3 4 5 1 7 8 9].

The only possible solutions I see would be to have the GC know everything about
the LRU cache and evict stale entries (probably slows down GC a lot, a huge PITA
to implement, couples things that shouldn't be tightly coupled), or clear the
cache every time GC is run (probably would make appending so slow as to defeat
the
purpose of having the cache).

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What about
 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything about
 the LRU cache and evict stale entries (probably slows down GC a lot, a huge
PITA
 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to defeat
the
 purpose of having the cache).


I think GC.collect may simply evict the entire cache. The collection 
cycle costs so much, the marginal cost of losing cached information is 
lost in the noise.

Andrei

Oct 19 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:
 
 I think GC.collect may simply evict the entire cache. The collection 
 cycle costs so much, the marginal cost of losing cached information is 
 lost in the noise.


Right.  The GC already has an LRU cache of size 1 (ie. LRU entry) for
GC.sizeOf() and GC.query(), and these values are cleared on a collection by
necessity, since pages can be repurposed if they're been completely emptied.

Oct 19 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What about
 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything about
 the LRU cache and evict stale entries (probably slows down GC a lot, a huge
PITA
 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to defeat
the
 purpose of having the cache).


 cycle costs so much, the marginal cost of losing cached information is
 lost in the noise.
 Andrei


But then you have to copy the whole array again, likely triggering another GC if
the array is large.  Then things really get ugly as, for all practical purposes,
you've completely done away with the cache.

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What about
 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything about
 the LRU cache and evict stale entries (probably slows down GC a lot, a huge
PITA
 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to defeat
the
 purpose of having the cache).


 cycle costs so much, the marginal cost of losing cached information is
 lost in the noise.
 Andrei


 But then you have to copy the whole array again, likely triggering another GC
if
 the array is large.  Then things really get ugly as, for all practical
purposes,
 you've completely done away with the cache.


This happens whether or not a cache is in use.

Andrei

Oct 19 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What about
 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything about
 the LRU cache and evict stale entries (probably slows down GC a lot, a huge
PITA
 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to








 purpose of having the cache).


 cycle costs so much, the marginal cost of losing cached information is
 lost in the noise.
 Andrei


 But then you have to copy the whole array again, likely triggering another GC
if
 the array is large.  Then things really get ugly as, for all practical
purposes,
 you've completely done away with the cache.


 Andrei


But the array isn't guaranteed to get reallocated immediately after *every* GC
run.  If you're appending to a huge array, the GC will likely run several times
while you're appending, leading to several unnecessary reallocations.  Each of
those unnecessary reallocations will increase the memory footprint of your
program, possibly triggering another GC run and wiping out your cache again in
short order, until, for sufficiently large arrays,

a ~= b;

is almost equivalent to

a = a ~ b;

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What about
 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything about
 the LRU cache and evict stale entries (probably slows down GC a lot, a huge
PITA
 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to








 purpose of having the cache).


 cycle costs so much, the marginal cost of losing cached information is
 lost in the noise.
 Andrei


 the array is large.  Then things really get ugly as, for all practical
purposes,
 you've completely done away with the cache.


 Andrei


 But the array isn't guaranteed to get reallocated immediately after *every* GC
 run.  If you're appending to a huge array, the GC will likely run several times
 while you're appending, leading to several unnecessary reallocations. 


I don't think I understand this.

1. Request for an append comes that runs out of memory

2. GC runs and clears memory

3. Array is reallocated and the capacity cached.

No?

 Each of
 those unnecessary reallocations will increase the memory footprint of your
 program, possibly triggering another GC run and wiping out your cache again in
 short order, until, for sufficiently large arrays,
 
 a ~= b;
 
 is almost equivalent to
 
 a = a ~ b;


I don't understand how the cache makes that all worse.


Andrei

Oct 19 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Started playing w/ the implementation a little and I see a problem.  What












 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know everything












 the LRU cache and evict stale entries (probably slows down GC a lot, a












 to implement, couples things that shouldn't be tightly coupled), or clear the
 cache every time GC is run (probably would make appending so slow as to








 purpose of having the cache).


 cycle costs so much, the marginal cost of losing cached information is
 lost in the noise.
 Andrei


 the array is large.  Then things really get ugly as, for all practical
purposes,
 you've completely done away with the cache.


 Andrei


 But the array isn't guaranteed to get reallocated immediately after *every* GC
 run.  If you're appending to a huge array, the GC will likely run several times
 while you're appending, leading to several unnecessary reallocations.


 1. Request for an append comes that runs out of memory
 2. GC runs and clears memory
 3. Array is reallocated and the capacity cached.
 No?


This is entirely correct.

 Each of
 those unnecessary reallocations will increase the memory footprint of your
 program, possibly triggering another GC run and wiping out your cache again in
 short order, until, for sufficiently large arrays,

 a ~= b;

 is almost equivalent to

 a = a ~ b;


 Andrei


The cache doesn't make anything *worse* than with no cache.  The only point I'm
trying to make is that, for large arrays, if the GC clears the cache every time
it
runs, things would start to get *almost as bad as* having no cache because the
copy operations become expensive and the GC may run frequently.

Oct 19 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:
 
 What I'm saying is that with the cache in place we'll have slices that 
 are safe and efficient. Right now they are not safe and not efficient. I 
 can hardly find reasons to characterize the new state of affairs as kludgey.


I'm not sure I agree that they'd be safe.  Given only a pointer to the head of
a block there's no way to know whether it represents the array or a slice of
the array from [0..n].

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 
 What I'm saying is that with the cache in place we'll have slices
 that are safe and efficient. Right now they are not safe and not
 efficient. I can hardly find reasons to characterize the new state
 of affairs as kludgey.


 I'm not sure I agree that they'd be safe.  Given only a pointer to
 the head of a block there's no way to know whether it represents the
 array or a slice of the array from [0..n].


That's the beauty of the cache: with the cache in place, you may 
actually know that the memory beyond the end of a cached slice is seen 
by nobody else.

Andrei

Oct 19 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:
 
 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }
 
 The caller may or may not see 42 in the first slot after the call.


Your definition of "safe" is clearly not aligned with mine.


-- 
Rainer Deyke - rainerd eldwood.com

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.


 Your definition of "safe" is clearly not aligned with mine.
 
 


What's yours?

Andrei

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Mon, Oct 19, 2009 at 5:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
   a[0] = 0;
   a ~= 42;
   a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.






 Probably something like "safe from easy-to-write hard-to-debug
 programming errors".
 
 I agree that it would stink if all these changes were made and *still*
 one of the top gotchas with arrays/slices remained.  It also makes me
 think slices and appending just don't belong together.  Appending to a
 view ... just say no.


How to reconcile this viewpoint with that of people who find ~= all too 
darn convenient?

Andrei

Oct 19 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Your definition of "safe" is clearly not aligned with mine.


 What's yours?


Safety is not an absolute, but a question of degree.  The harder it is
to write incorrect code, the safer the language.

One key element of this is deterministic behavior.  If you rely on the
whim of the runtime to determine if two slices refer to the same data,
then it becomes much harder to reason about or test the code.


-- 
Rainer Deyke - rainerd eldwood.com

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Your definition of "safe" is clearly not aligned with mine.




 Safety is not an absolute, but a question of degree.  The harder it is
 to write incorrect code, the safer the language.


Well you got a point. I think I will from here on talk about "soundness" 
instead of "safety", because the former is absolute.

 One key element of this is deterministic behavior.  If you rely on the
 whim of the runtime to determine if two slices refer to the same data,
 then it becomes much harder to reason about or test the code.


I agree.


Andrei

Oct 19 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.


 Your definition of "safe" is clearly not aligned with mine.


 What's yours?


"SafeD is easy to learn and it keeps the programmers away from undefined 
behaviors." -- safed.html.

The behaviour you quoted is undefined behaviour, therefore it's not safe 
according to the only SafeD definition in the spec.

Oct 20 2009

Walter Bright <newshound1 digitalmars.com> writes:

Denis Koroskin wrote:
 Safe as in SafeD (i.e. no memory corruption) :)


Right. The problems with other definitions of safe is they are too 
ill-defined.

Oct 19 2009

Brad Roberts <braddr bellevue.puremagic.com> writes:

On Mon, 19 Oct 2009, Walter Bright wrote:

 Denis Koroskin wrote:
 Safe as in SafeD (i.e. no memory corruption) :)


 Right. The problems with other definitions of safe is they are too
 ill-defined.


There's SafeD, which has a fairly formal definition.

The other side of it is the general principle of D which is that the right 
way should be the easy and obvious way.  Slices and arrays have issue with 
the principle.

Oct 19 2009

Christopher Wright <dhasenan gmail.com> writes:

Brad Roberts wrote:
 On Mon, 19 Oct 2009, Walter Bright wrote:
 
 Denis Koroskin wrote:
 Safe as in SafeD (i.e. no memory corruption) :)


 ill-defined.


 There's SafeD, which has a fairly formal definition.


But a fairly generic name, which confuses people repeatedly. I'm not the 
first to recommend that the name be changed. It does more harm than good.

Oct 20 2009

Fawzi Mohamed <fmohamed mac.com> writes:

On 2009-10-20 03:41:59 +0200, Walter Bright <newshound1 digitalmars.com> said:

 Denis Koroskin wrote:
 Safe as in SafeD (i.e. no memory corruption) :)


 Right. The problems with other definitions of safe is they are too ill-defined.


no (or as little as possible) undefined behaviour comes to mind.
Initialization of values in D follows that.
Slice appending does not (and will remain so also with LRU cache).
LRU cache improves the current status quo, but is still a cludge: 
undefined behavious in some corner cases, undefined optimization 
behaviour...
It improves things, but cannot be trusted in general, thus a clean 
library type is still needed imho.

Fawzi

Oct 20 2009

Rainer Deyke <rainerd eldwood.com> writes:

Denis Koroskin wrote:
 On Tue, 20 Oct 2009 03:00:57 +0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.


 Your definition of "safe" is clearly not aligned with mine.


 Safe as in SafeD (i.e. no memory corruption) :)


If the caller wasn't expecting the array to be modified, then that's a
textbook case of memory corruption.  If the caller /was/ expecting the
array to be modified, then it's the opposite, which isn't much better.


-- 
Rainer Deyke - rainerd eldwood.com

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Denis Koroskin wrote:
 On Tue, 20 Oct 2009 03:00:57 +0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.






 If the caller wasn't expecting the array to be modified, then that's a
 textbook case of memory corruption.


[citation needed]

Andrei

Oct 19 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 If the caller wasn't expecting the array to be modified, then that's a
 textbook case of memory corruption.


 [citation needed]


I guess we need to define memory corruption first.  "Memory corruption
is when a piece of code erroneously overwrites memory."  That applies
here.  Do you have a better definition?


-- 
Rainer Deyke - rainerd eldwood.com

Oct 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 If the caller wasn't expecting the array to be modified, then that's a
 textbook case of memory corruption.




 I guess we need to define memory corruption first.  "Memory corruption
 is when a piece of code erroneously overwrites memory."


Where's that quote from?

That definition is too vague to be of any use. Even an int that's 
written with a value when the spec meant to write another value is 
erroneously written and would be, by your definition, memory corruption. 
  It is not.

 That applies
 here.  Do you have a better definition?


I do, but since you mentioned a textbook case of memory corruption, I 
was curious which textbook that wold come from. That's why I asked for a 
_citation_. A vague useless ad-hoc definition is easy to put together.


Andrei

Oct 19 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 I guess we need to define memory corruption first.  "Memory corruption
 is when a piece of code erroneously overwrites memory."


 Where's that quote from?


It's my own attempt to define memory corruption.  It's equivalent to the
definition in Wikipedia: "Memory corruption happens when the contents of
a memory location are unintentionally modified due to programming errors."

 [...] Do you have a better definition?


 I do,


Then post it.

 but since you mentioned a textbook case of memory corruption, I
 was curious which textbook that wold come from.


My use of the word "textbook" was idiomatic, not literal.

From http://www.answers.com/topic/textbook:

text·book   (tĕkst'bʊk')

[...]

adj.

Being a characteristic example of its kind; classic: a textbook case of
schizophrenia.


-- 
Rainer Deyke - rainerd eldwood.com

Oct 19 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 20 Oct 2009 03:00:57 +0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }

 The caller may or may not see 42 in the first slot after the call.


 Your definition of "safe" is clearly not aligned with mine.


Safe as in SafeD (i.e. no memory corruption) :)

Oct 19 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Oct 19, 2009 at 5:07 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 One surprising (but safe) behavior that remains with slices is this:

 void fun(int[] a) {
 =A0 a[0] =3D 0;
 =A0 a ~=3D 42;
 =A0 a[0] =3D 42;
 }

 The caller may or may not see 42 in the first slot after the call.


 Your definition of "safe" is clearly not aligned with mine.


 What's yours?


Probably something like "safe from easy-to-write hard-to-debug
programming errors".

I agree that it would stink if all these changes were made and *still*
one of the top gotchas with arrays/slices remained.  It also makes me
think slices and appending just don't belong together.  Appending to a
view ... just say no.

--bb

Oct 19 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Oct 2009 22:37:26 -0400, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  


 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  






 dsimcha wrote:
 Started playing w/ the implementation a little and I see a  












 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know  












 the LRU cache and evict stale entries (probably slows down GC a  












 to implement, couples things that shouldn't be tightly coupled),  










 cache every time GC is run (probably would make appending so slow  










 defeat the
 purpose of having the cache).










 cycle costs so much, the marginal cost of losing cached  








 lost in the noise.
 Andrei








 the array is large.  Then things really get ugly as, for all  






 you've completely done away with the cache.


 Andrei


 But the array isn't guaranteed to get reallocated immediately after  


 run.  If you're appending to a huge array, the GC will likely run  


 while you're appending, leading to several unnecessary reallocations.


 1. Request for an append comes that runs out of memory
 2. GC runs and clears memory
 3. Array is reallocated and the capacity cached.
 No?


 This is entirely correct.

 Each of
 those unnecessary reallocations will increase the memory footprint of  


 program, possibly triggering another GC run and wiping out your cache  


 short order, until, for sufficiently large arrays,

 a ~= b;

 is almost equivalent to

 a = a ~ b;


 Andrei


 The cache doesn't make anything *worse* than with no cache.  The only  
 point I'm
 trying to make is that, for large arrays, if the GC clears the cache  
 every time it
 runs, things would start to get *almost as bad as* having no cache  
 because the
 copy operations become expensive and the GC may run frequently.


The cache can't be "cleared" every time, or else you might as well only  
keep one LRU entry:

int[] twos, threes;

for(int i = 1; i < 10000; i++)
{
   twos ~= i * 2;
   threes ~= i * 3;
}

At some point, twos or threes needs an allocation triggering a collection,  
and that clears the cache, making the other array need an allocation,  
clearing the cache, etc.

I'd think you only want to clear the entries affected by the collection.

-Steve

Oct 20 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 20 Oct 2009 10:05:42 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Mon, 19 Oct 2009 22:37:26 -0400, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  


 dsimcha wrote:
 == Quote from Andrei Alexandrescu  






 dsimcha wrote:
 Started playing w/ the implementation a little and I see a  












 the garbage collector?  There are two possibilities:


 The only possible solutions I see would be to have the GC know  












 the LRU cache and evict stale entries (probably slows down GC a  












 to implement, couples things that shouldn't be tightly coupled),  










 cache every time GC is run (probably would make appending so  










 defeat the
 purpose of having the cache).










 cycle costs so much, the marginal cost of losing cached  








 lost in the noise.
 Andrei








 the array is large.  Then things really get ugly as, for all  






 you've completely done away with the cache.


 Andrei


 But the array isn't guaranteed to get reallocated immediately after  


 run.  If you're appending to a huge array, the GC will likely run  


 while you're appending, leading to several unnecessary reallocations.


 1. Request for an append comes that runs out of memory
 2. GC runs and clears memory
 3. Array is reallocated and the capacity cached.
 No?


 This is entirely correct.

 Each of
 those unnecessary reallocations will increase the memory footprint  


 program, possibly triggering another GC run and wiping out your  


 short order, until, for sufficiently large arrays,

 a ~= b;

 is almost equivalent to

 a = a ~ b;


 Andrei


 The cache doesn't make anything *worse* than with no cache.  The only  
 point I'm
 trying to make is that, for large arrays, if the GC clears the cache  
 every time it
 runs, things would start to get *almost as bad as* having no cache  
 because the
 copy operations become expensive and the GC may run frequently.


 The cache can't be "cleared" every time, or else you might as well only  
 keep one LRU entry:

 int[] twos, threes;

 for(int i = 1; i < 10000; i++)
 {
    twos ~= i * 2;
    threes ~= i * 3;
 }

 At some point, twos or threes needs an allocation triggering a  
 collection, and that clears the cache, making the other array need an  
 allocation, clearing the cache, etc.

 I'd think you only want to clear the entries affected by the collection.

 -Steve


If it was free and simple to only clear the affected entries, sure. But  
doing so requires (very heavy?) modification of the GC in order to track  
and check changes. It also reduces collection performance. I think, that  
if GC allocations added entries to the LRU, and therefore the information  
in the LRU is never stale, you could avoid clearing the LRU. But this  
requires the LRU to be part of the GC.

Oct 20 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 20 Oct 2009 10:48:31 -0400, Robert Jacques <sandford jhu.edu>  
wrote:

 On Tue, 20 Oct 2009 10:05:42 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 I'd think you only want to clear the entries affected by the collection.


 If it was free and simple to only clear the affected entries, sure. But  
 doing so requires (very heavy?) modification of the GC in order to track  
 and check changes.


Why?  All you have to do is check whether a block is referenced in the LRU  
while freeing the block.  I don't even think it would be that performance  
critical.  Using my vastly novice assumptions about how the GC collection  
cycle works:

step 1, mark all blocks that are not referenced by any roots.
step 2, check which blocks are referenced by the LRU, if they are, then  
remove them from the LRU.
step 3, recycle free blocks.

 But this requires the LRU to be part of the GC.


I think we're already in that boat.  If the LRU isn't attached to the GC,  
then ~= becomes a locking operation even if the GC is thread-local, which  
makes no sense.

-Steve

Oct 20 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 20 Oct 2009 11:24:21 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Tue, 20 Oct 2009 10:48:31 -0400, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Tue, 20 Oct 2009 10:05:42 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 I'd think you only want to clear the entries affected by the  
 collection.


 If it was free and simple to only clear the affected entries, sure. But  
 doing so requires (very heavy?) modification of the GC in order to  
 track and check changes.


 Why?  All you have to do is check whether a block is referenced in the  
 LRU while freeing the block.  I don't even think it would be that  
 performance critical.  Using my vastly novice assumptions about how the  
 GC collection cycle works:

 step 1, mark all blocks that are not referenced by any roots.
 step 2, check which blocks are referenced by the LRU, if they are, then  
 remove them from the LRU.
 step 3, recycle free blocks.


I agree, but my mind hadn't gotten there yet. (It was thinking of the  
overhead of generational/concurrent collections, for some strange reason)

 But this requires the LRU to be part of the GC.


 I think we're already in that boat.  If the LRU isn't attached to the  
 GC, then ~= becomes a locking operation even if the GC is thread-local,  
 which makes no sense.

 -Steve


Of course, Andrei just stated the cache should be thread-local (and  
probably in the function, not the GC) which throws a spanner into the  
works.

Oct 20 2009

Sean Kelly <sean invisibleduck.org> writes:

Jason House Wrote:
 
 Will all arrays overallicate when created in order to allow efficient
appending?


They effectively do anyway, since the GC block sizes are in powers of 2 and the
runtime relies on the GC to determine block capacity.  In effect, it's simply
the GC that's over-allocating instead of the runtime itself.

Oct 19 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 19 de octubre a las 17:24 me escribiste:
 dsimcha wrote:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
Making ~= work well for slices does not preclude creating a distinct
library type.
Andrei


Yes, I see LRU/MRU/whatever as a kludge to allow convenience without being
unsafe
egregiously inefficient.  The fact that it is a kludge is not a criticism
because
there is clearly no easy answer, and thus a kludge is genuinely necessary.
However, I think there needs to be a separate array builder type for "heavy
duty"
appending operations.  In TDPL I would just say that concatenating slices can be
inefficient and that people should use array builder for heavy duty appending,
length changing, etc., w/o getting into the details.


 But how about the opposite view. What if the *previous*
 implementation of GC and ~= were a kludge that led to egregious
 inefficiency and revolting unsafety, kludge that that now is getting
 fixed.
 
 What I'm saying is that with the cache in place we'll have slices
 that are safe and efficient. Right now they are not safe and not
 efficient. I can hardly find reasons to characterize the new state
 of affairs as kludgey.
 
 One surprising (but safe) behavior that remains with slices is this:
 
 void fun(int[] a) {
    a[0] = 0;
    a ~= 42;
    a[0] = 42;
 }
 
 The caller may or may not see 42 in the first slot after the call.


And for me this is enough to remove appending from slicing and having
a proper array type.

There is no reason to keep that buggy behaviour.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
DONAN UN PENE EN NICARAGUA
	-- Crónica TV

Oct 19 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 19 de octubre a las 20:12 me escribiste:
 Bill Baxter wrote:
On Mon, Oct 19, 2009 at 5:07 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Rainer Deyke wrote:
Andrei Alexandrescu wrote:
One surprising (but safe) behavior that remains with slices is this:

void fun(int[] a) {
  a[0] = 0;
  a ~= 42;
  a[0] = 42;
}

The caller may or may not see 42 in the first slot after the call.






Probably something like "safe from easy-to-write hard-to-debug
programming errors".

I agree that it would stink if all these changes were made and *still*
one of the top gotchas with arrays/slices remained.  It also makes me
think slices and appending just don't belong together.  Appending to a
view ... just say no.


 How to reconcile this viewpoint with that of people who find ~= all
 too darn convenient?


Use a proper array type, not a view (slice).

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
No recuerdo las flores, no conozco el viento
Aquí donde envejecen juntos presos y carceleros
Los días no pasan, el tiempo no pasa
Y vivimos contando el tiempo
Y moriremos contando el tiempo

Oct 19 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: LRU cache for ~=