digitalmars.D - Round VII. COW in the city, myths.
- Andrew Fedoniouk (75/75) Jul 17 2005 Here is real test of two functions doing
- Holger (7/82) Jul 17 2005 Seems to be an interesting test.
- Andrew Fedoniouk (84/84) Jul 17 2005 Here it is:
- Holger (7/84) Jul 17 2005 How did you determine, that GC didn't kick in?
- Andrew Fedoniouk (42/42) Jul 17 2005 Function below should be changed to
- Ben Hinkle (7/11) Jul 17 2005 agreed. tolower when fed enough upper-case data will spend time figuring...
- Mike Capp (6/7) Jul 17 2005 I don't know about Java, but I believe the major reason why COW for std:...
- Andrew Fedoniouk (7/14) Jul 17 2005 Yes, I heard those rumors too.
- Andrew Fedoniouk (30/30) Jul 17 2005 ... function which is doing inplace tolower. Obviously.
- Chris Sauls (8/20) Jul 17 2005 Is this any better than:
- Andrew Fedoniouk (7/26) Jul 18 2005 Thanks, Chris,
- Nick (12/17) Jul 18 2005 Err.. no. COW is intended for when you are not supposed to change the or...
- Andrew Fedoniouk (15/32) Jul 18 2005 True.
- Nick (13/20) Jul 19 2005 That is of course true. In all cases where you know *with certainty* tha...
- Derek Parnell (8/13) Jul 19 2005
- Ben Hinkle (8/9) Jul 19 2005 I'm not sure what is so interesting about what I wrote. It seems like an...
- Derek Parnell (8/19) Jul 19 2005 The phrase "D has ... COW" implies that COW is built-in and automatic; t...
- Ben Hinkle (8/26) Jul 19 2005 oh, ok. Naturally I didn't mean to imply that. I assumed we all knew D's...
- Andrew Fedoniouk (8/36) Jul 19 2005 Sure I understand what you mean.
- Ben Hinkle (10/42) Jul 20 2005 Agreed. But I can't name another ... computer programming language and
- Andrew Fedoniouk (9/54) Jul 20 2005 Ok. Poetical enough :)
- Ben Hinkle (5/9) Jul 20 2005 I'm not sure what you mean by props. It brings to mind "stage props" or
- Andrew Fedoniouk (32/41) Jul 20 2005 Usually effective COW implementations use reference
- Ben Hinkle (12/55) Jul 20 2005 If one wants to program in D as one would in C one can - just that the
- Walter (7/9) Jul 21 2005 These methods work reasonably well in a single threaded environment. In
- Jan-Eric Duden (5/10) Jul 21 2005 How about for example the Win32 function InterlockedIncrement for ref
- Ben Hinkle (11/22) Jul 21 2005 That would indeed synchronize the inc/dec but it would still have proble...
- Jan-Eric Duden (4/35) Jul 21 2005 I'm not sure, but may be with some careful thinking one can find some
- Uwe Salomon (5/7) Jul 21 2005 And there are machine instructions for this problem. The Qt library uses...
- Ben Hinkle (7/14) Jul 21 2005 There are machine instructions to test and set (or increment) a memory
- Ben Hinkle (4/20) Jul 21 2005 Actually looking more closely an expression like str[i]='a' isn't
- Uwe Salomon (4/6) Jul 22 2005 Yes. The refcounting itself is thread-safe. But that's enough if you don...
- Regan Heath (4/8) Jul 22 2005 IMO that's all we need in D to do COW with surety, an indication that we...
- Ben Hinkle (16/22) Jul 22 2005 Agreed. I was thinking of code like
- Uwe Salomon (6/11) Jul 22 2005 Yes, of course. As a char[] is not implicitly shared, you cannot expect ...
- Ben Hinkle (4/13) Jul 22 2005 These are QStrings everywhere - no char[]'s. Use "at" or any modifying
- Uwe Salomon (5/17) Jul 22 2005 Ok i see. Well, in this case the above example will return the same valu...
- Ben Hinkle (26/44) Jul 22 2005 The "s" is global reference. Here's what I'm arguing happens:
- Uwe Salomon (4/9) Jul 22 2005 You are right, of course. Well, quirky tricks always come with some
- Andrew Fedoniouk (9/19) Jul 21 2005 No doubts.
- Walter (5/11) Jul 21 2005 Special design strategies can be very effective on an application by
- Andrew Fedoniouk (12/24) Jul 21 2005 Sure.
- Ben Hinkle (6/12) Jul 18 2005 Who knows what the future holds but personally I hope phobos doesn't blo...
- AJG (26/30) Jul 18 2005 I think, if the recent proposal to make in/out/inout mandatory at the ca...
- Andrew Fedoniouk (7/42) Jul 18 2005 inout is one more level of indirection and obviously is not needed
- Ben Hinkle (10/18) Jul 18 2005 I modified the parameters slightly to
- Ben Hinkle (6/27) Jul 18 2005 oops - I made a mistake. The numbers posted above were for upper-case
- Walter (19/21) Jul 21 2005 It's an interesting benchmark, but I don't agree it is realistic. The
- Andrew Fedoniouk (14/39) Jul 21 2005 Agreed.
- Sean Kelly (5/12) Jul 21 2005 For C++, why not just use:
- Andrew Fedoniouk (5/19) Jul 21 2005 Reason is to have them both.
Here is real test of two functions doing the same: char[] tolower_cow(char[] s) char[] tolower(char[] s) tolower_cow - simplified copy of Phobos tolower implementing COW. tolower - my version. no-COW - always allocates string up front. Results (less time - better) : 100000 strings: 9968ms - COW 9750ms - no-COW 50000 strings: 2062ms - COW 2015ms - no-COW 25000 strings: 546ms - COW 484ms - no-COW 10000 strings: 140ms - COW 125ms - no-COW Conclusion: In given pretty realistic example 1) COW is always slower 2) In cases when no GC involved COW is 12% slower then no-COW. To Ben: probably this is the reason why Java does not use COW? All above does not mean that COW is wrong. It is useful in some particular situations. But far not always. Andrew. Source of the test - attached. begin 666 cow.d M<W1A=&EC(&EN="!R86YD;VU3965D(#T ,3(S-#L-"B <F5T=7)N(" H<F%N M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ* T*("H 0V]N=F5R M(" 9F]R("AI(#T ,#L M:&%R(&, /2!S6VE=.PT*"0EI9B H)T$G(#P M"0D)"6-H86YG960 /2 Q.PT*"0D)?0T*"0D)<EMI72 ](&, *R H8V%S="AC M<F,H8VAA<B!C*0T*"7L-" D)<F5T=7)N(" G02< /#T 8R F)B!C(#P]("=: M("LK<#L- M=')I;F<H*0T*>PT*("!I;G0 ;&5N(#T M/3T ,"D ;&5N(#T ,3L-"B 8VAA<B!T6UT /2!N97< 8VAA<EML96Y=.R - M6VE=(#T M.PT*"0T*"6-H87);72!S=')I;F=S6VYU;5]S=')I;F=S73L-" EF;W(H(&EN M="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K*VD *0T*"2 ('-T<FEN9W-; M<G0H*3L-" D-" EF;W(H(&EN="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K M*VD *0T*"2 ("\O<W1R:6YG<UMI72 ]('1O;&]W97)?8V]W*'-T<FEN9W-; M:5TI.PT*"2 ('-T<FEN9W-;:5T /2!T;VQO=V5R*'-T<FEN9W-;:5TI.PT* M(" (&-O=6YT97(N<W1O<" I.PT*(" (%!R;V-E<W-4:6UE<T-O=6YT97(N M:6YT97)V86Q?='EP92 ;7,Q(#T 8V]U;G1E<BYM:6QL:7-E8V]N9',H*3L- M/2 P.R!I(#P <VEZ93L *RMI*0T*"2\O("!W<FET968H(FD])61<;B(L9&%T ` end
Jul 17 2005
Seems to be an interesting test. Unfortunately, i'm unable to download the attached source. I'm using Webbrowser (IE) to access this NG. How does one download attachments here? TIA, Holger In article <dbeli8$16e6$1 digitaldaemon.com>, Andrew Fedoniouk says...Here is real test of two functions doing the same: char[] tolower_cow(char[] s) char[] tolower(char[] s) tolower_cow - simplified copy of Phobos tolower implementing COW. tolower - my version. no-COW - always allocates string up front. Results (less time - better) : 100000 strings: 9968ms - COW 9750ms - no-COW 50000 strings: 2062ms - COW 2015ms - no-COW 25000 strings: 546ms - COW 484ms - no-COW 10000 strings: 140ms - COW 125ms - no-COW Conclusion: In given pretty realistic example 1) COW is always slower 2) In cases when no GC involved COW is 12% slower then no-COW. To Ben: probably this is the reason why Java does not use COW? All above does not mean that COW is wrong. It is useful in some particular situations. But far not always. Andrew. Source of the test - attached. begin 666 cow.d M<W1A=&EC(&EN="!R86YD;VU3965D(#T ,3(S-#L-"B <F5T=7)N(" H<F%N M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ* T*("H 0V]N=F5R M(" 9F]R("AI(#T ,#L M:&%R(&, /2!S6VE=.PT*"0EI9B H)T$G(#P M"0D)"6-H86YG960 /2 Q.PT*"0D)?0T*"0D)<EMI72 ](&, *R H8V%S="AC M<F,H8VAA<B!C*0T*"7L-" D)<F5T=7)N(" G02< /#T 8R F)B!C(#P]("=: M("LK<#L- M=')I;F<H*0T*>PT*("!I;G0 ;&5N(#T M/3T ,"D ;&5N(#T ,3L-"B 8VAA<B!T6UT /2!N97< 8VAA<EML96Y=.R - M6VE=(#T M.PT*"0T*"6-H87);72!S=')I;F=S6VYU;5]S=')I;F=S73L-" EF;W(H(&EN M="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K*VD *0T*"2 ('-T<FEN9W-; M<G0H*3L-" D-" EF;W(H(&EN="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K M*VD *0T*"2 ("\O<W1R:6YG<UMI72 ]('1O;&]W97)?8V]W*'-T<FEN9W-; M:5TI.PT*"2 ('-T<FEN9W-;:5T /2!T;VQO=V5R*'-T<FEN9W-;:5TI.PT* M(" (&-O=6YT97(N<W1O<" I.PT*(" (%!R;V-E<W-4:6UE<T-O=6YT97(N M:6YT97)V86Q?='EP92 ;7,Q(#T 8V]U;G1E<BYM:6QL:7-E8V]N9',H*3L- M/2 P.R!I(#P <VEZ93L *RMI*0T*"2\O("!W<FET968H(FD])61<;B(L9&%T ` end
Jul 17 2005
Here it is: import std.stdio; import std.perf; import std.ctype; public int random() { static int randomSeed = 1234; return ((randomSeed = randomSeed * 214013 + 2531011) >> 16) & 0x7FFFFFFF; } public int random(int max) { return random() % max; } /************************************ * Convert string to lower case. */ char[] tolower_cow(char[] s) { int changed; int i; char[] r = s; changed = 0; for (i = 0; i < s.length; i++) { char c = s[i]; if ('A' <= c && c <= 'Z') { if (!changed) { r = s.dup; changed = 1; } r[i] = c + (cast(char)'a' - 'A'); } } return r; } char[] tolower(char[] s) { char tolowerc(char c) { return ('A' <= c && c <= 'Z')? c + (cast(char)'a' - 'A') : c; } char[] dst = s.dup; char* p = dst.ptr; char* p_end = p + dst.length; while( p < p_end ) { *p = tolowerc(*p); ++p; } return dst; } char[] random_string() { int len = random(2048); if(len == 0) len = 1; char t[] = new char[len]; for( int i = 0; i < len; ++i ) { t[i] = random(1)? 'a' + random('z' - 'a'): 'A' + random('Z' - 'A'); } return t; } int main(char[][] args) { ProcessTimesCounter counter = new ProcessTimesCounter(); const int num_strings = 25000; char[] strings[num_strings]; for( int i = 0; i < num_strings; ++i ) strings[i] = random_string(); counter.start(); for( int i = 0; i < num_strings; ++i ) //strings[i] = tolower_cow(strings[i]); strings[i] = tolower(strings[i]); counter.stop(); ProcessTimesCounter.interval_type ms1 = counter.milliseconds(); writef("ms=%d\n",ms1); //for(int i = 0; i < size; ++i) // writef("i=%d\n",data[i]); return 0; }
Jul 17 2005
Thanx a lot, quite interestind indeed. Just one question:2) In cases when no GC involved COW is 12% slower then no-COW.How did you determine, that GC didn't kick in? Number of tolower() calls simply? Cheers, Holger In article <dbeli8$16e6$1 digitaldaemon.com>, Andrew Fedoniouk says...Here is real test of two functions doing the same: char[] tolower_cow(char[] s) char[] tolower(char[] s) tolower_cow - simplified copy of Phobos tolower implementing COW. tolower - my version. no-COW - always allocates string up front. Results (less time - better) : 100000 strings: 9968ms - COW 9750ms - no-COW 50000 strings: 2062ms - COW 2015ms - no-COW 25000 strings: 546ms - COW 484ms - no-COW 10000 strings: 140ms - COW 125ms - no-COW Conclusion: In given pretty realistic example 1) COW is always slower 2) In cases when no GC involved COW is 12% slower then no-COW. To Ben: probably this is the reason why Java does not use COW? All above does not mean that COW is wrong. It is useful in some particular situations. But far not always. Andrew. Source of the test - attached. begin 666 cow.d M<W1A=&EC(&EN="!R86YD;VU3965D(#T ,3(S-#L-"B <F5T=7)N(" H<F%N M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ* T*("H 0V]N=F5R M(" 9F]R("AI(#T ,#L M:&%R(&, /2!S6VE=.PT*"0EI9B H)T$G(#P M"0D)"6-H86YG960 /2 Q.PT*"0D)?0T*"0D)<EMI72 ](&, *R H8V%S="AC M<F,H8VAA<B!C*0T*"7L-" D)<F5T=7)N(" G02< /#T 8R F)B!C(#P]("=: M("LK<#L- M=')I;F<H*0T*>PT*("!I;G0 ;&5N(#T M/3T ,"D ;&5N(#T ,3L-"B 8VAA<B!T6UT /2!N97< 8VAA<EML96Y=.R - M6VE=(#T M.PT*"0T*"6-H87);72!S=')I;F=S6VYU;5]S=')I;F=S73L-" EF;W(H(&EN M="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K*VD *0T*"2 ('-T<FEN9W-; M<G0H*3L-" D-" EF;W(H(&EN="!I(#T ,#L :2 \(&YU;5]S=')I;F=S.R K M*VD *0T*"2 ("\O<W1R:6YG<UMI72 ]('1O;&]W97)?8V]W*'-T<FEN9W-; M:5TI.PT*"2 ('-T<FEN9W-;:5T /2!T;VQO=V5R*'-T<FEN9W-;:5TI.PT* M(" (&-O=6YT97(N<W1O<" I.PT*(" (%!R;V-E<W-4:6UE<T-O=6YT97(N M:6YT97)V86Q?='EP92 ;7,Q(#T 8V]U;G1E<BYM:6QL:7-E8V]N9',H*3L- M/2 P.R!I(#P <VEZ93L *RMI*0T*"2\O("!W<FET968H(FD])61<;B(L9&%T ` end
Jul 17 2005
Function below should be changed to char[] random_string() { int len = random(2048); if(len == 0) len = 1; char t[] = new char[len]; for( int i = 0; i < len; ++i ) { t[i] = random(2)? // WAS: random(1) 'a' + random('z' - 'a'): 'A' + random('Z' - 'A'); } return t; } Old version were always allocating string in upper case. Interesting fact that this does not change big picture: COW is slower in the given example. COW is really better when all strings are in lower case so no reallocation happens in cow version: char[] random_string() { int len = random(2048); if(len == 0) len = 1; char t[] = new char[len]; for( int i = 0; i < len; ++i ) { t[i] = 'a' + random('z' - 'a'): } return t; } In this case COW is significantly better: 10000 strings: 15ms (COW) versus 125ms (no-COW) No-COW here is loosing time in D heap allocation. In Java/.NET where memory allocation costs pretty much "nothing" no-COW version will be faster than COW. "nothing" in quotes means it does have a hidden cost - time spent in garbage collection. All above means that strictly speaking effectivenes of COW depends - in given example on frequency of upper / lower strings in the input set. Andrew.
Jul 17 2005
All above means that strictly speaking effectivenes of COW depends - in given example on frequency of upper / lower strings in the input set.agreed. tolower when fed enough upper-case data will spend time figuring out if it should dup when it would be faster to just always dup. Personally I think one can only conclude that there's no silver bullet that works in all situations :-) When used for programs that deal with ordinary sentances where most words are lower case I would expect that COW comes out ahead and that toUpper would be slower.
Jul 17 2005
In article <dbeli8$16e6$1 digitaldaemon.com>, Andrew Fedoniouk says...To Ben: probably this is the reason why Java does not use COW?I don't know about Java, but I believe the major reason why COW for std::string has been dropped by most C++ implementations is the synchronization overhead in multithreaded contexts. cheers Mike
Jul 17 2005
"Mike Capp" <mike.capp gmail.com> wrote in message news:dbeu34$1efi$1 digitaldaemon.com...In article <dbeli8$16e6$1 digitaldaemon.com>, Andrew Fedoniouk says...Yes, I heard those rumors too. The problem is that STL is anyway not thread safe strictly speaking. So... I am personally using home-brewed string implementation. It is auto-COW and capable to use Doug's Lea MSPACEs - thread can have local string memory space - fast).To Ben: probably this is the reason why Java does not use COW?I don't know about Java, but I believe the major reason why COW for std::string has been dropped by most C++ implementations is the synchronization overhead in multithreaded contexts.
Jul 17 2005
... function which is doing inplace tolower. Obviously. 100000 strings: Best for COW case (all strings are in lower case - no allocations): 170ms - COW tolower 156ms - inplace tolower Worst COW case (all strings are in uppar case - dup always): 11250ms - COW tolower 234ms - inplace tolower inplace behaves *48* times better then COW tolower In real data (50/50) *24* times better. Conclusions: 1) COW everywhere and anytime could degrade your app dramaticly. 2) Phobos shall include three versions of such functions. void tolower( char[] str ) Andrew. -------------- void tolower_inplace(char[] s) { char* p =s.ptr; char* p_end = p + s.length; while( p < p_end ) { int c = *p; if('A' <= c && c <= 'Z') *p = c + (cast(char)'a' - 'A'); ++p; } }
Jul 17 2005
Andrew Fedoniouk wrote:void tolower_inplace(char[] s) { char* p =s.ptr; char* p_end = p + s.length; while( p < p_end ) { int c = *p; if('A' <= c && c <= 'Z') *p = c + (cast(char)'a' - 'A'); ++p; } }Is this any better than: -- Chris Sauls
Jul 17 2005
"Chris Sauls" <ibisbasenji gmail.com> wrote in message news:dbfi82$295v$1 digitaldaemon.com...Andrew Fedoniouk wrote:Thanks, Chris, but I don't see it as improvement as *p = c + 32; and *p = c | 32; I suspect, will be executed with the same number of instructions and result.void tolower_inplace(char[] s) { char* p =s.ptr; char* p_end = p + s.length; while( p < p_end ) { int c = *p; if('A' <= c && c <= 'Z') *p = c + (cast(char)'a' - 'A'); ++p; } }Is this any better than:
Jul 18 2005
Other than in, say, readability... ;) --AJG.Thanks, Chris, but I don't see it as improvementvoid tolower_inplace(char[] s) { char* p =s.ptr; char* p_end = p + s.length; while( p < p_end ) { int c = *p; if('A' <= c && c <= 'Z') *p = c + (cast(char)'a' - 'A'); ++p; } }Is this any better than:
Jul 18 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbgk82$52q$1 digitaldaemon.com...but I don't see it as improvement as *p = c + 32; and *p = c | 32; I suspect, will be executed with the same number of instructions and result.You suspect correctly. On modern CPUs, there's no difference.
Jul 21 2005
In article <dbf67r$1sn1$1 digitaldaemon.com>, Andrew Fedoniouk says...... function which is doing inplace tolower. Obviously. 100000 strings: Best for COW case (all strings are in lower case - no allocations): 170ms - COW tolower 156ms - inplace tolowerErr.. no. COW is intended for when you are not supposed to change the original string. When you are not the owner of the data, you would _ALWAYS_ have to make a copy before calling the inplace version. So change your best-case code to this: // COW: char[] str = tolower_COW(original); // Inplace: char[] str = original.dup; tolower_inplace(str); and then see what happens. Nick
Jul 18 2005
"Nick" <Nick_member pathlink.com> wrote in message news:dbftkp$2k3t$1 digitaldaemon.com...In article <dbf67r$1sn1$1 digitaldaemon.com>, Andrew Fedoniouk says...True. This example shows just one simple thing: COW is not always such tasty as stated by COW fellowship here. COW is a technic which may help but if used blindly can degrade you app significantly. To be short: COW cannot be used as a universal principle. And phrases like "D has GC and COW" (C) Ben, scares me. I would like to highlight again - COW has practically nothing common with readonly arrays and slices problem discussed in all other Rounds. Andrew.... function which is doing inplace tolower. Obviously. 100000 strings: Best for COW case (all strings are in lower case - no allocations): 170ms - COW tolower 156ms - inplace tolowerErr.. no. COW is intended for when you are not supposed to change the original string. When you are not the owner of the data, you would _ALWAYS_ have to make a copy before calling the inplace version. So change your best-case code to this:
Jul 18 2005
In article <dbgpv6$afj$1 digitaldaemon.com>, Andrew Fedoniouk says...This example shows just one simple thing: COW is not always such tasty as stated by COW fellowship here. COW is a technic which may help but if used blindly can degrade you app significantly. To be short: COW cannot be used as a universal principle.That is of course true. In all cases where you know *with certainty* that you are the only owner, COW will always underperform compared to in-place modifications. But COW _is_ the most efficient method in the general case, where data may or may not be shared. I would guess both cases (shared data vs. single-owner data) are pretty common, so yes, maybe phobos should have two versions of each function. But I think overloading the same function names, eg. tolower(), for both cases probably isn't a good idea, though. How about using the prefix to- for the COW cases, and make- for the in-place case, as a general principle? Eg. char[] newStr = tolower(oldStr); // COW - oldStr is never changed makeLower(oldStr); // Convert in-place Nick
Jul 19 2005
On Tue, 19 Jul 2005 20:51:05 +0000 (UTC), Nick wrote:How about using the prefix to- for the COW cases, and make- for the in-place case, as a general principle? Eg. char[] newStr = tolower(oldStr); // COW - oldStr is never changed makeLower(oldStr); // Convert in-placeNice idea. Since COW is just a convention in D, then this naming convention also makes sense. -- Derek Parnell Melbourne, Australia 20/07/2005 8:05:14 AM
Jul 19 2005
Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.And phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?
Jul 19 2005
On Tue, 19 Jul 2005 20:28:03 -0400, Ben Hinkle wrote:Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.The phrase "D has ... COW" implies that COW is built-in and automatic; that the coder doesn't have to do anything special to get it to operate. This is the 'scary' or misleading impression. -- Derek Melbourne, Australia 20/07/2005 10:48:09 AMAnd phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?
Jul 19 2005
"Derek Parnell" <derek psych.ward> wrote in message news:15fhron8l28hr.ar386310lxno$.dlg 40tude.net...On Tue, 19 Jul 2005 20:28:03 -0400, Ben Hinkle wrote:oh, ok. Naturally I didn't mean to imply that. I assumed we all knew D's COW wasn't built-in. Funny how the last time (or at least a recent time) COW came up http://www.digitalmars.com/d/archives/digitalmars/D/15924.html is also was a long thread with a few mis-communications. I guess my messages are write-only :-)Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.The phrase "D has ... COW" implies that COW is built-in and automatic; that the coder doesn't have to do anything special to get it to operate. This is the 'scary' or misleading impression.And phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?
Jul 19 2005
"Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:dbkabn$15u9$1 digitaldaemon.com..."Derek Parnell" <derek psych.ward> wrote in message news:15fhron8l28hr.ar386310lxno$.dlg 40tude.net...Sure I understand what you mean. But any language has COW - it is just one of many design patterns. I don't understand why D is it so ummm ... let's say ... palpitating.On Tue, 19 Jul 2005 20:28:03 -0400, Ben Hinkle wrote:oh, ok. Naturally I didn't mean to imply that. I assumed we all knew D's COW wasn't built-in.Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.The phrase "D has ... COW" implies that COW is built-in and automatic; that the coder doesn't have to do anything special to get it to operate. This is the 'scary' or misleading impression.And phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?Funny how the last time (or at least a recent time) COW came up http://www.digitalmars.com/d/archives/digitalmars/D/15924.html is also was a long thread with a few mis-communications. I guess my messages are write-only :-)Yep. "Harry Potter. Reborn of the Horse" I guess we will see a lot of those... Andrew.
Jul 19 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbki8s$1ct2$1 digitaldaemon.com..."Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:dbkabn$15u9$1 digitaldaemon.com...Agreed. But I can't name another ... computer programming language and standard library runtime system ... that prefers COW over all the others. Let me make another analogy (the crowd groans): people learn how to speak French, English, Japanese, etc but when in France it is preferred to speak French. By analogy when in D speak COW unless there's a situation that warrents something else."Derek Parnell" <derek psych.ward> wrote in message news:15fhron8l28hr.ar386310lxno$.dlg 40tude.net...Sure I understand what you mean. But any language has COW - it is just one of many design patterns.On Tue, 19 Jul 2005 20:28:03 -0400, Ben Hinkle wrote:oh, ok. Naturally I didn't mean to imply that. I assumed we all knew D's COW wasn't built-in.Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.The phrase "D has ... COW" implies that COW is built-in and automatic; that the coder doesn't have to do anything special to get it to operate. This is the 'scary' or misleading impression.And phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?I don't understand why D is it so ummm ... let's say ... palpitating.I don't know what you mean by palpitating - you mean it gives you palpitations or that COW is palpating.
Jul 20 2005
"Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:dble7p$30j4$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbki8s$1ct2$1 digitaldaemon.com...Ok. Poetical enough :) Can I get a "Lingua Franca" somewhere here?"Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:dbkabn$15u9$1 digitaldaemon.com...Agreed. But I can't name another ... computer programming language and standard library runtime system ... that prefers COW over all the others. Let me make another analogy (the crowd groans): people learn how to speak French, English, Japanese, etc but when in France it is preferred to speak French. By analogy when in D speak COW unless there's a situation that warrents something else."Derek Parnell" <derek psych.ward> wrote in message news:15fhron8l28hr.ar386310lxno$.dlg 40tude.net...Sure I understand what you mean. But any language has COW - it is just one of many design patterns.On Tue, 19 Jul 2005 20:28:03 -0400, Ben Hinkle wrote:oh, ok. Naturally I didn't mean to imply that. I assumed we all knew D's COW wasn't built-in.Forgive me for prolonging this thread but I feel like I have to respond. I would let it drop if it weren't so important to get at a solution.The phrase "D has ... COW" implies that COW is built-in and automatic; that the coder doesn't have to do anything special to get it to operate. This is the 'scary' or misleading impression.And phrases like "D has GC and COW" (C) Ben, scares me.I'm not sure what is so interesting about what I wrote. It seems like an uncontroversial statement to me. See http://www.digitalmars.com/d/garbage.html and http://www.digitalmars.com/d/memory.html#copyonwrite for what I was referring to. Do you want to get into what is scary exactly?Sorry for my english. Intention was to say "COW is such crucial in/for D". Simple question: Is it possible to not to write into what is not designed to be written/changed? Why D needs such COW props?I don't understand why D is it so ummm ... let's say ... palpitating.I don't know what you mean by palpitating - you mean it gives you palpitations or that COW is palpating.
Jul 20 2005
Simple question: Is it possible to not to write into what is not designed to be written/changed?I'm not sure what you mean. Are you asking if D already has C/C++ const somewhere? I thought that was the point of the proposals.Why D needs such COW props?I'm not sure what you mean by props. It brings to mind "stage props" or something that stands in for a real object - like a knife in a play is not a real knife. Is that what you mean?
Jul 20 2005
"Ben Hinkle" <bhinkle mathworks.com> wrote in message news:dbmb0a$pd1$1 digitaldaemon.com...Usually effective COW implementations use reference counting to do copy only when it is really needed. Surprisingly but in case of D string package I think it is better to do not use COW at all. See: Caller of toupper or replace knows better if it owns the string or not. If it is an owner of the string then toupper don't need to allocate anything. If caller is not owner then it should create temp buffer for that. Caller may decide to allocate such buffer on stack eliminating need of any heap operations. In case of heap caller shall delete this buffer explicitly and do not left garbage after itself. As a rule operations like toupper used in various sort of parsers which are reading streams in token buffers. These token buffers are preallocated strings and in most cases allow such manipulations / modifications. In any case it is better to give a choice in the library: void makelower(char[]) But, again, to do not use COW here. Not too much gain in D anyway. COW perfectly works for strings using reference counting but in D COW used so blindly is not optimal.Simple question: Is it possible to not to write into what is not designed to be written/changed?I'm not sure what you mean. Are you asking if D already has C/C++ const somewhere? I thought that was the point of the proposals.I mean this: prop (n.) 1. An object placed beneath or against a structure to keep it from falling or shaking; a support. 2. One that serves as a means of support or assistance. Andrew.Why D needs such COW props?I'm not sure what you mean by props. It brings to mind "stage props" or something that stands in for a real object - like a knife in a play is not a real knife. Is that what you mean?
Jul 20 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com..."Ben Hinkle" <bhinkle mathworks.com> wrote in message news:dbmb0a$pd1$1 digitaldaemon.com...If one wants to program in D as one would in C one can - just that the standard library is designed for COW. Personally I prefer the benefits of COW (simplicity, maintenance, usability, good performance) and I'll do the memory micromanaging after profiling to see if the performance needs tweaking. Should phobos get an inplace upper/lower? Walter can decide if he wants to or not. A user-supplied inplace upper/lower case converter would be a nice download for those who want it. Creating a GC-free version of phobos is what some people are doing on dsource so that might have some of the functionality you're looking for.Usually effective COW implementations use reference counting to do copy only when it is really needed. Surprisingly but in case of D string package I think it is better to do not use COW at all. See: Caller of toupper or replace knows better if it owns the string or not. If it is an owner of the string then toupper don't need to allocate anything. If caller is not owner then it should create temp buffer for that. Caller may decide to allocate such buffer on stack eliminating need of any heap operations. In case of heap caller shall delete this buffer explicitly and do not left garbage after itself. As a rule operations like toupper used in various sort of parsers which are reading streams in token buffers. These token buffers are preallocated strings and in most cases allow such manipulations / modifications. In any case it is better to give a choice in the library: void makelower(char[]) But, again, to do not use COW here. Not too much gain in D anyway. COW perfectly works for strings using reference counting but in D COW used so blindly is not optimal.Simple question: Is it possible to not to write into what is not designed to be written/changed?I'm not sure what you mean. Are you asking if D already has C/C++ const somewhere? I thought that was the point of the proposals.ok. I'm still not sure what you meant then but it probably isn't important.I mean this: prop (n.) 1. An object placed beneath or against a structure to keep it from falling or shaking; a support. 2. One that serves as a means of support or assistance.Why D needs such COW props?I'm not sure what you mean by props. It brings to mind "stage props" or something that stands in for a real object - like a knife in a play is not a real knife. Is that what you mean?
Jul 20 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com...Usually effective COW implementations use reference counting to do copy only when it is really needed.These methods work reasonably well in a single threaded environment. In multithreaded environments, however, ref-counted COW behaves poorly. This is because the ref counting has to be protected with locks, mutexes, etc. I've written projects using mark/sweep that beat the pants off of ref-counting code written by others for multi-threaded environments.
Jul 21 2005
Walter wrote:"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com... These methods work reasonably well in a single threaded environment. In multithreaded environments, however, ref-counted COW behaves poorly. This is because the ref counting has to be protected with locks, mutexes, etc.How about for example the Win32 function InterlockedIncrement for ref counting? I think it doesn't use locks, mutexes and so forth. Jan-Eric
Jul 21 2005
"Jan-Eric Duden" <jeduden whisset.com> wrote in message news:42DFAB33.7040407 whisset.com...Walter wrote:That would indeed synchronize the inc/dec but it would still have problems that after reading the count it can be changed. For example the pseudo-code if (count > 1) duplicate write value fails if the count is incremented to 2 after the if condition is tested but before the write happens. So to be safe one needs more synchronization than just the count inc/dec - the entire operation of checking the count and reading/writing needs to be atomic."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com... These methods work reasonably well in a single threaded environment. In multithreaded environments, however, ref-counted COW behaves poorly. This is because the ref counting has to be protected with locks, mutexes, etc.How about for example the Win32 function InterlockedIncrement for ref counting? I think it doesn't use locks, mutexes and so forth. Jan-Eric
Jul 21 2005
Ben Hinkle wrote:"Jan-Eric Duden" <jeduden whisset.com> wrote in message news:42DFAB33.7040407 whisset.com...I'm not sure, but may be with some careful thinking one can find some shortcuts here. A feeling in my gut says it shouldn't be that expensive. ;) Jan-EricWalter wrote:That would indeed synchronize the inc/dec but it would still have problems that after reading the count it can be changed. For example the pseudo-code if (count > 1) duplicate write value fails if the count is incremented to 2 after the if condition is tested but before the write happens. So to be safe one needs more synchronization than just the count inc/dec - the entire operation of checking the count and reading/writing needs to be atomic."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com... These methods work reasonably well in a single threaded environment. In multithreaded environments, however, ref-counted COW behaves poorly. This is because the ref counting has to be protected with locks, mutexes, etc.How about for example the Win32 function InterlockedIncrement for ref counting? I think it doesn't use locks, mutexes and so forth. Jan-Eric
Jul 21 2005
the entire operation of checking the count and reading/writing needs to be atomic.And there are machine instructions for this problem. The Qt library uses this "atomic reference counting" for its containers. These are a few lines of assembler for every target platform. Ciao uwe
Jul 21 2005
"Uwe Salomon" <post uwesalomon.de> wrote in message news:op.st9qmlt06yjbe6 sandmann.maerchenwald.net...There are machine instructions to test and set (or increment) a memory location but I was referring to testing the count and setting some array content which would require using two memory locations. Glancing over the QString it looks like RAII is used to avoid the problem - though I haven't looked at the details.the entire operation of checking the count and reading/writing needs to be atomic.And there are machine instructions for this problem. The Qt library uses this "atomic reference counting" for its containers. These are a few lines of assembler for every target platform. Ciao uwe
Jul 21 2005
"Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:dbp9bj$fc$1 digitaldaemon.com..."Uwe Salomon" <post uwesalomon.de> wrote in message news:op.st9qmlt06yjbe6 sandmann.maerchenwald.net...Actually looking more closely an expression like str[i]='a' isn't thread-safe. The thread safety only applies to the memory management.There are machine instructions to test and set (or increment) a memory location but I was referring to testing the count and setting some array content which would require using two memory locations. Glancing over the QString it looks like RAII is used to avoid the problem - though I haven't looked at the details.the entire operation of checking the count and reading/writing needs to be atomic.And there are machine instructions for this problem. The Qt library uses this "atomic reference counting" for its containers. These are a few lines of assembler for every target platform. Ciao uwe
Jul 21 2005
Actually looking more closely an expression like str[i]='a' isn't thread-safe. The thread safety only applies to the memory management.Yes. The refcounting itself is thread-safe. But that's enough if you don't modify strings that are referenced more than once. Ciao uwe
Jul 22 2005
On Fri, 22 Jul 2005 09:55:40 +0200, Uwe Salomon <post uwesalomon.de> wrote:IMO that's all we need in D to do COW with surety, an indication that we are the owner of the string, the only reference holder, etc. ReganActually looking more closely an expression like str[i]='a' isn't thread-safe. The thread safety only applies to the memory management.Yes. The refcounting itself is thread-safe. But that's enough if you don't modify strings that are referenced more than once.
Jul 22 2005
"Uwe Salomon" <post uwesalomon.de> wrote in message news:op.sua1m2y56yjbe6 sandmann.maerchenwald.net...Agreed. I was thinking of code like QString s; thread 1: s[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; which can result in the y[0] returning different values in thread 2. Granted the code has a race condition anyway so the best one could hope for is that the y[0] return either their initial value or 'a' consitently. The code QString s; thread 1: QString x(s); x[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; is fine since the copy-on-write would detect x reference and dup before changing the value. I guess my (nit-picking) beef is that the implicit sharing is visible to multi-threaded users so that it isn't completely implicit.Actually looking more closely an expression like str[i]='a' isn't thread-safe. The thread safety only applies to the memory management.Yes. The refcounting itself is thread-safe. But that's enough if you don't modify strings that are referenced more than once. Ciao uwe
Jul 22 2005
Agreed. I was thinking of code like QString s; thread 1: s[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; which can result in the y[0] returning different values in thread 2.Yes, of course. As a char[] is not implicitly shared, you cannot expect thread-safety here. You have to work with QStrings throughout to play safe. But, personally, i don't like this programming paradigm. I normally think in terms of "ownership", and dup when the owner changes. Ciao uwe
Jul 22 2005
"Uwe Salomon" <post uwesalomon.de> wrote in message news:op.subinjm26yjbe6 sandmann.maerchenwald.net...These are QStrings everywhere - no char[]'s. Use "at" or any modifying operator if operator[] looks too much like char[].Agreed. I was thinking of code like QString s; thread 1: s[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; which can result in the y[0] returning different values in thread 2.Yes, of course. As a char[] is not implicitly shared, you cannot expect thread-safety here. You have to work with QStrings throughout to play safe. But, personally, i don't like this programming paradigm. I normally think in terms of "ownership", and dup when the owner changes.
Jul 22 2005
Ok i see. Well, in this case the above example will return the same values even in a multithreaded environment. That's because s[0] = 'a' takes a copy of the string (there are 2 references on it). Ciao uweThese are QStrings everywhere - no char[]'s. Use "at" or any modifying operator if operator[] looks too much like char[].QString s; thread 1: s[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; which can result in the y[0] returning different values in thread 2.Yes, of course. As a char[] is not implicitly shared, you cannot expect thread-safety here. You have to work with QStrings throughout to play safe. But, personally, i don't like this programming paradigm. I normally think in terms of "ownership", and dup when the owner changes.
Jul 22 2005
"Uwe Salomon" <post uwesalomon.de> wrote in message news:op.subsrszr6yjbe6 sandmann.maerchenwald.net...The "s" is global reference. Here's what I'm arguing happens: 1) s is constructed and filled with some data. It is the sole owner of the data so the ref count is 1. 2) Thread 1 enters the s[0] = 'a' statement. The operator[] makes and returns a QCharRef (which contains a field str of type QString& not a QString so it does not increment the ref count for s). 3) Thread 1 starts to evaluate the assignment. The QCharRef assignment operator has the following code if (str.count > 1) expand(); // dup if multiple refs str.data[i] = val; so since the ref count is 1 it does not duplicate the data. Now between the if statement and the std.data[i]=val suppose a thread context switch happens and 4) Thread 2 enters the constructor QString y(s) which increments the ref count and shares the data pointer with s. 5) Thread 2 evaluates the y[0] statement which reads the shared array item 0. Note that reading a string does not check the ref count. 6) Thread 1 switches back in and evaluates the assignment str.data[i] = val 7) Thread 2 switches and evaluates the y[0] which now picks up the new value in the shared data. So I'm arguing it can be pretty subtle just what is allowed with QString and what isn't. The documentation for QString is correct and the case I've described above is not contradicting their documentation since the threading switches involve a shared reference "s".Ok i see. Well, in this case the above example will return the same values even in a multithreaded environment. That's because s[0] = 'a' takes a copy of the string (there are 2 references on it). Ciao uweThese are QStrings everywhere - no char[]'s. Use "at" or any modifying operator if operator[] looks too much like char[].QString s; thread 1: s[0] = 'a'; thread 2: QString y(s); y[0]; y[0]; which can result in the y[0] returning different values in thread 2.Yes, of course. As a char[] is not implicitly shared, you cannot expect thread-safety here. You have to work with QStrings throughout to play safe. But, personally, i don't like this programming paradigm. I normally think in terms of "ownership", and dup when the owner changes.
Jul 22 2005
So I'm arguing it can be pretty subtle just what is allowed with QString and what isn't. The documentation for QString is correct and the case I've described above is not contradicting their documentation since the threading switches involve a shared reference "s".You are right, of course. Well, quirky tricks always come with some caveats... Ciao uwe
Jul 22 2005
"Walter" <newshound digitalmars.com> wrote in message news:dbnn3j$1raf$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbmd3g$r3n$1 digitaldaemon.com...No doubts. I am personally using COW strings in multithreading environments without any locking. Sharing strings as any other dynamic data between threads needs special design strategies. I mean - string is too atomic object to lock each instance of it. Andrew.Usually effective COW implementations use reference counting to do copy only when it is really needed.These methods work reasonably well in a single threaded environment. In multithreaded environments, however, ref-counted COW behaves poorly. This is because the ref counting has to be protected with locks, mutexes, etc. I've written projects using mark/sweep that beat the pants off of ref-counting code written by others for multi-threaded environments.
Jul 21 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dboh94$2ed8$1 digitaldaemon.com...No doubts. I am personally using COW strings in multithreading environments without any locking. Sharing strings as any other dynamic data between threads needs special design strategies. I mean - string is too atomic object to lock each instance of it.Special design strategies can be very effective on an application by application basis, but almost by definition won't work for a general purpose library.
Jul 21 2005
"Walter" <newshound digitalmars.com> wrote in message news:dbon46$2jgs$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dboh94$2ed8$1 digitaldaemon.com...Sure. Walter, I am not convincing anybody to use refcounted strings in D. They are good for specific cases and task domains. Under the GC it is just possible to use immutable character arrays as strings. Not always optimal (for specific tasks) but possible. Without GC refcounted or auto string as a solution is more valuable. But again string is an immutable character array - it is a type. Let's don't mix this problem with "immutable in" - this is from other opera.No doubts. I am personally using COW strings in multithreading environments without any locking. Sharing strings as any other dynamic data between threads needs special design strategies. I mean - string is too atomic object to lock each instance of it.Special design strategies can be very effective on an application by application basis, but almost by definition won't work for a general purpose library.
Jul 21 2005
Conclusions: 1) COW everywhere and anytime could degrade your app dramaticly.No-one was suggesting inplace is slower than COW. One can rephrase this conclusion as "allocating memory can dramatically slow down your app".2) Phobos shall include three versions of such functions. void tolower( char[] str )Who knows what the future holds but personally I hope phobos doesn't bloat with three versions of any COW function. If one determines that in an application a particular function is a bottleneck code up one tailered for the application and use that instead of the standard one.
Jul 18 2005
Hi,2) Phobos shall include three versions of such functions. void tolower( char[] str )I think, if the recent proposal to make in/out/inout mandatory at the call site is taken seriously (which I think would be a _good_ thing), then phobos could probably get away with 2 versions: string tolower(in /* readonly */ string str); // Returns a lowered _dup_ of the string. // If no changes made, it can be made to return the same reference. string tolower(inout string str); // Changes the string in place, and returns the same reference. No duping ever. I think this kind of overloading would be fairly useful, if only for this purpose. I don't think it would bloat, because the first one could be easily implemented in terms of the second one: And: Cheers, --AJG.
Jul 18 2005
"AJG" <AJG_member pathlink.com> wrote in message news:dbgc5t$2vsa$1 digitaldaemon.com...Hi,inout is one more level of indirection and obviously is not needed here. BTW: "foreach (inout char c; str) if ('A' <= c && c <= 'Z') c |= 32;" such cycle will not work for UTF-8 - use with care.2) Phobos shall include three versions of such functions. void tolower( char[] str )I think, if the recent proposal to make in/out/inout mandatory at the call site is taken seriously (which I think would be a _good_ thing), then phobos could probably get away with 2 versions: string tolower(in /* readonly */ string str); // Returns a lowered _dup_ of the string. // If no changes made, it can be made to return the same reference. string tolower(inout string str); // Changes the string in place, and returns the same reference. No duping ever. I think this kind of overloading would be fairly useful, if only for this purpose. I don't think it would bloat, because the first one could be easily implemented in terms of the second one: And:Cheers, --AJG.
Jul 18 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbeli8$16e6$1 digitaldaemon.com...Here is real test of two functions doing the same: char[] tolower_cow(char[] s) char[] tolower(char[] s) tolower_cow - simplified copy of Phobos tolower implementing COW. tolower - my version. no-COW - always allocates string up front.I modified the parameters slightly to 1) allocate strings of length between 1 and 20 instead of 1 and 2048 2) eye-balling English text I changed the probability of upper case to 1 in 40 instead of 1 in 2. With those parameters cow gives ~3187ms and dup gives ~3218ms. So with strings that are roughly the length of a word or two with upper-case roughly what happens in regular text I can't tell a difference between cow and always duping. YMMV
Jul 18 2005
"Ben Hinkle" <bhinkle mathworks.com> wrote in message news:dbgb5g$2utt$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbeli8$16e6$1 digitaldaemon.com...oops - I made a mistake. The numbers posted above were for upper-case 40-to-1 not 1-to-40. I didn't pay enough attention to the ?: logic. When I get the ratio right the cow version becomes ~610ms so it is significantly faster than always-dup. still YMMV.Here is real test of two functions doing the same: char[] tolower_cow(char[] s) char[] tolower(char[] s) tolower_cow - simplified copy of Phobos tolower implementing COW. tolower - my version. no-COW - always allocates string up front.I modified the parameters slightly to 1) allocate strings of length between 1 and 20 instead of 1 and 2048 2) eye-balling English text I changed the probability of upper case to 1 in 40 instead of 1 in 2. With those parameters cow gives ~3187ms and dup gives ~3218ms. So with strings that are roughly the length of a word or two with upper-case roughly what happens in regular text I can't tell a difference between cow and always duping. YMMV
Jul 18 2005
"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbeli8$16e6$1 digitaldaemon.com...Conclusion: In given pretty realistic exampleIt's an interesting benchmark, but I don't agree it is realistic. The strings presented by the test program to the tolower_cow() function will need to .dup the string essentially 100% of the time (as the string length is evenly distributed between 1 and 2048, and each character will be uppercase 50% of the time). Duping the string with a test 100% of the time will, inevitably, be slower than simply duping 100% of the time and skipping the test. COW is an effective optimization only when the data doesn't need to change a significant percentage of the time. Real text tends to be 100% lower case most of the time (just look at this post, for example). So, I suggest feeding the program some real text, such as splitting into words the complete text of "Alice in Wonderland" which is linked to from www.digitalmars.com/d/cppstrings.html. It'll be fun to see what the relative timings are for that. Of course, as always, when tuning for speed one must tune to the actual data expected. std.string.tolower(), being a library function, doesn't have the luxury of knowing this in advance, so we just make the best guess we can and go from there.
Jul 21 2005
"Walter" <newshound digitalmars.com> wrote in message news:dbnmh1$1qu3$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:dbeli8$16e6$1 digitaldaemon.com...Agreed. This is just a demonstration of the fact that COW cannot be used as a "silver bullet".Conclusion: In given pretty realistic exampleIt's an interesting benchmark, but I don't agree it is realistic. The strings presented by the test program to the tolower_cow() function will need to .dup the string essentially 100% of the time (as the string length is evenly distributed between 1 and 2048, and each character will be uppercase 50% of the time). Duping the string with a test 100% of the time will, inevitably, be slower than simply duping 100% of the time and skipping the test. COW is an effective optimization only when the data doesn't need to change a significant percentage of the time.Real text tends to be 100% lower case most of the time (just look at this post, for example). So, I suggest feeding the program some real text, such as splitting into words the complete text of "Alice in Wonderland" which is linked to from www.digitalmars.com/d/cppstrings.html. It'll be fun to see what the relative timings are for that. Of course, as always, when tuning for speed one must tune to the actual data expected. std.string.tolower(), being a library function, doesn't have the luxury of knowing this in advance, so we just make the best guess we can and go from there.In those C RTLs which have strlwr functions it is implemented as in-place function. In C++ libraries it usually implemented as: void strlwr(string &s); void strupr(string &s); string strlwr(const string &s); string strupr(const string &s); And for me this makes much more sense than then Phobos "dcowed" version - gives me more choices for optimisations.
Jul 21 2005
In article <dbp0i4$2rph$1 digitaldaemon.com>, Andrew Fedoniouk says...In those C RTLs which have strlwr functions it is implemented as in-place function. In C++ libraries it usually implemented as: void strlwr(string &s); void strupr(string &s); string strlwr(const string &s); string strupr(const string &s);For C++, why not just use: std::transform(str.begin(),str.end(),str.begin(),tolower); Is there really any reason to wrap this in a function? Sean
Jul 21 2005
"Sean Kelly" <sean f4.ca> wrote in message news:dbp4ou$2usq$1 digitaldaemon.com...In article <dbp0i4$2rph$1 digitaldaemon.com>, Andrew Fedoniouk says...Reason is to have them both. void strlwr(string &s); string strlwr(const string &s);In those C RTLs which have strlwr functions it is implemented as in-place function. In C++ libraries it usually implemented as: void strlwr(string &s); void strupr(string &s); string strlwr(const string &s); string strupr(const string &s);For C++, why not just use: std::transform(str.begin(),str.end(),str.begin(),tolower); Is there really any reason to wrap this in a function?
Jul 21 2005