digitalmars.D.learn - sliced().array compatibility with parallel?
- Jay Norwood (34/34) Jan 09 2016 I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice":
- Ilya Yaroshenko (4/9) Jan 09 2016 It is a bug (Slice or Parallel ?). Please fill this issue.
- Jay Norwood (2/5) Jan 09 2016 Ok, thanks, I'll submit it.
- Jay Norwood (4/4) Jan 09 2016 for example,
- Ilya Yaroshenko (54/89) Jan 09 2016 This is a bug in std.parallelism :-)
- Jay Norwood (43/44) Jan 09 2016 ok, thanks. I'm using your code and reduced it a bit. Looks
- Russel Winder via Digitalmars-d-learn (17/17) Jan 10 2016 T24gU3VuLCAyMDE2LTAxLTEwIGF0IDAxOjQ2ICswMDAwLCBKYXkgTm9yd29vZCB2aWEgRGln...
- Jay Norwood (2/12) Jan 10 2016 I saw it mentioned in another post, and tried it. Works.
- Ilya Yaroshenko (6/11) Jan 09 2016 Oh... there is no bug.
- Jay Norwood (2/14) Jan 09 2016 ok, thanks. That works. I'll go back to trying ndslice now.
- Jay Norwood (41/42) Jan 09 2016 The parallel time for this case is about a 2x speed-up on my
- Ilya (3/8) Jan 09 2016 I will add significantly faster pairwise summation based on SIMD
- Jay Norwood (12/14) Jan 10 2016 Wow! A lot of overhead in the debug build. I checked the
- Marc =?UTF-8?B?U2Now7x0eg==?= (3/15) Jan 10 2016 I'd say, if `shared` is required, but it compiles without, then
- Jay Norwood (8/10) Jan 10 2016 Yeah, probably so. Interestingly, without 'shared' and using a
I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? I find that without using parallel, all the means get computed, but with parallel, only about half of them are computed in this example. The others remain NaN, examined in the debugger in Visual D. import std.range : iota; import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import std.experimental.ndslice; enum testCount = 1; double[1000] means; double[] data; void f1() { import std.parallelism; auto sl = data.sliced(1000,100_000); auto sla = sl.array(); foreach(i,vec; parallel(sla)){ double v=vec.sum(0.0); means[i] = v / 100_000; } } void main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} auto r = benchmark!(f1)(testCount); auto f0Result = to!Duration(r[0] / testCount); f0Result.writeln; writeln(means[0]); }
Jan 09 2016
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? [...]It is a bug (Slice or Parallel ?). Please fill this issue. Slice should work with parallel, and array of slices should work with parallel.
Jan 09 2016
On Sunday, 10 January 2016 at 00:41:35 UTC, Ilya Yaroshenko wrote:It is a bug (Slice or Parallel ?). Please fill this issue. Slice should work with parallel, and array of slices should work with parallel.Ok, thanks, I'll submit it.
Jan 09 2016
for example, means[63] through means[251] are consistently all NaN when using parallel in this test, but are all computed double values when parallel is not used.
Jan 09 2016
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? I find that without using parallel, all the means get computed, but with parallel, only about half of them are computed in this example. The others remain NaN, examined in the debugger in Visual D. import std.range : iota; import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import std.experimental.ndslice; enum testCount = 1; double[1000] means; double[] data; void f1() { import std.parallelism; auto sl = data.sliced(1000,100_000); auto sla = sl.array(); foreach(i,vec; parallel(sla)){ double v=vec.sum(0.0); means[i] = v / 100_000; } } void main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} auto r = benchmark!(f1)(testCount); auto f0Result = to!Duration(r[0] / testCount); f0Result.writeln; writeln(means[0]); }This is a bug in std.parallelism :-) Proof: import std.range : iota; import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import mir.ndslice; import std.parallelism; enum testCount = 1; double[1000] means; double[] data; void f1() { //auto sl = data.sliced(1000, 100_000); //auto sla = sl.array(); auto sla = new double[][1000]; foreach(i, ref e; sla) { e = data[i * 100_000 .. (i+1) * 100_000]; } foreach(i,vec; parallel(sla)) { double v = vec.sum; means[i] = v / vec.length; } } void main() { data = new double[100_000_000]; foreach(i, ref e; data){ e = i / 100_000_000.0; } auto r = benchmark!(f1)(testCount); auto f0Result = to!Duration(r[0] / testCount); f0Result.writeln; writeln(means); } Prints: [0.000499995, 0.0015, 0.0025, 0.0035, 0.00449999, 0.00549999, 0.00649999, 0.00749999, 0.00849999, 0.00949999, 0.0105, 0.0115, 0.0125, 0.0135, 0.0145, 0.0155, 0.0165, 0.0175, 0.0185, 0.0195, 0.0205, 0.0215, 0.0225, 0.0235, 0.0245, 0.0255, 0.0265, 0.0275, 0.0285, 0.0295, 0.0305, 0.0315, 0.0325, 0.0335, 0.0345, 0.0355, 0.0365, 0.0375, 0.0385, 0.0395, 0.0405, 0.0415, 0.0425, 0.0435, 0.0445, 0.0455, 0.0465, 0.0475, 0.0485, 0.0495, 0.0505, 0.0515, 0.0525, 0.0535, 0.0545, 0.0555, 0.0565, 0.0575, 0.0585, 0.0595, 0.0605, 0.0615, 0.0625, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan ....
Jan 09 2016
On Sunday, 10 January 2016 at 00:47:29 UTC, Ilya Yaroshenko wrote:This is a bug in std.parallelism :-)ok, thanks. I'm using your code and reduced it a bit. Looks like it has some interaction with executing vec.sum. If I substitute a simple assign of a double value, then all the values are updated in the parallel version also. import std.algorithm; double[1000] dvp; double[1000] dv2; double[] data; void f1() { import std.parallelism; auto sla = new double[][1000]; foreach(i, ref e; sla) { e = data[i * 100_000 .. (i+1) * 100_000]; } // calculate sums in parallel foreach(i, vec; parallel(sla)){ dvp[i] = vec.sum; } // calculate same values non-parallel foreach(i, vec; sla){ dv2[i] = vec.sum; } } int main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} f1(); // processed non-parallel works ok foreach( dv; dv2){ if(dv != dv){ // test for NaN return 1; } } // calculated parallel leaves out processing of many values foreach( dv; dvp){ if(dv != dv){ // test for NaN return 1; } } return(0); }
Jan 09 2016
T24gU3VuLCAyMDE2LTAxLTEwIGF0IDAxOjQ2ICswMDAwLCBKYXkgTm9yd29vZCB2aWEgRGlnaXRh bG1hcnMtZC1sZWFybgp3cm90ZToKPiAKW+KApl0KPiDCoMKgwqDCoMKgLy8gcHJvY2Vzc2VkIG5v bi1wYXJhbGxlbCB3b3JrcyBvawo+IMKgwqDCoMKgwqBmb3JlYWNoKCBkdjsgZHYyKXsKPiDCoMKg wqDCoMKgwqDCoMKgwqBpZihkdiAhPSBkdil7IC8vIHRlc3QgZm9yIE5hTgo+IMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgcmV0dXJuIDE7Cj4gwqDCoMKgwqDCoMKgwqDCoMKgfQo+IMKgwqDCoMKg wqB9Cj4gCj4gwqDCoMKgwqDCoC8vIGNhbGN1bGF0ZWQgcGFyYWxsZWwgbGVhdmVzIG91dCBwcm9j ZXNzaW5nIG9mIG1hbnkgdmFsdWVzCj4gwqDCoMKgwqDCoGZvcmVhY2goIGR2OyBkdnApewo+IMKg wqDCoMKgwqDCoMKgwqDCoGlmKGR2ICE9IGR2KXsgLy8gdGVzdCBmb3IgTmFOCj4gwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gMTsKPiDCoMKgwqDCoMKgwqDCoMKgwqB9Cj4gwqDCoMKg wqDCoH0KPiDCoMKgwqDCoMKgcmV0dXJuKDApOwo+IH0KCkkgYW0gbm90IGNvbnZpbmNlZCB0aGVz ZSAiVGVzdHMgZm9yIE5hTiIgYWN0dWFsbHkgdGVzdCBmb3IgTmFOLiBJCmJlbGlldmUgeW91IGhh dmUgdG8gdXNlIGlzTmFuKGR2KS4KCi0tIApSdXNzZWwuCj09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09CkRy IFJ1c3NlbCBXaW5kZXIgICAgICB0OiArNDQgMjAgNzU4NSAyMjAwICAgdm9pcDogc2lwOnJ1c3Nl bC53aW5kZXJAZWtpZ2EubmV0CjQxIEJ1Y2ttYXN0ZXIgUm9hZCAgICBtOiArNDQgNzc3MCA0NjUg MDc3ICAgeG1wcDogcnVzc2VsQHdpbmRlci5vcmcudWsKTG9uZG9uIFNXMTEgMUVOLCBVSyAgIHc6 IHd3dy5ydXNzZWwub3JnLnVrICBza3lwZTogcnVzc2VsX3dpbmRlcgoK
Jan 10 2016
On Sunday, 10 January 2016 at 12:11:39 UTC, Russel Winder wrote:I saw it mentioned in another post, and tried it. Works.foreach( dv; dvp){ if(dv != dv){ // test for NaN return 1; } } return(0); }I am not convinced these "Tests for NaN" actually test for NaN. I believe you have to use isNan(dv).
Jan 10 2016
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? [...]Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
Jan 09 2016
On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:ok, thanks. That works. I'll go back to trying ndslice now.I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? [...]Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
Jan 09 2016
On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:ok, thanks. That works. I'll go back to trying ndslice now.The parallel time for this case is about a 2x speed-up on my corei5 laptop, debug build in windows32, dmd. D:\ec_mars_ddt\workspace\nd8>nd8.exe parallel time msec:2495 non_parallel msec:5093 =========== import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import std.experimental.ndslice; shared double[1000] means; double[] data; void f1() { import std.parallelism; auto sl = data.sliced(1000,100_000); foreach(i,vec; parallel(sl)){ means[i] = vec.sum / 100_000; } } void f2() { auto sl = data.sliced(1000,100_000); foreach(i,vec; sl.array){ means[i] = vec.sum / 100_000; } } void main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} StopWatch sw1, sw2; sw1.start(); f1() ; auto r1 = sw1.peek().msecs; sw2.start(); f2(); auto r2 = sw2.peek().msecs; writeln("parallel time msec:",r1); writeln("non_parallel msec:", r2); }
Jan 09 2016
On Sunday, 10 January 2016 at 02:43:05 UTC, Jay Norwood wrote:On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:I will add significantly faster pairwise summation based on SIMD instructions into the future std.las. --Ilya[...]The parallel time for this case is about a 2x speed-up on my corei5 laptop, debug build in windows32, dmd. [...]
Jan 09 2016
On Sunday, 10 January 2016 at 03:23:14 UTC, Ilya wrote:I will add significantly faster pairwise summation based on SIMD instructions into the future std.las. --IlyaWow! A lot of overhead in the debug build. I checked the computed values are the same. This is on my laptop corei5. dub -b release-nobounds --force parallel time msec:448 non_parallel msec:767 dub -b debug --force parallel time msec:2465 non_parallel msec:4962 on my corei7 desktop, the release-no bounds parallel time msec:161 non_parallel msec:571
Jan 10 2016
On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:I'd say, if `shared` is required, but it compiles without, then it's still a bug.I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": "~>0.8.8". If I convert the 2D slice with .array(), should that first dimension then be compatible with parallel foreach? [...]Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
Jan 10 2016
On Sunday, 10 January 2016 at 11:21:53 UTC, Marc Schütz wrote:I'd say, if `shared` is required, but it compiles without, then it's still a bug.Yeah, probably so. Interestingly, without 'shared' and using a simple assignment from a constant (means[i]= 1.0;), instead of assignment from the sum() evaluation, results in all the values being initialized, so not marking it shared doesn't protect it from being written from the other thread. Anyway, the shared declaration doesn't seem to slow the execution, and it does make sense to me that it should be marked shared.
Jan 10 2016