digitalmars.D.learn - foreach (i; taskPool.parallel(0..2_000_000)
- Paul (7/7) Apr 01 2023 Thanks in advance for any assistance.
- Steven Schveighoffer (6/14) Apr 01 2023 ```d
- Paul (1/1) Apr 01 2023 Thanks Steve.
- Paul (5/10) Apr 01 2023 Is there a way to verify that it split up the work in to
- =?UTF-8?Q?Ali_=c3=87ehreli?= (31/33) Apr 01 2023 It is hard to see the difference unless there is actual work in the loop...
- Salih Dincer (71/77) Apr 01 2023 I always use the Rowland Sequence for such experiments. At least
- Salih Dincer (25/26) Apr 02 2023 **Edit:** I saw, I saw :)
- Salih Dincer (53/72) Apr 04 2023 I don't understand what `foreach()` does :)
- =?UTF-8?Q?Ali_=c3=87ehreli?= (10/12) Apr 04 2023 Hm. I forgot whether 'parallel' works only with 'foreach'. But there are...
- Steven Schveighoffer (12/14) Apr 04 2023 parallel is a shortcut to `TaskPool.parallel`, which is indeed a
- Salih Dincer (26/38) Apr 04 2023 I tried, thanks but it goes into infinite loop. For example, the
- Steven Schveighoffer (5/42) Apr 04 2023 Keep in mind that `arr` and `range` are thread-local, and so will be
- Salih Dincer (108/149) Apr 05 2023 I guess the reason it goes into an infinite loop is because gcd()
- Paul (4/10) Apr 01 2023 Is there a way to tell if the parallelism actually divided up the
- Steven Schveighoffer (9/21) Apr 02 2023 It's important to note that parallel doesn't iterate the range in
- Paul (7/16) Apr 03 2023 **?!?**
- Steven Schveighoffer (25/37) Apr 03 2023 So for example, if you have:
- Paul (7/13) Apr 03 2023 **No**
- Steven Schveighoffer (3/16) Apr 03 2023 Yeah, please post.
- Paul (146/147) Apr 03 2023 ```d
- Steven Schveighoffer (25/73) Apr 03 2023 So I just quoted your main loop.
- Paul (7/11) Apr 04 2023 Well Steven just making the change you said reduced the execution
- H. S. Teoh (13/19) Apr 04 2023 Best practices for arrays in hot loops:
- Paul (7/19) Apr 05 2023 Thanks for sharing Teoh! Very helpful.
- Steven Schveighoffer (20/36) Apr 05 2023 No, random access is access out of sequence. Those two lines are pretty
- H. S. Teoh (37/50) Apr 05 2023 [...]
- Paul (7/10) Apr 05 2023 Yes I understand, basically, what's going on in hardware. I just
- H. S. Teoh (12/19) Apr 05 2023 D ranges are conceptually sequential, but the actual underlying memory
- Paul (2/10) Apr 06 2023 Very helpful Teoh. Thanks again.
Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge.
Apr 01 2023
On 4/1/23 2:25 PM, Paul wrote:Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge.```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve
Apr 01 2023
```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -SteveIs there a way to verify that it split up the work in to tasks/threads ...? The example you gave me works...compiles w/o errors but the execution time is the same as the non-parallel version. They both take about 6 secs to execute. totalCPUs tells me I have 8 CPUs available.
Apr 01 2023
On 4/1/23 15:30, Paul wrote:Is there a way to verify that it split up the work in to tasks/threads ...?It is hard to see the difference unless there is actual work in the loop that takes time. You can add a Thread.sleep call. (Commented-out in the following program.) Another option is to monitor a task manager like 'top' on unix based systems. It should multiple threads for the same program. However, I will do something unspeakably wrong and take advantage of undefined behavior below. :) Since iteration count is an even number, the 'sum' variable should come out as 0 in the end. With .parallel it doesn't because multiple threads are stepping on each other's toes (values): import std; void main() { long sum; foreach(i; iota(0, 2_000_000).parallel) { // import core.thread; // Thread.sleep(1.msecs); if (i % 2) { ++sum; } else { --sum; } } if (sum == 0) { writeln("We highly likely worked serially."); } else { writefln!"We highly likely worked in parallel because %s != 0."(sum); } } If you remove .parallel, 'sum' will always be 0. Ali
Apr 01 2023
On Saturday, 1 April 2023 at 22:48:46 UTC, Ali Çehreli wrote:On 4/1/23 15:30, Paul wrote:I always use the Rowland Sequence for such experiments. At least it's better than the Fibonacci Range: ```d struct RowlandSequence { import std.numeric : gcd; import std.format : format; import std.conv : text; long b, r, a = 3; enum empty = false; string[] front() { string result = format("%s, %s", b, r); return [text(a), result]; } void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } } enum BP { f = 1, b = 7, r = 2, a = 1, /* f = 109, b = 186837516, r = 62279173, //*/ s = 5 } void main() { RowlandSequence rs; long start, skip; with(BP) { rs = RowlandSequence(b, r); start = f; skip = s; } rs.popFront(); import std.stdio, std.parallelism; import std.range : take; auto rsFirst128 = rs.take(128); foreach(r; rsFirst128.parallel) { if(r[0].length > skip) { start.writeln(": ", r); } start++; } } /* PRINTS: 46: ["121403", "364209, 121404"] 48: ["242807", "728421, 242808"] 68: ["486041", "1458123, 486042"] 74: ["972533", "2917599, 972534"] 78: ["1945649", "5836947, 1945650"] 82: ["3891467", "11674401, 3891468"] 90: ["7783541", "23350623, 7783542"] 93: ["15567089", "46701267, 15567090"] 102: ["31139561", "93418683, 31139562"] 108: ["62279171", "186837513, 62279172"] */ ``` The operation is simple, again multiplication, addition, subtraction and module, i.e. So four operations but enough to overrun the CPU! I haven't seen rsFirst256 until now because I don't have a fast enough processor. Maybe you'll see it, but the first 108 is fast anyway. **PS:** Decrease value of the `skip` to see the entire sequence. In cases where your processor power is not enough, you can create skip points. Check out BP... SDB 79Is there a way to verify that it split up the work in totasks/threads...?It is hard to see the difference unless there is actual work in the loop that takes time.
Apr 01 2023
On Sunday, 2 April 2023 at 04:34:40 UTC, Salih Dincer wrote:I haven't seen rsFirst256 until now...**Edit:** I saw, I saw :) I am struck with consternation! I've never seen these results before. Interesting, there is such a thing as parallel threading :) Here are my skipPoints: ```d enum BP : long { //f, a, r, b = 7, /* <- beginning f = 113, r = 62279227, b = 186837678, // f = 146, r = 249134971, b = 747404910, // f = 161, r = 498270808, b = 1494812421, // f = 178, r = 1993083484, b = 5979250449, // f = 210, r = 3986167363, b = 11958502086, //*/ s = 5 } /* PRINTS: eLab pico:~/Projeler$ ./RownlandSequence_v2 122: ["124559610, 373678827"] 128: ["249120240, 747360717"] */ ``` It looks like there are 5 total skipPoints until 256 where it loops for a long time. (while passing 1's). SDB 79
Apr 02 2023
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:So for example, if you have: ```d foreach(i; iota(0, 2_000_000).parallel) { runExpensiveTask(i); } ``` The foreach is run on the main thread, gets a `0`, then hands off to a task thread `runExpensiveTask(0)`. Then it gets a `1`, and hands off to a task thread `runExpensiveTask(1)`, etc. The iteration is not expensive, and is not done in parallel. On the other hand, what you *shouldn't* do is: ```d foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel) { } ``` as this will run the expensive task *before* running any tasks.I don't understand what `foreach()` does :) ```d import std.range, std.algorithm : map; import std.stdio, std.parallelism; //import sdb.sequences : RowlandSequence_v2;/* struct RowlandSequence_v2 { import std.numeric : gcd; long b, r, a = 3; enum empty = false; auto front() => a; void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } }//*/ enum BP : long { // s, f, r, b = 7, /* <- beginning s = 178, r = 1993083484, b = 5979250449,//*/ len = 190 } void main() { with(BP) { long[len] arr; auto range = RowlandSequence_v2(b, r); s.iota(len) .map!((a){ range.popFront(); return arr[a] = range.front(); } ) .parallel .writeln; } } /* PRINTS: ParallelForeach!(MapResult!(__lambda3, Result))(std.parallelism.TaskPool, [5, 3, 73, 157, 7, 5, 3, 13, 3986167223, 3, 7, 73], 1) */ ``` Is it necessary to enclose the code in `foreach()`? I invite Ali to tell me! Please explain why parallel isn't running. "Ben anlamıyor, foreach ne yapıyor 😀 Kodu `foreach()` içine almak şart mı? Ali'yi davet ediyorum, bana anlatması için! Paralel() niye çalışmıyor, lütfen açıklayın hocam. Mümkünse Türkçe!" in Turkish. SDB 79
Apr 04 2023
On 4/4/23 02:24, Salih Dincer wrote:I don't understand what `foreach()` does :)Hm. I forgot whether 'parallel' works only with 'foreach'. But there are various other algorithms in std.parallelism that may be more useful with range algorithm chains: https://dlang.org/phobos/std_parallelism.htmlin TurkishI don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well: http://ddili.org/ders/d/kosut_islemler.html http://ddili.org/ders/d.en/parallelism.html Ali
Apr 04 2023
On 4/4/23 5:24 AM, Salih Dincer wrote:Is it necessary to enclose the code in `foreach()`? I invite Ali to tell me! Please explain why parallel isn't running.parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ``` Can't use pipelining with it, because it is a member function. https://dlang.org/phobos/std_parallelism.html#.TaskPool.map -Steve
Apr 04 2023
On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ```I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) { range.popFront(); return arr[a] = range.front(); } void main() { range = RowlandSequence_v2(7, 2); taskPool.map!next(iota(50))/* s.iota(50) .map!next .parallel//*/ .writeln; } ``` On Tuesday, 4 April 2023 at 13:18:01 UTC, Ali Çehreli wrote:I don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well: http://ddili.org/ders/d/kosut_islemler.htmlI read it, thanks... SDB 79
Apr 04 2023
On 4/4/23 11:34 AM, Salih Dincer wrote:On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:Keep in mind that `arr` and `range` are thread-local, and so will be different states for different tasks. Though I don't really know what you are doing there. -Steveparallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */)   (s.iota(len)).writeln; ```I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) {  range.popFront();  return arr[a] = range.front(); } void main() {    range = RowlandSequence_v2(7, 2);    taskPool.map!next(iota(50))/*    s.iota(50)     .map!next     .parallel//*/     .writeln; } ```
Apr 04 2023
On Tuesday, 4 April 2023 at 16:22:29 UTC, Steven Schveighoffer wrote:On 4/4/23 11:34 AM, Salih Dincer wrote:I guess the reason it goes into an infinite loop is because gcd() a recursive function (gcd). This is the only solution I can think of about this: ```d import std.range, std.algorithm : map; import std.stdio, std.parallelism; //import std.numeric : gcd; /* struct Vector { long x, y, result; alias result this; } Vector gcd(long a, long b) { if(b == 0) return Vector(1, 0, a); auto pre = gcd(b, a % b); auto tmp = (a / b) * pre.y; return Vector(pre.y, pre.x - tmp, pre.result); }//*/ struct RowlandSequence_v3 { long b, r, n, a = 3, limit; bool empty() { return n == limit; } auto front() { return a; } void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } long gcd(long a, long b) { long c; while(b != 0) { c = a % b; a = b; b = c; } return a; } } auto next(ref RowlandSequence_v3 range) { with(range) { if(empty) return [n, front]; popFront(); return [n++, front]; } } long[179] arr; void main() { // initialization: RowlandSequence_v3[4] r = [ RowlandSequence_v3(7 , 2, 0, 3, 112), RowlandSequence_v3(186837678, 62279227, 112, 3, 145), RowlandSequence_v3(747404910, 249134971, 145, 6257, 160), RowlandSequence_v3(1494812421, 498270808, 160, 11, 177) ]; auto tasks = [ task(&next, r[0]), task(&next, r[1]), task(&next, r[2]), task(&next, r[3]) ]; // quad parallel operation: foreach(_; 0..r[0].limit) { foreach(p, ref task; tasks) { task.executeInNewThread; auto t = task.workForce; arr[t[0]] = t[1]; } } // prints... foreach(x, n; arr) { switch(x + 1) { case 112, 145, 160: n.writeln; break; default: n.write(", "); } } } /* PRINTS: user debian:~/Documents$ dmd -O rowlandSequence.d -release user debian:~/Documents$ time ./rowlandSequence 5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 467, 3, 5, 3, 941, 3, 7, 1889, 3, 3779, 3, 7559, 3, 13, 15131, 3, 53, 3, 7, 30323, 3, 60647, 3, 5, 3, 101, 3, 121403, 3, 242807, 3, 5, 3, 19, 7, 5, 3, 47, 3, 37, 5, 3, 17, 3, 199, 53, 3, 29, 3, 486041, 3, 7, 421, 23, 3, 972533, 3, 577, 7, 1945649, 3, 163, 7, 3891467, 3, 5, 3, 127, 443, 3, 31, 7783541, 3, 7, 15567089, 3, 19, 29, 3, 5323, 7, 5, 3, 31139561, 3, 41, 3, 5, 3, 62279171, 3, 7, 83, 3 29, 3, 1103, 3, 5, 3, 13, 7, 124559609, 3, 107, 3, 911, 3, 249120239, 3, 11, 3, 7, 61, 37, 179, 3, 31, 19051, 7, 3793, 23, 3, 5, 3, 6257, 3 3, 11, 3, 13, 5, 3, 739, 37, 5, 3, 498270791, 3, 19, 11, 3 3, 3, 5, 3, 996541661, 3, 7, 37, 5, 3, 67, 1993083437, 3, 5, 3, 83, 3, 3, 0, real 7m54.093s user 7m54.062s sys 0m0.024s */ ``` However, parallel processing for 4-digit sequence elements is not promising at least for the Rowland Sequence. SDB 79On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:Keep in mind that `arr` and `range` are thread-local, and so will be different states for different tasks.parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */)   (s.iota(len)).writeln; ```I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) {  range.popFront();  return arr[a] = range.front(); } void main() {    range = RowlandSequence_v2(7, 2);    taskPool.map!next(iota(50))/*    s.iota(50)     .map!next     .parallel//*/     .writeln; } ```
Apr 05 2023
On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote:On 4/1/23 2:25 PM, Paul wrote: ```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -SteveIs there a way to tell if the parallelism actually divided up the work? Both versions of my program run in the same time ~6 secs.
Apr 01 2023
On 4/1/23 6:32 PM, Paul wrote:On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote:It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count. If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). If you can disclose more about what you are trying to do, it would be helpful. Also make sure you have more than one logical CPU. -SteveOn 4/1/23 2:25 PM, Paul wrote: ```d import std.range; foreach(i; iota(0, 2_000_000).parallel) ```Is there a way to tell if the parallelism actually divided up the work? Both versions of my program run in the same time ~6 secs.
Apr 02 2023
On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote:It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count.**?!?**If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually).**Ok I did have some debug writelns I commented out.**If you can disclose more about what you are trying to do, it would be helpful.**This seems like it would be a lot of code and explaining but let me think about how to summarize.**Also make sure you have more than one logical CPU.**I have 8.**
Apr 03 2023
On 4/3/23 6:02 PM, Paul wrote:On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote:So for example, if you have: ```d foreach(i; iota(0, 2_000_000).parallel) { runExpensiveTask(i); } ``` The foreach is run on the main thread, gets a `0`, then hands off to a task thread `runExpensiveTask(0)`. Then it gets a `1`, and hands off to a task thread `runExpensiveTask(1)`, etc. The iteration is not expensive, and is not done in parallel. On the other hand, what you *shouldn't* do is: ```d foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel) { } ``` as this will run the expensive task *before* running any tasks.It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count.**?!?**And did it help? Another thing that takes a global lock is memory allocation.If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually).**Ok I did have some debug writelns I commented out.**It's dependent on the work being done, but you should see a roughly 8x speedup as long as the overhead of distributing tasks is not significant compared to the work being done. -SteveAlso make sure you have more than one logical CPU.**I have 8.**
Apr 03 2023
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:**No** My program is about 140 lines Steven. Its just one of the Advent of Code challenges. Could I past the whole program here and see what you think? Thanks for your assistance...much appreciated.And did it help?If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually).**Ok I did have some debug writelns I commented out.**
Apr 03 2023
On 4/3/23 6:56 PM, Paul wrote:On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:Yeah, please post. -Steve**No** My program is about 140 lines Steven. Its just one of the Advent of Code challenges. Could I past the whole program here and see what you think?And did it help?If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually).**Ok I did have some debug writelns I commented out.**
Apr 03 2023
On Monday, 3 April 2023 at 23:13:58 UTC, Steven Schveighoffer wrote:Yeah, please post.```d module aoc2215b2; import std.stdio; import std.file: readText; import std.conv: to; import std.math: abs; import std.traits; import std.parallelism; import std.range; import core.time: MonoTime; // Timed main() vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv void main(string[] args) { auto progStartTime = MonoTime.currTime; //----------------------------------------------------------------------------- string filename = args.length > 1 ? args[1] : "aoc2215a.data"; CommPair[] commPair; ulong x,y; // read file that has data sets in the form of x,y coordinate pairs // for each sensor-beacon pair. Create on array of structs to hold // this information. loadFileDataIntoArrayOfStructs(commPair, filename); foreach(int lineOfInterest;parallel(iota(0,4_000_001))) { Span[] span; // A section of line-of-interest coordinates where no other beacons are present. const spanReserve = span.reserve(50); createSpansOfNoBeacons(lineOfInterest,commPair,span); // if spans overlap, combine them into a single span and mark // the other spans !inUse. combineOverlappingSpans(span); // look for a line that doesn't have 4,000,001 locations accounted for if(beaconFreeLocations(span) < 4_000_001) { // find the location that is not accounted for foreach(ulong i;0..4_000_000) { bool found = false; foreach(sp;span) { if(i >= sp.xLow && i <= sp.xHigh) { found = true; break; } } if(!found) { x = i; y = lineOfInterest; break; } } } } writeln(x," ",y); //----------------------------------------------------------------------------- auto progEndTime = MonoTime.currTime; writeln(progEndTime - progStartTime); } // Timed main() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ struct CommPair { int sx,sy,bx,by; int manhattanDistance; } void loadFileDataIntoArrayOfStructs(ref CommPair[] commPair, string filename) { import std.regex; auto s = readText(filename); auto ctr = ctRegex!(`x=(-*\d+), y=(-*\d+):.*x=(-*\d+), y=(-*\d+)`); CommPair cptemp; foreach (c; matchAll(s, ctr)) { cptemp.sx = to!int(c[1]); cptemp.sy = to!int(c[2]); cptemp.bx = to!int(c[3]); cptemp.by = to!int(c[4]); cptemp.manhattanDistance = abs(cptemp.sx-cptemp.bx) + abs(cptemp.sy-cptemp.by); commPair ~= cptemp; } } struct Span { int xLow, xHigh; bool inUse = true; } void createSpansOfNoBeacons(int lineOfInterest, CommPair[] commPair,ref Span[] span) { foreach(size_t i,cp;commPair) { int distanceToLineOfInterest = abs(cp.sy - lineOfInterest); if(cp.manhattanDistance >= distanceToLineOfInterest) { int xLow = cp.sx - (cp.manhattanDistance - distanceToLineOfInterest); int xHigh = cp.sx + (cp.manhattanDistance - distanceToLineOfInterest); span ~= Span(xLow,xHigh); } } } void combineOverlappingSpans(ref Span[] span) { bool combinedSpansThisCycle = true; while(combinedSpansThisCycle) { combinedSpansThisCycle = false; for(size_t i=0; i < span.length-1; i++) { if(!span[i].inUse) continue; for(size_t j=i+1; j < span.length; j++) { if(!span[j].inUse) continue; // if one span overlaps with the other, combine them into one span if(spanIContainedInSpanJ(span[i],span[j]) || spanJContainedInSpanI(span[i],span[j])) { span[i].xLow = span[i].xLow < span[j].xLow ? span[i].xLow : span[j].xLow; span[i].xHigh = span[i].xHigh > span[j].xHigh ? span[i].xHigh : span[j].xHigh; span[j].inUse = false; combinedSpansThisCycle = true; // after combining two spans, perform bounds checking // 15 part b limits the search between 0 and 4,000,000 span[i].xLow = span[i].xLow < 0 ? 0 : span[i].xLow; span[i].xHigh = span[i].xHigh > 4_000_000 ? 4_000_000 : span[i].xHigh; } } } } } bool spanIContainedInSpanJ (Span spanI, Span spanJ) { return (spanI.xLow >= spanJ.xLow && spanI.xLow <= spanJ.xHigh) || (spanI.xHigh>= spanJ.xLow && spanI.xHigh<= spanJ.xHigh); } bool spanJContainedInSpanI (Span spanI, Span spanJ) { return (spanJ.xLow >= spanI.xLow && spanJ.xLow <= spanI.xHigh) || (spanJ.xHigh>= spanI.xLow && spanJ.xHigh<= spanI.xHigh); } int beaconFreeLocations(ref Span[] span) { int noBeaconCount = 0; foreach(sp;span) if(sp.inUse) noBeaconCount += sp.xHigh - sp.xLow + 1; return noBeaconCount; } ```
Apr 03 2023
On 4/3/23 7:22 PM, Paul wrote:```d // Timed main() vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv void main(string[] args) { auto progStartTime = MonoTime.currTime; //-----------------------------------------------------------------------------     string filename = args.length > 1 ? args[1] : "aoc2215a.data";     CommPair[] commPair;     ulong x,y;     // read file that has data sets in the form of x,y coordinate pairs     // for each sensor-beacon pair. Create on array of structs to hold     // this information.     loadFileDataIntoArrayOfStructs(commPair, filename);     foreach(int lineOfInterest;parallel(iota(0,4_000_001))) {        Span[] span; // A section of line-of-interest coordinates where no other beacons are present.        const spanReserve = span.reserve(50);        createSpansOfNoBeacons(lineOfInterest,commPair,span);        // if spans overlap, combine them into a single span and mark        // the other spans !inUse.        combineOverlappingSpans(span);        // look for a line that doesn't have 4,000,001 locations accounted for        if(beaconFreeLocations(span) < 4_000_001) {            // find the location that is not accounted for            foreach(ulong i;0..4_000_000) {                bool found = false;                foreach(sp;span) {                    if(i >= sp.xLow && i <= sp.xHigh) {                        found = true;                        break;                    }                }                if(!found) {                    x = i; y = lineOfInterest;                    break;                }            }        }     } writeln(x," ",y); ```So I just quoted your main loop. I am assuming that this O(n^2) algorithm doesn't actually run for all iterations, because that wouldn't be feasible (16 trillion iterations is a lot). This means that I'm assuming a lot of cases do not run the second loop. Everything you do besides prune the second loop is mostly allocating an array of `Span` types. This means most of the parallel loops are allocating, and doing nothing else. As I said earlier, allocations need a global lock of the GC. What you need to do probably, is to avoid these allocations per loop. The easiest thing I can think of is to store the Span array as a static array of the largest array you need (i.e. the length of `commPair`), and then slice it instead of appending. So what you need is inside `createSpansOfNoBeacons`, take as a reference a `ref Span[MAX_SPANS]`, and have it return a `Span[]` that is a slice of that which was "alocated". See if this helps. FWIW, I did the AoC 2022 as well, and I never needed parallel execution. Looking at my solution comment in reddit, this one I sort of punted by knowing I could exit as soon as the answer is found (my solution runs in 2.5s on my input). But I recommend (once you are done), reading this post, it is a really cool way to look at it: https://www.reddit.com/r/adventofcode/comments/zmcn64/2022_day_15_solutions/j0hl19a/?context=8&depth=9 -Steve
Apr 03 2023
On Monday, 3 April 2023 at 23:50:48 UTC, Steven Schveighoffer wrote:So what you need is inside `createSpansOfNoBeacons`, take as a reference a `ref Span[MAX_SPANS]`, and have it return a `Span[]` that is a slice of that which was "alocated". See if this helps.Well Steven just making the change you said reduced the execution time from ~6-7 secs to ~3 secs. Then, including the 'parallel' in the foreach statement took it down to ~1 sec. Boy lesson learned in appending-to and zeroing dynamic arrays in a hot loop!
Apr 04 2023
On Tue, Apr 04, 2023 at 09:35:29PM +0000, Paul via Digitalmars-d-learn wrote: [...]Well Steven just making the change you said reduced the execution time from ~6-7 secs to ~3 secs. Then, including the 'parallel' in the foreach statement took it down to ~1 sec. Boy lesson learned in appending-to and zeroing dynamic arrays in a hot loop!Best practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones and allocating new ones. - Use slices where possible instead of making copies of subarrays (this esp. applies to strings). - Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy). T -- Famous last words: I *think* this will work...
Apr 04 2023
On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:Best practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones and allocating new ones. - Use slices where possible instead of making copies of subarrays (this esp. applies to strings). - Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy).Thanks for sharing Teoh! Very helpful. would this be random access? for(size_t i; i<arr.length; i++) using indices? ...and this be sequential foreach(a;arr) ? or would they have to be completely different kinds of containers? a dlang 'range' vs arr[]?
Apr 05 2023
On 4/5/23 6:34 PM, Paul wrote:On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:No, random access is access out of sequence. Those two lines are pretty much equivalent, and even a naive compiler is going to produce exactly the same generated code from both of them. A classic example is processing a 2d array: ```d for(int i = 0; i < arr[0].length; ++i) for(int j = 0; j < arr.length; ++j) arr[j][i]++; // vs for(int j = 0; j < arr.length; ++j) for(int i = 0; i < arr[0].length; ++i) arr[j][i]++; ``` The first accesses elements *by column*, which means that the array data is accessed non-linearly in memory. To be fair, both are "linear" in terms of algorithm, but one is going to be faster because of cache coherency (you are accessing sequential *hardware addresses*). -SteveBest practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones  and allocating new ones. - Use slices where possible instead of making copies of subarrays (this  esp. applies to strings). - Where possible, prefer sequential access over random access (take  advantage of the CPU cache hierarchy).Thanks for sharing Teoh! Very helpful. would this be random access? for(size_t i; i<arr.length; i++) using indices? ...and this be sequential foreach(a;arr) ?
Apr 05 2023
On Wed, Apr 05, 2023 at 10:34:22PM +0000, Paul via Digitalmars-d-learn wrote:On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:[...]Best practices for arrays in hot loops:[...] The exact syntactic construct you use is not important; under the hood, for(i; i<arr.length; i++) and foreach(a;arr) are exactly the same thing. What's important is the actual pattern of memory access. Looping from the first element of an array to the last element is sequential access; doing binary search on an array is random access (because it's unpredictable where the next access will be). Traversing a linked list is also random access, because in general individual nodes will be stored in different places in memory. So doing a hashtable lookup. Basically, modern CPUs have a cache hierarchy and memory prefetching controlled by the access predictor. When the predictor sees memory being accessed sequentially, it can speed up future accesses by preloading subsequent blocks of memory into cache lines so that when the CPU next tries to read from that location it's already in the cache and it doesn't have to wait for a fetch cycle from RAM, which is slow. The bad thing about things like hashtables is because it's unpredictable where the next access will be, so the CPU's predictor won't be able to preload the next access for you. So you'll incur a fetch cycle from RAM every time. Of course, this doesn't mean that hashtable (or other data structure) lookups are necessarily bad. If the alternative is linear scan through very large arrays, then hashtables will speed up your inner loop in spite of the cache misses. So it depends on your exact use case. But if your arrays are not overly large, scanning through the entire array may sometimes be faster than doing many hashtable lookups. Some modern hashtable implementations are also adapted to take advantage of the cache hierarchy, so it may not be as bad as a naïve implementation. Nevertheless, the general principle is that locality (adjacent memory accesses are close to each other) is good, and should generally be preferred over strategies that require jumping around in memory. So your data structures and algorithms should be designed in a way that takes advantage of linear access where possible. T -- In theory, there is no difference between theory and practice.- Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy).Thanks for sharing Teoh! Very helpful. would this be random access? for(size_t i; i<arr.length; i++) using indices? ...and this be sequential foreach(a;arr) ? or would they have to be completely different kinds of containers? a dlang 'range' vs arr[]?
Apr 05 2023
On Wednesday, 5 April 2023 at 23:06:54 UTC, H. S. Teoh wrote:So your data structures and algorithms should be designed in a way that takes advantage of linear access where possible. TYes I understand, basically, what's going on in hardware. I just wasn't sure if the access type was linked to the container type. It seems obvious now, since you've both made it clear, that it also depends on how I'm accessing my container. Having said all of this, isn't a D 'range' fundamentally a sequential access container (i.e popFront) ?
Apr 05 2023
On Thu, Apr 06, 2023 at 01:20:28AM +0000, Paul via Digitalmars-d-learn wrote: [...]Yes I understand, basically, what's going on in hardware. I just wasn't sure if the access type was linked to the container type. It seems obvious now, since you've both made it clear, that it also depends on how I'm accessing my container. Having said all of this, isn't a D 'range' fundamentally a sequential access container (i.e popFront) ?D ranges are conceptually sequential, but the actual underlying memory access patterns depends on the concrete type at runtime. An array's elements are stored sequentially in memory, and arrays are ranges. But a linked-list can also have a range interface, yet its elements may be stored in non-consecutive memory locations. So the concrete type matters here; the range API only gives you conceptual sequentiality, it does not guarantee physically sequential memory access. T -- Many open minds should be closed for repairs. -- K5 user
Apr 05 2023
On Thursday, 6 April 2023 at 01:44:15 UTC, H. S. Teoh wrote:D ranges are conceptually sequential, but the actual underlying memory access patterns depends on the concrete type at runtime. An array's elements are stored sequentially in memory, and arrays are ranges. But a linked-list can also have a range interface, yet its elements may be stored in non-consecutive memory locations. So the concrete type matters here; the range API only gives you conceptual sequentiality, it does not guarantee physically sequential memory access.Very helpful Teoh. Thanks again.
Apr 06 2023