www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How can I make a program which uses all cores and 100% of cpu power?

reply Murilo <murilomiranda92 hotmail.com> writes:
I have started working with neural networks and for that I need a 
lot of computing power but the programs I make only use around 
30% of the cpu, or at least that is what Task Manager tells me. 
How can I make it use all 4 cores of my AMD FX-4300 and how can I 
make it use 100% of it?
Oct 10 2019
next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Oct 11, 2019 at 2:45 AM Murilo via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:
 I have started working with neural networks and for that I need a
 lot of computing power but the programs I make only use around
 30% of the cpu, or at least that is what Task Manager tells me.
 How can I make it use all 4 cores of my AMD FX-4300 and how can I
 make it use 100% of it?
You should use minimally same amount of threads as you have cores. So in your case 4 or even more. Than you should buy a new CPU if you really need a lot of computing power :). Other issue can be using blocking IO, so your threads are in idle, so can stress your CPU.
Oct 10 2019
prev sibling next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Oct 11, 2019 at 6:58 AM Daniel Kozak <kozzi11 gmail.com> wrote:
  so can stress your CPU.
can't
Oct 10 2019
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/10/2019 05:41 PM, Murilo wrote:
 I have started working with neural networks and for that I need a lot of
 computing power but the programs I make only use around 30% of the cpu,
 or at least that is what Task Manager tells me. How can I make it use
 all 4 cores of my AMD FX-4300 and how can I make it use 100% of it?
Your threads must allocate as little memory as possible because memory allocation can trigger garbage collection and garbage collection stops all threads (except the one that's performing collection). We studied the effects of different allocation schemes during our last local D meetup[1]. The following program has two similar worker threads. One allocates in an inner scope, the other one uses a static Appender and clears its state as needed. The program sets 'w' to 'worker' inside main(). Change it to 'worker2' to see a huge difference: On my 4-core laptop its 100% versus 400% CPU usage. import std.random; import std.range; import std.algorithm; import std.concurrency; import std.parallelism; enum inner_N = 100; void worker() { ulong result; while (true) { int[] arr; foreach (j; 0 .. inner_N) { arr ~= uniform(0, 2); } result += arr.sum; } } void worker2() { ulong result; static Appender!(int[]) arr; while (true) { arr.clear(); foreach (j; 0 .. inner_N) { arr ~= uniform(0, 2); } result += arr.data.sum; } } void main() { // Replace with 'worker2' to see the speedup alias w = worker; auto workers = totalCPUs.iota.map!(_ => spawn(&w)).array; w(); } The static Appender is thread-safe because each thread gets their own copy due to data being thread-local by default in D. However, it doesn't mean that the functions are reentrant: If they get called recursively perhaps indirectly, then the subsequent executions would corrupt previous executions' Appender states. Ali [1] https://www.meetup.com/D-Lang-Silicon-Valley/events/kmqcvqyzmbzb/ Are you someone in the Bay Area but do not come to our meetups? We've been eating your falafel wraps! ;)
Oct 10 2019
parent Murilo <murilomiranda92 hotmail.com> writes:
On Friday, 11 October 2019 at 06:18:03 UTC, Ali Çehreli wrote:
 Your threads must allocate as little memory as possible because 
 memory allocation can trigger garbage collection and garbage 
 collection stops all threads (except the one that's performing 
 collection).
 We studied the effects of different allocation schemes during 
 our last local D meetup[1]. The following program has two 
 similar worker threads. One allocates in an inner scope, the 
 other one uses a static Appender and clears its state as needed.
 The static Appender is thread-safe because each thread gets 
 their own copy due to data being thread-local by default in D. 
 However, it doesn't mean that the functions are reentrant: If 
 they get called recursively perhaps indirectly, then the 
 subsequent executions would corrupt previous executions' 
 Appender states.
Thanks for the information, they were very helpful.
Dec 05 2019
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Fri, 2019-10-11 at 00:41 +0000, Murilo via Digitalmars-d-learn wrote:
 I have started working with neural networks and for that I need a=20
 lot of computing power but the programs I make only use around=20
 30% of the cpu, or at least that is what Task Manager tells me.=20
 How can I make it use all 4 cores of my AMD FX-4300 and how can I=20
 make it use 100% of it?
Why do you want to get CPU utilisation to 100%? I would have thought you'd want to get the neural net to be as fast as possible, this does not necessarily imply that all CPU cycles must be used. A neural net is, at it's heart, a set of communicating nodes. This is as mu= ch an I/O bound model as it is compute bound one =E2=80=93 nodes are generally= waiting for input as much as they are computing a value. The obvious solution architecture for a small computer is to create a task per node on a thread pool, with a few more threads in the pool than you have processors, and hop= e that you can organise the communication between tasks so as to avoid cache misses. This can be tricky when using multi-core processors. It gets even worse when you have hyperthreads =E2=80=93 many organisations doing CPU bou= nd computations switch off hyperthreads as they cause more problems than theys= olve.=20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
Oct 10 2019
parent Murilo <murilomiranda92 hotmail.com> writes:
On Friday, 11 October 2019 at 06:57:46 UTC, Russel Winder wrote:
 A neural net is, at it's heart, a set of communicating nodes. 
 This is as much
 an I/O bound model as it is compute bound one – nodes are 
 generally waiting
 for input as much as they are computing a value. The obvious 
 solution
 architecture for a small computer is to create a task per node 
 on a thread
 pool, with a few more threads in the pool than you have 
 processors, and hope
 that you can organise the communication between tasks so as to 
 avoid cache
 misses. This can be tricky when using multi-core processors. It 
 gets even
 worse when you have hyperthreads – many organisations doing CPU 
 bound
 computations switch off hyperthreads as they cause more 
 problems than theysolve.
Thanks, that helped a lot. But I already figured out a new training algorithm that is a lot faster, no need to use parallelism anymore.
Dec 05 2019