digitalmars.D.learn - Modify thread-local storage from parent thread

Kai Meyer (14/14) Aug 08 2011 I am playing with threading, and I am doing something like this:

Steven Schveighoffer (16/30) Aug 08 2011 I'd have to see where bytes is created, if it's created in the same

Kai Meyer (8/39) Aug 09 2011 Well, bytes is in a loop, so casting to immutable wouldn't do it. The

Steven Schveighoffer (12/59) Aug 09 2011 OK, there are other options. First, you could keep a "pool" of buffers,...

Kai Meyer (22/83) Aug 09 2011 These are concepts that I'm only familiar with. I think I would like to

Steven Schveighoffer (20/96) Aug 09 2011 shared is just like const, you use a cast to mark something as shared.

Ali =?iso-8859-1?q?=C7ehreli?= (28/44) Aug 09 2011 I don't know what copies happen behind the scenes in the following code,...

Ali =?iso-8859-1?q?=C7ehreli?= (27/65) Aug 09 2011 The following is a program that uses std.concurrency. In this case the

Kai Meyer <kai unixlords.com> writes:

I am playing with threading, and I am doing something like this:
         file.rawRead(bytes);
         auto tmpTask = task!do_something(bytes.idup);
         task_pool.put(tmpTask);
Is there a way to avoid the idup (or can somebody explain why idup here 
is not expensive?)

If the logic above is expressed as:
Read bytes into an array
Create a thread (task) to execute a function that takes a copy of 'bytes'
Execute the thread

I wonder if I could:
Create a thread (task)
Read bytes directly into the tasks' thread local storage
Execute the thread

Aug 08 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
          file.rawRead(bytes);
          auto tmpTask = task!do_something(bytes.idup);
          task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup here  
 is not expensive?)

I'd have to see where bytes is created, if it's created in the same  
context, just casting to immutable is allowed, as long as you never use  
the mutable reference again.

 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible.  However, in many cases, the OS is responsible  
for creating the TLS when the thread starts, so you have to wait until the  
thread is actually running to access it (not an expert on this, but I  
think this is the case for everything but OSX?)

So you would have to create the thread, have it pause while you fill it's  
TLS, then resume it.

But I think this is clearly a weird approach to this problem.  Finding a  
way to reliably pass the data to the sub-thread seems more appropriate.

BTW, I've dealt with having to access other threads' TLS.  It's not  
pretty, and I don't recommend using it except in specialized situations  
(mine was adding a GC hook).

-Steve

Aug 08 2011

Kai Meyer <kai unixlords.com> writes:

On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

 I'd have to see where bytes is created, if it's created in the same
 context, just casting to immutable is allowed, as long as you never use
 the mutable reference again.

 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

 This *might* be possible. However, in many cases, the OS is responsible
 for creating the TLS when the thread starts, so you have to wait until
 the thread is actually running to access it (not an expert on this, but
 I think this is the case for everything but OSX?)

 So you would have to create the thread, have it pause while you fill
 it's TLS, then resume it.

 But I think this is clearly a weird approach to this problem. Finding a
 way to reliably pass the data to the sub-thread seems more appropriate.

 BTW, I've dealt with having to access other threads' TLS. It's not
 pretty, and I don't recommend using it except in specialized situations
 (mine was adding a GC hook).

 -Steve

Well, bytes is in a loop, so casting to immutable wouldn't do it. The 
idea is to read a block of bytes, and hand them off to a worker thread 
to operate on those set of bytes. Everything is working, I'm just trying 
to avoid having to reallocate that block of bytes for the read, and then 
reallocate them again to pass them off to the worker thread. If I could 
get away with one allocation, I'd be happier.

-Kai Meyer

Aug 09 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

 I'd have to see where bytes is created, if it's created in the same
 context, just casting to immutable is allowed, as long as you never use
 the mutable reference again.

 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of  
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

 This *might* be possible. However, in many cases, the OS is responsible
 for creating the TLS when the thread starts, so you have to wait until
 the thread is actually running to access it (not an expert on this, but
 I think this is the case for everything but OSX?)

 So you would have to create the thread, have it pause while you fill
 it's TLS, then resume it.

 But I think this is clearly a weird approach to this problem. Finding a
 way to reliably pass the data to the sub-thread seems more appropriate.

 BTW, I've dealt with having to access other threads' TLS. It's not
 pretty, and I don't recommend using it except in specialized situations
 (mine was adding a GC hook).

 -Steve

 Well, bytes is in a loop, so casting to immutable wouldn't do it. The  
 idea is to read a block of bytes, and hand them off to a worker thread  
 to operate on those set of bytes. Everything is working, I'm just trying  
 to avoid having to reallocate that block of bytes for the read, and then  
 reallocate them again to pass them off to the worker thread. If I could  
 get away with one allocation, I'd be happier.

OK, there are other options.  First, you could keep a "pool" of buffers,  
which are marked as shared.  When you want to run a task, get one of those  
buffers, fill it, then pass the buffer to the task thread to process.   
Make sure the task thread puts the buffer back into the pool when it's  
done.  I'd recommend casting the buffer to unshared while inside the task  
thread to save some cycles.  This is probably the option I'd go with.

Second, you can have the task thread give you it's TLS buffer to read data  
into (you need to do some casting to get this around the type system).   
Note that in order for it truly to be stored in TLS, the buffer has to be  
a fixed-sized array.

-Steve

Aug 09 2011

Kai Meyer <kai unixlords.com> writes:

On 08/09/2011 10:27 AM, Steven Schveighoffer wrote:
 On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

 I'd have to see where bytes is created, if it's created in the same
 context, just casting to immutable is allowed, as long as you never use
 the mutable reference again.

 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

 This *might* be possible. However, in many cases, the OS is responsible
 for creating the TLS when the thread starts, so you have to wait until
 the thread is actually running to access it (not an expert on this, but
 I think this is the case for everything but OSX?)

 So you would have to create the thread, have it pause while you fill
 it's TLS, then resume it.

 But I think this is clearly a weird approach to this problem. Finding a
 way to reliably pass the data to the sub-thread seems more appropriate.

 BTW, I've dealt with having to access other threads' TLS. It's not
 pretty, and I don't recommend using it except in specialized situations
 (mine was adding a GC hook).

 -Steve

 Well, bytes is in a loop, so casting to immutable wouldn't do it. The
 idea is to read a block of bytes, and hand them off to a worker thread
 to operate on those set of bytes. Everything is working, I'm just
 trying to avoid having to reallocate that block of bytes for the read,
 and then reallocate them again to pass them off to the worker thread.
 If I could get away with one allocation, I'd be happier.

 OK, there are other options. First, you could keep a "pool" of buffers,
 which are marked as shared. When you want to run a task, get one of
 those buffers, fill it, then pass the buffer to the task thread to
 process. Make sure the task thread puts the buffer back into the pool
 when it's done. I'd recommend casting the buffer to unshared while
 inside the task thread to save some cycles. This is probably the option
 I'd go with.

 Second, you can have the task thread give you it's TLS buffer to read
 data into (you need to do some casting to get this around the type
 system). Note that in order for it truly to be stored in TLS, the buffer
 has to be a fixed-sized array.

 -Steve

These are concepts that I'm only familiar with. I think I would like to 
try the "pool" of buffers. I can't say I know how to mark buffers as 
shared, though. Could you modify this for me, to show me an example?

import std.parallelism;
ubyte[8][] pool; // Dynamic array of (array of 8 bytes)
void thread_func()
{
     //my_pool[0] = 50;
}
void main(string[] args)
{
     uint threads = 2;
     pool.length = threads;
     TaskPool taskpool = new TaskPool(threads);
     foreach(i; 0..threads)
     {
         auto tmpTask = task!thread_func();
         taskpool.put(tmpTask);
     }
     taskpool.stop();
}

Aug 09 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Aug 2011 12:59:46 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/09/2011 10:27 AM, Steven Schveighoffer wrote:
 On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com>  
 wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

 I'd have to see where bytes is created, if it's created in the same
 context, just casting to immutable is allowed, as long as you never  
 use
 the mutable reference again.

 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

 This *might* be possible. However, in many cases, the OS is  
 responsible
 for creating the TLS when the thread starts, so you have to wait until
 the thread is actually running to access it (not an expert on this,  
 but
 I think this is the case for everything but OSX?)

 So you would have to create the thread, have it pause while you fill
 it's TLS, then resume it.

 But I think this is clearly a weird approach to this problem. Finding  
 a
 way to reliably pass the data to the sub-thread seems more  
 appropriate.

 BTW, I've dealt with having to access other threads' TLS. It's not
 pretty, and I don't recommend using it except in specialized  
 situations
 (mine was adding a GC hook).

 -Steve

 Well, bytes is in a loop, so casting to immutable wouldn't do it. The
 idea is to read a block of bytes, and hand them off to a worker thread
 to operate on those set of bytes. Everything is working, I'm just
 trying to avoid having to reallocate that block of bytes for the read,
 and then reallocate them again to pass them off to the worker thread.
 If I could get away with one allocation, I'd be happier.

 OK, there are other options. First, you could keep a "pool" of buffers,
 which are marked as shared. When you want to run a task, get one of
 those buffers, fill it, then pass the buffer to the task thread to
 process. Make sure the task thread puts the buffer back into the pool
 when it's done. I'd recommend casting the buffer to unshared while
 inside the task thread to save some cycles. This is probably the option
 I'd go with.

 Second, you can have the task thread give you it's TLS buffer to read
 data into (you need to do some casting to get this around the type
 system). Note that in order for it truly to be stored in TLS, the buffer
 has to be a fixed-sized array.

 -Steve

 These are concepts that I'm only familiar with. I think I would like to  
 try the "pool" of buffers. I can't say I know how to mark buffers as  
 shared, though. Could you modify this for me, to show me an example?

shared is just like const, you use a cast to mark something as shared.

It can also be a storage class.  So for example, you can simply mark your  
pool as shared, and all threads can see it.

I'm not very familiar with std.parallelism, so I don't know how to pass  
the buffer (or it's pool index) to the task thread.

What you have to be careful is that you somehow mark the pool buffers as  
being "used" by the thread.

I'd recommend something like this:

struct buffer
{
    bool inUse;
    bool[8] buf;
}

Then use this as your pool:

shared buffer[] pool; // this is now not in TLS, it's accessible from all  
threads.

Someone more familiar with std.parallelism can probably find a way to do  
this with parallel foreach.

-Steve

Aug 09 2011

Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:

On Mon, 08 Aug 2011 12:17:28 -0600, Kai Meyer wrote:

 I am playing with threading, and I am doing something like this:
          file.rawRead(bytes);
          auto tmpTask = task!do_something(bytes.idup);
          task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup here
 is not expensive?)
 
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes' Execute the thread
 
 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage Execute the
 thread

I don't know what copies happen behind the scenes in the following code, 
but std.paralleism is great when threads don't need to interact with each 
other:

import std.stdio;
import std.parallelism;

void main()
{
    ubyte[8][10] buffers;

    foreach (i, ref buffer; parallel(buffers[])) {
        ubyte value = cast(ubyte)i;
        workWith(value, buffer);
    }

    writeln(buffers);
}

void workWith(ubyte value, ref ubyte[8] buffer)
{
    foreach (ref b; buffer) {
        b = value;
    }
}

Notes:

- I had to give buffers[] to parallel() as it calls popFront() which my 
constant-size array can't provide. (Yes, I could have used a dynamic 
array.)

- Note the three ref's that I used; two of those are because constant-
size arrays are value types.

Ali

Aug 09 2011

Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:

On Tue, 09 Aug 2011 20:37:04 +0000, Ali Çehreli wrote:

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage Execute the
 thread

 
 I don't know what copies happen behind the scenes in the following code,
 but std.paralleism is great when threads don't need to interact with
 each other:
 
 import std.stdio;
 import std.parallelism;
 
 void main()
 {
     ubyte[8][10] buffers;
 
     foreach (i, ref buffer; parallel(buffers[])) {
         ubyte value = cast(ubyte)i;
         workWith(value, buffer);
     }
 
     writeln(buffers);
 }
 
 void workWith(ubyte value, ref ubyte[8] buffer) {
     foreach (ref b; buffer) {
         b = value;
     }
 }
 
 Notes:
 
 - I had to give buffers[] to parallel() as it calls popFront() which my
 constant-size array can't provide. (Yes, I could have used a dynamic
 array.)
 
 - Note the three ref's that I used; two of those are because constant-
 size arrays are value types.

The following is a program that uses std.concurrency. In this case the 
threads communicate with each other:

import std.stdio;
import std.concurrency;

void main()
{
    shared(ubyte[8])[10] buffers;

    /* Spawn the threads */
    foreach (i, ref buffer; buffers) {
        spawn(&worker, thisTid, i, &buffer);
    }

    /* Collect the results */
    foreach (i; 0 .. buffers.length) {
        size_t id = receiveOnly!size_t();
        writefln("thread %s is done", id);
    }

    writeln(buffers);
}

void worker(Tid owner, size_t myId, ubyte[8] * buffer)
{
    foreach (ref b; *buffer) {
        b = cast(ubyte)myId;
    }

    owner.send(myId);
}

Ali

Aug 09 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Modify thread-local storage from parent thread