www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Passing large or complex data structures to threads

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hello all,

Are there any recommended strategies for passing large or complex data
structures (particularly reference types) to threads?

For the purpose of this discussion we can assume that it's read-only data, so if
we're talking about just an array (albeit perhaps a large one) I guess just
passing an .idup copy would be best.  However, the practical situation I have is
a data structure of the form,

	Tuple!(size_t, size_t)[][]

... which I _could_ .idup, but it's a little bit of a hassle to do so, so I'm
wondering if there are alternative ways or suggestions.

The actual way I found to "solve" this problem (for now) was that, since the
data in question is loaded from a file, I just got each thread to load the file
separately.  However, this runs into a different problem -- the function(s)
required to load and interpret the file may vary, and may require different
input, which requires manual rewriting of the thread function when it needs to
be changed (it's a tolerable but annoying solution).  I guess I could solve this
by passing the thread a delegate, or maybe employ mixins ... ?

Anyway, I thought I'd throw the question open in case others can suggest better
ideas!

Thanks & best wishes,

     -- Joe
May 24 2013
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/24/2013 06:26 AM, Joseph Rushton Wakeling wrote:

 Are there any recommended strategies for passing large or complex data
 structures (particularly reference types) to threads?
std.concurrency works with shared data.
 For the purpose of this discussion we can assume that it's read-only 
 data
The following simple example uses mutable data but it should work with 'const' too. import std.stdio; import std.concurrency; import std.typecons; import core.thread; alias DataElement = Tuple!(size_t, size_t); alias DataRow = DataElement[]; alias Data = DataRow[]; enum size_t totalRows = 4; void func(Tid owner, shared(Data) data, size_t rowId) { foreach (ref element; data[rowId]) { element[0] *= 10; element[1] *= 10; } } shared(Data) makeData() { shared(Data) data; foreach (size_t row; 0 .. totalRows) { shared(DataRow) dataRow; foreach (size_t col; 0 .. 10) { dataRow ~= tuple(row, col); } data ~= dataRow; } return data; } void main() { shared(Data) data = makeData(); writeln("before: ", data); foreach (rowId, row; data) { // Instead of 'data' and 'rowId', the child could take // its own row (not tested) spawn(&func, thisTid, data, rowId); } thread_joinAll(); writeln("after : ", data); } Ali
May 24 2013
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 05/24/2013 05:59 PM, Ali Çehreli wrote:
 The following simple example uses mutable data but it should work with 'const'
too.
Limiting ourselves to read-only, won't there still be a slowdown caused by multiple threads trying to access the same data? The particular case I have will involve continuous reading from the data concerned.
May 26 2013
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 26 May 2013 at 12:08:41 UTC, Joseph Rushton Wakeling 
wrote:
 On 05/24/2013 05:59 PM, Ali Çehreli wrote:
 The following simple example uses mutable data but it should 
 work with 'const' too.
Limiting ourselves to read-only, won't there still be a slowdown caused by multiple threads trying to access the same data? The particular case I have will involve continuous reading from the data concerned.
Not necessarily. It really depends on the memory access patterns of the algorithms, the number of threads, the size of the data, the number/size/hierarchy of cpu caches, the number of CPUs. Hard and fast rule: If your threads are reading data that is distant (i.e. one thread reading from around the beginning of a long array, the second reading from the end) then the fact that they happen to be the same data "object" is irrelevant. Also, remember that in the short term the CPUs are all keeping their own independent copies of the relevant parts of the data in caches anyway.
May 26 2013