www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Threads and static initialization.

reply wrzosk <dprogr gmail.com> writes:
I believe that when new thread is being created, all static data is 
initialized inside new thread. What worries me is the fact, that many 
'mini' threads will initialize all static data inside application. This 
may be somewhat time consuming.
Maybe there should be a possibility to define 'Pure' thread that doesnt 
touch any static data, and in turn it could leave static constructors 
untouched.

What do you think
Dec 17 2010
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 13:47:05 wrzosk wrote:
 I believe that when new thread is being created, all static data is
 initialized inside new thread. What worries me is the fact, that many
 'mini' threads will initialize all static data inside application. This
 may be somewhat time consuming.
 Maybe there should be a possibility to define 'Pure' thread that doesnt
 touch any static data, and in turn it could leave static constructors
 untouched.
 
 What do you think
That seems like it would be _really_ hard to do - probably impossible. The compiler would have to know which functions, types, and variables a thread accessed. I don't believe that compiler really knows anything about threads. It doesn't know which thread calls a particular function or accesses a particular variable. It knows about thread-local storage vs shared storage, and the type system restricts conversions between the two, but to do what you're suggesting, you'd have to have the compiler know which thread is calling which function and disallowing - at compile time - certain threads from calling certain functions. As it is, any thread can call anything and the compiler makes no attempt at tracking any of that. It just has restrictions with regards to thread-local and shared. So, while it certainly seems like a good idea, I don't see how it would really be possible. It might be possible to make it so that a thread could be created which did not initialize any module or static variables (be they at class, struct, or function scope), but I don't see how the compiler could enforce that none of those variables were used. It would have to be up to the programmer to only call pure functions. What _might_ be possible would be if you had a way of starting a thread which took a function (or overrode one) and that function _had_ to be strongly pure. Then it could skip running static constructors, because it would be impossible for any static variables to be accessed. However, the only way that you'd then get access to anything that thread did was from the return value of the function that started it, which not raises the issue of how you'd get the return value with the asynchronous nature of threads, but it would restrict the usage of such threads to the point that they'd be practically useless. I think that you raise a valid concern, but I don't think that that's the way to handle it. Restricting static constructors and global or static variables in practice will help, and using immutable more will help. But if we want a means of making threads more lightweight, we probably need to look at a different way of doing it than you're suggesting. This does make me wonder about static constructors and immutable though. It's currently possible to assign to both immutable and mutable module-scope variables in a static constructor. Such static constructors _must_ run for every thread or the thread-local portions are going to be wrong, but you _can't_ run them for immutable variables or you aren't sharing them between threads (or you're reassigning them each time that at new thread is created - I think that there's currently a bug on that). Perhaps static constructors which assign to immutable variables should have to be immutable, and mutable variables would have to be assigned in non-immutable static constructors. Then the immutable static constructors only get run once with the main thread, whereas the non- static ones get run for every thread. - Jonathan M Davis
Dec 17 2010
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 18 Dec 2010 00:33:43 +0200, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 That seems like it would be _really_ hard to do - probably impossible.
This is about TLS, right? If so, can't we just add a way to create a thread without TLS, so that any attempts to access it will cause an access violation? Sounds simple and useful enough. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 17 2010
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 15:17:14 Vladimir Panteleev wrote:
 On Sat, 18 Dec 2010 00:33:43 +0200, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 That seems like it would be _really_ hard to do - probably impossible.
This is about TLS, right? If so, can't we just add a way to create a thread without TLS, so that any attempts to access it will cause an access violation? Sounds simple and useful enough.
Except that virtually _everything_ in D is in TLS. Only shared variables and some immutable variables aren't. So, unless you use a set of functions which uses shared variables for _everything_, that would be rather useless. - Jonathan M Davis
Dec 17 2010
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 18 Dec 2010 01:27:01 +0200, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 Except that virtually _everything_ in D is in TLS. Only shared variables  
 and
 some immutable variables aren't. So, unless you use a set of functions  
 which
 uses shared variables for _everything_, that would be rather useless.
But isn't that what OP's asking? Also, I wouldn't say it's useless... By _everything_ you mean global and class/function static variables, right? I can imagine lots of examples of using threads without having to access those variables. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 17 2010
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 16:44:19 Vladimir Panteleev wrote:
 On Sat, 18 Dec 2010 01:27:01 +0200, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 Except that virtually _everything_ in D is in TLS. Only shared variables
 and
 some immutable variables aren't. So, unless you use a set of functions
 which
 uses shared variables for _everything_, that would be rather useless.
But isn't that what OP's asking? Also, I wouldn't say it's useless... By _everything_ you mean global and class/function static variables, right? I can imagine lots of examples of using threads without having to access those variables.
And how about every other variable? If it's not shared or immutable, it's in TLS. Do you want to mark every single variable with shared? I believe that that's pretty much what you'd have to do. It's _rare_ in D that a variable is shared. You'd basically have to create a set of functions explictly intended to be used with such special threads. - Jonathan M Davis
Dec 17 2010
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 18 Dec 2010 03:06:26 +0200, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 And how about every other variable?
I'm sorry, I'm not following you. What other variables? * Globals and class/struct/function statics are in TLS * Explicitly shared vars are in the data segment * Locals are in the stack or registers (no problem here) * Everything else (as referenced by the above three) is in the heap -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 17 2010
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 19:52:19 Vladimir Panteleev wrote:
 On Sat, 18 Dec 2010 03:06:26 +0200, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 And how about every other variable?
I'm sorry, I'm not following you. What other variables? * Globals and class/struct/function statics are in TLS * Explicitly shared vars are in the data segment * Locals are in the stack or registers (no problem here) * Everything else (as referenced by the above three) is in the heap
Value types which on the stack are going to be okay. That's true. You're right. They wouldn't be in TLS. However, anything that involves the heap wouldn't be okay, and that's a _lot_ of variables. Any and all references and pointers - inluding dynamic arrays - would be in TLS unless you marked them as shared. So, you'd have to use shared all over the place except in very simple cases or cases where you went out of your way to avoid using the heap. D is designed to avoid using shared memory except in cases where data is immutable. So, if you try to set up your program so that it uses shared memory primarily, then you're going to have problems. And not calling the static constructors on thread creation would mean using shared memory for everything which uses the heap. You couldn't even create local variables which are class objects using TLS in such a case, because they might have a static constructor which then would never have been called. Really, I don't think that trying to avoid calling static constructors is going to work very well. It may very well be a good reason to minimize what's done in static constructors, but skipping them entirely would be very difficult to pull off safely. - Jonathan M Davis
Dec 17 2010
next sibling parent reply =?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:
On 12/18/2010 07:53 AM, Jonathan M Davis wrote:
 On Friday 17 December 2010 19:52:19 Vladimir Panteleev wrote:
 On Sat, 18 Dec 2010 03:06:26 +0200, Jonathan M Davis<jmdavisProg gmx.com>

 wrote:
 And how about every other variable?
I'm sorry, I'm not following you. What other variables? * Globals and class/struct/function statics are in TLS * Explicitly shared vars are in the data segment * Locals are in the stack or registers (no problem here) * Everything else (as referenced by the above three) is in the heap
Value types which on the stack are going to be okay. That's true. You're right. They wouldn't be in TLS. However, anything that involves the heap wouldn't be okay, and that's a _lot_ of variables. Any and all references and pointers - inluding dynamic arrays - would be in TLS unless you marked them as shared. So, you'd have to use shared all over the place except in very simple cases or cases where you went out of your way to avoid using the heap. D is designed to avoid using shared memory except in cases where data is immutable. So, if you try to set up your program so that it uses shared memory primarily, then you're going to have problems. And not calling the static constructors on thread creation would mean using shared memory for everything which uses the heap. You couldn't even create local variables which are class objects using TLS in such a case, because they might have a static constructor which then would never have been called. Really, I don't think that trying to avoid calling static constructors is going to work very well. It may very well be a good reason to minimize what's done in static constructors, but skipping them entirely would be very difficult to pull off safely. - Jonathan M Davis
The heap is the heap is the heap. You can have local variables on the heap which are not shared. I think you are overstating the need for shared, probably some misunderstanding. You could not have classes/structs with static members, or call functions with static variables. Everything else should work, probably. The spawned thread could use the parent thread immutable globals, to avoid the need to construct them in the spawned tls. I don't know if this is actually possible :-)
Dec 17 2010
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 23:56:37 Pelle M=C3=A5nsson wrote:
 On 12/18/2010 07:53 AM, Jonathan M Davis wrote:
 On Friday 17 December 2010 19:52:19 Vladimir Panteleev wrote:
 On Sat, 18 Dec 2010 03:06:26 +0200, Jonathan M
 Davis<jmdavisProg gmx.com>
=20
 wrote:
 And how about every other variable?
=20 I'm sorry, I'm not following you. What other variables? =20 * Globals and class/struct/function statics are in TLS * Explicitly shared vars are in the data segment * Locals are in the stack or registers (no problem here) * Everything else (as referenced by the above three) is in the heap
=20 Value types which on the stack are going to be okay. That's true. You're right. They wouldn't be in TLS. =20 However, anything that involves the heap wouldn't be okay, and that's a _lot_ of variables. Any and all references and pointers - inluding dynamic arrays - would be in TLS unless you marked them as shared. So, you'd have to use shared all over the place except in very simple cases or cases where you went out of your way to avoid using the heap. =20 D is designed to avoid using shared memory except in cases where data is immutable. So, if you try to set up your program so that it uses shared memory primarily, then you're going to have problems. And not calling the static constructors on thread creation would mean using shared memory for everything which uses the heap. You couldn't even create local variables which are class objects using TLS in such a case, because they might have a static constructor which then would never have been called. =20 Really, I don't think that trying to avoid calling static constructors =
is
 going to work very well. It may very well be a good reason to minimize
 what's done in static constructors, but skipping them entirely would be
 very difficult to pull off safely.
=20
 - Jonathan M Davis
=20 The heap is the heap is the heap. You can have local variables on the heap which are not shared. I think you are overstating the need for shared, probably some misunderstanding. =20 You could not have classes/structs with static members, or call functions with static variables. Everything else should work, probably. =20 The spawned thread could use the parent thread immutable globals, to avoid the need to construct them in the spawned tls. I don't know if this is actually possible :-)
The problem is that the OP wants the static constructors to be skipped. If= =20 they're skipped, anything and everything which could be affected by that ca= n't be=20 used. That pretty much means not using TLS, since the compiler isn't going = to be=20 able to track down which variables in TLS will or won't be affected by it. = So,=20 you're stuck using shared memory only. _That_ is where the problem comes in. As for immutable globals, in theory at least (I believe there's an open bug= on=20 the issue at the moment), global immutables are all shared. If all that's d= one=20 correctly, in theory, you'd just create the immutable globals either at com= pile=20 time or with static constructors (possibly shared or immutable ones) which = are=20 just run once and don't have to be run per thread. So, immutable globals=20 shouldn't pose a problem. However, mutable ones would - including those for= =20 classes. Because they'd need their static constructors run on thread creati= on in=20 order to be properly initialized. =2D Jonathan M Davis =2D Jonathan M Davis
Dec 18 2010
parent reply =?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:
On 12/18/2010 10:00 AM, Jonathan M Davis wrote:
 The problem is that the OP wants the static constructors to be skipped. If
 they're skipped, anything and everything which could be affected by that can't
be
 used. That pretty much means not using TLS, since the compiler isn't going to
be
 able to track down which variables in TLS will or won't be affected by it. So,
 you're stuck using shared memory only. _That_ is where the problem comes in.
Exactly, not using TLS. You can still use the heap, as it is not thread local. Meaning you can create non-shared anything all you like, as long as you're not using TLS.
Dec 18 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 18 Dec 2010 03:27:22 -0700, Pelle Månsson  
<pelle.mansson gmail.com> wrote:

 On 12/18/2010 10:00 AM, Jonathan M Davis wrote:
 The problem is that the OP wants the static constructors to be skipped.  
 If
 they're skipped, anything and everything which could be affected by  
 that can't be
 used. That pretty much means not using TLS, since the compiler isn't  
 going to be
 able to track down which variables in TLS will or won't be affected by  
 it. So,
 you're stuck using shared memory only. _That_ is where the problem  
 comes in.
Exactly, not using TLS. You can still use the heap, as it is not thread local. Meaning you can create non-shared anything all you like, as long as you're not using TLS.
Except that the 'heap' internally uses TLS. The GC does need and use TLS.
Dec 18 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-18 15:57:50 -0500, "Robert Jacques" <sandford jhu.edu> said:

 On Sat, 18 Dec 2010 03:27:22 -0700, Pelle Månsson  
 <pelle.mansson gmail.com> wrote:
 
 On 12/18/2010 10:00 AM, Jonathan M Davis wrote:
 The problem is that the OP wants the static constructors to be skipped.  If
 they're skipped, anything and everything which could be affected by  
 that can't be
 used. That pretty much means not using TLS, since the compiler isn't  
 going to be
 able to track down which variables in TLS will or won't be affected by  it. So,
 you're stuck using shared memory only. _That_ is where the problem  comes in.
Exactly, not using TLS. You can still use the heap, as it is not thread local. Meaning you can create non-shared anything all you like, as long as you're not using TLS.
Except that the 'heap' internally uses TLS. The GC does need and use TLS.
Using D's TLS for the GC is an implementation choice, not a requirement. If someone wants to optimize 'spawn' for pure functions by skipping D's TLS initialization, he can make the GC and the array appending cache work with that. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 18 2010
parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 18 Dec 2010 15:04:38 -0700, Michel Fortin  
<michel.fortin michelf.com> wrote:
 On 2010-12-18 15:57:50 -0500, "Robert Jacques" <sandford jhu.edu> said:

 On Sat, 18 Dec 2010 03:27:22 -0700, Pelle Månsson   
 <pelle.mansson gmail.com> wrote:

 On 12/18/2010 10:00 AM, Jonathan M Davis wrote:
 The problem is that the OP wants the static constructors to be  
 skipped.  If
 they're skipped, anything and everything which could be affected by   
 that can't be
 used. That pretty much means not using TLS, since the compiler isn't   
 going to be
 able to track down which variables in TLS will or won't be affected  
 by  it. So,
 you're stuck using shared memory only. _That_ is where the problem   
 comes in.
Exactly, not using TLS. You can still use the heap, as it is not thread local. Meaning you can create non-shared anything all you like, as long as you're not using TLS.
Except that the 'heap' internally uses TLS. The GC does need and use TLS.
Using D's TLS for the GC is an implementation choice, not a requirement. If someone wants to optimize 'spawn' for pure functions by skipping D's TLS initialization, he can make the GC and the array appending cache work with that.
Not really, as _every_ modern GC requires TLS. And we're talking about a performance optimization here: not supporting modern GCs in order to remove TLS initialization would be a penny-wise pound foolish move. Besides, 'mini' threads shouldn't be created using OS threads; that's what thread-pools, fibers and tasks are for.
Dec 18 2010
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-18 01:53:04 -0500, Jonathan M Davis <jmdavisProg gmx.com> said:

 However, anything that involves the heap wouldn't be okay, and that's a 
 _lot_ of
 variables. Any and all references and pointers - inluding dynamic 
 arrays - would
 be in TLS unless you marked them as shared.
Things stored in the heap are *not* stored in thread-local storage. TLS is for thread-local global and static variables. Having TLS variables uninitialized shouldn't a problem when playing with the heap. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 18 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
wrzosk <dprogr gmail.com> wrote:

 I believe that when new thread is being created, all static data is  
 initialized inside new thread. What worries me is the fact, that many  
 'mini' threads will initialize all static data inside application. This  
 may be somewhat time consuming.
 Maybe there should be a possibility to define 'Pure' thread that doesnt  
 touch any static data, and in turn it could leave static constructors  
 untouched.

 What do you think
Unless you are spawning lots of threads and at arbitrary times, this is unlikely to be much of a problem. If it is, you likely should use a thread pool instead. Lastly, if you absolutely must, it is possible to spawn a thread using methods other than D's built-in thread-spawning functions, and those would not have their static constructors run. On the downside, you lose any guarantees the compiler could give you, as just about anything that is not a local variable is in TLS. -- Simen
Dec 17 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-17 16:47:05 -0500, wrzosk <dprogr gmail.com> said:

 I believe that when new thread is being created, all static data is 
 initialized inside new thread. What worries me is the fact, that many 
 'mini' threads will initialize all static data inside application. This 
 may be somewhat time consuming.
 Maybe there should be a possibility to define 'Pure' thread that doesnt 
 touch any static data, and in turn it could leave static constructors 
 untouched.
 
 What do you think
You mean something like this: pure void doSomething(Tid parent) { int result = 1 + 1; parent.sendMessage(result); } void main() { spawn(&doSomething); ... wait for message ... } Perhaps 'spawn' could do this when you feed it with a pure function. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 17 2010
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 17 December 2010 16:02:27 Michel Fortin wrote:
 On 2010-12-17 16:47:05 -0500, wrzosk <dprogr gmail.com> said:
 I believe that when new thread is being created, all static data is
 initialized inside new thread. What worries me is the fact, that many
 'mini' threads will initialize all static data inside application. This
 may be somewhat time consuming.
 Maybe there should be a possibility to define 'Pure' thread that doesnt
 touch any static data, and in turn it could leave static constructors
 untouched.
 
 What do you think
You mean something like this: pure void doSomething(Tid parent) { int result = 1 + 1; parent.sendMessage(result); } void main() { spawn(&doSomething); ... wait for message ... } Perhaps 'spawn' could do this when you feed it with a pure function.
That would only work if send() can be pure, and I really doubt that it can be. If it _can_, then that might be a good optimization, but I'm betting that it can't be done. Someone more familiar with how send() work would have to say on that though, since I'm not intimately familiar with how send() works. - Jonathan M Davis
Dec 17 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-17 19:15:07 -0500, Jonathan M Davis <jmdavisProg gmx.com> said:

 On Friday 17 December 2010 16:02:27 Michel Fortin wrote:
 On 2010-12-17 16:47:05 -0500, wrzosk <dprogr gmail.com> said:
 I believe that when new thread is being created, all static data is
 initialized inside new thread. What worries me is the fact, that many
 'mini' threads will initialize all static data inside application. This
 may be somewhat time consuming.
 Maybe there should be a possibility to define 'Pure' thread that doesnt
 touch any static data, and in turn it could leave static constructors
 untouched.
 
 What do you think
You mean something like this: pure void doSomething(Tid parent) { int result = 1 + 1; parent.sendMessage(result); } void main() { spawn(&doSomething); ... wait for message ... } Perhaps 'spawn' could do this when you feed it with a pure function.
That would only work if send() can be pure, and I really doubt that it can be. If it _can_, then that might be a good optimization, but I'm betting that it can't be done. Someone more familiar with how send() work would have to say on that though, since I'm not intimately familiar with how send() works.
That's an interesting question. There isn't much difference between sendMessage and appending a value to an array. A pure function can append things to a mutable array it gets as a parameter, so why couldn't it append to another thread's message queue it gets as a parameter? Your question is probably more about wether manipulating synchronization primitives should be pure or not. Things like locking a mutex or waiting on a condition. I don't see why they should not. Consider a strongly pure function that creates its own synchronization primitive for internal usage; how is that going to affect the rest of the program considering that no other parts of the program has access the same instances of those primitives? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 17 2010