digitalmars.D - Thread Attributes

Jonathan Marler (94/94) Jul 10 2014 I had an idea this morning and wanted to post it to see what

Jacob Carlborg (6/29) Jul 10 2014 I'm not sure I understand but if the variable can only be accessed from

Jonathan Marler (28/28) Jul 11 2014 I'm not sure how AST macros would assist in thread safety the way

Jacob Carlborg (38/56) Jul 13 2014 Looking at the first example:

Jonathan Marler (69/91) Jul 14 2014 Ah I see now. It looks like AST macros are going to open up alot

Jacob Carlborg (21/54) Jul 15 2014 Yeah, probably. I think the problem is that threads are not so tightly

Timon Gehr (2/3) Jul 11 2014 How do you make sure there is at most one thread of each kind?

Jonathan Marler (26/29) Jul 12 2014 Good question. First, since the language doesn't support starting

"Jonathan Marler" <johnnymarler gmail.com> writes:

I had an idea this morning and wanted to post it to see what 
people think.

I know we have alot of attributes already but I'm wondering if 
people think adding a thread attribute could be useful.  
Something that says a variable or function or class/struct can 
only be accessed by code that has been tagged with the same 
thread name.  Something like this.

// This variable is allocated as a true shared global
// with a fixed location in memory since it can only
// be accessed by one thread.
 thread:main
int mainThreadGlobal;

 thread:main
int main(string[] args)
{
   // Start the worker thread at some point
}

 thread:worker
void workerLoop()
{
   // do some work , cannot access mainThreadGlobal
}

With this information the compiler could help verify thread 
safety at compile time.
This idea is far from refined as I just thought of it this 
morning, but I had a couple thoughts.
One problem I foresaw was how to handle passing callback 
functions into other libraries.
If the callback function is tagged with a thread name, then maybe 
you have to call that function on the same
thread that the callback is tagged with?
In example:

FindLibrary:
     // No thread attribute because it is a library function
     void find(string haystack, string needle, void function(int 
offset))
     {
       // logic...
     }
	
MyProgram:

	 thread:worker
	uint[] offsets;
	
	 thread:worker
	void foundOffset(int offset)
	{
	  offsets ~= offset;
	}
	
	 thread:worker
	void callFind(string data)
	{
	  find(data, "importantstring", &foundOffset);
	}	
	
	void main(string[] args)
	{
	    // start thread for callFind
	}


Now let's say that we didn't tag callFind with the  thread:worker 
attribute.
The compiler would need to know the source code of the find 
function but could throw an error when it sees that it calls the 
callback function passed into it tagged with a specific thread.
Or if you didn't know the source code of the find function, you 
could assume that it calls the callback function on it's own 
thread and just throw an error whenever you pass a callback 
function into another function that isn't tagged with the same 
thread you are currently executing on.

When something is not tagged with a thread (which would likely 
include any kind of library/api function, or any code in a single 
threaded application), then no checking is done.
But the  thread attribute would be guarantee that any function 
tagged with that attribute can only be called by a function 
tagged with the same attribute.


The other thought I had was handling synchronization.

Let's say you have a function that you don't mind being called by 
other threads but you still want it to be synchronized.
You could add a synchronized attribute that takes an object:

object mySyncObject;

 sync(mySyncObject)
void doSomething()
{
   // I don't need to synchronize on the mySyncObject here because 
the compiler will verify that
   // anyone who calls this function will have already 
synchronized on it.
   // Therefore I can assume I am already synchronized on the 
mySyncObject without actually doing it...yay! :)
}


So what do people think?  Like I said I just thought of this and 
haven't had time think about more corner cases so feel free to 
nit pick:)

Jul 10 2014

Jacob Carlborg <doob me.com> writes:

On 10/07/14 20:12, Jonathan Marler wrote:
 I had an idea this morning and wanted to post it to see what people think.

 I know we have alot of attributes already but I'm wondering if people
 think adding a thread attribute could be useful. Something that says a
 variable or function or class/struct can only be accessed by code that
 has been tagged with the same thread name.  Something like this.

 // This variable is allocated as a true shared global
 // with a fixed location in memory since it can only
 // be accessed by one thread.
  thread:main
 int mainThreadGlobal;

  thread:main
 int main(string[] args)
 {
    // Start the worker thread at some point
 }

  thread:worker
 void workerLoop()
 {
    // do some work , cannot access mainThreadGlobal
 }

I'm not sure I understand but if the variable can only be accessed from 
a single thread, why not make it thread local?

 [SNIP]
 So what do people think?  Like I said I just thought of this and haven't
 had time think about more corner cases so feel free to nit pick:)

BTW, both of these features sounds like a job for AST macros.

-- 
/Jacob Carlborg

Jul 10 2014

"Jonathan Marler" <johnnymarler gmail.com> writes:

I'm not sure how AST macros would assist in thread safety the way 
that this feature would.  Maybe you could elaborate?

To explain a little more, when you put a  thread:name or 
 sync(object) attribute on something, the compiler will guarantee 
that no safe D code will ever use that code or data unless it is 
either on the given thread or can guarantee at compile time that 
it has synchronized on the given object.

You mentioned making the variable thread local.  So if I'm 
understanding, you're saying just make it a regular global 
variable.  However, the point is that if you tell the compiler 
that it can only be accessed by a single thread then it doesn't 
need to be thread local.  Real global variables are preferred 
over thread local for performance/memory reasons.  Their address 
is known at compile time and you don't need to allocate a new 
instance for every thread.  The only reason for thread local 
variables is to alleviate problems with multithreaded 
applications, but using an attribute like this would allow 
someone to have the benefit of a real global variable without 
exposing it to other threads fixing the synchronization issue.

D has its own way of handling multithreaded applications but I 
still have applications that use the old idioms to get lightning 
performance and minimize memory usage.  A feature like this could 
solve alot of problems the old idioms use.  There are many times 
that I write a function and I have to make a mental note (or a 
comment) that this function should only ever be called by a 
certain thread.  Or that this function should only be called by 
code that has locked on a certain object.  It would be wonderful 
if the compiler could guarantee that for me.

Jul 11 2014

Jacob Carlborg <doob me.com> writes:

On 2014-07-11 19:07, Jonathan Marler wrote:
 I'm not sure how AST macros would assist in thread safety the way that
 this feature would.  Maybe you could elaborate?

Looking at the first example:

 thread:main
int mainThreadGlobal;

 thread:main
int main(string[] args)
{
   // Start the worker thread at some point
}

 thread:worker
void workerLoop()
{
   // do some work , cannot access mainThreadGlobal
}

This would be implemented as a declaration macro [1], something like this:

macro thread (Context, Ast!(string) name, Declaration decl)
{
     if (decl.isVariable)
         decl.attributes ~= Thread(name);

     else if (decl.isCallable)
     {
         foreach (var ; decl.accessedVariables)
         {
             if (auto attr = var.getAttribute!(Thread))
                 if (attr.name != name)
                     context.compiler.error("Cannot access variable with 
thread name " ~ attr.name ~ " from callable with thread name " ~ name);
         }
     }

     return decl;
}

Usage:

 thread("main") int mainThreadGlobal;
 thread("worker") void workerLoop ();

 To explain a little more, when you put a  thread:name or  sync(object)
 attribute on something, the compiler will guarantee that no safe D code
 will ever use that code or data unless it is either on the given thread
 or can guarantee at compile time that it has synchronized on the given
 object.

 You mentioned making the variable thread local.  So if I'm
 understanding, you're saying just make it a regular global variable.
 However, the point is that if you tell the compiler that it can only be
 accessed by a single thread then it doesn't need to be thread local.
 Real global variables are preferred over thread local for
 performance/memory reasons.  Their address is known at compile time and
 you don't need to allocate a new instance for every thread.  The only
 reason for thread local variables is to alleviate problems with
 multithreaded applications, but using an attribute like this would allow
 someone to have the benefit of a real global variable without exposing
 it to other threads fixing the synchronization issue.

Makes sense.

[1] http://wiki.dlang.org/DIP50#Declaration_macros

-- 
/Jacob Carlborg

Jul 13 2014

"Jonathan Marler" <johnnymarler gmail.com> writes:

On Sunday, 13 July 2014 at 10:45:29 UTC, Jacob Carlborg wrote:
 This would be implemented as a declaration macro [1], something 
 like this:

 macro thread (Context, Ast!(string) name, Declaration decl)
 {
     if (decl.isVariable)
         decl.attributes ~= Thread(name);

     else if (decl.isCallable)
     {
         foreach (var ; decl.accessedVariables)
         {
             if (auto attr = var.getAttribute!(Thread))
                 if (attr.name != name)
                     context.compiler.error("Cannot access 
 variable with thread name " ~ attr.name ~ " from callable with 
 thread name " ~ name);
         }
     }

     return decl;
 }

 Usage:

  thread("main") int mainThreadGlobal;
  thread("worker") void workerLoop ();

Ah I see now.  It looks like AST macros are going to open up alot 
of new paradigms, I'm excited to see how they progress and what 
they can do.  It doesn't get us all the way there in this example 
but its a very good alternative without having to add anything to 
the compiler. Your macro would help the developer ensure that 
particular variables and functions are only touched by the 
appropriate code.  If the feature only did this I would use a 
different name such as "restrict" or something.  However, in 
order for the compiler to perform thread-optimizations based on 
this information it would have to be apart of the language.

It also doesn't handle the synchronized case. For that you would 
need a knowledge of the execution paths of your functions to 
determine what parts are locked and what parts are not (Unless of 
course AST macros could support that, I would be very pleasently 
surprised if they did).

This AST macro is very intriguing though.  Its like you're 
writing code that can analyze your code as you develop it.  This 
is a really cool idea.

As I've had more time to think on this here's one of the 
potential consequence I've thought of.

*********** Deprecating usage of the __gshared hack ***********

Here's some cases that a developer may think __gshared usage is 
justified.
   1. In a single threaded application
   2. When the developer knows that its only possible for one 
thread to access that global at a time or,
   3. When the developer is using some type of locking scheme to 
access the globals is the correct design.

In a single threaded application then you could add the thread 
attribute with the same name to every single function and 
variable, but this would be very unnecessary. Instead the 
developer could use a pragme to tell the compiler to make sure it 
is single threaded, but since adding this feature would require 
the compiler to know when threads are started anyway, the 
compiler could determine that an application was single threaded 
on its own.  The pragma would just be a sanity check so that the 
developer would be notified when their code has changed to break 
the initial design.

In the second case, if you know that only one thread will ever 
access the global variable(s) then you may be willing to take the 
risk of making the variable(s) __gshared and just remember to 
make sure you don't break your own rule by using it on another 
thread.  This feature would allow the compiler to verify this at 
compile time taking pressure off the developer.
		
In the third case, the developer has designed the code to use the 
global(s) so long as they lock on the appropriate object first.  
This is a huge risk because the safety of the code is up to the 
developer remembering to check that every access of the global(s) 
has locked on the appropriate object.  Adding the sync(object) 
attribute would allow the compiler to verify this at compile time 
for the developer, and the compiler would again have no need to 
make the globals thread local.

There's one more odd corner case I thought of.  Suppose you know 
that only one thread will access the global data but this thread 
could change over the course of the program's life (impossible to 
know at compile time which thread it is).  In this case you would 
use the sync(object) design pattern and just have the appropriate 
thread lock on the object whenever it is started.  Instead of 
locking before every access you would just lock the object as 
soon as the thread is started and unlock it when the thread dies. 
  It would also be beneficial if the thread could throw an 
exception if the object is already locked on.  This is a 
different kind of way to used locked objects, instead of using 
them to synchronize small accesses to shared data, it is used to 
create a "slot" so that only one thread can be in the slot at a 
time.

***********************************************************************************

Jul 14 2014

Jacob Carlborg <doob me.com> writes:

On 14/07/14 23:26, Jonathan Marler wrote:

 Ah I see now.  It looks like AST macros are going to open up alot of new
 paradigms, I'm excited to see how they progress and what they can do.
 It doesn't get us all the way there in this example but its a very good
 alternative without having to add anything to the compiler. Your macro
 would help the developer ensure that particular variables and functions
 are only touched by the appropriate code.  If the feature only did this
 I would use a different name such as "restrict" or something.  However,
 in order for the compiler to perform thread-optimizations based on this
 information it would have to be apart of the language.

Yeah, probably. I think the problem is that threads are not so tightly 
couple with the language. They're mostly implemented in the runtime.

Perhaps it's possible to if you could attach a thread name, globally to 
indicate the current thread. When the  thread macro is used on a 
function it would check if a new thread is created. If it is, it would 
attach a name to the current thread variable, somewhere in the AST.

When the  thread macro is used on a variable, it would check context and 
get all callers (if possible). It would get the thread name of the 
caller and see if it matches the current global thread name. Otherwise 
issue an error.

I have no idea if this is possible, if it is it sounds quite complicated.

It would be easy to verify at runtime at least.

 It also doesn't handle the synchronized case. For that you would need a
 knowledge of the execution paths of your functions to determine what
 parts are locked and what parts are not (Unless of course AST macros
 could support that, I would be very pleasently surprised if they did).

If you could do something similar as above and get the caller of a 
function with the  sync macro/attribute. The AST of a synched object 
(mySyncObject) would have some way to indicate if it's currently synced 
or not.

 This AST macro is very intriguing though.  Its like you're writing code
 that can analyze your code as you develop it.  This is a really cool idea.

 As I've had more time to think on this here's one of the potential
 consequence I've thought of.

 *********** Deprecating usage of the __gshared hack ***********

 Here's some cases that a developer may think __gshared usage is justified.
    1. In a single threaded application
    2. When the developer knows that its only possible for one thread to
 access that global at a time or,
    3. When the developer is using some type of locking scheme to access
 the globals is the correct design.

 In a single threaded application then you could add the thread attribute
 with the same name to every single function and variable, but this would
 be very unnecessary. Instead the developer could use a pragme to tell
 the compiler to make sure it is single threaded, but since adding this
 feature would require the compiler to know when threads are started
 anyway, the compiler could determine that an application was single
 threaded on its own.  The pragma would just be a sanity check so that
 the developer would be notified when their code has changed to break the
 initial design.

That would be a lot easier to do in the compiler. Or just remove it from 
the runtime.

-- 
/Jacob Carlborg

Jul 15 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/10/2014 08:12 PM, Jonathan Marler wrote:
 So what do people think?

How do you make sure there is at most one thread of each kind?

Jul 11 2014

"Jonathan Marler" <johnnymarler gmail.com> writes:

On Friday, 11 July 2014 at 18:56:07 UTC, Timon Gehr wrote:
 On 07/10/2014 08:12 PM, Jonathan Marler wrote:
 So what do people think?

 How do you make sure there is at most one thread of each kind?

Good question. First, since the language doesn't support starting 
threads itself (like Go) but instead uses a library, the compiler 
would likely need to be modified to semantically understand 
whenever a line of code is starting a thread (I'm assuming it 
doesn't already).  If this feature were interesting enough I'm 
sure Walter would have an opinion on the right way to accomplish 
this.

Then how do you make sure that every named thread is only started 
once?  The ideal situation would be to verify this at compile 
time.  This is possible in some situations.  If it is not 
possible to verify this at compile time then the compiler could 
generate a synchronized global pointer to every named thread to 
prevent each one from getting started more than once.  However, 
one thought that comes to mind is if the developer cannot change 
the code to be able to verify that the thread is only started 
once at compile-time then maybe their code is poorly designed or 
they are using this feature incorrectly.

This is just a random thought I had after writing this but maybe 
if you could somehow tell the compiler that a section of code 
will only ever be executed once it would help in this analysis. 
 executeonce.  The main function would obviously only be executed 
once, so any function that executes once would need to be called 
at most once and you can directly trace where it is called from 
the main thread.  However I'm not sure how useful this feature 
would be in the general case...I would have to think on it more.

Jul 12 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Thread Attributes