www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Common Issue in Shared Code

About a month or so ago, I started trying to convert a codebase I've been
working on into a multithreaded system, and I've been hitting this sort of
thing over and over:
--------
// used as a field and as a local variable all over the codebase
struct Data {
int a,b,c;
 int total() {
 return a + b + c;
}
}

// has a Data as one of its members but never escapes a pointer to it
class Bob {
private:
Data _dat;
 public:
int currentTotal() {
return _dat.total();
 }
}
--------
Now, as part of my multithreaded refactor, I need to make Bob synchronized,
but that means the Data field inside it is shared, which means I can no
longer call the total() method in currentTotal().
To fix this, I could make Data synchronized as well, but Data is used all
over the codebase, most of the time as a local variable inside a function.
In my particular case, I see this a lot with a struct that represents a
location, which is just 2 bytes in my codebase, so adding a monitor would
more than double the size, and the locking overhead would be completely
unnecessary.
If I don't want to make it synchronized, I could just cast away shared
everywhere I use it as a field, which looks ugly and is confusing when I
look at the codebase.
If I don't want to cast away shared, I could just make Data shared and
assume that the owner will make sure it's not shared improperly, but at
this point I've disabled all help the type system could provide me.

Firstly, according to TDPL:
--------
For synchronized methods:
"Maybe not very intuitively, the temporary nature of synchronized entails
the rule that no address of a field can escape a synchronized address. If
that happened, some other portion of the code could access some data beyond
the temporary protection conferred by method-level synchronization."

For synchronized classes:
=95 All numeric types are not shared (they have no tail) so they can be
manipulated normally.
=95 Array fields declared with type T [ ] receive type shared(T) [ ] ; that
is, the head (the slice limits) is not shared and the tail (the contents of
the array) remains shared.
=95 Pointer fields declared with type T* receive type shared(T)*; that is,
the head (the pointer itself) is not shared and the tail (the pointed-to
data) remains shared.
=95 Class fields declared with type T receive type shared(T). Classes are
automatically by-reference, so they're "all tail."
These rules apply on top of the no-escape rule described in the previous
section.
One direct consequence is that operations affecting direct fields of the
object can be freely reordered and optimized inside the method, as if
sharing has been temporarily suspended for them=97which is exactly what
synchronized does.
--------

At a first glance, it seems like the first rule should apply for structs
(which would mean it should address "value types"), but it can't because a
struct could contain a reference to another object, and that reference
should be transitively shared. Typing a struct as shared if it contains a
reference and unshared otherwise would just be confusing, but this use case
is one that the language does not currently address in a satisfying way.

When I flag a type as shared, all instances of it are forced to become
shared, but the compiler assumes that the programmer has properly
synchronized things such that sharing instances of the type is safe. Why,
then, can I not force the compiler to assume I've properly synchronized
things for a field of a class? In this case, the effect would be the
opposite - the field wouldn't be flagged as shared, but supposing we had
such a keyword, it would act as a much more limited version of the "shared"
keyword because I'm only forcing the compiler to assume I've done things
properly within the context of a class.
The keyword would have to be restricted such that it could only be applied
to private fields, and the compiler would continue to enforce (as much as
is reasonable) that the address of the field does not escape.

I believe that this case of data sharing will appear and frustrate
programmers in almost any multithreaded program, and that finding a
satisfying solution to allow the language to provide as many guarantees as
possible is worthwhile.

Any thoughts?
Nov 20 2011