www.digitalmars.com         C & C++   DMDScript  

c++.stlsoft - c_str_data() ??

reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
Hey everybody, it's Dr Nic!

(Sorry, been spending too much time watching The Simpsons of late. <g>)

I've been meaning to bring this up for a while, and Pablo's recent work on
using string_view for returning slices of
strings has brought it back into focus.

Those of you who've been peering inside the STLSoft components to find out just
how they can be so fabulously compatible
with almost anything you can shake a stick at will have noticed a whole lot of
use of string access shims, in particular
c_str_ptr() and, to a lesser extent, c_str_len().

(Anyone not wholly persuaded by Shims can check them in Chapter 20 of Imperfect
C++ (http://imperfectcplusplus.com) or
in my August 2003 CUJ article "Generalised String Manipulation"
(http://www.cuj.com/documents/s=8681/cuj0308wilson/).
There's also a pretty pokey definition available at at
http://www.synesis.com.au/articles.html#whitepapers.)

Anyway, the main motivation for c_str_data() would be to provide a
non-necessarily-nul-terminated pointer to a
contiguous array of bytes of a given string (or string-able type) instance.
This would mean that when one is intending
to use the c-string form of a particular object _without_ relying on the
nul-termination, there could is an opportunity
for non-trivial optimisation.

For example, the memory_database class in the Open-RJ/STL mapping has a
generalised ctor:

class memory_database
    : public database_base
{
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_ptr(contents),
::stlsoft::c_str_len(contents), flags))
    {}


Since c_str_len() is being used, we may assume (and obviously one should check,
in the general case!) that
create_database_() does not rely on the nul-terminator. Hence, this could be
rewritten as:


class memory_database
    : public database_base
{
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_data(contents),
::stlsoft::c_str_len(contents), flags))
    {}

Now, for most string-able entities, this will make no difference. An MFC
CString will still return the base of its
nul-terminated allocation. An STL CWindow will still have to return a temporary
shim_string instance.

But for other types, such as the new string_view, and the Win32 Security API
type LSA_UNICODE_STRING, this would remove
the need to generate nul-terminated storage.

So, the picture is rosy in so far as things that already have nul-termination
(e.g. CString), or have no intrinsic
c-string of their own (e.g. HWND) are not affected, but those that have storage
which is not nul-terminated would result
in more efficient code. The downside is another c_str_XXX() to be aware of, but
since most people understand the
concepts of the standard String models' c_str() and data() (and length())
functions, this is pretty readily grokable.

What I'm interested in is whether anyone sees a downside? (I vaguely recall
having thought of one on a bike ride a few
weeks ago, but someone passed me and I had to chase 'em down. <g>) One thing
that did occur would be whether there might
be any circumstances where one might not be able to define a c_str_data(). (I
can't think of one, since the
characteristics of the return value of c_str() / c_str_ptr() answers the
requirements of the return value of data() /
c_str_data(), but I may well have missed something.)

Thoughts?

Cheers

Matthew
Mar 31 2005
parent "Matthew" <admin.hat stlsoft.dot.org> writes:
I remembered the arguments against: it might (in fact it should!) incline
library writers to use 
c_str_data()+c_str_len() instead of c_str_ptr() (alone). For cases where the
'string' is something for which a temporary 
shim string instance must be synthesised (e.g. ACE's ACE_INET_Addr), it'd
result in two conversions instead of one. 
Naturally, such cases are likely to be rare.

Anyway, I've added it in tonight, and it'll appear in beta 7. Each case where
c_str_data() is now used was already using 
c_str_ptr()+c_str_len(), so it's either the same or a gain in every extant
case. Time will tell how it pans out in 
future use.

Cheers

Matthew

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> wrote in message
news:d2i736$2aoc$1 digitaldaemon.com...
 Hey everybody, it's Dr Nic!

 (Sorry, been spending too much time watching The Simpsons of late. <g>)

 I've been meaning to bring this up for a while, and Pablo's recent work on
using string_view for returning slices of
 strings has brought it back into focus.

 Those of you who've been peering inside the STLSoft components to find out
just how they can be so fabulously 
 compatible
 with almost anything you can shake a stick at will have noticed a whole lot of
use of string access shims, in 
 particular
 c_str_ptr() and, to a lesser extent, c_str_len().

 (Anyone not wholly persuaded by Shims can check them in Chapter 20 of
Imperfect C++ (http://imperfectcplusplus.com) or
 in my August 2003 CUJ article "Generalised String Manipulation"
(http://www.cuj.com/documents/s=8681/cuj0308wilson/).
 There's also a pretty pokey definition available at at
http://www.synesis.com.au/articles.html#whitepapers.)

 Anyway, the main motivation for c_str_data() would be to provide a
non-necessarily-nul-terminated pointer to a
 contiguous array of bytes of a given string (or string-able type) instance.
This would mean that when one is intending
 to use the c-string form of a particular object _without_ relying on the
nul-termination, there could is an 
 opportunity
 for non-trivial optimisation.

 For example, the memory_database class in the Open-RJ/STL mapping has a
generalised ctor:

 class memory_database
    : public database_base
 {
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_ptr(contents),
::stlsoft::c_str_len(contents), flags))
    {}


 Since c_str_len() is being used, we may assume (and obviously one should
check, in the general case!) that
 create_database_() does not rely on the nul-terminator. Hence, this could be
rewritten as:


 class memory_database
    : public database_base
 {
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_data(contents),
::stlsoft::c_str_len(contents), flags))
    {}

 Now, for most string-able entities, this will make no difference. An MFC
CString will still return the base of its
 nul-terminated allocation. An STL CWindow will still have to return a
temporary shim_string instance.

 But for other types, such as the new string_view, and the Win32 Security API
type LSA_UNICODE_STRING, this would 
 remove
 the need to generate nul-terminated storage.

 So, the picture is rosy in so far as things that already have nul-termination
(e.g. CString), or have no intrinsic
 c-string of their own (e.g. HWND) are not affected, but those that have
storage which is not nul-terminated would 
 result
 in more efficient code. The downside is another c_str_XXX() to be aware of,
but since most people understand the
 concepts of the standard String models' c_str() and data() (and length())
functions, this is pretty readily grokable.

 What I'm interested in is whether anyone sees a downside? (I vaguely recall
having thought of one on a bike ride a few
 weeks ago, but someone passed me and I had to chase 'em down. <g>) One thing
that did occur would be whether there 
 might
 be any circumstances where one might not be able to define a c_str_data(). (I
can't think of one, since the
 characteristics of the return value of c_str() / c_str_ptr() answers the
requirements of the return value of data() /
 c_str_data(), but I may well have missed something.)

 Thoughts?

 Cheers

 Matthew


 
Apr 05 2005