www.digitalmars.com         C & C++   DMDScript  

D - Local functions

reply "Matthew Wilson" <dmd synesis.com.au> writes:
Just trimmed down the implementation of a function, in which I'd had to
duplicate similar behaviour in multiple places to achieve optimal
performance, by using a local function.

Totally love it!
Mar 27 2003
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Matthew Wilson" <dmd synesis.com.au> wrote in message
news:b5uk7n$16tj$1 digitaldaemon.com...
 Just trimmed down the implementation of a function, in which I'd had to
 duplicate similar behaviour in multiple places to achieve optimal
 performance, by using a local function.

 Totally love it!
Local functions are something we've all learned well to live without in C/C++, and have developed all kinds of idioms and kludges to compensate for it. The neat thing is when one discovers that local functions are the natural solution, and those idioms and kludges are not!
Mar 27 2003
parent "Matthew Wilson" <dmd synesis.com.au> writes:
Well it's another of those things that has me much more convinced than I was
last year that D has a real commercial future.

Now, if we can just sort out this nonsense about equality and identity ...


"Walter" <walter digitalmars.com> wrote in message
news:b5vmgd$2a17$1 digitaldaemon.com...
 "Matthew Wilson" <dmd synesis.com.au> wrote in message
 news:b5uk7n$16tj$1 digitaldaemon.com...
 Just trimmed down the implementation of a function, in which I'd had to
 duplicate similar behaviour in multiple places to achieve optimal
 performance, by using a local function.

 Totally love it!
Local functions are something we've all learned well to live without in C/C++, and have developed all kinds of idioms and kludges to compensate
for
 it. The neat thing is when one discovers that local functions are the
 natural solution, and those idioms and kludges are not!
Mar 27 2003
prev sibling parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Matthew Wilson" <dmd synesis.com.au> escribiσ en el mensaje
news:b5uk7n$16tj$1 digitaldaemon.com...
| Just trimmed down the implementation of a function, in which I'd had to
| duplicate similar behaviour in multiple places to achieve optimal
| performance, by using a local function.
|
| Totally love it!
|
|
|
|

This might sound dumb, but what exactly is a local function? A nested
function? A private function? Un-dumb me, please :D

—————————————————————————
Carlos Santander


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.465 / Virus Database: 263 - Release Date: 2003-03-25
Mar 27 2003
parent reply "Matthew Wilson" <dmd synesis.com.au> writes:
Walter and I have been faffing around with some D library performance stuff

C, C++, D and Java.

I found that the performance of string.split() was preposterously bad, and
wrote two much faster, and more flexible, versions

  char[][] tokenise(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate); // character delimiter, e.g. ';'
  char[][] tokenise(char[] source, char[] delimiter, bit bElideBlanks, bit
bZeroTerminate); // string delimiter, e.g. "\r\n"

Walter then took up the gauntlet and fixed up string.split() such that it
performed faster for character delimiters than my tokenise method. (The
string delimiter of tokenise() calls string.split() and then trims the
blanks in situ if requested. Hence string.split() is slightly better
performing than tokenise() for string delimiters, albeit that the difference
is very slight.)

In turn, I've addressed some issues in my implementation and made tokenise()
for char delimiters faster again. The ball is currently in Walter's court,
but I'm having to seal the article results today, so whatever happens I've
won in print! (Don't worry, this nonsense contest won't be featuring outside
this forum ;)

Anyway, one of the ways in which I increased the speed of the original
version of tokenise() was to split out the loops for the bElideBlanks = true
and bElideBlanks = false. Consider the implementation

char[][] tokenise(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate)
{
 int   i;
 int   cDelimiters = 128;
 char[][] tokens  = new char[][cDelimiters];
 int   start;
 int   begin;
 int   cTokens;

 if(bElideBlanks)
 {
  for(start = 0, begin = 0, cTokens = 0; begin < source.length; ++begin)
  {
   if(source[begin] == delimiter)
   {
    if(start < begin)
    {
     if(!(cTokens < tokens.length))
     {
      tokens.length = tokens.length * 2;
     }

     tokens[cTokens++] = source[start .. begin];
    }

    start = begin + 1;
   }
  }

  if(start < begin)
  {
   if(!(cTokens < tokens.length))
   {
    tokens.length = tokens.length * 2;
   }

   tokens[cTokens++] = source[start .. begin];
  }
 }
 else
 {
  for(start = 0, begin = 0, cTokens = 0; begin < source.length; ++begin)
  {
   if(source[begin] == delimiter)
   {
    if(!(cTokens < tokens.length))
    {
     tokens.length = tokens.length * 2;
    }

    tokens[cTokens++] = source[start .. begin];

    start = begin + 1;
   }
  }

  if(!(cTokens < tokens.length))
  {
   tokens.length = tokens.length * 2;
  }

  tokens[cTokens++] = source[start .. begin];
 }

 tokens.length = cTokens;

 if(bZeroTerminate)
 {
  for(i = 0; i < tokens.length; ++i)
  {
   tokens[i] ~= (char)0;
  }
 }

 return tokens;
}

char[][] tokenize(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate)
{
 return tokenise(source, delimiter, bElideBlanks, bZeroTerminate);
}

As you can see, there's a lot of duplicated code, which is always something
that is to be avoided where possible. All that

  if(!(cTokens < tokens.length))
  {
   tokens.length = tokens.length * 2;
  }

is just asking for bugs to creep in. But by creating a local function
ensure_length() I can trim the function length, and even more importantly
ensure that the array-growing algorithm is managed in one place. The whole
function was thus rewritten:

char[][] tokenise(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate)
{
 int   i;
 int   cDelimiters = 128;
 char[][] tokens  = new char[][cDelimiters];
 int   start;
 int   begin;
 int   cTokens;

 /// Ensures that the tokens array is big enough  *** Carlos, this is the
local function ***
 void ensure_length()
 {
  if(!(cTokens < tokens.length))
  {
   tokens.length = tokens.length * 2;
  }
 }

 if(bElideBlanks)
 {
  for(start = 0, begin = 0, cTokens = 0; begin < source.length; ++begin)
  {
   if(source[begin] == delimiter)
   {
    if(start < begin)
    {
     ensure_length();

     tokens[cTokens++] = source[start .. begin];
    }

    start = begin + 1;
   }
  }

  if(start < begin)
  {
   ensure_length();

   tokens[cTokens++] = source[start .. begin];
  }
 }
 else
 {
  for(start = 0, begin = 0, cTokens = 0; begin < source.length; ++begin)
  {
   if(source[begin] == delimiter)
   {
    ensure_length();

    tokens[cTokens++] = source[start .. begin];

    start = begin + 1;
   }
  }

  ensure_length();

  tokens[cTokens++] = source[start .. begin];
 }

 tokens.length = cTokens;

 if(bZeroTerminate)
 {
  for(i = 0; i < tokens.length; ++i)
  {
   tokens[i] ~= (char)0;
  }
 }

 return tokens;
}

There's still scope for more local functions, with

    tokens[cTokens++] = source[start .. begin];

but I simply haven't got to that yet. I probably will before I submit it to
Walter for Phobosisation.

Hope that helps. :)

Matthew



"Carlos Santander B." <carlos8294 msn.com> wrote in message
news:b5vp1o$2cad$1 digitaldaemon.com...
 "Matthew Wilson" <dmd synesis.com.au> escribiσ en el mensaje
 news:b5uk7n$16tj$1 digitaldaemon.com...
 | Just trimmed down the implementation of a function, in which I'd had to
 | duplicate similar behaviour in multiple places to achieve optimal
 | performance, by using a local function.
 |
 | Totally love it!
 |
 |
 |
 |

 This might sound dumb, but what exactly is a local function? A nested
 function? A private function? Un-dumb me, please :D

 -------------------------
 Carlos Santander


 ---
 Outgoing mail is certified Virus Free.
 Checked by AVG anti-virus system (http://www.grisoft.com).
 Version: 6.0.465 / Virus Database: 263 - Release Date: 2003-03-25
Mar 27 2003
next sibling parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Matthew Wilson" <dmd synesis.com.au> escribiσ en el mensaje
news:b5vr97$2e7s$1 digitaldaemon.com...
| ...
|  /// Ensures that the tokens array is big enough  *** Carlos, this is the
| local function ***
|  void ensure_length()
|  {
|   if(!(cTokens < tokens.length))
|   {
|    tokens.length = tokens.length * 2;
|   }
|  }
| ...
| Hope that helps. :)
|
| Matthew
|

Well, that was obviously much more information than I was asking, but yes,
it helped. Thanks.
Of course, you could've only said "local functions=nested functions"... lol

—————————————————————————
Carlos Santander


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.465 / Virus Database: 263 - Release Date: 2003-03-25
Mar 27 2003
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Carlos Santander B." <carlos8294 msn.com> wrote in message
news:b5vsa9$2f6c$1 digitaldaemon.com...
 Well, that was obviously much more information than I was asking, but yes,
 it helped. Thanks.
 Of course, you could've only said "local functions=nested functions"...
lol True, but it is a nice example of what they are good for!
Mar 27 2003
next sibling parent "Carlos Santander B." <carlos8294 msn.com> writes:
"Walter" <walter digitalmars.com> escribiσ en el mensaje
news:b5vsn3$2fgb$4 digitaldaemon.com...
|
| "Carlos Santander B." <carlos8294 msn.com> wrote in message
| news:b5vsa9$2f6c$1 digitaldaemon.com...
| > Well, that was obviously much more information than I was asking, but
yes,
| > it helped. Thanks.
| > Of course, you could've only said "local functions=nested functions"...
| lol
|
| True, but it is a nice example of what they are good for!
|
|

Yes, it is.

—————————————————————————
Carlos Santander


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.465 / Virus Database: 263 - Release Date: 2003-03-25
Mar 27 2003
prev sibling parent reply "Matthew Wilson" <dmd synesis.com.au> writes:
Aye

Maybe we'll slip that into some documentary form in the future ... ?

"Walter" <walter digitalmars.com> wrote in message
news:b5vsn3$2fgb$4 digitaldaemon.com...
 "Carlos Santander B." <carlos8294 msn.com> wrote in message
 news:b5vsa9$2f6c$1 digitaldaemon.com...
 Well, that was obviously much more information than I was asking, but
yes,
 it helped. Thanks.
 Of course, you could've only said "local functions=nested functions"...
lol True, but it is a nice example of what they are good for!
Mar 27 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Matthew Wilson" <dmd synesis.com.au> wrote in message
news:b5vte2$2g2u$1 digitaldaemon.com...
 Aye

 Maybe we'll slip that into some documentary form in the future ... ?
It's a bit long for that. If documentation examples aren't really, really short, people just skip over them. For examples of what I mean by really, really short, see www.digitalmars.com/d/pretod.html, www.digitalmars.com/d/ctod.html, www.digitalmars.com/d/cpptod.html.
Mar 28 2003
parent "Matthew Wilson" <dmd synesis.com.au> writes:
Gotcha

"Walter" <walter digitalmars.com> wrote in message
news:b628s6$17e8$1 digitaldaemon.com...
 "Matthew Wilson" <dmd synesis.com.au> wrote in message
 news:b5vte2$2g2u$1 digitaldaemon.com...
 Aye

 Maybe we'll slip that into some documentary form in the future ... ?
It's a bit long for that. If documentation examples aren't really, really short, people just skip over them. For examples of what I mean by really, really short, see www.digitalmars.com/d/pretod.html, www.digitalmars.com/d/ctod.html, www.digitalmars.com/d/cpptod.html.
Mar 28 2003
prev sibling parent "Matthew Wilson" <dmd synesis.com.au> writes:
I'm not known for my terseness. It gets me in all kinds of trouble with my
publishers. "Less is more!" they cry. "Phooey! More is more!" I cry (to
myself, while I duly cut away at my lovely text)

In any case, you're more than welcome.

:)

"Carlos Santander B." <carlos8294 msn.com> wrote in message
news:b5vsa9$2f6c$1 digitaldaemon.com...
 "Matthew Wilson" <dmd synesis.com.au> escribiσ en el mensaje
 news:b5vr97$2e7s$1 digitaldaemon.com...
 | ...
 |  /// Ensures that the tokens array is big enough  *** Carlos, this is
the
 | local function ***
 |  void ensure_length()
 |  {
 |   if(!(cTokens < tokens.length))
 |   {
 |    tokens.length = tokens.length * 2;
 |   }
 |  }
 | ...
 | Hope that helps. :)
 |
 | Matthew
 |

 Well, that was obviously much more information than I was asking, but yes,
 it helped. Thanks.
 Of course, you could've only said "local functions=nested functions"...
lol
 -------------------------
 Carlos Santander


 ---
 Outgoing mail is certified Virus Free.
 Checked by AVG anti-virus system (http://www.grisoft.com).
 Version: 6.0.465 / Virus Database: 263 - Release Date: 2003-03-25
Mar 27 2003
prev sibling parent reply Mark T <Mark_member pathlink.com> writes:
char[][] tokenise(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate)
[snip]
 if(bElideBlanks)
 {
<rant> can't we leave hungarian notation back in the 1980's where it belongs, I'm only commenting, because you are writing an article for publication and all the newbies will think that this is the proper style for D, yuk there is guy at work that can't define a C++ class without a capital "C" in front of the name (of course he first learned C++ with Visual C++), for us older folks who used the Borland C++ compiler (when the Visual C++ compiler was crap) we should start all our class names with a "T" :) the lowercase "b" is completely unnecessary for understanding the code and just makes it more unreadable </rant> sorry for the rant
Mar 28 2003
parent reply "Matthew Wilson" <dmd synesis.com.au> writes:
Wow. Here you're opening a very large box. :)

I'll attempt, against my wont, to be succinct.

I totally eschew the use of Hungarian to denote type. For example.

void(char const *pcszName, int iIndex); // C
void(string strName, int iIndex); // C++
void(char[] strName, int iIndex); // D

This is hokey and pointless, and often a great source of error. For example,
when one changes the int to a long, iIndex stops becoming redundant, and
starts becoming a lie. Very bad. Don't do it. Ever!

They should all read as

void(char const *name, int index); // C
void(string name, int index); // C++
void(char[] name, int index); // D

However, where you're taking issue is in how I depart from the accepted norm
of the day which is to decorate nothing. I disagree with this proposition,
and think that sensible variable name decoration can be of great benefit.

With member variables

class String
{
  this(char[] data, int length)
  {
        // Nasties here. Do we qualify with "this."?
       this.data = data;
       this.length = length;
      // Yuck!
  }

private:
  char[]    data;
  int          length; // I know data has its own length. This is just an
example!!
}


m_. Hence

class String
{
  this(char[] data, int length)
  {
       m_data = data;
       m_length = length;
  }

private:
  char[]    m_data;
  int          m_length;
}

I know UNIX heads always append the ever so readable '_', as in

  this(char[] data, int length)
  {
       data_ = data;
       length_ = length;
  }

but this is just truculence. We may rightly lambast M$ for a great many
ills, but they have actually done a couple of good things. (In fact, I often
have a laugh about MFC in thinking that all those many million man days of
effort have resulted in one useful thing, the humble "m_". By using it, you
can make an ironic testament to the power of wizard-assisted collective
delution.)

Ok, but you were objecting to (bElideBlanks). This is another "departure" of
mine from the current mode. As I said, I object to the use of Hungarian to
denote type. However, I strongly believe in its (judicious) use to denote
purpose.

Hence, you would see something like the following

char** tokenise(char const *source, char const *delimiter, int
bElideBlanks); // C
vector<string> tokenise(string source, string delimiter, bool bElideBlanks);
// C++
char[][] tokenise(char[] source, char[] delimter, bit bElideBlanks); // D

In whatever language, I choose to denote bElideBlanks as a boolean,
precisely because it is boolean behaviour that I'm after. Whether the actual
type is int (C), bool (C++) or bit (D), I am after boolean behaviour, so I
denote it with a b.

I guess you could argue that it's redundant in the C++, but that's kind of
the counter-Hungarian argument from the other end. I believe that purpose is
primary, and type secondary, so just as I do not add a prefix on the basis
of type, nor do I fail to prefix on the same basis. Thus it is consistent,
and a nice side effect is that porting is a breeze. This is an important
point. Remember that the most compelling criticism of Hungarian is that it
kills portability. My own little dialect aids portability.

Having lots of experience in writing C/C++ libs that have to work with
different character encoding conventions, I also make use of the prefixes
cb - count of bytes - and cch - count of characters (these can be short,
int, size_t, long, whatever. The type doesn't matter), since it is often a
very important distinction. There are others, but I think you get the point.

- Classic Hungarian notation is redundant because it just tells you the type
you can see/deduce anyway. My variant does not purport to tell you anything
you already know (save for a few cases), rather it aids meaning by giving
you information you may not have (e.g. bit bElideBlanks is a boolean
true/false rather than a bit 1/0)
- Classic Hungarian notation is an accident waiting to happen, and a
significant portability block. Mine does not present the accident at all,
and is an aid to portability.
- Classic Hungarian is as ugly as hell. Mine probably is as well, but I'm
used to it, so sue me. :)

Naturally, I'm not 100% with the application of these principles, as I'm
only human. Where I do falter, it's usually to omit the decoration, rather
than add.

In summary, apart from the uglification of code, my scheme is of benefit
where Hungarian is a detriment. You may not like it. That's cool. I'm not
asking anyone else to use it. But I am able to go back to code that is 5 or
more years old, readily understand and change it, or port it to a new
language (as I have recently done to D) without any hassles.

Hope that clears it up. (Bet you wish you'd never asked!)

Matthew


"Mark T" <Mark_member pathlink.com> wrote in message
news:b61kqs$o8f$1 digitaldaemon.com...
char[][] tokenise(char[] source, char delimiter, bit bElideBlanks, bit
bZeroTerminate)
[snip]
 if(bElideBlanks)
 {
<rant> can't we leave hungarian notation back in the 1980's where it belongs, I'm
only
 commenting, because you are writing an article for publication and all the
 newbies will think that this is the proper style for D, yuk

 there is guy at work that can't define a C++ class without a capital "C"
in
 front of the name (of course he first learned C++ with Visual C++), for us
older
 folks who used the Borland C++ compiler (when the Visual C++ compiler was
crap)
 we should start all our class names with a "T" :)

 the lowercase "b" is completely unnecessary for understanding the code and
just
 makes it more unreadable
 </rant>

 sorry for the rant
Mar 28 2003
parent reply Burton Radons <loth users.sourceforge.net> writes:
Matthew Wilson wrote:
 Wow. Here you're opening a very large box. :)
 
 I'll attempt, against my wont, to be succinct.
 
 I totally eschew the use of Hungarian to denote type. For example.
 
 void(char const *pcszName, int iIndex); // C
 void(string strName, int iIndex); // C++
 void(char[] strName, int iIndex); // D
 
 This is hokey and pointless, and often a great source of error. For example,
 when one changes the int to a long, iIndex stops becoming redundant, and
 starts becoming a lie. Very bad. Don't do it. Ever!
 
 They should all read as
 
 void(char const *name, int index); // C
 void(string name, int index); // C++
 void(char[] name, int index); // D
 
 However, where you're taking issue is in how I depart from the accepted norm
 of the day which is to decorate nothing. I disagree with this proposition,
 and think that sensible variable name decoration can be of great benefit.
But you're writing for Phobos. The style guide says, in total: "Just say no." I don't think there's an "unless" in there. Walter's let some code in, some of mine, which didn't use his style to the letter, and that was a mistake. I wouldn't do that for a library for personal use, much less a language runtime. It's up to Walter whether he'll include code which uses both Hungarian and proper English spellings (I use proper English myself, so I'm not biased against it - it's just what Phobos uses), but I hope he doesn't.
Mar 29 2003
parent "Matthew Wilson" <dmd synesis.com.au> writes:
I hadn't thought about it in that light. I assumed that it would change
radically were it to be put in Phobos, and therefore hadn't specifically
considered whether the bElideBlanks would be changed to elideBlanks (or is
that elide_blanks? I hope not) or not. It's Walter's party, so to some
extent one must be content to go with the flow - you don't bring a knife to
a gunfight. Save the energy for the big fights (like identity-ignorant
equivalence checking)

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:b64imp$302t$1 digitaldaemon.com...
 Matthew Wilson wrote:
 Wow. Here you're opening a very large box. :)

 I'll attempt, against my wont, to be succinct.

 I totally eschew the use of Hungarian to denote type. For example.

 void(char const *pcszName, int iIndex); // C
 void(string strName, int iIndex); // C++
 void(char[] strName, int iIndex); // D

 This is hokey and pointless, and often a great source of error. For
example,
 when one changes the int to a long, iIndex stops becoming redundant, and
 starts becoming a lie. Very bad. Don't do it. Ever!

 They should all read as

 void(char const *name, int index); // C
 void(string name, int index); // C++
 void(char[] name, int index); // D

 However, where you're taking issue is in how I depart from the accepted
norm
 of the day which is to decorate nothing. I disagree with this
proposition,
 and think that sensible variable name decoration can be of great
benefit.
 But you're writing for Phobos.  The style guide says, in total: "Just
 say no."  I don't think there's an "unless" in there.  Walter's let some
 code in, some of mine, which didn't use his style to the letter, and
 that was a mistake.  I wouldn't do that for a library for personal use,
 much less a language runtime.  It's up to Walter whether he'll include
 code which uses both Hungarian and proper English spellings (I use
 proper English myself, so I'm not biased against it - it's just
 what Phobos uses), but I hope he doesn't.
Mar 29 2003