www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - [OT] Is this more readable, or just way too verbose?

reply simendsjo <simen.endsjo pandavre.com> writes:
I took splitlines from std.string, which is a simple, short method.

S[] splitlines(S)(S s)
{
     size_t istart;
     auto result = Appender!(S[])();

     foreach (i; 0 .. s.length)
     {
         immutable c = s[i];
         if (c == '\r' || c == '\n')
         {
             result.put(s[istart .. i]);
             istart = i + 1;
             if (c == '\r' && i + 1 < s.length && s[i + 1] == '\n')
             {
                 i++;
                 istart++;
             }
         }
     }
     if (istart != s.length)
     {
         result.put(s[istart .. $]);
     }

     return result.data;
}


I guess it takes less than 30 seconds to fully understand this method. 
Then I rap.. I mean refactored it to this:

S[] mysplitlines(S)(S s)
{
     const CR = '\r';
     const LF = '\n';

     size_t istart;

     auto result = Appender!(S[])();

     foreach (i; 0 .. s.length)
     {
         immutable c = s[i];

         immutable isCR = (c == CR);
         immutable isLF = (c == LF);

         if (isCR || isLF)
         {
             auto line = s[istart .. i];
             result.put(line);

             istart = i + 1;

             // Might be CRLF. In that case we need to consume LF too
             if (isCR)
             {
                 immutable hasMoreCharacters = (i + 1 < s.length);
                 immutable nextIsLF = hasMoreCharacters && (s[i + 1] == LF);
                 immutable isCRLF = isCR && nextIsLF;

                 if (isCRLF)
                 {
                     i++;
                     istart++;
                 }
             }
         }
     }

     immutable lineNotEmpty = (istart != s.length);
     if (lineNotEmpty)
     {
         auto lastLine = s[istart .. $];
         result.put(lastLine);
     }

     return result.data;
}


Yes, I'm reading some books now and don't have much experience :)

It went from a small 26 line, very readable function to 48 lines (85% 
increase!) with many more temporary variables..

So... Do you think this kind of code is (more) readable, or just way too 
verbose and doing more harm than good?

And will the compiler generate slower code, or should the optimizer be 
able to inline the temporaries?
Aug 09 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
simendsjo:
 And will the compiler generate slower code, or should the optimizer be 
 able to inline the temporaries?
Write a good enough benchmark and run it. Bye, bearophile
Aug 09 2010
prev sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
simendsjo wrote:

 I took splitlines from std.string, which is a simple, short method.
 
 S[] splitlines(S)(S s)
 {
      size_t istart;
      auto result = Appender!(S[])();
 
      foreach (i; 0 .. s.length)
      {
          immutable c = s[i];
          if (c == '\r' || c == '\n')
          {
              result.put(s[istart .. i]);
              istart = i + 1;
              if (c == '\r' && i + 1 < s.length && s[i + 1] == '\n')
              {
                  i++;
                  istart++;
              }
          }
      }
      if (istart != s.length)
      {
          result.put(s[istart .. $]);
      }
 
      return result.data;
 }
 
 
 I guess it takes less than 30 seconds to fully understand this method.
 Then I rap.. I mean refactored it to this:
 
 S[] mysplitlines(S)(S s)
 {
      const CR = '\r';
      const LF = '\n';
 
      size_t istart;
 
      auto result = Appender!(S[])();
 
      foreach (i; 0 .. s.length)
      {
          immutable c = s[i];
 
          immutable isCR = (c == CR);
          immutable isLF = (c == LF);
 
          if (isCR || isLF)
          {
              auto line = s[istart .. i];
              result.put(line);
 
              istart = i + 1;
 
              // Might be CRLF. In that case we need to consume LF too
              if (isCR)
              {
                  immutable hasMoreCharacters = (i + 1 < s.length);
                  immutable nextIsLF = hasMoreCharacters && (s[i + 1] == LF);
                  immutable isCRLF = isCR && nextIsLF;
 
                  if (isCRLF)
                  {
                      i++;
                      istart++;
                  }
              }
          }
      }
 
      immutable lineNotEmpty = (istart != s.length);
      if (lineNotEmpty)
      {
          auto lastLine = s[istart .. $];
          result.put(lastLine);
      }
 
      return result.data;
 }
 
 
 Yes, I'm reading some books now and don't have much experience :)
 
 It went from a small 26 line, very readable function to 48 lines (85%
 increase!) with many more temporary variables..
 
 So... Do you think this kind of code is (more) readable, or just way too
 verbose and doing more harm than good?
The CR and LF constants are a bit too much, probably because they don't really abstract over the literals which I can actually parse faster. The isCR and isLF are nice however. Taking it a step further: bool canSplit = inPattern(c,"\r\n"); if (canSplit) { ... You have increased the nesting of ifs to 3 inside a for-loop. Personally I don't read deep nesting very well. To go for readability I would use a small function for the entire expression: if( s[i..$].startsWithCRLF() ) // same as startsWithCRLF(s[i..$]) { i++; istart++; } or use std.algorithm: if ( s[i..$].startsWith("\r\n") )
 And will the compiler generate slower code, or should the optimizer be
 able to inline the temporaries?
I don't think it matters much, but you can only tell by testing. There is a benchmark function (called benchmark iirc) somewhere in phobos to start with and dmd has a builtin profiler too.
Aug 09 2010
parent reply simendsjo <simen.endsjo pandavre.com> writes:
Lutger wrote:
 simendsjo wrote:
 
(...)
 
 The CR and LF constants are a bit too much, probably because they don't really 
 abstract over the literals which I can actually parse faster. The isCR and
isLF 
 are nice however. Taking it a step further:
 
 bool canSplit = inPattern(c,"\r\n");
 if (canSplit)
 {
   ...
 
 You have increased the nesting of ifs to 3 inside a for-loop.Personally I
don't 
 read deep nesting very well. To go for readability I would use a small
function 
 for the entire expression: 
 
 if( s[i..$].startsWithCRLF() ) // same as startsWithCRLF(s[i..$])
 {
   i++;
   istart++;
 }
 
 or use std.algorithm: if ( s[i..$].startsWith("\r\n") )
  
(...) Nice. I didn't increase the if nesting though. Something like this then? S[] mysplitlines(S)(S s) { size_t istart; auto result = Appender!(S[])(); foreach (i; 0 .. s.length) { immutable c = s[i]; immutable isEOL = inPattern(c, "\r\n"); if (isEOL) { auto beforeEOL = s[istart .. i]; result.put(beforeEOL); auto rest = s[i .. $]; immutable isCRLF = rest.startsWith("\r\n"); istart = i + 1; // consume first EOL character if (isCRLF) // skip \n too { i++; istart++; } } } // The last line might not end with EOL immutable lineNotEmpty = (istart != s.length); if (lineNotEmpty) { auto lastLine = s[istart .. $]; result.put(lastLine); } return result.data; }
Aug 10 2010
parent Lutger <lutger.blijdestijn gmail.com> writes:
simendsjo wrote:

 Lutger wrote:
...
  I didn't increase the if nesting though.
I count 2 nested if-statements inside of the foreach loop in the original, you have 3 nested if-statements.
 Something like this then?
Looks good to me, yes.
Aug 10 2010