digitalmars.D - Code layout for range-intensive D code

bearophile (86/86) Jun 09 2012 The introduction of UFCS in D offers new ways to format D code,

Denis Shelomovskij (11/13) Jun 09 2012 I have to mention that one shouldn't write range-intensive D code for
Nick Sabalausky (4/29) Jun 10 2012 [...]
Dmitry Olshansky (9/19) Jun 10 2012 //hopefully I'm not alone in that this:
ixid (4/4) Jun 10 2012 Having to append .array() all the time is rather annoying. I

Dmitry Olshansky (5/9) Jun 10 2012 And then try to get lazy version back. Now that would be harder then
Peter Alexander (10/14) Jun 10 2012 Problem with eager versions is that you end up doing multiple
Jonathan M Davis (21/25) Jun 10 2012 Eager versions are only good if you don't want to pass the result to ano...

ixid (4/4) Jun 10 2012 Thank you. May I ask though, is the argument against

Jonathan M Davis (26/30) Jun 10 2012 D doesn't do much of anything like that automatically, Aside from the fa...

ixid (3/3) Jun 10 2012 I must say that this is what I love about the D community-

Philippe Sigaud (6/9) Jun 11 2012 auto answers = questions.filter!(isBasic)

Jonathan M Davis (8/33) Jun 10 2012 "D doesn't do much of anything like that automatically_._ Aside from the...

bearophile (12/16) Jun 11 2012 In most cases having a lazy range is the right default. On the

"bearophile" <bearophileHUGS lycos.com> writes:

The introduction of UFCS in D offers new ways to format D code, 
especially when your code uses many high order functions. What is 
a good layout of the D code in such situations? I have tried 
several alternative layouts, and in the end I found to appreciate 

of  extreme example :-)

A textual matrix of bits like this is the input of a little 
nonogram puzzle:

0 1 1 1 1 0
1 0 0 1 1 1
1 0 1 1 1 1
1 1 1 1 1 1
0 1 1 1 1 0


A program has to produce an output like this, in the first part 
of the output it looks at the columns and counts the lengths of 
the groups of "1", and in the second part of the output it does 
the same on the rows:

3
1 2
1 3
5
5
3

4
1 3
1 4
6
4


This is a possible solution program:


import std.stdio, std.algorithm, std.string, std.range, std.conv;

void main() {

     auto t = "table.txt"
              .File()
              .byLine()
              .map!(r => r.removechars("^01".dup))()
              .array();

     const transposed = t[0]
                        .length
                        .iota()
                        .map!(i => t.transversal(i).array())()
                        .array();

     (t ~ [(char[]).init] ~ transposed)
     .map!(r => r
                .group()
                .filter!(p => p[0] == '1')()
                .map!(p => p[1].text())()
                .join(" ")
          )()
     .join("\n")
     .writeln();
}


(Note: the second argument of removechars is "^01".dup because 
removechars is a bit stupid, it requires the same type argument 
on both arguments, and the 'r' given by byLine() is a char[]. 
Here the code performs the string->char[] conversion many times 
because the typical inputs for this program are small enough, 
otherwise it's a premature optimization.)


As you see you have to break the lines, because the processing 
chains often become too much long for single lines.
At first I have put the dots at the end of the lines, but later I 
have found that putting the dots at their start is better, it 
reminds me we are inside a processing chain still.
Putting a single operation on each line (instead of two or three) 
helps readability, allowing a bit of single-line nesting like in 
".map!(i => t.transversal(i).array())()".
And putting the dots and first part aligned vertically helps the 
eye find what chain we are in. In the last part of the program 
you see a nested chain too, inside a map.




to use this kind of layout for such kind of 
higher-order-function-heavy code.

I have found that breaking the chains and giving variable names 
to the intermediate parts of those processing chains doesn't help 
the readability a lot, and the names for those intermediate 
temporary variables tend to be dull and repetitive. On the other 
hand putting just one processing step on each row gives space for 
a short comment on each row, where you thik you need it:









Bye,
bearophile

Jun 09 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

09.06.2012 14:43, bearophile пишет:
 The introduction of UFCS in D offers new ways to format D code,
 especially when your code uses many high order functions.

I have to mention that one shouldn't write range-intensive D code for 
now. It's too risky to use high level functions in D because it can 
result in any kind of memory corruption because of at least "Issue 7965 
- Invalid outer function scope pointer in some cases":
http://d.puremagic.com/issues/show_bug.cgi?id=7965

Not sure about other such issues but at least that one makes high level 
functions almost unusable because of a danger.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Jun 09 2012

"Nick Sabalausky" <SeeWebsiteToContactMe semitwist.com> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:lkplokawtyisvwhvfsew forum.dlang.org...
 The introduction of UFCS in D offers new ways to format D code, especially 
 when your code uses many high order functions. What is a good layout of 
 the D code in such situations? I have tried several alternative layouts, 
 and in the end I found to appreciate a layout similar to the one used in 


[...]
     auto t = "table.txt"
              .File()
              .byLine()
              .map!(r => r.removechars("^01".dup))()
              .array();

     const transposed = t[0]
                        .length
                        .iota()
                        .map!(i => t.transversal(i).array())()
                        .array();

     (t ~ [(char[]).init] ~ transposed)
     .map!(r => r
                .group()
                .filter!(p => p[0] == '1')()
                .map!(p => p[1].text())()
                .join(" ")
          )()
     .join("\n")
     .writeln();
 }

That's basically what I do, except I always indent by exactly one tab.

Jun 10 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 09.06.2012 14:43, bearophile wrote:
[snip]
 import std.stdio, std.algorithm, std.string, std.range, std.conv;

 void main() {

 auto t = "table.txt"
 .File()
 .byLine()
 .map!(r => r.removechars("^01".dup))()
 .array();

 const transposed = t[0]
 .length
 .iota()

//hopefully I'm not alone in that this:
iota(t[0].length)
	.map!(...)
	...
is so much more readable.



-- 
Dmitry Olshansky

Jun 10 2012

"ixid" <nuaccount gmail.com> writes:

Having to append .array() all the time is rather annoying. I 
can't help but feel that there's a better solution than this. Are 
lazy Result methods really the default way of doing things? I'd 
rather have eager versions.

Jun 10 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 11.06.2012 1:49, ixid wrote:
 Having to append .array() all the time is rather annoying. I can't help
 but feel that there's a better solution than this. Are lazy Result
 methods really the default way of doing things? I'd rather have eager
 versions.

And then try to get lazy version back. Now that would be harder then 
appending .array() wouldn't it? ;)

-- 
Dmitry Olshansky

Jun 10 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Sunday, 10 June 2012 at 21:49:14 UTC, ixid wrote:
 Having to append .array() all the time is rather annoying. I 
 can't help but feel that there's a better solution than this. 
 Are lazy Result methods really the default way of doing things? 
 I'd rather have eager versions.

Problem with eager versions is that you end up doing multiple 
memory allocations and passes when combining range manipulators. 
e.g.

r.take(n).filter!(blah).map!(whatever).array()

Only allocates once, and is only one linear pass over the memory.

If they were all eager, 'take' would create a new array, then 
'filter' would pass over that and create a new array, then 'map' 
would pass over that and create yet another array. Not very 
efficient.

Jun 10 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, June 10, 2012 23:49:12 ixid wrote:
 Having to append .array() all the time is rather annoying. I
 can't help but feel that there's a better solution than this. Are
 lazy Result methods really the default way of doing things? I'd
 rather have eager versions.

Eager versions are only good if you don't want to pass the result to another 
function. And since Phobos (and plenty of user code) uses ranges all over the 
place, you'd end up allocating memory all over the place, whereas right now, 
you only allocate it when you explicitly call a function which allocates 
memory to hold the data - e.g. array. Something like

foreach(e; map!"to!string(a)"(filter!"a <= 50"(arr, 42)))
{
    doSomething(e);
    if(someCondition)
        break;
}

would end up allocating  for both the result of filter and map, when it doesn't 
actually need to allocate for _either_ of them. Right now, it can process them 
lazily, only filtering and mapping for the elements that actually get 
processed. By using lazy ranges, you avoid both unnecessary allocations and 
avoid having to process all of the elements in a range if you don't need to.

On top of all of that, if a function doesn't return a lazy range, you _can't_ 
use it with infinite ranges, so if range-based functions all eagerly operated 
on ranges, infinite ranges would be useless.

- Jonathan M Davis

Jun 10 2012

"ixid" <nuaccount gmail.com> writes:

Thank you. May I ask though, is the argument against 
automatically appending .array() when a single or chain of lazy 
functions are used to set a variable or set of variables just 
syntactic salt against accidentally doing it eagerly?

Jun 10 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 11, 2012 02:11:24 ixid wrote:
 Thank you. May I ask though, is the argument against
 automatically appending .array() when a single or chain of lazy
 functions are used to set a variable or set of variables just
 syntactic salt against accidentally doing it eagerly?

D doesn't do much of anything like that automatically, Aside from the fact tha 
foreach supports the ranges, _nothing_ in the compiler supports them. They're 
entirely a library artifact. So, the compiler isn't going to do _anything_ to 
them automatically. And besides, it's not necessarily the case that you want  
the result of a range-based function to be an array. What if I want to keep 
the result of map on the stack?

auto a = map!"to!string(a)"(arr);

I could pass it to multiple functions later without having to allocate 
anything. Forcing it to be an array would stop that. And it would be downright 
nasty for that to happen when dealing with containers, because a number of 
containers require that you pass them the _exact_ range type that you give 
them (e.g. remove does this). Converting those ranges to arrays just because 
you assigned them to a variable would be a _big_ problem.

And as for infinite ranges again, if they automatically were converted to 
arrays when assigned to variables, then you couldn't have variables for them 
at all, because they _have_ to be eager. Take a random number generator for 
instance. If all ranges were automatically converted to arrays when assigned 
to variables, then you couldn't do

auto generator = rndGen();

You'd be stuck in an infinite loop if you tried.

Having to use std.array.array to explicitly convert a range to an array really 
doesn't cost you much, and trying to do it automatically would be incredibly 
error-prone and costly. And really, the more that you use rang-based 
functions, the less that you need to convert ranges to arrays.

- Jonathan M Davis

Jun 10 2012

"ixid" <nuaccount gmail.com> writes:

I must say that this is what I love about the D community- 
getting a very thorough answer to a rather basic and newbie 
question from someone who knows what they're talking about.

Jun 10 2012

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Mon, Jun 11, 2012 at 2:36 AM, ixid <nuaccount gmail.com> wrote:
 I must say that this is what I love about the D community- getting a very
 thorough answer to a rather basic and newbie question from someone who knows
 what they're talking about.

auto answers = questions.filter!(isBasic)
                                         .filter!(fromNewbie)
                                         .map!(jonathanMDavis)
                                         .array();

Hey, you can even get lazy answers :)

Jun 11 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, June 10, 2012 17:31:37 Jonathan M Davis wrote:
 On Monday, June 11, 2012 02:11:24 ixid wrote:
 Thank you. May I ask though, is the argument against
 automatically appending .array() when a single or chain of lazy
 functions are used to set a variable or set of variables just
 syntactic salt against accidentally doing it eagerly?


Ouch, I obviously need to do a better job of re-reading my posts.

 D doesn't do much of anything like that automatically, Aside from the fact
 tha foreach supports the ranges, _nothing_ in the compiler supports them.

"D doesn't do much of anything like that automatically_._ Aside from the fact 
_that_ foreach supports ranges, nothing in the compiler supports them."

 They're entirely a library artifact. So, the compiler isn't going to do
 _anything_ to them automatically. And besides, it's not necessarily the
 case that you want the result of a range-based function to be an array.
 What if I want to keep the result of map on the stack?
 
 auto a = map!"to!string(a)"(arr);
 
 I could pass it to multiple functions later without having to allocate
 anything. Forcing it to be an array would stop that. And it would be
 downright nasty for that to happen when dealing with containers, because a
 number of containers require that you pass them the _exact_ range type that
 you give them (e.g. remove does this).

"a number of containers require that you pass them the exact range type that 
_they_ give _you_..."

 Converting those ranges to arrays
 just because you assigned them to a variable would be a _big_ problem.
 
 And as for infinite ranges again, if they automatically were converted to
 arrays when assigned to variables, then you couldn't have variables for them
 at all, because they _have_ to be eager.

"because they have to be _lazy_."

- Jonathan M Davis

Jun 10 2012

"bearophile" <bearophileHUGS lycos.com> writes:

ixid:

 Having to append .array() all the time is rather annoying. I 
 can't help but feel that there's a better solution than this. 
 Are lazy Result methods really the default way of doing things? 
 I'd rather have eager versions.

In most cases having a lazy range is the right default. On the 
other hand in D many times you can't use lazy ranges, you need 
arrays, so the ".array()" is quite common after a map or filter. 
So I propose to add to Phobos two small functions named 
amap/afilter that produce arrays:
http://d.puremagic.com/issues/show_bug.cgi?id=5756

If you think those are redundant, then first of all take a look 
here:
http://d.puremagic.com/issues/show_bug.cgi?id=8155

Bye,
bearophile

Jun 11 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Code layout for range-intensive D code