## digitalmars.D.learn - Function to print a diamond shape

• =?UTF-8?B?QWxpIMOHZWhyZWxp?= (16/16) Mar 20 2014 This is a somewhat common little exercise: Write a function that takes
• Justin Whear (2/22) Mar 20 2014 What's the appropriate output for an even number?
• Chris Williams (6/8) Mar 20 2014 Well one of the more convoluted methods that I can think of would
• Brad Anderson (16/33) Mar 20 2014 I'm not entirely happy with it but:
• monarch_dodra (10/12) Mar 20 2014 I'd be interested in seeing a solution using "iota", and the
• Timon Gehr (5/21) Mar 20 2014 import std.stdio, std.range, std.algorithm, std.math;
• bearophile (135/150) Mar 20 2014 Some of my solutions (using each() in the last two is easy):
• Jay Norwood (20/23) Mar 20 2014 I like that replicate but easier for me to keep track of the
• Jay Norwood (27/27) Mar 21 2014 This one calculates, then outputs subranges of the ba and sa char
• =?UTF-8?B?QWxpIMOHZWhyZWxp?= (34/36) Mar 20 2014 I have learned a lot, especially the following two:
• Sergei Nosov (9/26) Mar 21 2014 Probably, the most boring way is
• Andrea Fontana (2/10) Mar 21 2014 A single foreach(i; 0..N*N) is more boring!
• Vladimir Panteleev (2/32) Mar 21 2014
• Sergei Nosov (3/38) Mar 21 2014 Beat me. Yours is even more boring. =)
• Jay Norwood (12/12) Mar 22 2014 The computation times of different methods can differ a lot.
• Jay Norwood (16/16) Mar 22 2014 I decided to redirect stdout to nul and print the stopwatch
• =?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/5) Mar 22 2014 Cool. stderr should work too:
• Jay Norwood (6/6) Mar 23 2014 Hmmm, looks like stderr.writefln requires format specs, else it
• Jay Norwood (64/64) Mar 23 2014 I converted the solution examples to functions, wrote a test to
• Jay Norwood (13/13) Mar 23 2014 A problem with the previous brad measurement is that his solution
• bearophile (6/12) Mar 23 2014 The task didn't ask for a computationally efficient solution :-)
• Jay Norwood (5/10) Mar 23 2014 Yes, this is just for my own education. My builds are using the
• Jay Norwood (30/30) Mar 23 2014 These were the times on ubuntu 64 bit dmd. I added diamondShape,
• monarch_dodra (23/35) Mar 24 2014 So it's about speed now? Then I submit this:
• Jay Norwood (12/12) Mar 24 2014 Very nice example. I'll test on ubuntu later.
• =?UTF-8?B?Ikx1w61z?= Marques" (26/32) Mar 23 2014 I used this to benchmark H. S. Teoh's calendar formatter:
• bearophile (32/33) Mar 24 2014 if you like similar puzzles, here is another:
• Jay Norwood (40/40) Mar 24 2014 not through yet with the diamond. This one is a little faster.
• Jay Norwood (37/37) Mar 24 2014 These were times on ubuntu. I may have printed debug build times
• monarch_dodra (41/81) Mar 25 2014 Interesting. I'd have thought the "extra copy" would be an
• Jay Norwood (2/42) Mar 25 2014 ok. I'll try it. I was happy the appender was pretty fast.
• Jay Norwood (14/14) Mar 25 2014 These are times on ubuntu. printDiamond3 was slower than
• monarch_dodra (9/11) Mar 25 2014 Hum... Too bad :/
• Jay Norwood (7/9) Mar 25 2014 Yes, I'm pretty happy to see the appender works well. The
• Jay Norwood (26/26) Mar 25 2014 This is a first attempt at using parallel, but no improvement in
• Jay Norwood (2/3) Mar 25 2014 oops. scratch that one. I tested a pointer to the wrong function.
• Jay Norwood (23/23) Mar 25 2014 This corrects the parallel example range in the second foreach.
• Jay Norwood (62/64) Apr 20 2014 I installed ubuntu 14.04 64 bit, and measured some of these
• monarch_dodra (32/55) Apr 21 2014 With this slightly tweaked solution, I can get times of roughly
• Jay Norwood (49/56) Apr 21 2014 Yes your solution is the fastest yet. Also, its times are
• Jay Norwood (52/52) Apr 22 2014 Wow, joiner is much slower than join. Such a small choice can
• monarch_dodra (11/15) Apr 22 2014 Yeah, that's because join actually works on "RoR, R", rather than
• Jay Norwood (7/18) Apr 23 2014 Ok, thanks. I re-tried joiner with both parameters being ranges,
• monarch_dodra (5/23) Apr 22 2014 I'm not sure what you mean? "data" returns the managed array, but
• bearophile (4/9) Mar 28 2014 It used to work, but with the latest changes I think I have
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```This is a somewhat common little exercise: Write a function that takes
the size of a diamond and produces a diamond of that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali
```
Mar 20 2014
Justin Whear <justin economicmodeling.com> writes:
```On Thu, 20 Mar 2014 14:25:02 -0700, Ali Çehreli wrote:

This is a somewhat common little exercise: Write a function that takes
the size of a diamond and produces a diamond of that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

What's the appropriate output for an even number?
```
Mar 20 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/20/2014 02:30 PM, Justin Whear wrote:

What's the appropriate output for an even number?

Great question! :) Size must be odd. I have this in my function:

enforce(size % 2,
format("Size cannot be an even number. (%s)", size));

Ali
```
Mar 20 2014
"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
```On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
What interesting, boring, efficient, slow, etc. ways are there?

Ali

Well one of the more convoluted methods that I can think of would
be to define a square as a set of four vectors, rotate 45
degrees, and then create a rasterizer that checks for the
presence of the rect at sequential points, and plots those to the
console.
```
Mar 20 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/20/2014 02:52 PM, Chris Williams wrote:
On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
What interesting, boring, efficient, slow, etc. ways are there?

Ali

Well one of the more convoluted methods that I can think of would be to
define a square as a set of four vectors, rotate 45 degrees, and then
create a rasterizer that checks for the presence of the rect at
sequential points, and plots those to the console.

A slightly convoluted solution that I've come up with considers the
diamond as three pieces:

1) Top triangle

2) The widest line

3) The bottom triangle, which happens to be the .retro of the first part

auto bottomHalf = topHalf.retro;

auto diamond = chain(topHalf, widestLine, bottomHalf).joiner("\n");

Ali
```
Mar 20 2014
```On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function
that takes the size of a diamond and produces a diamond of that
size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

I'm not entirely happy with it but:

void main()
{
import std.algorithm, std.range, std.stdio, std.conv;

enum length = 5;
auto rng =
chain(iota(length), iota(length, -1, -1))
.map!((a => " ".repeat(length-a)),
(a => "#".repeat(a*2+1)))
.map!(a => chain(a[0].joiner, a[1].joiner, "\n"))
.joiner;

writeln(rng);
}

of something string-like.
```
Mar 20 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/20/2014 03:03 PM, Brad Anderson wrote:

I'm not entirely happy with it but:

I am not happy with my attempt either. :)

void main()
{
import std.algorithm, std.range, std.stdio, std.conv;

enum length = 5;
auto rng =
chain(iota(length), iota(length, -1, -1))

Ooh. I like that. That would have never occurred to me. :)

.map!((a => " ".repeat(length-a)),
(a => "#".repeat(a*2+1)))
.map!(a => chain(a[0].joiner, a[1].joiner, "\n"))
.joiner;

writeln(rng);
}

Does that compile for you? Failed for me with v2.066-devel-d0f461a:

./phobos/std/typetuple.d(550): Error: template instance F!(__lambda1)
cannot use local '__lambda1' as parameter to non-global template
AppliedReturnType(alias f)
./phobos/std/typetuple.d(556): Error: template instance
deneme.main.staticMap!(AppliedReturnType, __lambda1) error instantiating
./phobos/std/algorithm.d(404):        instantiated from here:
staticMap!(AppliedReturnType, __lambda1, __lambda2)
deneme.d(161788):        instantiated from here: map!(Result)

That is pointing at this line:

.map!((a => " ".repeat(length-a)),

A regression?

Had some trouble with the result coming out as integers instead of
something string-like.

I had the same problem at one point. I will try to understand when that
happens.

Ali
```
Mar 20 2014
```On Thursday, 20 March 2014 at 22:46:53 UTC, Ali Çehreli wrote:
On 03/20/2014 03:03 PM, Brad Anderson wrote:

I'm not entirely happy with it but:

I am not happy with my attempt either. :)

void main()
{
import std.algorithm, std.range, std.stdio, std.conv;

enum length = 5;
auto rng =
chain(iota(length), iota(length, -1, -1))

Ooh. I like that. That would have never occurred to me. :)

It felt kind of clumsy when I ended up with it. I don't think it
shows my intent very well (repeat the range in reverse). I wish
Phobos had something like a mirror() range (i.e. chain(rng,
rng.retro())).

.map!((a => " ".repeat(length-a)),
(a => "#".repeat(a*2+1)))
.map!(a => chain(a[0].joiner, a[1].joiner, "\n"))
.joiner;

writeln(rng);
}

Does that compile for you? Failed for me with
v2.066-devel-d0f461a:
[snip]
A regression?

I did it on dpaste which is using 2.065 so I suspect regression.

http://dpaste.dzfl.pl/71c331960cb0

Had some trouble with the result coming out as integers

something string-like.

I had the same problem at one point. I will try to understand
when that happens.

Ali

I was getting the integers when I was using character literals
with repeat() rather than string literals.
```
Mar 20 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
What interesting, boring, efficient, slow, etc. ways are there?

Ali

I'd be interested in seeing a solution using "iota", and the
currently proposed "each" or "tee". A quick protype to draw a
triangle would be:

iota(0, n).each!(a=>q{%(*%)}.writefln(a.iota()))();
or
iota(0, n).each!(a=>'*'.repeat(a).writeln())();

Adapting that to do a diamond should be straight forward? It
would be a good benchmark of functional vs imperative code, and
the usability of "each".
```
Mar 20 2014
Timon Gehr <timon.gehr gmx.ch> writes:
```On 03/20/2014 10:25 PM, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function that takes
the size of a diamond and produces a diamond of that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

import std.stdio, std.range, std.algorithm, std.math;

enum s=11;
writef("%(%s\n%)", (i=>i.map!(a=>i.map!(b=>"* "[a+b>s/2])))
(iota(-s/2,s/2+1).map!abs));
```
Mar 20 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/20/2014 03:48 PM, Timon Gehr wrote:
On 03/20/2014 10:25 PM, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function that takes
the size of a diamond and produces a diamond of that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

import std.stdio, std.range, std.algorithm, std.math;

enum s=11;
writef("%(%s\n%)", (i=>i.map!(a=>i.map!(b=>"* "[a+b>s/2])))
(iota(-s/2,s/2+1).map!abs));

Sweet! :)

"* "[a+b>s/2]    // loving it

Ali
```
Mar 20 2014
"bearophile" <bearophileHUGS lycos.com> writes:
```Ali Çehreli:

This is a somewhat common little exercise: Write a function
that takes the size of a diamond and produces a diamond of that
size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

Some of my solutions (using each() in the last two is easy):

import std.stdio, std.array, std.string, std.range,
std.algorithm, std.math;

void printDiamond1(in uint n) {
immutable k = (n % 2 == 1) ? 1 : 2;

for (int i = k; i <= n; i += 2)
writeln("*".replicate(i).center(n));

for (int i = n - 2; i >= k; i -= 2)
writeln("*".replicate(i).center(n));
}

void printDiamond2(in int n) {
iota(!(n % 2), n)
.map!(i => "*"
.replicate((n % 2) + ((n / 2) - abs(i - (n / 2)))
* 2)
.center(n))
.join("\n")
.writeln;
}

void printDiamond3(in int n) {
writefln("%-(%s\n%)",
iota(!(n % 2), n)
.map!(i => "*"
.replicate((n % 2) + ((n / 2) - abs(i -
(n / 2))) * 2)
.center(n)));
}

void main() {
foreach (immutable i; 0 .. 15) {
printDiamond3(i);
writeln;
}
}

Output:

*

**

*
***
*

**
****
**

*
***
*****
***
*

**
****
******
****
**

*
***
*****
*******
*****
***
*

**
****
******
********
******
****
**

*
***
*****
*******
*********
*******
*****
***
*

**
****
******
********
**********
********
******
****
**

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

**
****
******
********
**********
************
**********
********
******
****
**

*
***
*****
*******
*********
***********
*************
***********
*********
*******
*****
***
*

**
****
******
********
**********
************
**************
************
**********
********
******
****
**

Bye,
bearophile
```
Mar 20 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Friday, 21 March 2014 at 00:31:58 UTC, bearophile wrote:
This is a somewhat common little exercise: Write a function

Bye,
bearophile

I like that replicate but easier for me to keep track of the
counts if I work from the center.

int blanks[];
blanks.length = n;
int stars[];
stars.length = n;

int c = n/2; // center of diamond
int cp1 = c+1;
blanks[c]=0;
stars[c]=n;

// calculate stars and blanks in each row
for(int i=1; i<cp1; i++){
blanks[c-i] = blanks[c+i] = i;
stars[c-i] = stars[c+i] = n - (i*2);
}

for (int i=0; i<n; i++){
write(" ".replicate(blanks[i]));
writeln("*".replicate(stars[i]));
}
```
Mar 20 2014
"Jay Norwood" <jayn prismnet.com> writes:
```This one calculates, then outputs subranges of the ba and sa char
arrays.

int n = 11;
int blanks[];
blanks.length = n;
int stars[];
stars.length = n;
char ba[];
ba.length = n;
ba[] = ' '; // fill full ba array
char sa[];
sa.length = n;
sa[] = '*'; // fill full sa array

int c = n/2; // center of diamond
int cp1 = c+1;
blanks[c]=0;
stars[c]=n;

// calculate stars and blanks in each row
for(int i=1; i<cp1; i++){
blanks[c-i] = blanks[c+i] = i;
stars[c-i] = stars[c+i] = n - (i*2);
}

// output subranges of the ba and sa char arrays
for (int i=0; i<n; i++){
write(ba[\$-blanks[i]..\$]);
writeln(sa[\$-stars[i]..\$]);
}
```
Mar 21 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/20/2014 02:25 PM, Ali Çehreli wrote:

Write a function that takes
the size of a diamond and produces a diamond of that size.

I have learned a lot, especially the following two:

1) chain'ing iotas is an effective way of producing non-monotonic number
intervals (and more).

2) There is std.string.center. :)

Also considering readability, here is my favorite so far:

auto diamondShape(size_t N, dchar fillChar = '*')
{
import std.range : chain, iota, repeat;
import std.algorithm : map;
import std.conv : text;
import std.string : center, format;
import std.exception : enforce;

enforce(N % 2, format("Size must be an odd number. (%s)", N));

return
chain(iota(1, N, 2),
iota(N, 0, -2))
.map!(i => fillChar.repeat(i))
.map!(s => s.text)
.map!(s => s.center(N));
}

unittest
{
import std.exception : assertThrown;
import std.algorithm : equal;

assertThrown(diamondShape(4));
assert(diamondShape(3, 'o').equal([ " o ", "ooo", " o " ]));
}

void main()
{
import std.stdio : writefln;

writefln("%-(%s\n%)", diamondShape(11));
}

Ali
```
Mar 20 2014
"Sergei Nosov" <sergei.nosov gmail.com> writes:
```On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function
that takes the size of a diamond and produces a diamond of that
size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

Probably, the most boring way is

foreach(i; 0..N)
{
foreach(j; 0..N)
write(" *"[i + j >= N/2 && i + j < 3*N/2 && i - j <= N/2
&& j - i <= N/2]);
writeln;
}
```
Mar 21 2014
"Andrea Fontana" <nospam example.com> writes:
```On Friday, 21 March 2014 at 12:32:58 UTC, Sergei Nosov wrote:
Probably, the most boring way is

foreach(i; 0..N)
{
foreach(j; 0..N)
write(" *"[i + j >= N/2 && i + j < 3*N/2 && i - j <=
N/2 && j - i <= N/2]);
writeln;
}

A single foreach(i; 0..N*N) is more boring!
```
Mar 21 2014
```On Friday, 21 March 2014 at 12:32:58 UTC, Sergei Nosov wrote:
On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function
that takes the size of a diamond and produces a diamond of
that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are there?

Ali

Probably, the most boring way is

foreach(i; 0..N)
{
foreach(j; 0..N)
write(" *"[i + j >= N/2 && i + j < 3*N/2 && i - j <=
N/2 && j - i <= N/2]);

write(" *"[abs(i-N/2) + abs(j-N/2) <= N/2]);

writeln;
}

```
Mar 21 2014
"Sergei Nosov" <sergei.nosov gmail.com> writes:
```On Friday, 21 March 2014 at 13:59:27 UTC, Vladimir Panteleev
wrote:
On Friday, 21 March 2014 at 12:32:58 UTC, Sergei Nosov wrote:
On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
This is a somewhat common little exercise: Write a function
that takes the size of a diamond and produces a diamond of
that size.

When printed, here is the output for size 11:

*
***
*****
*******
*********
***********
*********
*******
*****
***
*

What interesting, boring, efficient, slow, etc. ways are
there?

Ali

Probably, the most boring way is

foreach(i; 0..N)
{
foreach(j; 0..N)
write(" *"[i + j >= N/2 && i + j < 3*N/2 && i - j <=
N/2 && j - i <= N/2]);

write(" *"[abs(i-N/2) + abs(j-N/2) <= N/2]);

writeln;
}

Beat me. Yours is even more boring. =)
```
Mar 21 2014
"Jay Norwood" <jayn prismnet.com> writes:
```The computation times of different methods can differ a lot.
How do you suggest to measure this effectively without the
overhead of the write and writeln output?   Would a count of
100001 and stubs like below be reasonable, or would there be
something else that would  prevent the optimizer from getting too
aggressive?

void writelnx(T...)(T args)
{
}
void writex(T...)(T args)
{
}
```
Mar 22 2014
"Jay Norwood" <jayn prismnet.com> writes:
```I decided to redirect stdout to nul and print the stopwatch
messages to stderr.
So, basically like this.

import std.stdio;
import std.datetime;
import std.cstream;

StopWatch sw;
sw.start();

measured code

sw.stop();
derr.writefln("time: ", sw.peek().msecs, "[ms]");

Then, windows results comparing two versions, this for n=2001,
shows one form is about 3x faster when you redirect stdout to nul.

D:\diamond\diamond\diamond\Release>diamond 1>nul
time: 15[ms]
time: 42[ms]
```
Mar 22 2014
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/22/2014 06:03 PM, Jay Norwood wrote:

derr.writefln("time: ", sw.peek().msecs, "[ms]");

Cool. stderr should work too:

stderr.writefln(/* ... */);

Ali
```
Mar 22 2014
"Jay Norwood" <jayn prismnet.com> writes:
```Hmmm, looks like stderr.writefln requires format specs, else it
omits the additional parameters. (not so on derr.writefln)

stderr.writefln("time: %s%s",sw.peek().msecs, "[ms]");

D:\diamond\diamond\diamond\Release>diamond 1>nul
time: 16[ms]
time: 44[ms]
```
Mar 23 2014
"Jay Norwood" <jayn prismnet.com> writes:
```I converted the solution examples to functions, wrote a test to
measure each 100 times with a diamond of size 1001.  These are
release build times.  timon's crashed so I took it out.  Maybe I
made a mistake copying ... have to go back and look.

D:\diamond\diamond\diamond\Release>diamond 1>nul
printDiamond1: time: 1166[ms]
printDiamond2: time: 1659[ms]
printDiamond3: time: 631[ms]
jay1: time: 466[ms]
sergei: time: 11944[ms]
jay2: time: 414[ms]

These are the the measurement functions

void measure( void function(in int a) func, int times, int
diamondsz, string name ){
StopWatch sw;
sw.start();
for (int i=0; i<times; i++){
func(diamondsz);
}
sw.stop;
stderr.writeln(name, ": time: ", sw.peek().msecs, "[ms]");
}

void measureu( void function(in uint a) func, int times, uint
diamondsz, string name ){
StopWatch sw;
sw.start();
for (int i=0; i<times; i++){
func(diamondsz);
}
sw.stop;
stderr.writeln(name, ": time: ", sw.peek().msecs, "[ms]");
}

int main(string[] argv)
{
int times = 100;
int dsz = 1001;
uint dszu = 1001;
//measure (&timon,times,dsz,"timon");
measureu (&printDiamond1,times,dszu,"printDiamond1");
measure (&printDiamond2,times,dsz,"printDiamond2");
measure (&printDiamond3,times,dsz,"printDiamond3");
measure (&jay1,times,dsz,"jay1");
measure (&sergei,times,dsz,"sergei");
measure (&jay2,times,dsz,"jay2");

return 0;

}

All the functions are like this:
import std.algorithm, std.range, std.stdio, std.conv;

auto rng =
chain(iota(length), iota(length, -1, -1))
.map!((a => " ".repeat(length-a)),
(a => "#".repeat(a*2+1)))
.map!(a => chain(a[0].joiner, a[1].joiner, "\n"))
.joiner;

writeln(rng);
}

void timon(in int s){
import std.stdio, std.range, std.algorithm, std.math;

writef("%(%s\n%)", (i=>i.map!(a=>i.map!(b=>"* "[a+b>s/2])))
(iota(-s/2,s/2+1).map!abs));
}
```
Mar 23 2014
"Jay Norwood" <jayn prismnet.com> writes:
```A problem with the previous brad measurement is that his solution
creates a diamond of size 2n+1 for an input of n.  Correcting the
size input for brad's function call, and re-running, I get this.
So the various solutions can have overhead computation time of
40x difference, depending on the implementation.

D:\diamond\diamond\diamond\Release>diamond 1>nul
printDiamond1: time: 1154[ms]
printDiamond2: time: 1637[ms]
printDiamond3: time: 622[ms]
jay1: time: 475[ms]
sergei: time: 11939[ms]
jay2: time: 413[ms]
```
Mar 23 2014
"bearophile" <bearophileHUGS lycos.com> writes:
```Jay Norwood:

A problem with the previous brad measurement is that his
solution creates a diamond of size 2n+1 for an input of n.
Correcting the size input for brad's function call, and
re-running, I get this.  So the various solutions can have
overhead computation time of 40x difference, depending on the
implementation.

So you are measuring something that was not optimized for. So
there's lot of variance.

Bye,
bearophile
```
Mar 23 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Sunday, 23 March 2014 at 17:30:20 UTC, bearophile wrote:

:-) So you are measuring something that was not optimized for.
So there's lot of variance.

Bye,
bearophile

Yes, this is just for my own education.   My builds are using the
dmd compiler on windows, and some  posts indicate I should expect
better optimization currently with the ldc compiler... so maybe
I'll get on a linux box and retest with ldc.
```
Mar 23 2014
"Jay Norwood" <jayn prismnet.com> writes:
```These were the times on ubuntu 64 bit dmd.  I added diamondShape,
which is slightly modified to be consistent with the others ..
just removing the second parameter and doing the writeln calls
within the function, as the others have been done.  This is still

Also,  I posted the test code on dpaste.com/hold/1753517

printDiamond1: time: 482[ms]
printDiamond2: time: 944[ms]
printDiamond3: time: 490[ms]
jay1: time: 62[ms]
sergei: time: 4154[ms]
jay2: time: 30[ms]
diamondShape: time: 3384[ms]

void diamondShape(in int N)
{
import std.range : chain, iota, repeat;
import std.algorithm : map;
import std.conv : text;
import std.string : center, format;
import std.exception : enforce;
dchar fillChar = '*';
enforce(N % 2, format("Size must be an odd number. (%s)", N));

foreach(ln;
chain(iota(1, N, 2),
iota(N, 0, -2))
.map!(i => fillChar.repeat(i))
.map!(s => s.text)
.map!(s => s.center(N))) writeln(ln);
}
```
Mar 23 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Sunday, 23 March 2014 at 18:28:18 UTC, Jay Norwood wrote:
On Sunday, 23 March 2014 at 17:30:20 UTC, bearophile wrote:

:-) So you are measuring something that was not optimized for.
So there's lot of variance.

Bye,
bearophile

Yes, this is just for my own education.   My builds are using
the dmd compiler on windows, and some  posts indicate I should
expect better optimization currently with the ldc compiler...
so maybe I'll get on a linux box and retest with ldc.

So it's about speed now? Then I submit this:

//----
void printDiamond(size_t N)
{
char[32] rawSpace = void;
char[64] rawStars = void;
char* pSpace = rawSpace.ptr;
char* pStars = rawStars.ptr;
if (N > 64)
{
pSpace = new char[](N/2).ptr;
pStars = new char[](N).ptr;
}
pSpace[0 .. N/2] = ' ';
pStars[0 ..   N] = '*';

N/=2;
foreach         (n ; 0 .. N + 1)
writeln(pSpace[0 .. N - n], pStars[0 .. 2*n+1]);
foreach_reverse (n ; 0 .. N)
writeln(pSpace[0 .. N - n], pStars[0 .. 2*n+1]);
}
//----
```
Mar 24 2014
"Jay Norwood" <jayn prismnet.com> writes:
```Very nice example.   I'll test on ubuntu later.

On windows ...

D:\diamond\diamond\diamond\Release>diamond 1> nul
printDiamond1: time: 1139[ms]
printDiamond2: time: 1656[ms]
printDiamond3: time: 663[ms]
jay1: time: 455[ms]
sergei: time: 11673[ms]
jay2: time: 411[ms]
diamondShape: time: 4399[ms]
printDiamond: time: 185[ms]
```
Mar 24 2014
=?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
```On Saturday, 22 March 2014 at 14:41:48 UTC, Jay Norwood wrote:
The computation times of different methods can differ a lot.
How do you suggest to measure this effectively without the
overhead of the write and writeln output?   Would a count of
100001 and stubs like below be reasonable, or would there be
something else that would  prevent the optimizer from getting
too aggressive?

I used this to benchmark H. S. Teoh's calendar formatter:

version(benchmark)
{
int main(string[] args)
{
enum MonthsPerRow = 3;
auto t = benchmark!(function() {
foreach(formattedYear; iota(1800, 2000).map!(year
=> formatYear(year, MonthsPerRow)))
{
foreach(_; formattedYear){};
}
})(30);
writeln(t[0].msecs * 0.001);
return 0;
}
}

While the optimizer could probably remove all of that, it
doesn't. I also tested it against other options like walkLength,
this ended up begin the better choice.

(BTW, using joiner instead of join I was able to more than double
the performance:
https://github.com/luismarques/dcal/tree/benchmark . Once the
pipeline is made lazy end to end that will probably have even
more impact.)
```
Mar 23 2014
"bearophile" <bearophile HUGS lycos.com> writes:
```On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
This is a somewhat common little exercise:

if you like similar puzzles, here is another:

Write a program that expects a 10-by-10 matrix from standard
input. The program should compute sum of each row and each column
and print the highest of these numbers to standard output.

An example input:

01 34 46 31 55 21 16 88 87 87
32 40 82 40 43 96 08 82 41 86
30 16 24 18 04 54 65 96 38 48
32 00 99 90 24 75 89 41 04 01
11 80 31 83 08 93 37 96 27 64
09 81 28 41 48 23 68 55 86 72
64 61 14 55 33 39 40 18 57 59
49 34 50 81 85 12 22 54 80 76
18 45 50 26 81 95 25 14 46 75
22 52 37 50 37 40 16 71 52 17

Expected output:

615

The purpose is to write a "golfing" program, that is the shortest.

My current D solution is about 170 bytes (UNIX newlines):

void main(){
import std.stdio,std.range,std.algorithm,std.conv;
m.map!sum.chain(m.transposed.map!sum).reduce!max.write;
}

I am now trying to use std.file.slurp, but its documentation is
insufficient.

A cryptic Python solution (not mine), 73 characters:

m=[map(int,_().split())for _ in[raw_input]*10]
_(max(map(sum,m+zip(*m))))

Bye,
bearophile
```
Mar 24 2014
"Jay Norwood" <jayn prismnet.com> writes:
```not through yet with the diamond.  This one is a little faster.
Appending the newline to the stars and calculating the slice
backward from the end would save a w.put for the newlines ...
probably faster.  I keep looking for a way to create a dynamic
array of a specific size, filled with the init value I provide.
Does it exist?

D:\diamond\diamond\diamond\Release>diamond 1>nul
printDiamond1: time: 1140[ms]
printDiamond2: time: 1631[ms]
printDiamond3: time: 633[ms]
jay1: time: 459[ms]
sergei: time: 11886[ms]
jay2: time: 415[ms]
diamondShape: time: 4553[ms]
printDiamond: time: 187[ms]
printDiamonde2a: time: 139[ms]

void printDiamonde2a(in uint N)
{
size_t N2 = N/2;
char pSpace[] = uninitializedArray!(char[])(N2);
pSpace[] = ' ';

char pStars[] = uninitializedArray!(char[])(N);
pStars[] = '*';

char pNewLine[]=uninitializedArray!(char[])(2);
pNewLine[] = '\n';

auto w = appender!(char[])();
w.reserve(N*4);

foreach (n ; 0 .. N2 + 1){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[0 .. 2*n+1]);
w.put(pNewLine[1]);
}

foreach_reverse (n ; 0 .. N2){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[0 .. 2*n+1]);
w.put(pNewLine[1]);
}
write(w.data);
}
```
Mar 24 2014
"Jay Norwood" <jayn prismnet.com> writes:
```These were times on ubuntu. I may have printed debug build times
previously, but these are dmd release build.  I gave up trying to
figure out how to build ldc on ubuntu.  The dmd one click
installer is much appreciated.

printDiamond1: time: 380[ms]
printDiamond2: time: 728[ms]
printDiamond3: time: 378[ms]
jay1: time: 62[ms]
sergei: time: 3965[ms]
jay2: time: 27[ms]
diamondShape: time: 2778[ms]
printDiamond: time: 19[ms]
printDiamonde: time: 19[ms]
printDiamonde2b: time: 16[ms]

This was using the appended newlines to get rid of the extra wput
in the loops.

void printDiamonde2b(in uint N)
{
uint N2 = N/2;
char pSpace[] = uninitializedArray!(char[])(N2);
pSpace[] = ' ';

char pStars[] = uninitializedArray!(char[])(N+1);
pStars[] = '*';

pStars[\$-1] = '\n';

auto w = appender!(char[])();
w.reserve(N*3);

foreach (n ; 0 .. N2 + 1){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

foreach_reverse (n ; 0 .. N2){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

write(w.data);
}
```
Mar 24 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Tuesday, 25 March 2014 at 02:25:57 UTC, Jay Norwood wrote:
not through yet with the diamond.  This one is a little faster.
Appending the newline to the stars and calculating the slice
backward from the end would save a w.put for the newlines ...
probably faster.  I keep looking for a way to create a dynamic
array of a specific size, filled with the init value I provide.
Does it exist?

D:\diamond\diamond\diamond\Release>diamond 1>nul
printDiamond1: time: 1140[ms]
printDiamond2: time: 1631[ms]
printDiamond3: time: 633[ms]
jay1: time: 459[ms]
sergei: time: 11886[ms]
jay2: time: 415[ms]
diamondShape: time: 4553[ms]
printDiamond: time: 187[ms]
printDiamonde2a: time: 139[ms]

void printDiamonde2a(in uint N)
{
size_t N2 = N/2;
char pSpace[] = uninitializedArray!(char[])(N2);
pSpace[] = ' ';

char pStars[] = uninitializedArray!(char[])(N);
pStars[] = '*';

char pNewLine[]=uninitializedArray!(char[])(2);
pNewLine[] = '\n';

auto w = appender!(char[])();
w.reserve(N*4);

foreach (n ; 0 .. N2 + 1){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[0 .. 2*n+1]);
w.put(pNewLine[1]);
}

foreach_reverse (n ; 0 .. N2){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[0 .. 2*n+1]);
w.put(pNewLine[1]);
}
write(w.data);
}

Interesting. I'd have thought the "extra copy" would be an
overall slowdown, but I guess that's not the case.

I also tried your strategy of adding '\n' to the buffer, but I
was getting some bad output on windows. I'm not sure why "\n\n"
works though. On *nix, I'd have also expected a double line feed.
Did you check the actual output?

Appender is better than "~=", but it's not actually that good
either. Try this:

//----
void printDiamond3(size_t N)
{
import core.memory;
char* p = cast(char*)GC.malloc(N*N+16);
p[0..N*N+16]='*';

auto pp = p;
N/=2;
enum code = q{
pp[0 .. N - n] = ' ';
pp+=(1+N+n);
version(Windows)
{
pp[0 .. 2] = "\r\n";
pp+=2;
}
else
{
pp[0] = '\n';
++pp;
}
};
foreach        (n; 0 .. N + 1) {mixin(code);}
foreach_reverse(n; 0 .. N    ) {mixin(code);}
write(p[0 .. pp-p]);
}
//----

This makes just 1 allocation of roughly the right size. It also
eagerly fills the entire array with '*', since I *figure* that's
faster than a lot of different writes.

I could be mistaken about that though, but I imagine the
pre-allocation and not using Appender is definitely a boost.
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
``` Interesting. I'd have thought the "extra copy" would be an
overall slowdown, but I guess that's not the case.

I also tried your strategy of adding '\n' to the buffer, but I
was getting some bad output on windows. I'm not sure why "\n\n"
works though. On *nix, I'd have also expected a double line
feed. Did you check the actual output?

I checked the output.  The range selected is for one newline.
Appender is better than "~=", but it's not actually that good
either. Try this:

//----
void printDiamond3(size_t N)
{
import core.memory;
char* p = cast(char*)GC.malloc(N*N+16);
p[0..N*N+16]='*';

auto pp = p;
N/=2;
enum code = q{
pp[0 .. N - n] = ' ';
pp+=(1+N+n);
version(Windows)
{
pp[0 .. 2] = "\r\n";
pp+=2;
}
else
{
pp[0] = '\n';
++pp;
}
};
foreach        (n; 0 .. N + 1) {mixin(code);}
foreach_reverse(n; 0 .. N    ) {mixin(code);}
write(p[0 .. pp-p]);
}
//----

This makes just 1 allocation of roughly the right size. It also
eagerly fills the entire array with '*', since I *figure*
that's faster than a lot of different writes.

I could be mistaken about that though, but I imagine the
pre-allocation and not using Appender is definitely a boost.

ok. I'll try it.  I was happy the appender was pretty fast.
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```These are times on ubuntu. printDiamond3 was slower than
printDiamond.

printDiamond1: time: 373[ms]
printDiamond2: time: 722[ms]
printDiamond3: time: 384[ms]
jay1: time: 62[ms]
sergei: time: 3918[ms]
jay2: time: 28[ms]
diamondShape: time: 2725[ms]
printDiamond: time: 19[ms]
printDiamonde2a: time: 18[ms]
printDiamonde2b: time: 14[ms]
printDiamond3: time: 26[ms]
```
Mar 25 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Tuesday, 25 March 2014 at 12:30:37 UTC, Jay Norwood wrote:
These are times on ubuntu. printDiamond3 was slower than
printDiamond.

I was able to improve my first "printDiamon" by having a single
slice that contains spaces then stars, and make writeln's of that.

It gave (on my windows) speeds comparable to your printDiamond3.
But not any speed differences that warrants posting new code.

Thanks for the benches. This was fun :)

I love how D can achieve *great* performance, while still looking
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Tuesday, 25 March 2014 at 15:31:12 UTC, monarch_dodra wrote:
I love how D can achieve *great* performance, while still

Yes,  I'm pretty happy to see the appender works well.  The
parallel library also seems to work very well in my few
experiences with it.

Maybe it would be useful to see how to use the parallel api to
implement this, and if it can make a scalable impact on the
execution time.
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```This is a first attempt at using parallel, but no improvement in
speed on a corei7.  It is about 3x slower than the prior
versions.  Probably the join was not a good idea.  Also, no
foreach_reverse for the parallel, so it requires extra
calculations for the reverse index.

void printDiamonde2cpa(in uint N)
{
size_t N2 = N/2;
char p[] = uninitializedArray!(char[])(N2+N);
p[0..N2] = ' ';
p[N2..\$] = '*';
char nl[] = uninitializedArray!(char[])(1);
nl[] = '\n';

char[][] wc = minimallyInitializedArray!(char[][])(N);

auto w = appender!(char[])();

elem = p[n .. N2+2*n+1];
}

foreach (rn, ref elem ; taskPool.parallel(wc[0..N2],100)){
int n = N2 - rn - 1;
elem = p[n .. N2+2*n+1];
}
auto wj = join(wc,nl);
w.put(wj);

writeln(w.data);
}
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Wednesday, 26 March 2014 at 04:47:48 UTC, Jay Norwood wrote:
This is a first attempt at using parallel, but no improvement

oops.  scratch that one. I tested a pointer to the wrong function.
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```This corrects the parallel example range in the second foreach.
Still slow.

void printDiamonde2cpa(in uint N)
{
size_t N2 = N/2;
char p[] = uninitializedArray!(char[])(N2+N);
p[0..N2] = ' ';
p[N2..\$] = '*';
char nl[] = uninitializedArray!(char[])(1);
nl[] = '\n';

char[][] wc = minimallyInitializedArray!(char[][])(N);

auto w = appender!(char[])();

elem = p[n .. N2+2*n+1];
}

foreach (rn, ref elem ; taskPool.parallel(wc[N2+1..N],100)){
int n = N2 - rn - 1;
elem = p[n .. N2+2*n+1];
}
auto wj = join(wc,nl);
w.put(wj);

writeln(w.data);
}
```
Mar 25 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Tuesday, 25 March 2014 at 08:42:30 UTC, monarch_dodra wrote:
Interesting. I'd have thought the "extra copy" would be an
overall slowdown, but I guess that's not the case.

I installed ubuntu 14.04 64 bit, and measured some of these
examples using gdc, ldc and dmd on a corei3 box.  The examples
that wouldn't build had something to do with use of
array.replicate and range.replicate conflicting in the libraries
for gdc and ldc builds, which were based on 2.064.2.

This is the ldc2 (0.13.0 alpha)(2.064.2) result:
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./main
1>/dev/null
sergei: time: 2441[ms]
jay2: time: 26[ms]
diamondShape: time: 679[ms]
printDiamond: time: 19[ms]
printDiamonde2a: time: 9[ms]
printDiamonde2b: time: 8[ms]
printDiamond3: time: 14[ms]

This is the gdc(2.064.2) result:
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./a.out
1>/dev/null
sergei: time: 2828[ms]
jay2: time: 26[ms]
diamondShape: time: 776[ms]
printDiamond: time: 19[ms]
printDiamonde2a: time: 13[ms]
printDiamonde2b: time: 13[ms]
printDiamond3: time: 51[ms]

This is the dmd(2.065) result:
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./main
1>/dev/null
sergei: time: 3480[ms]
jay2: time: 29[ms]
diamondShape: time: 2462[ms]
printDiamond: time: 23[ms]
printDiamonde2a: time: 13[ms]
printDiamonde2b: time: 10[ms]
printDiamond3: time: 23[ms]

So this printDiamonde2b example had the fastest time of the
solutions, and had similar times on all three builds. The ldc2
compiler build is performing best in most examples on ubuntu.

void printDiamonde2b(in uint N)
{
uint N2 = N/2;
char pSpace[] = uninitializedArray!(char[])(N2);
pSpace[] = ' ';

char pStars[] = uninitializedArray!(char[])(N+1);
pStars[] = '*';

pStars[\$-1] = '\n';

auto w = appender!(char[])();
w.reserve(N*3);

foreach (n ; 0 .. N2 + 1){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

foreach_reverse (n ; 0 .. N2){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

write(w.data);
}
```
Apr 20 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Monday, 21 April 2014 at 00:11:14 UTC, Jay Norwood wrote:
So this printDiamonde2b example had the fastest time of the
solutions, and had similar times on all three builds. The ldc2
compiler build is performing best in most examples on ubuntu.

void printDiamonde2b(in uint N)
{
uint N2 = N/2;
char pSpace[] = uninitializedArray!(char[])(N2);
pSpace[] = ' ';

char pStars[] = uninitializedArray!(char[])(N+1);
pStars[] = '*';

pStars[\$-1] = '\n';

auto w = appender!(char[])();
w.reserve(N*3);

foreach (n ; 0 .. N2 + 1){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

foreach_reverse (n ; 0 .. N2){
w.put(pSpace[0 .. N2 - n]);
w.put(pStars[\$-2*n-2 .. \$]);
}

write(w.data);
}

With this slightly tweaked solution, I can get times of roughly
50% to 100% faster, on my dmd-linux box:

//----
void printDiamonde2monarch(in uint N)
{
uint N2 = N/2;

char[] pBuf = uninitializedArray!(char[])(N + N2);
pBuf[ 0 .. N2] = ' ';
pBuf[N2 ..  \$] = '*';

auto slice = uninitializedArray!(char[])(3*N2*N2 + 4*N);

size_t i;
foreach (n ; 0 .. N2 + 1){
auto w = 1 + N2 + n;
slice[i .. i + w] = pBuf[n .. w + n];
slice[(i+=w)++]='\n';
}

foreach_reverse (n ; 0 .. N2){
auto w = 1 + N2 + n;
slice[i .. i + w] = pBuf[n .. w + n];
slice[(i+=w)++]='\n';
}

write(slice[0 .. i]);
}
//----

The two "key" points here, first, is to avoid using appender.
Second, instead of having two buffer: "    " and "******\n", and
two do two "slice copies", to only have 1 buffer "    *****", and
to do 1 slice copy, and a single '\n' write. At this point, I'm
not sure how we could be going any faster, short of using
alloca...

How does this hold up on your environment?
```
Apr 21 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Monday, 21 April 2014 at 08:26:49 UTC, monarch_dodra wrote:
The two "key" points here, first, is to avoid using appender.
Second, instead of having two buffer: "    " and "******\n",
and two do two "slice copies", to only have 1 buffer "
*****", and to do 1 slice copy, and a single '\n' write. At
this point, I'm not sure how we could be going any faster,
short of using alloca...

How does this hold up on your environment?

Yes your solution is the fastest yet.  Also, its times are
similar for all three compilers.   The range of execution times
varied for different solutions from over 108 seconds down to 64
msec.

I see that RefAppender's data() returns the managed array.  Can
write() handle that?  It seems that would be more efficient than
duplicating the  character buffer ... or perhaps writing directly
to an OutBuffer, and then sending that to write() would avoid the
duplication?

jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ gdc -O2 main.d
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./a.out
1>/dev/null
sergei: time: 28596[ms]
jay2: time: 258[ms]
diamondShape: time: 7512[ms]
printDiamond: time: 200[ms]
printDiamonde2a: time: 140[ms]
printDiamonde2b: time: 137[ms]
printDiamond3: time: 503[ms]
printDiamonde2monarch: time: 86[ms]
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ dmd -release
main.d
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./main
1>/dev/null
sergei: time: 33949[ms]
jay2: time: 282[ms]
diamondShape: time: 24567[ms]
printDiamond: time: 230[ms]
printDiamonde2a: time: 132[ms]
printDiamonde2b: time: 106[ms]
printDiamond3: time: 222[ms]
printDiamonde2monarch: time: 66[ms]
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ~/ldc/bin/ldc2
-O2 main.d
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./main
1>/dev/null
sergei: time: 24841[ms]
jay2: time: 259[ms]
diamondShape: time: 6797[ms]
printDiamond: time: 194[ms]
printDiamonde2a: time: 91[ms]
printDiamonde2b: time: 87[ms]
printDiamond3: time: 145[ms]
printDiamonde2monarch: time: 64[ms]
jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$
```
Apr 21 2014
"Jay Norwood" <jayn prismnet.com> writes:
```Wow,  joiner is much slower than join.  Such a small choice can
make this big of a difference.  Not at all expected, since the
lazy calls, I thought, were considered to be more efficient.
This is with ldc2 -O2.

jay jay-ubuntu:~/ec_ddt/workspace/diamond/source\$ ./main
1>/dev/null
sergei: time: 24629[ms]
jay2: time: 259[ms]
diamondShape: time: 6701[ms]
printDiamond: time: 194[ms]
printDiamonde2a: time: 95[ms]
printDiamonde2b: time: 92[ms]
printDiamond3: time: 144[ms]
printDiamonde2monarch: time: 67[ms]
printDiamonde2cJoin: time: 96[ms]
printDiamonde2cJoiner: time: 16115[ms]

void printDiamonde2cJoin(in uint N)
{
int n,l;
size_t N2 = N/2;
size_t NM1 = N-1;
char p[] = uninitializedArray!(char[])(N2+N);
p[0..N2] = ' ';
p[N2..\$] = '*';
char nl[] = uninitializedArray!(char[])(1);
nl[] = '\n';

char wc[][] = minimallyInitializedArray!(char[][])(N);

for(n=0,l=0; n<N2; n++,l+=2){
wc[n] = wc[NM1-n] = p[n .. N2+l+1];
}

wc[N2] = p[N2..\$];
auto wj = join(wc,nl);
write(wj);
write('\n');
}

void printDiamonde2cJoiner(in uint N)
{
int n,l;
size_t N2 = N/2;
size_t NM1 = N-1;
char p[] = uninitializedArray!(char[])(N2+N);
p[0..N2] = ' ';
p[N2..\$] = '*';

char wc[][] = minimallyInitializedArray!(char[][])(N);

for(n=0,l=0; n<N2; n++,l+=2){
wc[n] = wc[NM1-n] = p[n .. N2+l+1];
}

wc[N2] = p[N2..\$];
write(joiner(wc,"\n"));
write('\n');
}
```
Apr 22 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Tuesday, 22 April 2014 at 11:41:41 UTC, Jay Norwood wrote:
Wow,  joiner is much slower than join.  Such a small choice can
make this big of a difference.  Not at all expected, since the
lazy calls, I thought, were considered to be more efficient.
This is with ldc2 -O2.

Yeah, that's because join actually works on "RoR, R", rather than
"R, E". This means if you feed it a "string[], string", then it
will actually iterate over individual *characters*. Not only
that, but since you are using char[], it will decode them too.

"join" is faster for 2 reasons:
1) It detects you want to joins arrays, so it doesn't have to
iterate over them: It just glues them "slice at once"
2) No UTF decoding.

I kind of wish we had a faster joiner, but I think it would have
```
Apr 22 2014
"Jay Norwood" <jayn prismnet.com> writes:
```On Tuesday, 22 April 2014 at 15:25:04 UTC, monarch_dodra wrote:
Yeah, that's because join actually works on "RoR, R", rather
than "R, E". This means if you feed it a "string[], string",
then it will actually iterate over individual *characters*. Not
only that, but since you are using char[], it will decode them
too.

"join" is faster for 2 reasons:
1) It detects you want to joins arrays, so it doesn't have to
iterate over them: It just glues them "slice at once"
2) No UTF decoding.

I kind of wish we had a faster joiner, but I think it would

Ok, thanks.  I re-tried joiner with both parameters being ranges,
but there was no improvement in execution speed.  I thought

char nl[] = uninitializedArray!(char[])(1);
nl[] = '\n';

write(joiner(wc,nl));
```
Apr 23 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Tuesday, 22 April 2014 at 05:05:30 UTC, Jay Norwood wrote:
On Monday, 21 April 2014 at 08:26:49 UTC, monarch_dodra wrote:
The two "key" points here, first, is to avoid using appender.
Second, instead of having two buffer: "    " and "******\n",
and two do two "slice copies", to only have 1 buffer "
*****", and to do 1 slice copy, and a single '\n' write. At
this point, I'm not sure how we could be going any faster,
short of using alloca...

How does this hold up on your environment?

Yes your solution is the fastest yet.  Also, its times are
similar for all three compilers.   The range of execution times
varied for different solutions from over 108 seconds down to 64
msec.

I see that RefAppender's data() returns the managed array.  Can
write() handle that?  It seems that would be more efficient
than duplicating the  character buffer ...

I'm not sure what you mean? "data" returns the managed array, but
no duplication ever actually happens. It's allocated on the GC.
the only thing that is copied is the slice itself.

or perhaps writing directly to an OutBuffer, and then sending
that to write() would avoid the duplication?

appender *is* the outbuffer :)
```
Apr 22 2014
"bearophile" <bearophileHUGS lycos.com> writes:
``` void main(){
import std.stdio,std.range,std.algorithm,std.conv;