digitalmars.D.learn - Speed of horizontal flip

tchaloupka (84/84) Apr 01 2015 Hi,

bearophile (15/16) Apr 01 2015 If you have to perform performance benchmarks then use ldc or gdc.

tchaloupka (20/24) Apr 01 2015 I tried it on my slower linux box (i5-2500K vs i7-2600K) without
Dominikus Dittes Scherkl (5/12) Apr 02 2015 This very text should be placed somewhere prominent at the D

John Colvin (46/46) Apr 01 2015 On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote:

tchaloupka (3/9) Apr 01 2015 Yes thats right, load, flip and save are all performed by GDI+ so

thedeemon (2/2) Apr 01 2015 std.algorithm.reverse uses ranges, and shamefully DMD is really

John Colvin (13/15) Apr 02 2015 The specialisation of reverse selected for slices does not use

Rikki Cattermole (23/104) Apr 02 2015 Assuming I've done it correctly, Devisualization.Image takes around 8ms

Rikki Cattermole (3/126) Apr 02 2015 My bad, forgot I decreased test image resolution to 256x256. I'm totally...

John Colvin (17/168) Apr 02 2015 Have you considered just being able to grab an object with

Rikki Cattermole (4/158) Apr 02 2015 I've got it down to ~ 12ms using dmd now. But if the image was much

John Colvin (6/198) Apr 02 2015 That would be an insanely large image. If it was square it would

Rikki Cattermole (6/177) Apr 02 2015 Most image editing software could definitely not handle it. I would be

Vladimir Panteleev (3/16) Apr 05 2015 My implementation of flip takes 0ms ;)

"tchaloupka" <chalucha gmail.com> writes:

Hi,
I have a bunch of square r16 and png images which I need to flip
horizontally.

My flip method looks like this:
void hFlip(T)(T[] data, int w)
{
    import std.datetime : StopWatch;
	
    StopWatch sw;
    sw.start();
	
    foreach(int i; 0..w)
    {
      auto row = data[i*w..(i+1)*w];
      row.reverse();
    }
	
    sw.stop();
    writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
}

With simple r16 file format its pretty fast, but with RGB PNG
files (2048x2048) I noticed its somewhat slow so I tried to



PNG load - 90ms
PNG flip - 10ms
PNG save - 380ms

D using dlib (http://code.dlang.org/packages/dlib):
PNG load - 500ms
PNG flip - 30ms
PNG save - 950ms

D using imageformats
(http://code.dlang.org/packages/imageformats):
PNG load - 230ms
PNG flip - 30ms
PNG save - 1100ms

I used dmd-2.0.67 with -release -inline -O

debugging and even with that it is much faster.

I know that System.Drawing is using Windows GDI+, that can be
used with D too, but not on linux.
If we ignore the PNG loading and saving (didn't tried libpng
yet), even flip method itself is 3 times slower - I don't know D
enough to be sure if there isn't some more effecient way to make
the flip. I like how the slices can be used here.


possible from a system level programming language this can be
somewhat disappointing to see that pure D version is about 3
times slower.

Am I doing something utterly wrong?
Note that this example is not critical for me, it's just a simple
hobby script I use to move and flip some images - I can wait. But
I post it to see if this can be taken somewhat closer to what can
be expected from a system level programming language.

dlib:
auto im = loadPNG(name);
hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
savePNG(im, newName);

imageformats:
auto im = read_image(name);
hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
write_image(newName, im.w, im.h, im.pixels);


static void Main(string[] args)
          {
              var files = Directory.GetFiles(args[0]);

              foreach (var f in files)
              {
                  var sw = Stopwatch.StartNew();
                  var img = Image.FromFile(f);

                  Debug.WriteLine("Img loaded in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

                  img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                  Debug.WriteLine("Img flipped in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

                  img.Save(Path.Combine(args[0], "test_" +
Path.GetFileName(f)));
                  Debug.WriteLine("Img saved in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                  sw.Stop();
              }
          }

Apr 01 2015

"bearophile" <bearophileHUGS lycos.com> writes:

tchaloupka:

 Am I doing something utterly wrong?

If you have to perform performance benchmarks then use ldc or gdc.

Also disable bound tests with your compilation switches.

Sometimes reverse() is not efficient, I think, it should be 
improved. Try to replace it with a little function written by you.

Add the usual pure/nothrow/ nogc/ safe annotations where you can 
(they don't increase speed much, usually).

And you refer to flip as "method", so if you are using classes 
don't forget to make the method final.

Profile the code and look for the performance bottlenecks.

You can even replace the *w multiplications with an increment of 
an index each loop, but this time saving is dwarfed by the 
reverse().

Bye,
bearophile

Apr 01 2015

"tchaloupka" <chalucha gmail.com> writes:

On Wednesday, 1 April 2015 at 14:00:52 UTC, bearophile wrote:
 tchaloupka:

 Am I doing something utterly wrong?

 If you have to perform performance benchmarks then use ldc or 
 gdc.

I tried it on my slower linux box (i5-2500K vs i7-2600K) without 
change with these results:


Img loaded in 108[ms]
Img flipped in 22[ms]
Img saved in 492[ms]

dmd-2.067:
Png loaded in: 150[ms]
Img flipped in: 28[ms]
Png saved in: 765[ms]

gdc-4.8.3:
Png loaded in: 121[ms]
Img flipped in: 4[ms]
Png saved in: 686[ms]

ldc2-0_15:
Png loaded in: 106[ms]
Img flipped in: 4[ms]
Png saved in: 610[ms]

I'm ok with that, thx.

Apr 01 2015

"Dominikus Dittes Scherkl" writes:

On Wednesday, 1 April 2015 at 14:00:52 UTC, bearophile wrote:
 If you have to perform performance benchmarks then use ldc or 
 gdc.

 Also disable bound tests with your compilation switches.

 Add the usual pure/nothrow/ nogc/ safe annotations where you 
 can (they don't increase speed much, usually).

 if you are using classes don't forget to make the method final.

 Profile the code and look for the performance bottlenecks.

This very text should be placed somewhere prominent at the D
homepage if we don't want to constantly dissapoint people who
come with the impession that D should be at the same speed level
as C/C++ but their test programs aren't.

Apr 02 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote:
<snip>

I'm pretty sure that the flipping happens in GDI+ as well. You 

the work is C and/or C++, quite possibly carefully optimised over 
many years by microsoft.


could easily just set the iteration scheme and return (like 
numpy.ndarray.T does, if you're familiar with python).

dmd does not produce particularly fast code. ldc and gdc are much 
better at that.

Sadly, std.algorithm.reserve isn't perhaps as fast as it could be 
for arrays of static arrays, at least in theory. Try this, but I 
hope that with a properly optimised build from ldc/gdc it won't 
make any difference:

void reverse(ubyte[3][] r)
{
     immutable last = r.length-1;
     immutable steps = r.length/2;
     foreach(immutable i; 0 .. steps)
     {
         immutable tmp = r[i];
         r[i] = r[last - i];
         r[last - i] = tmp;
     }
}

unittest
{
	ubyte[3] a = [1,2,3];
	ubyte[3] b = [7,6,5];
	
	auto c = [a,b];
	c.reverse();
	assert(c == [b,a]);
	
	ubyte[3] d = [9,4,6];
	
	auto e = [a,b,d];
	e.reverse();
	assert(e == [d,b,a]);
	
	auto f = e.dup;
	e.reverse;
	e.reverse;
	assert(f == e);
}

Apr 01 2015

"tchaloupka" <chalucha gmail.com> writes:

On Wednesday, 1 April 2015 at 16:08:14 UTC, John Colvin wrote:
 On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote:
 <snip>

 I'm pretty sure that the flipping happens in GDI+ as well. You 

 the work is C and/or C++, quite possibly carefully optimised 
 over many years by microsoft.

Yes thats right, load, flip and save are all performed by GDI+ so

Apr 01 2015

"thedeemon" <dlang thedeemon.com> writes:

std.algorithm.reverse uses ranges, and shamefully DMD is really 
bad at optimizing away range-induced costs.

Apr 01 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 2 April 2015 at 05:21:08 UTC, thedeemon wrote:
 std.algorithm.reverse uses ranges, and shamefully DMD is really 
 bad at optimizing away range-induced costs.

The specialisation of reverse selected for slices does not use 
the range interface, it's all just indexing. The only overheads 
come from:

a) function calls, if the inliner isn't doing its job (which it 
really should be in these cases).

b) a check for aliasing in swapAt, which is only done for ranges 
of static arrays. Again, should be optimised away in this case, 
but it's possible DMD doesn't manage it. Either way, it's a 
trivially predictable branch and should be effectively free at 
the CPU level.

Once you've got past those, it's just straight loop I posted 
before.

Apr 02 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
     import std.datetime : StopWatch;

     StopWatch sw;
     sw.start();

     foreach(int i; 0..w)
     {
       auto row = data[i*w..(i+1)*w];
       row.reverse();
     }

     sw.stop();
     writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't know D
 enough to be sure if there isn't some more effecient way to make
 the flip. I like how the slices can be used here.


 possible from a system level programming language this can be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a simple
 hobby script I use to move and flip some images - I can wait. But
 I post it to see if this can be taken somewhat closer to what can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
           {
               var files = Directory.GetFiles(args[0]);

               foreach (var f in files)
               {
                   var sw = Stopwatch.StartNew();
                   var img = Image.FromFile(f);

                   Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Restart();

                   img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                   Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Restart();

                   img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                   Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Stop();
               }
           }


Assuming I've done it correctly, Devisualization.Image takes around 8ms 
in debug mode to flip horizontally using dmd. But 3ms for release.

module test;

void main() {
     import devisualization.image;
     import devisualization.image.mutable;
	import devisualization.util.core.linegraph;

     import std.stdio;

	writeln("===============\nREAD\n===============");
	Image img = imageFromFile("test/large.png");
	img = new MutableImage(img);

	import std.datetime : StopWatch;

	StopWatch sw;
	sw.start();

	foreach(i; 0 .. 1000) {
		img.flipHorizontal;
	}

	sw.stop();

	writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
}

I was planning on doing this earlier. But I discovered a PR I pulled 
which fixed for 2.067 broke chunk types reading.

Apr 02 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
 On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
     import std.datetime : StopWatch;

     StopWatch sw;
     sw.start();

     foreach(int i; 0..w)
     {
       auto row = data[i*w..(i+1)*w];
       row.reverse();
     }

     sw.stop();
     writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't know D
 enough to be sure if there isn't some more effecient way to make
 the flip. I like how the slices can be used here.


 possible from a system level programming language this can be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a simple
 hobby script I use to move and flip some images - I can wait. But
 I post it to see if this can be taken somewhat closer to what can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
           {
               var files = Directory.GetFiles(args[0]);

               foreach (var f in files)
               {
                   var sw = Stopwatch.StartNew();
                   var img = Image.FromFile(f);

                   Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Restart();

                   img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                   Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Restart();

                   img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                   Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                   sw.Stop();
               }
           }


 Assuming I've done it correctly, Devisualization.Image takes around 8ms
 in debug mode to flip horizontally using dmd. But 3ms for release.

 module test;

 void main() {
      import devisualization.image;
      import devisualization.image.mutable;
      import devisualization.util.core.linegraph;

      import std.stdio;

      writeln("===============\nREAD\n===============");
      Image img = imageFromFile("test/large.png");
      img = new MutableImage(img);

      import std.datetime : StopWatch;

      StopWatch sw;
      sw.start();

      foreach(i; 0 .. 1000) {
          img.flipHorizontal;
      }

      sw.stop();

      writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
 }

 I was planning on doing this earlier. But I discovered a PR I pulled
 which fixed for 2.067 broke chunk types reading.

My bad, forgot I decreased test image resolution to 256x256. I'm totally 
out of the running. I have some serious work to do by the looks.

Apr 02 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
 On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
 On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to 
 flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
    import std.datetime : StopWatch;

    StopWatch sw;
    sw.start();

    foreach(int i; 0..w)
    {
      auto row = data[i*w..(i+1)*w];
      row.reverse();
    }

    sw.stop();
    writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 for
 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't 
 know D
 enough to be sure if there isn't some more effecient way to 
 make
 the flip. I like how the slices can be used here.


 possible from a system level programming language this can be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a 
 simple
 hobby script I use to move and flip some images - I can wait. 
 But
 I post it to see if this can be taken somewhat closer to what 
 can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
          {
              var files = Directory.GetFiles(args[0]);

              foreach (var f in files)
              {
                  var sw = Stopwatch.StartNew();
                  var img = Image.FromFile(f);

                  Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

                  
 img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                  Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

                  img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                  Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Stop();
              }
          }


 Assuming I've done it correctly, Devisualization.Image takes 
 around 8ms
 in debug mode to flip horizontally using dmd. But 3ms for 
 release.

 module test;

 void main() {
     import devisualization.image;
     import devisualization.image.mutable;
     import devisualization.util.core.linegraph;

     import std.stdio;

     writeln("===============\nREAD\n===============");
     Image img = imageFromFile("test/large.png");
     img = new MutableImage(img);

     import std.datetime : StopWatch;

     StopWatch sw;
     sw.start();

     foreach(i; 0 .. 1000) {
         img.flipHorizontal;
     }

     sw.stop();

     writeln("Img flipped in: ", sw.peek().msecs / 1000, 
 "[ms]");
 }

 I was planning on doing this earlier. But I discovered a PR I 
 pulled
 which fixed for 2.067 broke chunk types reading.

 My bad, forgot I decreased test image resolution to 256x256. 
 I'm totally out of the running. I have some serious work to do 
 by the looks.

Have you considered just being able to grab an object with 
changed iteration order instead of actually doing the flip? The 
same goes for transposes and 90º rotations. Sure, sometimes you 
do need actually rearrange the memory and in a subset of those 
cases you need it to be done fast, but a lot of the time you're 
better off* just using a different iteration scheme (which, for 
ranges, should probably be part of the type to avoid checking the 
scheme every iteration).

*for speed and memory reasons. Need to keep the original and the 
transpose? No need to for any duplicates

Note that this is what numpy does with transposes. The .T and 
.transpose methods of ndarray don't actually modify the data, 
they just set the memory order** whereas the transpose function 
actually moves memory around.

**using a runtime flag, which is ok for them because internal 
iteration lets you only branch once on it.

Apr 02 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 3/04/2015 12:29 a.m., John Colvin wrote:
 On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
 On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
 On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
    import std.datetime : StopWatch;

    StopWatch sw;
    sw.start();

    foreach(int i; 0..w)
    {
      auto row = data[i*w..(i+1)*w];
      row.reverse();
    }

    sw.stop();
    writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't know D
 enough to be sure if there isn't some more effecient way to make
 the flip. I like how the slices can be used here.


 possible from a system level programming language this can be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a simple
 hobby script I use to move and flip some images - I can wait. But
 I post it to see if this can be taken somewhat closer to what can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
          {
              var files = Directory.GetFiles(args[0]);

              foreach (var f in files)
              {
                  var sw = Stopwatch.StartNew();
                  var img = Image.FromFile(f);

                  Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

 img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                  Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Restart();

                  img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                  Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                  sw.Stop();
              }
          }


 Assuming I've done it correctly, Devisualization.Image takes around 8ms
 in debug mode to flip horizontally using dmd. But 3ms for release.

 module test;

 void main() {
     import devisualization.image;
     import devisualization.image.mutable;
     import devisualization.util.core.linegraph;

     import std.stdio;

     writeln("===============\nREAD\n===============");
     Image img = imageFromFile("test/large.png");
     img = new MutableImage(img);

     import std.datetime : StopWatch;

     StopWatch sw;
     sw.start();

     foreach(i; 0 .. 1000) {
         img.flipHorizontal;
     }

     sw.stop();

     writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
 }

 I was planning on doing this earlier. But I discovered a PR I pulled
 which fixed for 2.067 broke chunk types reading.

 My bad, forgot I decreased test image resolution to 256x256. I'm
 totally out of the running. I have some serious work to do by the looks.

 Have you considered just being able to grab an object with changed
 iteration order instead of actually doing the flip? The same goes for
 transposes and 90º rotations. Sure, sometimes you do need actually
 rearrange the memory and in a subset of those cases you need it to be
 done fast, but a lot of the time you're better off* just using a
 different iteration scheme (which, for ranges, should probably be part
 of the type to avoid checking the scheme every iteration).

 *for speed and memory reasons. Need to keep the original and the
 transpose? No need to for any duplicates

 Note that this is what numpy does with transposes. The .T and .transpose
 methods of ndarray don't actually modify the data, they just set the
 memory order** whereas the transpose function actually moves memory around.

 **using a runtime flag, which is ok for them because internal iteration
 lets you only branch once on it.

I've got it down to ~ 12ms using dmd now. But if the image was much 
bigger (lets say a height of ushort.max). I wouldn't be able to use a 
little trick. But this is only because I'm using multithreading.

Apr 02 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
 On 3/04/2015 12:29 a.m., John Colvin wrote:
 On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole 
 wrote:
 On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
 On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to 
 flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
   import std.datetime : StopWatch;

   StopWatch sw;
   sw.start();

   foreach(int i; 0..w)
   {
     auto row = data[i*w..(i+1)*w];
     row.reverse();
   }

   sw.stop();
   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB 
 PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 for
 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can 
 be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't 
 know D
 enough to be sure if there isn't some more effecient way to 
 make
 the flip. I like how the slices can be used here.


 as
 possible from a system level programming language this can 
 be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a 
 simple
 hobby script I use to move and flip some images - I can 
 wait. But
 I post it to see if this can be taken somewhat closer to 
 what can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
         {
             var files = Directory.GetFiles(args[0]);

             foreach (var f in files)
             {
                 var sw = Stopwatch.StartNew();
                 var img = Image.FromFile(f);

                 Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Restart();

 img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                 Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Restart();

                 img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                 Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Stop();
             }
         }


 Assuming I've done it correctly, Devisualization.Image takes 
 around 8ms
 in debug mode to flip horizontally using dmd. But 3ms for 
 release.

 module test;

 void main() {
    import devisualization.image;
    import devisualization.image.mutable;
    import devisualization.util.core.linegraph;

    import std.stdio;

    writeln("===============\nREAD\n===============");
    Image img = imageFromFile("test/large.png");
    img = new MutableImage(img);

    import std.datetime : StopWatch;

    StopWatch sw;
    sw.start();

    foreach(i; 0 .. 1000) {
        img.flipHorizontal;
    }

    sw.stop();

    writeln("Img flipped in: ", sw.peek().msecs / 1000, 
 "[ms]");
 }

 I was planning on doing this earlier. But I discovered a PR 
 I pulled
 which fixed for 2.067 broke chunk types reading.

 My bad, forgot I decreased test image resolution to 256x256. 
 I'm
 totally out of the running. I have some serious work to do by 
 the looks.

 Have you considered just being able to grab an object with 
 changed
 iteration order instead of actually doing the flip? The same 
 goes for
 transposes and 90º rotations. Sure, sometimes you do need 
 actually
 rearrange the memory and in a subset of those cases you need 
 it to be
 done fast, but a lot of the time you're better off* just using 
 a
 different iteration scheme (which, for ranges, should probably 
 be part
 of the type to avoid checking the scheme every iteration).

 *for speed and memory reasons. Need to keep the original and 
 the
 transpose? No need to for any duplicates

 Note that this is what numpy does with transposes. The .T and 
 .transpose
 methods of ndarray don't actually modify the data, they just 
 set the
 memory order** whereas the transpose function actually moves 
 memory around.

 **using a runtime flag, which is ok for them because internal 
 iteration
 lets you only branch once on it.

 I've got it down to ~ 12ms using dmd now. But if the image was 
 much bigger (lets say a height of ushort.max). I wouldn't be 
 able to use a little trick. But this is only because I'm using 
 multithreading.

That would be an insanely large image. If it was square it would 
be a 4GiB image. I think it's safe to say that someone with 
images that large will be looking for quite specialised solutions 
and wouldn't be disappointed if things aren't optimally fast 
off-the-shelf!

Apr 02 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 3/04/2015 4:27 a.m., John Colvin wrote:
 On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
 On 3/04/2015 12:29 a.m., John Colvin wrote:
 On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
 On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
 On 2/04/2015 2:52 a.m., tchaloupka wrote:
 Hi,
 I have a bunch of square r16 and png images which I need to flip
 horizontally.

 My flip method looks like this:
 void hFlip(T)(T[] data, int w)
 {
   import std.datetime : StopWatch;

   StopWatch sw;
   sw.start();

   foreach(int i; 0..w)
   {
     auto row = data[i*w..(i+1)*w];
     row.reverse();
   }

   sw.stop();
   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
 }

 With simple r16 file format its pretty fast, but with RGB PNG
 files (2048x2048) I noticed its somewhat slow so I tried to



 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

 I used dmd-2.0.67 with -release -inline -O

 debugging and even with that it is much faster.

 I know that System.Drawing is using Windows GDI+, that can be
 used with D too, but not on linux.
 If we ignore the PNG loading and saving (didn't tried libpng
 yet), even flip method itself is 3 times slower - I don't know D
 enough to be sure if there isn't some more effecient way to make
 the flip. I like how the slices can be used here.


 possible from a system level programming language this can be
 somewhat disappointing to see that pure D version is about 3
 times slower.

 Am I doing something utterly wrong?
 Note that this example is not critical for me, it's just a simple
 hobby script I use to move and flip some images - I can wait. But
 I post it to see if this can be taken somewhat closer to what can
 be expected from a system level programming language.

 dlib:
 auto im = loadPNG(name);
 hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
 savePNG(im, newName);

 imageformats:
 auto im = read_image(name);
 hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
 write_image(newName, im.w, im.h, im.pixels);


 static void Main(string[] args)
         {
             var files = Directory.GetFiles(args[0]);

             foreach (var f in files)
             {
                 var sw = Stopwatch.StartNew();
                 var img = Image.FromFile(f);

                 Debug.WriteLine("Img loaded in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Restart();

 img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                 Debug.WriteLine("Img flipped in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Restart();

                 img.Save(Path.Combine(args[0], "test_" +
 Path.GetFileName(f)));
                 Debug.WriteLine("Img saved in {0}[ms]",
 (int)sw.Elapsed.TotalMilliseconds);
                 sw.Stop();
             }
         }


 Assuming I've done it correctly, Devisualization.Image takes around
 8ms
 in debug mode to flip horizontally using dmd. But 3ms for release.

 module test;

 void main() {
    import devisualization.image;
    import devisualization.image.mutable;
    import devisualization.util.core.linegraph;

    import std.stdio;

    writeln("===============\nREAD\n===============");
    Image img = imageFromFile("test/large.png");
    img = new MutableImage(img);

    import std.datetime : StopWatch;

    StopWatch sw;
    sw.start();

    foreach(i; 0 .. 1000) {
        img.flipHorizontal;
    }

    sw.stop();

    writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
 }

 I was planning on doing this earlier. But I discovered a PR I pulled
 which fixed for 2.067 broke chunk types reading.

 My bad, forgot I decreased test image resolution to 256x256. I'm
 totally out of the running. I have some serious work to do by the
 looks.

 Have you considered just being able to grab an object with changed
 iteration order instead of actually doing the flip? The same goes for
 transposes and 90º rotations. Sure, sometimes you do need actually
 rearrange the memory and in a subset of those cases you need it to be
 done fast, but a lot of the time you're better off* just using a
 different iteration scheme (which, for ranges, should probably be part
 of the type to avoid checking the scheme every iteration).

 *for speed and memory reasons. Need to keep the original and the
 transpose? No need to for any duplicates

 Note that this is what numpy does with transposes. The .T and .transpose
 methods of ndarray don't actually modify the data, they just set the
 memory order** whereas the transpose function actually moves memory
 around.

 **using a runtime flag, which is ok for them because internal iteration
 lets you only branch once on it.

 I've got it down to ~ 12ms using dmd now. But if the image was much
 bigger (lets say a height of ushort.max). I wouldn't be able to use a
 little trick. But this is only because I'm using multithreading.

 That would be an insanely large image. If it was square it would be a
 4GiB image. I think it's safe to say that someone with images that large
 will be looking for quite specialised solutions and wouldn't be
 disappointed if things aren't optimally fast off-the-shelf!

Most image editing software could definitely not handle it. I would be 
very surprised if e.g. libpng can even read such a file. Although I'm 
pretty sure mine can ;)

Worse case scenario for more than ushort.max I think it'll be a couple 
hundred ms.

Apr 02 2015

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote:

 PNG load - 90ms
 PNG flip - 10ms
 PNG save - 380ms

 D using dlib (http://code.dlang.org/packages/dlib):
 PNG load - 500ms
 PNG flip - 30ms
 PNG save - 950ms

 D using imageformats
 (http://code.dlang.org/packages/imageformats):
 PNG load - 230ms
 PNG flip - 30ms
 PNG save - 1100ms

My implementation of flip takes 0ms ;)

http://blog.thecybershadow.net/2014/03/21/functional-image-processing-in-d/

Apr 05 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Speed of horizontal flip