digitalmars.D.learn - Is this code "D-ish" enough?

lafoldes (33/33) Aug 07 2013 Hi, this is one of my attempts to code in D:

Justin Whear (4/17) Aug 07 2013 Is 0 present as an id in the file? If not, you could treat the file as ...

lafoldes (3/27) Aug 07 2013 Unfortunately id is not a string, it is an integer in binary

Justin Whear (60/60) Aug 07 2013 The key problem here is that you have an unstructured range of bytes and...
=?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/13) Aug 07 2013 Without giving much thought to your specific problem, I just want to
Tobias Pankrath (12/13) Aug 08 2013 You could use sequence!("n") to get rid of repeat(0, count) and
H. S. Teoh (79/120) Aug 08 2013 This is a bad idea. You should be constructing a range that spans until

"lafoldes" <lafoldes geemail.kom> writes:

Hi, this is one of my attempts to code in D:

Given a binary file, which contains pairs of a 4 byte integer and 
a zero-terminated string. The number of pairs is known. The task 
is to read the pairs, sort them by the integer ("id") part, and 
write out the result with line numbers to the console.

I tried to do this using the component paradigm, without extra 
classes, helper functions, etc. I've ended up with this code:

{
   uint count = 1000;

   auto f = File("binary.dat", "rb");

   uint id[1];
   int lineNo = 0;

   repeat(0, count).
   map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
   array.
   sort!("a[0] < b[0]").
   map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
   copy(stdout.lockingTextWriter);

   stdout.flush();
}

Is this code "D-ish" enough?

There are things, I don'like here:

- the dummy repeat at the beginning of the component chain. The 
only purpose of it is to produce the needed number of items. 
Moreover repeat doesn't produce sortable range, so the array is 
needed as well.
- I don't know how to do this when the number of items is nor 
known. How to repeat until EOF?
- The variables id, and lineNo outside the chain.
- rawRead() needs an array, even if there is only one item to 
read.
- How to avoid the flush() at the end?

What do you think?

Aug 07 2013

Justin Whear <justin economicmodeling.com> writes:

On Wed, 07 Aug 2013 20:25:33 +0200, lafoldes wrote:

 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. The only
 purpose of it is to produce the needed number of items. Moreover repeat
 doesn't produce sortable range, so the array is needed as well.
 - I don't know how to do this when the number of items is nor known. How
 to repeat until EOF?
 - The variables id, and lineNo outside the chain.
 - rawRead() needs an array, even if there is only one item to read.
 - How to avoid the flush() at the end?
 
 What do you think?

Is 0 present as an id in the file?  If not, you could treat the file as a 
range of bytes, use lazy split('\0') on it, then use the peek function 
from std.bitmanip to split each "line" into an integer-string tuple.

Aug 07 2013

"lafoldes" <lafoldes geemail.kom> writes:

On Wednesday, 7 August 2013 at 18:36:24 UTC, Justin Whear wrote:
 On Wed, 07 Aug 2013 20:25:33 +0200, lafoldes wrote:

 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. 
 The only
 purpose of it is to produce the needed number of items. 
 Moreover repeat
 doesn't produce sortable range, so the array is needed as well.
 - I don't know how to do this when the number of items is nor 
 known. How
 to repeat until EOF?
 - The variables id, and lineNo outside the chain.
 - rawRead() needs an array, even if there is only one item to 
 read.
 - How to avoid the flush() at the end?
 
 What do you think?

 Is 0 present as an id in the file?  If not, you could treat the 
 file as a
 range of bytes, use lazy split('\0') on it, then use the peek 
 function
 from std.bitmanip to split each "line" into an integer-string 
 tuple.

Unfortunately id is not a string, it is an integer in binary 
form. So byte 0 is present...

Aug 07 2013

Justin Whear <justin economicmodeling.com> writes:

The key problem here is that you have an unstructured range of bytes and 
you want to provide a function to iteratively structure it.  The 
splitting and chunking functionality in Phobos doesn't fit because the 
records are not of fixed size nor do they contain a usable delimiter.  
Unfortunately, I can't think of anything currently in Phobos that makes 
this easy.

My take on this was to define a range which induces structure over 
another by consuming it with a user function; for lack of a better term I 
called it "restructure".  It works much like map, except instead of 
passing an element to the user function, it passes the range itself and 
the function is expected to consume some amount of it.

Here's the code rewritten (without your formatting map) with the 
restructure function:

import std.stdio,
	std.range,
	std.algorithm,
	std.file,
	std.typecons;
import std.bitmanip : readAs = read;

void main(string[] args)
{
	string readZeroTerminatedString(R)(ref R range)
	{
		return (cast(char[])range.until!('\0').array).idup;
	}

	// Read file as range of bytes
	File("binsplit.bin", "rb")
		.byChunk(1024 * 4)
		.joiner

		// Induce our record structure over that range
		.restructure!(r => tuple(r.readAs!int, 
r.readZeroTerminatedString))

		// Sort the records by the first field (id)
		.sort((a, b) => a[0] < b[0])

		// Write to stdout
		.copy(stdout.lockingTextWriter);
}

And here's a quick and dirty outline of the restructure function itself.  
Unfortunately, it doesn't actually compile due to needing to take the 
source range by ref.  Also, the user function needs to take the range by 
ref so that it can be consumed.

auto restructure(alias Fun, R)(ref R range)
	if (isInputRange!R 
		//TODO test Fun: must be an unary function taking R
	)
{
	struct Result
	{
		alias F = typeof(Fun(range));
		private R range;
		private F _front;

		bool empty()  property { return range.empty; }
		F front()  property { return _front; }
		void popFront()
		{
			_front = Fun(range);
		}
	}

	return Result(range);
}

Aug 07 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 08/07/2013 11:25 AM, lafoldes wrote:

    repeat(0, count).
    map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
    array.
    sort!("a[0] < b[0]").
    map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
    copy(stdout.lockingTextWriter);

Without giving much thought to your specific problem, I just want to 
mention H. S. Teoh's recent article:

   http://wiki.dlang.org/Component_programming_with_ranges

The problem in the article has to deal with data that is not trivially 
structured.

Ali

Aug 07 2013

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 7 August 2013 at 18:25:37 UTC, lafoldes wrote:
 What do you think?

You could use sequence!("n") to get rid of repeat(0, count) and 
lineNO. This will still be strangely backwards. To get a general 
solution you'll need to base the iteration on the number of 
entries in your file and not on the number of lines you think 
that are in that file.

While the solution proposed by Whear does not work because of ref 
issues, you could do something with recurrence and a state that 
is a tuple (state', elem), but I don't think thats the way to go 
here.

Just write your own range that reads the file and yields (int, 
string).

Aug 08 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Aug 07, 2013 at 08:25:33PM +0200, lafoldes wrote:
 Hi, this is one of my attempts to code in D:
 
 Given a binary file, which contains pairs of a 4 byte integer and a
 zero-terminated string. The number of pairs is known. The task is to
 read the pairs, sort them by the integer ("id") part, and write out
 the result with line numbers to the console.
 
 I tried to do this using the component paradigm, without extra
 classes, helper functions, etc. I've ended up with this code:
 
 {
   uint count = 1000;
 
   auto f = File("binary.dat", "rb");
 
   uint id[1];
   int lineNo = 0;
 
   repeat(0, count).
   map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
   array.
   sort!("a[0] < b[0]").
   map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
   copy(stdout.lockingTextWriter);
 
   stdout.flush();
 }
 
 Is this code "D-ish" enough?
 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. The only
 purpose of it is to produce the needed number of items.

This is a bad idea. You should be constructing a range that spans until
EOF, not some arbitrary count.


 Moreover repeat doesn't produce sortable range, so the array is needed
 as well.

You can't sort a one-pass sequence of values. It's only natural to
require storage (in an array, or some other data structure) in order to
be sortable.


 - I don't know how to do this when the number of items is nor known.
 How to repeat until EOF?

You should write a range that consumes exactly the amount of data you
need from the stream. First of all, you should recognize that your input
file has a different structure than just a mere sequence of bytes or
pages. For maximum readability/maintainability, you should make this
explicit by defining a structure to contain this data:

	struct Record {
		uint id;
		char[] str;
	}

Next, you should write a range that takes a File and returns a range of
Record's. Maybe something like this:

	// Warning: untested code
	auto getRecords(File f) {
		static struct Result {
			File f;
			this(File _f) {
				f = _f;
				readNext(); // get things going
			}
			 property bool empty() { return f.eof; }
			Record front;
			void popFront() { readNext(); }
			private void readNext() {
				union U {
					uint id;
					ubyte[uint.sizeof] raw;
				}
				U u;
				f.rawRead(u.raw);
				auto str = f.readln('\0');
				front = Record(u.id, str);
			}
		}
		return Result(f);
	}

Phobos isn't *quite* at the point where you don't have to write custom
code. :)

Once you have this, your code becomes:

	{
		File("binary.dat", "rb")
			.getRecords()
			.array	// this is necessary! you can't sort a one-pass range
			.sort!((a,b) => a.id < b.id)
			.map(t => format("%6d %08x  %s\n", t.id, t.str))
			.copy(stdout.lockingTextWriter);

		stdout.flush();	// this is probably also necessary
	}

If you want line numbers, you can use zip to pair up each record with a
line number:

	// Warning: untested code
	{
		File("binary.dat", "rb")
			.getRecords()
			.array	// this is necessary! you can't sort a one-pass range
			.sort!((a,b) => a.id < b.id)
			.zip(sequence!"n"(0))
			.map(t => format("%6d %08x  %s\n", t[1], t[0].id, t[0].str))
			.copy(stdout.lockingTextWriter);

		stdout.flush();	// this is probably also necessary
	}


 - The variables id, and lineNo outside the chain.

Yeah, those are bad. In my example code above, I got rid of them.


 - rawRead() needs an array, even if there is only one item to read.

This is a std.stdio limitation. But it can be worked around using a
union as I did above.


 - How to avoid the flush() at the end?

[...]

Why would you want to? You do have to flush stdout if you want output to
be written immediately, because it's a buffered output stream.
Forgetting to flush() is OK if your program exits shortly after, since
the runtime exit code will flush any unflushed buffers. But doing it
explicitly is probably better, and necessary if your program isn't going
to exit and you want the output flushed right away.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be
algorithms.

Aug 08 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Is this code "D-ish" enough?