www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Is this code "D-ish" enough?

reply "lafoldes" <lafoldes geemail.kom> writes:
Hi, this is one of my attempts to code in D:

Given a binary file, which contains pairs of a 4 byte integer and 
a zero-terminated string. The number of pairs is known. The task 
is to read the pairs, sort them by the integer ("id") part, and 
write out the result with line numbers to the console.

I tried to do this using the component paradigm, without extra 
classes, helper functions, etc. I've ended up with this code:

{
   uint count = 1000;

   auto f = File("binary.dat", "rb");

   uint id[1];
   int lineNo = 0;

   repeat(0, count).
   map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
   array.
   sort!("a[0] < b[0]").
   map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
   copy(stdout.lockingTextWriter);

   stdout.flush();
}

Is this code "D-ish" enough?

There are things, I don'like here:

- the dummy repeat at the beginning of the component chain. The 
only purpose of it is to produce the needed number of items. 
Moreover repeat doesn't produce sortable range, so the array is 
needed as well.
- I don't know how to do this when the number of items is nor 
known. How to repeat until EOF?
- The variables id, and lineNo outside the chain.
- rawRead() needs an array, even if there is only one item to 
read.
- How to avoid the flush() at the end?

What do you think?
Aug 07 2013
next sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Wed, 07 Aug 2013 20:25:33 +0200, lafoldes wrote:

 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. The only
 purpose of it is to produce the needed number of items. Moreover repeat
 doesn't produce sortable range, so the array is needed as well.
 - I don't know how to do this when the number of items is nor known. How
 to repeat until EOF?
 - The variables id, and lineNo outside the chain.
 - rawRead() needs an array, even if there is only one item to read.
 - How to avoid the flush() at the end?
 
 What do you think?
Is 0 present as an id in the file? If not, you could treat the file as a range of bytes, use lazy split('\0') on it, then use the peek function from std.bitmanip to split each "line" into an integer-string tuple.
Aug 07 2013
parent "lafoldes" <lafoldes geemail.kom> writes:
On Wednesday, 7 August 2013 at 18:36:24 UTC, Justin Whear wrote:
 On Wed, 07 Aug 2013 20:25:33 +0200, lafoldes wrote:

 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. 
 The only
 purpose of it is to produce the needed number of items. 
 Moreover repeat
 doesn't produce sortable range, so the array is needed as well.
 - I don't know how to do this when the number of items is nor 
 known. How
 to repeat until EOF?
 - The variables id, and lineNo outside the chain.
 - rawRead() needs an array, even if there is only one item to 
 read.
 - How to avoid the flush() at the end?
 
 What do you think?
Is 0 present as an id in the file? If not, you could treat the file as a range of bytes, use lazy split('\0') on it, then use the peek function from std.bitmanip to split each "line" into an integer-string tuple.
Unfortunately id is not a string, it is an integer in binary form. So byte 0 is present...
Aug 07 2013
prev sibling next sibling parent Justin Whear <justin economicmodeling.com> writes:
The key problem here is that you have an unstructured range of bytes and 
you want to provide a function to iteratively structure it.  The 
splitting and chunking functionality in Phobos doesn't fit because the 
records are not of fixed size nor do they contain a usable delimiter.  
Unfortunately, I can't think of anything currently in Phobos that makes 
this easy.

My take on this was to define a range which induces structure over 
another by consuming it with a user function; for lack of a better term I 
called it "restructure".  It works much like map, except instead of 
passing an element to the user function, it passes the range itself and 
the function is expected to consume some amount of it.

Here's the code rewritten (without your formatting map) with the 
restructure function:

import std.stdio,
	std.range,
	std.algorithm,
	std.file,
	std.typecons;
import std.bitmanip : readAs = read;

void main(string[] args)
{
	string readZeroTerminatedString(R)(ref R range)
	{
		return (cast(char[])range.until!('\0').array).idup;
	}

	// Read file as range of bytes
	File("binsplit.bin", "rb")
		.byChunk(1024 * 4)
		.joiner

		// Induce our record structure over that range
		.restructure!(r => tuple(r.readAs!int, 
r.readZeroTerminatedString))

		// Sort the records by the first field (id)
		.sort((a, b) => a[0] < b[0])

		// Write to stdout
		.copy(stdout.lockingTextWriter);
}

And here's a quick and dirty outline of the restructure function itself.  
Unfortunately, it doesn't actually compile due to needing to take the 
source range by ref.  Also, the user function needs to take the range by 
ref so that it can be consumed.

auto restructure(alias Fun, R)(ref R range)
	if (isInputRange!R 
		//TODO test Fun: must be an unary function taking R
	)
{
	struct Result
	{
		alias F = typeof(Fun(range));
		private R range;
		private F _front;

		bool empty()  property { return range.empty; }
		F front()  property { return _front; }
		void popFront()
		{
			_front = Fun(range);
		}
	}

	return Result(range);
}
Aug 07 2013
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 08/07/2013 11:25 AM, lafoldes wrote:

    repeat(0, count).
    map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
    array.
    sort!("a[0] < b[0]").
    map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
    copy(stdout.lockingTextWriter);
Without giving much thought to your specific problem, I just want to mention H. S. Teoh's recent article: http://wiki.dlang.org/Component_programming_with_ranges The problem in the article has to deal with data that is not trivially structured. Ali
Aug 07 2013
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 7 August 2013 at 18:25:37 UTC, lafoldes wrote:
 What do you think?
You could use sequence!("n") to get rid of repeat(0, count) and lineNO. This will still be strangely backwards. To get a general solution you'll need to base the iteration on the number of entries in your file and not on the number of lines you think that are in that file. While the solution proposed by Whear does not work because of ref issues, you could do something with recurrence and a state that is a tuple (state', elem), but I don't think thats the way to go here. Just write your own range that reads the file and yields (int, string).
Aug 08 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Aug 07, 2013 at 08:25:33PM +0200, lafoldes wrote:
 Hi, this is one of my attempts to code in D:
 
 Given a binary file, which contains pairs of a 4 byte integer and a
 zero-terminated string. The number of pairs is known. The task is to
 read the pairs, sort them by the integer ("id") part, and write out
 the result with line numbers to the console.
 
 I tried to do this using the component paradigm, without extra
 classes, helper functions, etc. I've ended up with this code:
 
 {
   uint count = 1000;
 
   auto f = File("binary.dat", "rb");
 
   uint id[1];
   int lineNo = 0;
 
   repeat(0, count).
   map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
   array.
   sort!("a[0] < b[0]").
   map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
   copy(stdout.lockingTextWriter);
 
   stdout.flush();
 }
 
 Is this code "D-ish" enough?
 
 There are things, I don'like here:
 
 - the dummy repeat at the beginning of the component chain. The only
 purpose of it is to produce the needed number of items.
This is a bad idea. You should be constructing a range that spans until EOF, not some arbitrary count.
 Moreover repeat doesn't produce sortable range, so the array is needed
 as well.
You can't sort a one-pass sequence of values. It's only natural to require storage (in an array, or some other data structure) in order to be sortable.
 - I don't know how to do this when the number of items is nor known.
 How to repeat until EOF?
You should write a range that consumes exactly the amount of data you need from the stream. First of all, you should recognize that your input file has a different structure than just a mere sequence of bytes or pages. For maximum readability/maintainability, you should make this explicit by defining a structure to contain this data: struct Record { uint id; char[] str; } Next, you should write a range that takes a File and returns a range of Record's. Maybe something like this: // Warning: untested code auto getRecords(File f) { static struct Result { File f; this(File _f) { f = _f; readNext(); // get things going } property bool empty() { return f.eof; } Record front; void popFront() { readNext(); } private void readNext() { union U { uint id; ubyte[uint.sizeof] raw; } U u; f.rawRead(u.raw); auto str = f.readln('\0'); front = Record(u.id, str); } } return Result(f); } Phobos isn't *quite* at the point where you don't have to write custom code. :) Once you have this, your code becomes: { File("binary.dat", "rb") .getRecords() .array // this is necessary! you can't sort a one-pass range .sort!((a,b) => a.id < b.id) .map(t => format("%6d %08x %s\n", t.id, t.str)) .copy(stdout.lockingTextWriter); stdout.flush(); // this is probably also necessary } If you want line numbers, you can use zip to pair up each record with a line number: // Warning: untested code { File("binary.dat", "rb") .getRecords() .array // this is necessary! you can't sort a one-pass range .sort!((a,b) => a.id < b.id) .zip(sequence!"n"(0)) .map(t => format("%6d %08x %s\n", t[1], t[0].id, t[0].str)) .copy(stdout.lockingTextWriter); stdout.flush(); // this is probably also necessary }
 - The variables id, and lineNo outside the chain.
Yeah, those are bad. In my example code above, I got rid of them.
 - rawRead() needs an array, even if there is only one item to read.
This is a std.stdio limitation. But it can be worked around using a union as I did above.
 - How to avoid the flush() at the end?
[...] Why would you want to? You do have to flush stdout if you want output to be written immediately, because it's a buffered output stream. Forgetting to flush() is OK if your program exits shortly after, since the runtime exit code will flush any unflushed buffers. But doing it explicitly is probably better, and necessary if your program isn't going to exit and you want the output flushed right away. T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
Aug 08 2013