www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Record separator is being lost after string cast

reply "Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:
I am opening a .gz file and reading it chunk by chunk for 
uncompressing it.

The data in the uncompressed file is like : aRSbRScRSd, There are 
record separators(ASCII code 30) between each record(records in 
my dummy example a,b,c).

     File file = File(mylog.gz, "r");
     auto uc = new UnCompress();
     foreach (ubyte[] curChunk; file.byChunk(4096*1024))
     {
         auto uncompressed = cast(string)uc.uncompress(curChunk);
         writeln(uncompressed);
         auto stringRange = uncompressed.splitLines();
         foreach (string line; stringRange)
         {
             ***************** Do something with line

The result of the code above is: abcd unfortunately record 
separators(ASCII 30) are missing.

I realized by examining the data record separators are missing 
after I cast ubyte[] to string.

Now I have two questions :

Urgent one (my boss already a little disturbed I started the task 
with D I need to solve this): What should I change in the code to 
keep record separator?

Second one : How can I write the code above without for loops? I 
want to read gz file line by line.

A more general and understandable code for first question :

     ubyte[] temp = [ 65, 30, 66, 30, 67];
     writeln(temp);
     string tempStr = cast(string) temp;
     writeln (tempStr);

Result is : ABC which is not desired.

Thanks
Kadir Erdem
Feb 04 2015
next sibling parent "Kagamin" <spam here.lot> writes:
Looks like RS is an unprintable character, that's why you don't 
see it in console.
Feb 04 2015
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
You can use C functions in D too:

import core.stdc.stdio;
ubyte[] temp = [ 65, 30, 66, 30, 67, 0];
puts(cast(char*)temp.ptr);
Feb 04 2015
prev sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Wed, 04 Feb 2015 08:13:28 +0000, Kadir Erdem Demir wrote:

 A more general and understandable code for first question :
=20
      ubyte[] temp =3D [ 65, 30, 66, 30, 67]; writeln(temp);
      string tempStr =3D cast(string) temp;
      writeln (tempStr);
=20
 Result is : ABC which is not desired.
nothing is lost in the program. what you see is a quirk in tty output:=20 '\x1f' is unprintable character, so you simply cannot see it. redirect=20 the output to file and open that file in any hex editor -- and you will=20 find your separators intact. don't beleive what you see! ;-)=
Feb 04 2015
parent reply "Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:
 don't beleive what you see! ;-)
I am sorry make a busy community more busy with false alarms. When I write to file I saw Record Separator really exists. I hope my second question is a valid one. How can I write the code below better? How can I reduce the number of foreach? statements. File file = File(mylog.gz, "r"); auto uc = new UnCompress(); foreach (ubyte[] curChunk; file.byChunk(4096*1024)) { auto uncompressed = cast(string)uc.uncompress(curChunk); writeln(uncompressed); auto stringRange = uncompressed.splitLines(); foreach (string line; stringRange) { Thanks a lot for replies Kadir Erdem
Feb 04 2015
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Wed, 04 Feb 2015 09:28:27 +0000, Kadir Erdem Demir wrote:

 I am sorry make a busy community more busy with false alarms.
don't mind it. ;-) "D.learn" is for *any* questions about language, no=20 matter how strange they may seem.
 How can I write the code below better? How can I reduce the number of
 foreach? statements.
actually, your loop seems to be not good anyway, as it may easily read=20 only part of a line. sadly, there is no streaming interface to gz files,=20 so your best bet is to read the whole file in memory, then unpack it all=20 at once, and then process it. just be sure that you have enough RAM.=20 something like this: import std.stdio; import std.string; import std.zlib; void main () { char[] unpacked; // read the whole file and unpack it { auto fl =3D File("test.txt.gz", "rb"); auto packed =3D new ubyte[](cast(usize)fl.size); fl.rawRead(packed); auto up =3D new UnCompress(); unpacked ~=3D cast(char[])up.uncompress(packed); unpacked ~=3D cast(char[])up.flush(); } foreach (auto s; unpacked.splitLines) { writeln(s); } } =
Feb 04 2015
parent "Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:
Thanks a lot,

I will follow your advise and implement this part same as your 
example.

Regards
Kadir Erdem
Feb 04 2015