www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Any library with string encoding/decoding support?

reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
Do you know any library with string encoding/decoding support? I 
need more encodings than provides `std.encoding`.
Jan 20 2014
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Monday, 20 January 2014 at 08:33:09 UTC, ilya-stromberg wrote:
 Do you know any library with string encoding/decoding support? 
 I need more encodings than provides `std.encoding`.
I did one that does a little bit more decoding, but no encoding support at all. (I wrote it for my web scraper and email reader so all i cared about was getting it to utf8) https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff/blob/master/characterencodings.d auto s = convertToUtf8(your_raw_data, "current_encoding"); if you want something full featured, GNU iconv isn't hard to use from D import core.stdc.errno; extern(C) { alias void* iconv_t; iconv_t iconv_open(const char *tocode, const char *fromcode); int iconv_close(iconv_t cd); pragma(lib, "iconv"); size_t iconv(iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft); } auto i = iconv_open("UTF-8", toStringz("CP1252")); if(i == cast(void*) -1) throw new Exception("iconv open failed"); scope(exit) iconv_close(i); /* get input pointer and length ready */ /* Allocate an output buffer with 4x the size of the input buffer */ // keep the output buffer around as a slice and get a pointer to it for the lib auto startingOutputBuffer = new char(content.length * 4]; char* outputBuffer = startingOutputBuffer.ptr; while(inputLength) { auto ret = iconv(i, &input, &inputLength, &outputBuffer, &outputLength); if(ret == -1) { // check errno. errno == 84 means wrong charset } } // number of bytes remaining in the output buffer is the size here // so we do original buffer size minus remaining buffer size outputLength = (content.length * 4) - outputLength; // then slice it to get the result string convertedContent = startingOutputBuffer[0 .. outputLength]; Note that iconv i think is GPL licensed.
Jan 20 2014
prev sibling next sibling parent "MGW" <mgw yandex.ru> writes:
On Monday, 20 January 2014 at 08:33:09 UTC, ilya-stromberg wrote:
 Do you know any library with string encoding/decoding support? 
 I need more encodings than provides `std.encoding`.
Library to work with Qt. https://github.com/MGWL/QtE-Qt_for_Dlang_and_Forth Working with Qt and its QTextCodec class. --------------------------------------- // Compile: // ------------------------------ // Linux: dmd ex1.d qte.d -L-ldl // Windows: dmd ex1.d qte.d // ------------------------------ import qte; // Work with Qt import core.runtime; // Parametrs start import std.stdio; // writeln(); int main(string[] args) { QApplication app; // Application QTextCodec UTF_8; QTextCodec WIN_1251; QTextCodec IBM866; QString tmpQs; QByteArray ba; QLabel label; // Test load. If '--debug' start with warnings message load QtE bool fDebug; fDebug = false; foreach (arg; args[0 .. args.length]) { if (arg=="--debug") fDebug = true; } // Load GUI. fDebug=F disable warnings, T=enable warnings int rez = LoadQt( dll.Core | dll.Gui | dll.QtE, fDebug); if (rez==1) return 1; // Init Qt. Last parametr T=GUI, F=console app app = new QApplication; (app.adrQApplication())(cast(void*)app.bufObj, &Runtime.cArgs.argc, Runtime.cArgs.argv, true); // Init insaid coding. All codec Qt QTextCodec tmpQs = new QString(); UTF_8 = new QTextCodec("UTF-8"); // Linux WIN_1251 = new QTextCodec("Windows-1251"); // Windows IBM866 = new QTextCodec("IBM 866"); // DOS // Create string "Hello from Qt" on Rushen tmpQs.toUnicode(cast(char*)("<h2>Привет из <font color=red size=5>QtE.d</font></h2>".ptr), UTF_8); // QLabel label = new QLabel(null); label.setText(tmpQs); label.setAlignment(QtE.AlignmentFlag.AlignCenter); // Write text and alignment label.resize(300, 130); // Size label // Exammple DOS console ba = new QByteArray(cast(char*)("Привет из QtE.d - обратите внимание на перекодировку в DOS".ptr)); // Это в UTF-8 tmpQs.toUnicode(cast(char*)ba.data(), UTF_8); // in Unicode version(Windows) { // window DOS in Windows tmpQs.fromUnicode(cast(char*)ba.data(), IBM866); } version(linux) { // Linux work UTF-8. tmpQs.fromUnicode(cast(char*)ba.data(), UTF_8); } printf("%s", ba.data()); label.show(); return app.exec(); }
Jan 20 2014
prev sibling parent "FreeSlave" <freeslave93 gmail.com> writes:
iconv as library is under LGPL. iconv as utility is under GPL. 
Note that iconv is not portable even on Linux, since different 
distros may have different implementations.

Qt is not the case because it's unstable with D. It's also 
redundant dependency. And as far as I know Qt uses 
platform-dependent functions like iconv on Linux and 
Windows-specific functions to work with encodings on Windows.
Jan 21 2014