digitalmars.D.learn - Make Simple Things Hard to Figure out

default0 (85/85) Dec 21 2015 Hi

thedeemon (11/15) Dec 21 2015 Thanks for sharing! Obviously Phobos documentation could and

Adam D. Ruppe (6/10) Dec 21 2015 Hmm, yeah, I didn't want to have any section in there that was
default0 (8/23) Dec 21 2015 Well if I post this as a question to SO and link it here, would

Adam D. Ruppe (4/11) Dec 21 2015 yeah I think that's a good idea to do, even if you already know

Adam D. Ruppe (67/92) Dec 21 2015 So when I read this, I thought you might have missed another

default0 (39/119) Dec 21 2015 I am aware of this and I used Base64URL in my code, as does my

Adam D. Ruppe (6/8) Dec 21 2015 Just click this link:

default0 <Kevin.Labschek gmx.de> writes:

Hi

So today I tried setting up vibe.d and see how that all works out.
Doing the initial setup was easy enough (dub is amazingly 
convenient!) and I had a "Hello World" server up and running in 
about 10 minutes. Sweet.
After that, I started looking into vibes URLRouter - also easy 
enough, documented good enough to get the gist of it without 
having to spend a long time doing anything.

After that, it all went downhill for me.
First off, I'm very new to D, I'm also very new to lots of the 
concepts D implements/exposes (low-level stuff, most the 
paradigms, etc). This is mostly a description of what I did to 
attempt solving my problems and how that did or did not work out. 
Maybe this can help guide decisions on what things to clarify and 
where.

The thing I was trying to do was dead simple: Receive a base64 
encoded text via a query parameter.
After digging around the HTTPRequest vibe exposes, I quickly 
found the query-dictionary, so to get my base64 encoded text, all 
I had to do was query["data"]. Easy, convenient.
To decode the base64 there is something in the std-lib. Awesome. 
So all I need to do is Base64URL.decode(query["data"]) and I'm 
done. Or so I thought. Naturally, the decode function returns a 
ubyte[], so I need to somehow decode the ubyte[] to a char[] 
(since I'm using a frontend-library that encodes base64 text as 
utf8).
My first instinct was to use google.
The first thing that came up was unsurprisingly std.utf.
After skimming through the functions I couldn't really make any 
function out that would accept a simple ubyte[] (or range of 
ubyte) and output a simple string or char[] (or range of chars). 
Disappointing. Maybe I missed something?
There is a decode function, but I couldn't quite figure out what 
it did or how I was supposed to use it, if it did what I wanted 
it to - no examples.

After that I moved on to std.string. It only had one function 
that seemed somewhat interesting - assumeUTF. After reading 
through the docs, it failed my criteria since it had no 
validation - as its name states, it simply assumes that whatever 
you give it is correctly encoded. I didn't expect much here 
anyways, it would have been an odd place to put this 
functionality.

On to the third package that seemed related to my problem: 
std.encoding.
The function that seemed most obvious to do what I wanted to do 
was called "decode".
Well... it decodes a single code point. Really inconvenient. It 
then goes on to state that it supersedes std.utf.decode, but I 
don't remember reading any notice in std.utf.decode that it 
actually was superseded and I shouldn't even really bother trying 
to learn about it, weird but okay. It also helpfully notes that 
codePoints() is more convenient than it. So let's look at that.
Alright, the example shows that codePoints() wants a string or a 
range of chars. I only have a range of bytes, and I would like to 
validate it, not type-system-breaking-cast it. Doesn't seem like 
this function is helpful, but maybe I'm missing something.

Scrolling a bit further, there is an EncodingScheme class. It has 
a neat function, isValid. So after reading a bit on it, what I 
ended up with was: 
EncodingScheme.create("UTF-8").isValid(decodedBase64) followed by 
a type-system-ignoring cast from ubyte[] to char[] (since I now 
know it is valid so this cast is fine). All in all, including the 
explicit error handling required by isValid this has taken about 
an hour of research and 7 lines of code.

the following would've done the trick, throwing an exception on 
failure:
Encoding.UTF8.GetString(Convert.FromBase64String("base64"))

Looking back the things that really slowed me down here were:
-The lack of an answer on StackOverflow to this very specific 
problem (otherwise this would have been the job of 5 minutes, if 
even)
-The lack of examples for specific functions in the documentation
-The relative difficulty navigating the documentation (often 
times 10 or more functions on a single page) as well as very 
densely written documentation (I often find myself reading 
sentences twice or more just to extract all information from 
them, since single sentences often contain multiple important 
facts)

As this isn't really a question for Learn I'm not sure if it fits 
here. This is more of a "This is how I went about trying to learn 
X. These are the problems I encountered. Ideas to improve?" but I 
guess I might as well post it here.

So with that in mind, any ideas to improve the situation (that do 
not require 500 man-decades of work)?

Dec 21 2015

thedeemon <dlang thedeemon.com> writes:

On Monday, 21 December 2015 at 13:51:57 UTC, default0 wrote:
 As this isn't really a question for Learn I'm not sure if it 
 fits here. This is more of a "This is how I went about trying 
 to learn X. These are the problems I encountered. Ideas to 
 improve?" but I guess I might as well post it here.

Thanks for sharing! Obviously Phobos documentation could and 
should be improved, people are working on it but unlike some 
other languages there's nobody in D community working full time 
on documentation and articles. And D user base is still quite 
small, so Stack Overflow is not quite overflown with D-related 
answers.

Out of curiosity I looked into "D Cookbook" to check if it 
contains your particular case but the only mention of Base64 
there is about encoding some data into Base64, not the other way 
around.

Dec 21 2015

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 21 December 2015 at 15:49:14 UTC, thedeemon wrote:
 Out of curiosity I looked into "D Cookbook" to check if it 
 contains your particular case but the only mention of Base64 
 there is about encoding some data into Base64, not the other 
 way around.

Hmm, yeah, I didn't want to have any section in there that was 
just "call this one function" but indeed the worry about utf 
encoding can magnify the problem. I did write about this a bit in 
chapter one on strings, but probably still not quite what you'd 
want...

Dec 21 2015

default0 <Kevin.Labschek gmx.de> writes:

On Monday, 21 December 2015 at 15:49:14 UTC, thedeemon wrote:
 On Monday, 21 December 2015 at 13:51:57 UTC, default0 wrote:
 As this isn't really a question for Learn I'm not sure if it 
 fits here. This is more of a "This is how I went about trying 
 to learn X. These are the problems I encountered. Ideas to 
 improve?" but I guess I might as well post it here.

 Thanks for sharing! Obviously Phobos documentation could and 
 should be improved, people are working on it but unlike some 
 other languages there's nobody in D community working full time 
 on documentation and articles. And D user base is still quite 
 small, so Stack Overflow is not quite overflown with D-related 
 answers.

 Out of curiosity I looked into "D Cookbook" to check if it 
 contains your particular case but the only mention of Base64 
 there is about encoding some data into Base64, not the other 
 way around.

Well if I post this as a question to SO and link it here, would 
you mind answering it? Maybe we should make this a general 
scheme: If someone has trouble learning something, just ask the 
question directly on SO and have someone answer it there. SO is 
way easier to google than this learn forum or the documentation 
on dlang. And every single question that gets answered may be 
helpful to others trying to do the same :)

Dec 21 2015

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 21 December 2015 at 15:55:13 UTC, default0 wrote:
 Well if I post this as a question to SO and link it here, would 
 you mind answering it? Maybe we should make this a general 
 scheme: If someone has trouble learning something, just ask the 
 question directly on SO and have someone answer it there. SO is 
 way easier to google than this learn forum or the documentation 
 on dlang. And every single question that gets answered may be 
 helpful to others trying to do the same :)

yeah I think that's a good idea to do, even if you already know 
the answer you can post it there and answer at the same time to 
provide an archive for future searching.

Dec 21 2015

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 21 December 2015 at 13:51:57 UTC, default0 wrote:
 The thing I was trying to do was dead simple: Receive a base64 
 encoded text via a query parameter.

So when I read this, I thought you might have missed another 
little fact... there's more than one base64.

Yup, normal Base64 encoding uses + and / as characters, which are 
special in URLs, so often (but not always!), base64 url encoding 
uses - and _ instead.

This isn't D specific, it is just part of the confusing mess that 
is the real world of computer data.

Normal base64 does work in urls, as long as it is properly url 
encoded. (Got enough encoding yet?!)

Anywho if you are consuming this from some other source, make 
sure you are using the same kind as base64 as they are.

import std.base64;

// for normal base64
ubyte[] bytes = Base64.decode(your_string);

// for the url-optimized variant of base64
ubyte[] bytes = Base64URL.decode(your_string);

 My first instinct was to use google.

Tip I tell people at work too: yes, look for it yourself, but if 
you don't see an answer with a few minutes, go ahead and ask us, 
drop a quick question in the chatroom. D has one on IRC freenode 
called #d.

We won't necessarily even see your question and might not know, 
so keep trying to figure it out yourself, but you might be able 
to save a lot of time by just picking our brains.

 There is a decode function, but I couldn't quite figure out 
 what it did or how I was supposed to use it, if it did what I 
 wanted it to - no examples.

std.utf.decode will take a few chars and decode them into a 
single wchar or dchar.

Take the character “ for example, the double curly quote that 
Microsoft Word likes to put in when you type " on your keyboard.

“ has several different encodings as bytes.

http://www.fileformat.info/info/unicode/char/201c/index.htm

UTF-8 (hex) 	0xE2 0x80 0x9C (e2809c)
UTF-16 (hex) 	0x201C (201c)
UTF-32 (hex) 	0x0000201C (201c)


UTF-8 is char in D. That curly quote takes up three chars:

char[] curlyQuote = [0xE2, 0x80, 0x9C];
size_t idx = 0;
dchar curlyQuoteAsDchar = decode(curlyQuote[], idx);

assert(curlyQuoteAsDchar == '\u201c');



The std.utf module mostly works on this level, chars to dchars 
and back.

There's one big exception though... the validate function.

http://dlang.org/phobos/std_utf.html#validate

That works on a whole string and validates the whole sequence of 
chars as being valid utf8, throwing an exception if it isn't. 
(Weird behavior btw, I think I would have preferred `isValid` 
returning bool, or `validate` taking bytes and returning chars - 
which would be exactly what you wanted - but it returns void and 
throws instead :( )


This stuff btw is pretty confusing, there's an awful lot to know 
about text encoding, so don't feel bad if it makes very little 
sense to you. I spent like four pages in my book introducing 
unicode as part of the discussion on D strings... and still, that 
left out a lot of things too...

 After that I moved on to std.string. It only had one function 
 that seemed somewhat interesting - assumeUTF. After reading 
 through the docs, it failed my criteria since it had no 
 validation - as its name states, it simply assumes that 
 whatever you give it is correctly encoded. I didn't expect much 
 here anyways, it would have been an odd place to put this 
 functionality.

Ooooh you're close though.

If you did

---
import std.base64, std.string, std.utf;

auto utf = assumeUTF(Base64.decode(it));
validate(utf);
---

you'd probably get what you wanted...


 Really inconvenient. It then goes on to state that it 
 supersedes std.utf.decode, but I don't remember reading any 
 notice in std.utf.decode that it actually was superseded and I 
 shouldn't even really bother trying to learn about it, weird 
 but okay.

blargh I had to look at the source to understand what these 
actually did

 EncodingScheme.create("UTF-8").isValid(decodedBase64) followed 
 by a type-system-ignoring cast from ubyte[] to char[] (since I 
 now know it is valid so this cast is fine). All in all, 
 including the explicit error handling required by isValid this 
 has taken about an hour of research and 7 lines of code.

yeah that works too

 So with that in mind, any ideas to improve the situation (that 
 do not require 500 man-decades of work)?

We need a lot more examples, and not just of individual 
functions. Examples on how to bring the functions together to do 
real world tasks.

Dec 21 2015

default0 <Kevin.Labschek gmx.de> writes:

On Monday, 21 December 2015 at 16:20:18 UTC, Adam D. Ruppe wrote:
 On Monday, 21 December 2015 at 13:51:57 UTC, default0 wrote:
 The thing I was trying to do was dead simple: Receive a base64 
 encoded text via a query parameter.

 So when I read this, I thought you might have missed another 
 little fact... there's more than one base64.

I am aware of this and I used Base64URL in my code, as does my 
frontend :-) Glad you pointed it out though, I really did write 
my post as if I missed that fact.

 Yup, normal Base64 encoding uses + and / as characters, which 
 are special in URLs, so often (but not always!), base64 url 
 encoding uses - and _ instead.

 This isn't D specific, it is just part of the confusing mess 
 that is the real world of computer data.

 Normal base64 does work in urls, as long as it is properly url 
 encoded. (Got enough encoding yet?!)

Oh you can keep going, I'm not that easily scared :D
 My first instinct was to use google.

 Tip I tell people at work too: yes, look for it yourself, but 
 if you don't see an answer with a few minutes, go ahead and ask 
 us, drop a quick question in the chatroom. D has one on IRC 
 freenode called #d.

I don't have an IRC client set up since I rarely use that, plus 
an IRC is always kind of "out of the way". It's good to know, but 
if you're a beginner trying to learn about basics of a language, 
standalone tutorials and/or easy-to-understand documentation with 
examples are miles better :-)

 There is a decode function, but I couldn't quite figure out 
 what it did or how I was supposed to use it, if it did what I 
 wanted it to - no examples.

 std.utf.decode will take a few chars and decode them into a 
 single wchar or dchar.

 Take the character “ for example, the double curly quote that 
 Microsoft Word likes to put in when you type " on your keyboard.

 “ has several different encodings as bytes.

 http://www.fileformat.info/info/unicode/char/201c/index.htm

 UTF-8 (hex) 	0xE2 0x80 0x9C (e2809c)
 UTF-16 (hex) 	0x201C (201c)
 UTF-32 (hex) 	0x0000201C (201c)


 UTF-8 is char in D. That curly quote takes up three chars:

 char[] curlyQuote = [0xE2, 0x80, 0x9C];
 size_t idx = 0;
 dchar curlyQuoteAsDchar = decode(curlyQuote[], idx);

 assert(curlyQuoteAsDchar == '\u201c');

Nice explanation, thanks. I wish the documentation could have 
taught me that information as clearly as you did :-)


 There's one big exception though... the validate function.

 http://dlang.org/phobos/std_utf.html#validate

 That works on a whole string and validates the whole sequence 
 of chars as being valid utf8, throwing an exception if it 
 isn't. (Weird behavior btw, I think I would have preferred 
 `isValid` returning bool, or `validate` taking bytes and 
 returning chars - which would be exactly what you wanted - but 
 it returns void and throws instead :( )

Well, a ubyte[] isn't exactly an array of code-points, so just 
calling validate and casting is confusing (even though logical if 
you think about it for a second).
Having an API like bool tryDecode(ubyte[], char[] outBuf) except 
more rangified and an analogous char[] decode(ubyte[]) (also 
rangified) would be much easier to
understand (and I would argue use, too). The task I'm trying to 
do is explicitly not "casting this byte array to code points" but 
"decode this byte array into code points". That an implementation 
of this functionality may simply cast the original
array is an implementation detail, so going for 
cast(string)ubytes in the first place is kind of 
counter-intuitive (since I did have some D exposure for a while I 
managed to figure that one out without too much of a hassle 
though).

 This stuff btw is pretty confusing, there's an awful lot to 
 know about text encoding, so don't feel bad if it makes very 
 little sense to you. I spent like four pages in my book 
 introducing unicode as part of the discussion on D strings... 
 and still, that left out a lot of things too...

Text encoding in general makes sense to me - I don't usually have 
trouble dealing with it. It was just hard to navigate the 
information available on how to write the code to do the 
necessary things in D :-)

 After that I moved on to std.string. It only had one function 
 that seemed somewhat interesting - assumeUTF. After reading 
 through the docs, it failed my criteria since it had no 
 validation - as its name states, it simply assumes that 
 whatever you give it is correctly encoded. I didn't expect 
 much here anyways, it would have been an odd place to put this 
 functionality.

 Ooooh you're close though.

 If you did

 ---
 import std.base64, std.string, std.utf;

 auto utf = assumeUTF(Base64.decode(it));
 validate(utf);
 ---

 you'd probably get what you wanted...

That plus some text explaining the details should be the answer 
to the SO question. 
http://stackoverflow.com/questions/34401744/convert-ubyte-to-string-in-d is
where I asked. Would be awesome if you could respond there!

 Really inconvenient. It then goes on to state that it 
 supersedes std.utf.decode, but I don't remember reading any 
 notice in std.utf.decode that it actually was superseded and I 
 shouldn't even really bother trying to learn about it, weird 
 but okay.

 blargh I had to look at the source to understand what these 
 actually did

That sounds painful  _ 

 EncodingScheme.create("UTF-8").isValid(decodedBase64) followed 
 by a type-system-ignoring cast from ubyte[] to char[] (since I 
 now know it is valid so this cast is fine). All in all, 
 including the explicit error handling required by isValid this 
 has taken about an hour of research and 7 lines of code.

 yeah that works too

 So with that in mind, any ideas to improve the situation (that 
 do not require 500 man-decades of work)?

 We need a lot more examples, and not just of individual 
 functions. Examples on how to bring the functions together to 
 do real world tasks.

Yup, lots of things in D require composition of different parts 
of std. This is not easy to learn or understand unless you are 
quite familiar with std - or have a heap of examples for lots of 
different tasks somewhere.

Dec 21 2015

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 21 December 2015 at 18:02:55 UTC, default0 wrote:
 I don't have an IRC client set up since I rarely use that, plus 
 an IRC is always kind of "out of the way".

Just click this link:

http://webchat.freenode.net/?channels=d

type in a random name, click the captcha checkbox and go!


I'll come back to the rest later, just want to highlight the 
existence of that webchat link for everyone.

Dec 21 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Make Simple Things Hard to Figure out