digitalmars.D.learn - For those ready to take the challenge
- eles (1/1) Jan 09 2015 https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups...
- Justin Whear (3/4) Jan 09 2015 stroustrups-debunking-of-the-myth-c-is-for-large-complicated-pro
- Adam D. Ruppe (31/32) Jan 09 2015 Well, as the author of my dom.d, I think it counts as a first
- Adam D. Ruppe (3/3) Jan 09 2015 Huh, looking at the answers on the website, they're mostly using
- Justin Whear (4/7) Jan 09 2015 Yes, I noticed that. `` isn't a
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/12) Jan 10 2015 Yeah... Surprising, since languages like python includes a HTML
- Tobias Pankrath (3/6) Jan 10 2015 Since it is a comparison of languages it's okay to match the
- Adam D. Ruppe (19/21) Jan 10 2015 I don't think this is really a great comparison of languages
- Tobias Pankrath (13/45) Jan 10 2015 I think he's wrong, because it spoils the comparison. Every
- Adam D. Ruppe (10/17) Jan 10 2015 Yeah, that would be best. BTW interesting line here:
- Paulo Pinto (22/38) Jan 10 2015 I disagree.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/17) Jan 10 2015 The challenge is completely pointless. Different languages have
- Adam D. Ruppe (61/65) Jan 10 2015 Though, that's still a library thing rather than a language thing.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (13/21) Jan 10 2015 It is a language-library-platform thing, things like how
- Adam D. Ruppe (20/26) Jan 10 2015 Of course. It does it both ways:
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/24) Jan 10 2015 Both these code examples triggers the same assert()
- Adam D. Ruppe (6/7) Jan 10 2015 Don't use git master :P
- bearophile (4/5) Jan 10 2015 Is the issue in Bugzilla?
- Adam D. Ruppe (4/5) Jan 10 2015 I don't know, bugzilla is extremely difficult to search.
- Adam D. Ruppe (2/3) Jan 10 2015 https://issues.dlang.org/show_bug.cgi?id=13966
- Vladimir Panteleev (6/9) Jan 10 2015 Do use git master. The more people do, the fewer regressions will
- Jesse Phillips (3/4) Jan 09 2015 Link to answer in D:
- Andrei Alexandrescu (2/7) Jan 09 2015 Nailed it. -- Andrei
- Vladimir Panteleev (14/18) Jan 09 2015 I think byLine is not necessary. By default . will not match line
- Daniel Kozak via Digitalmars-d-learn (4/28) Jan 10 2015 Oh here is it, I was looking for each. I think it is allready in a
- MattCoder (7/8) Jan 10 2015 From the link: "Let's show Stroustrup what small and readable
On Fri, 09 Jan 2015 13:50:28 +0000, eles wrote:https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-pro Was excited to give it a try, then remembered...std.xml :(
Jan 09 2015
On Friday, 9 January 2015 at 16:55:30 UTC, Justin Whear wrote:Was excited to give it a try, then remembered...std.xml :(Well, as the author of my dom.d, I think it counts as a first party library when I use it! --- import arsd.dom; import std.net.curl; import std.stdio, std.algorithm; void main() { auto document = new Document(cast(string) get("http://www.stroustrup.com/C++.html")); writeln(document.querySelectorAll("a[href]").map!(a=>a.href)); } --- prints: [snip ... "http://www.morganstanley.com/", "http://www.cs.columbia.edu/", "http://www.cse.tamu.edu", "index.html", "C++.html", "bs_faq.html", "bs_faq2.html", "C++11FAQ.html", "papers.html", "4th.html", "Tour.html", "programming.html", "dne.html", "bio.html", "interviews.html", "applications.html", "glossary.html", "compilers.html"] Or perhaps better yet: import arsd.dom; import std.net.curl; import std.stdio; void main() { auto document = new Document(cast(string) get("http://www.stroustrup.com/C++.html")); foreach(a; document.querySelectorAll("a[href]")) writeln(a.href); } Which puts each one on a separate line.
Jan 09 2015
Huh, looking at the answers on the website, they're mostly using regular expressions. Weaksauce. And wrong - they don't find ALL the links, they find the absolute HTTP urls!
Jan 09 2015
On Fri, 09 Jan 2015 17:18:42 +0000, Adam D. Ruppe wrote:Huh, looking at the answers on the website, they're mostly using regular expressions. Weaksauce. And wrong - they don't find ALL the links, they find the absolute HTTP urls!Yes, I noticed that. `<script src="http://app.js"`></script>` isn't a "hyperlink". Wake up sheeple!
Jan 09 2015
On Friday, 9 January 2015 at 17:18:43 UTC, Adam D. Ruppe wrote:Huh, looking at the answers on the website, they're mostly using regular expressions. Weaksauce. And wrong - they don't find ALL the links, they find the absolute HTTP urls!Yeah... Surprising, since languages like python includes a HTML parser in the standard library. Besides, if you want all resource links you have to do a lot better, since the following attributes can contain resource addresses: href, src, data, cite, xlink:href… You also need to do entity expansion since the links can contain html entities like "&". Depressing.
Jan 10 2015
On Friday, 9 January 2015 at 17:18:43 UTC, Adam D. Ruppe wrote:Huh, looking at the answers on the website, they're mostly using regular expressions. Weaksauce. And wrong - they don't find ALL the links, they find the absolute HTTP urls!Since it is a comparison of languages it's okay to match the original behaviour.
Jan 10 2015
On Saturday, 10 January 2015 at 12:34:42 UTC, Tobias Pankrath wrote:Since it is a comparison of languages it's okay to match the original behaviour.I don't think this is really a great comparison of languages either though because it is gluing together a couple library tasks. Only a few bits about the actual language are showing through. In the given regex solutions, C++ has an advantage over C wherein the regex structure can be freed automatically in a destructor and a raw string literal in here, but that's about all from the language itself. The original one is kinda long because he didn't use a http get library, not because the language couldn't do one. There are bits where the language can make those libraries nicer too: dom.d uses operator overloading and opDispatch to support things like .attribute and also .attr.X and .style.foo and element["selector"].addClass("foo") and so on implemented in very very little code - I didn't have to manually list methods for the collection or properties for the attributes - ...but a library *could* do it that way and get similar results for the end user; the given posts wouldn't show that.
Jan 10 2015
On Saturday, 10 January 2015 at 15:13:27 UTC, Adam D. Ruppe wrote:On Saturday, 10 January 2015 at 12:34:42 UTC, Tobias Pankrath wrote:I agree and one of the answers says:Since it is a comparison of languages it's okay to match the original behaviour.I don't think this is really a great comparison of languages either though because it is gluing together a couple library tasks. Only a few bits about the actual language are showing through. In the given regex solutions, C++ has an advantage over C wherein the regex structure can be freed automatically in a destructor and a raw string literal in here, but that's about all from the language itself. The original one is kinda long because he didn't use a http get library, not because the language couldn't do one. There are bits where the language can make those libraries nicer too: dom.d uses operator overloading and opDispatch to support things like .attribute and also .attr.X and .style.foo and element["selector"].addClass("foo") and so on implemented in very very little code - I didn't have to manually list methods for the collection or properties for the attributes - ...but a library *could* do it that way and get similar results for the end user; the given posts wouldn't show that.I think the "no third-party" assumption is a fallacy. And is a specific fallacy that afflicts C++ developers, since it's so hard to make reusable code in C++. When you are developing anything at all, even if it's a small script, you will always make use of whatever pieces of reusable code are available to you.The thing is, in languages like Perl, Python, Ruby (to name a few), reusing someone else's code is not only easy, but it is how most people actually write code most of the time.I think he's wrong, because it spoils the comparison. Every answer should delegate those tasks to a library that Stroustroup used as well, e.g. regex matching, string to number conversion and some kind of TCP sockets. But it must do the same work that he's solution does: Create and parse HTML header and extract the html links, probably using regex, but I wouldn't mind another solution. Everyone can put a libdo_the_stroustroup_thing on dub and then call do_the_stroustroup_thing() in main. To compare what the standard libraries (and libraries easily obtained or quasi standard) offer is another challenge.
Jan 10 2015
On Saturday, 10 January 2015 at 15:52:21 UTC, Tobias Pankrath wrote:But it must do the same work that he's solution does: Create and parse HTML header and extract the html links, probably using regex, but I wouldn't mind another solution.Yeah, that would be best. BTW interesting line here: s << "GET " << "http://" + server + "/" + file << " HTTP/1.0\r\n"; s << "Host: " << server << "\r\n"; Why + instead of <<? C++'s usage of << is totally blargh to me anyway, but seeing both is even stranger. Weird language, weird library.Everyone can put a libdo_the_stroustroup_thing on dub and then call do_the_stroustroup_thing() in main. To compare what the standard libraries (and libraries easily obtained or quasi standard) offer is another challenge.Yeah.
Jan 10 2015
On Saturday, 10 January 2015 at 15:52:21 UTC, Tobias Pankrath wrote:...I disagree. The great thing about comes with batteries runtimes is that I have the guarantee the desired features exist in all platforms supported by the language. If the libraries are dumped into a repository, there is always a problem if the library works across all OS supported by the language or even if they work together at all. Specially if they depend on common packages with incompatible versions. This is the cause of so many string and vector types across all C++ libraries as most of those libraries were developed before C++98 was even done. Or why C runtime isn't nothing more than a light version of UNIX as it was back in 1989, without any worthwhile feature since then, besides some extra support for numeric types and a little more secure libraries. Nowadays, unless I am doing something very OS specific, I hardly care which OS I am using, thanks to such "comes with batteries" runtimes. -- PauloThe thing is, in languages like Perl, Python, Ruby (to name a few), reusing someone else's code is not only easy, but it is how most people actually write code most of the time.I think he's wrong, because it spoils the comparison. Every answer should delegate those tasks to a library that Stroustroup used as well, e.g. regex matching, string to number conversion and some kind of TCP sockets. But it must do the same work that he's solution does: Create and parse HTML header and extract the html links, probably using regex, but I wouldn't mind another solution. Everyone can put a libdo_the_stroustroup_thing on dub and then call do_the_stroustroup_thing() in main. To compare what the standard libraries (and libraries easily obtained or quasi standard) offer is another challenge.
Jan 10 2015
On Saturday, 10 January 2015 at 15:52:21 UTC, Tobias Pankrath wrote:I think he's wrong, because it spoils the comparison. Every answer should delegate those tasks to a library that Stroustroup used as well, e.g. regex matching, string to number conversion and some kind of TCP sockets. But it must do the same work that he's solution does: Create and parse HTML header and extract the html links, probably using regex, but I wouldn't mind another solution.The challenge is completely pointless. Different languages have different ways of hacking together a compact incorrect solution. How to directly translate a C++ hack into another language is a task for people who are drunk. For the challenge to make sense it would entail parsing all legal HTML5 documents, extracting all resource links, converting them into absolute form and printing them one per line. With no hickups.
Jan 10 2015
On Saturday, 10 January 2015 at 17:23:31 UTC, Ola Fosheim Grøstad wrote:For the challenge to make sense it would entail parsing all legal HTML5 documents, extracting all resource links, converting them into absolute form and printing them one per line. With no hickups.Though, that's still a library thing rather than a language thing. dom.d and the Url struct in cgi.d should be able to do all that, in just a few lines even, but that's just because I've done a *lot* of web scraping with the libs before so I made them work for that. In fact.... let me to do it. I'll use my http2.d instead of cgi.d, actually, it has a similar Url struct just more focused on client requests. import arsd.dom; import arsd.http2; import std.stdio; void main() { auto base = Uri("http://www.stroustrup.com/C++.html"); // http2 is a newish module of mine that aims to imitate // a browser in some ways (without depending on curl btw) auto client = new HttpClient(); auto request = client.navigateTo(base); auto document = new Document(); // and http2 provides an asynchonous api but you can // pretend it is sync by just calling waitForCompletion auto response = request.waitForCompletion(); // parseGarbage uses a few tricks to fixup invalid/broken HTML // tag soup and auto-detect character encodings, including when // it lies about being UTF-8 but is actually Windows-1252 document.parseGarbage(response.contentText); // Uri.basedOn returns a new absolute URI based on something else foreach(a; document.querySelectorAll("a[href]")) writeln(Uri(a.href).basedOn(base)); } Snippet of the printouts: [...] http://www.computerhistory.org http://www.softwarepreservation.org/projects/c_plus_plus/ http://www.morganstanley.com/ http://www.cs.columbia.edu/ http://www.cse.tamu.edu http://www.stroustrup.com/index.html http://www.stroustrup.com/C++.html http://www.stroustrup.com/bs_faq.html http://www.stroustrup.com/bs_faq2.html http://www.stroustrup.com/C++11FAQ.html http://www.stroustrup.com/papers.html [...] The latter are relative links that it based on and the first few are absolute. Seems to have worked. There's other kinds of links than just a[href], but fetching them is as simple as adding them to the selector or looping for them too separately: foreach(a; document.querySelectorAll("script[src]")) writeln(Uri(a.src).basedOn(base)); none on that page, no <link>s either, but it is easy enough to do with the lib. Looking at the source of that page, I find some invalid HTML and lies about the character set. How did Document.parseGarbage do? Pretty well, outputting the parsed DOM tree shows it auto-corrected the problems I see by eye.
Jan 10 2015
On Saturday, 10 January 2015 at 17:39:17 UTC, Adam D. Ruppe wrote:Though, that's still a library thing rather than a language thing.It is a language-library-platform thing, things like how composable the eco system is would be interesting to compare. But it would be unfair to require a minimalistic language to not use third party libraries. One should probably require that the library used is generic (not a spider-framework), not using FFI, mature and maintained?document.parseGarbage(response.contentText); // Uri.basedOn returns a new absolute URI based on something else foreach(a; document.querySelectorAll("a[href]")) writeln(Uri(a.href).basedOn(base)); }Nice and clean code; does it expand html entities ("&")? The HTML5 standard has improved on HTML4 by now being explicit on how incorrect documents shall be interpreted in section 8.2. That ought to be sufficient, since that is what web browsers are supposed to do. http://www.w3.org/TR/html5/syntax.html#html-parser
Jan 10 2015
On Saturday, 10 January 2015 at 19:17:22 UTC, Ola Fosheim Grøstad wrote:Nice and clean code; does it expand html entities ("&")?Of course. It does it both ways: <span>a &</span> span.innerText == "a &" span.innerText = "a \" b"; assert(span.innerHTML == "a " b"); parseGarbage also tries to fix broken entities, so like & standing alone it will translate to & for you. there's also parseStrict which just throws an exception in cases like that. That's one thing a lot of XML parsers don't do in the name of speed, but I do since it is pretty rare that I don't want them translated. One thing I did for a speedup though was scan the string for & and if it doesn't find one, return a slice of the original, and if it does, return a new string with the entity translated. Gave a surprisingly big speed boost without costing anything in convenience.The HTML5 standard has improved on HTML4 by now being explicit on how incorrect documents shall be interpreted in section 8.2. That ought to be sufficient, since that is what web browsers are supposed to do. http://www.w3.org/TR/html5/syntax.html#html-parserHuh, I never read that, my thing just did what looked right to me over hundreds of test pages that were broken in various strange and bizarre ways.
Jan 10 2015
On Friday, 9 January 2015 at 17:15:43 UTC, Adam D. Ruppe wrote:import arsd.dom; import std.net.curl; import std.stdio, std.algorithm; void main() { auto document = new Document(cast(string) get("http://www.stroustrup.com/C++.html")); writeln(document.querySelectorAll("a[href]").map!(a=>a.href)); } Or perhaps better yet: import arsd.dom; import std.net.curl; import std.stdio; void main() { auto document = new Document(cast(string) get("http://www.stroustrup.com/C++.html")); foreach(a; document.querySelectorAll("a[href]")) writeln(a.href); } Which puts each one on a separate line.Both these code examples triggers the same assert() dmd: expression.c:3761: size_t StringExp::length(int): Assertion `encSize == 1 || encSize == 2 || encSize == 4' failed. on dmd git master. Ideas anyone?
Jan 10 2015
On Saturday, 10 January 2015 at 13:22:57 UTC, Nordlöw wrote:on dmd git master. Ideas anyone?Don't use git master :P Definitely another regression. That line was just pushed to git like two weeks ago and the failing assertion is pretty obviously a pure dmd code bug, it doesn't know the length of char apparently.
Jan 10 2015
Adam D. Ruppe:Don't use git master :PIs the issue in Bugzilla? Bye, bearophile
Jan 10 2015
On Saturday, 10 January 2015 at 15:24:45 UTC, bearophile wrote:Is the issue in Bugzilla?I don't know, bugzilla is extremely difficult to search. I guess I'll post it again and worst case it will be closed as a duplicate.
Jan 10 2015
On Saturday, 10 January 2015 at 15:24:45 UTC, bearophile wrote:Is the issue in Bugzilla?https://issues.dlang.org/show_bug.cgi?id=13966
Jan 10 2015
On Saturday, 10 January 2015 at 14:56:09 UTC, Adam D. Ruppe wrote:On Saturday, 10 January 2015 at 13:22:57 UTC, Nordlöw wrote:Do use git master. The more people do, the fewer regressions will slip into the final release. You can use Dustmite to reduce the code to a simple example, and Digger to find the exact pull request which introduced the regression. (Yes, shameless plug, preaching to the choir, etc.)on dmd git master. Ideas anyone?Don't use git master :P
Jan 10 2015
On Friday, 9 January 2015 at 13:50:29 UTC, eles wrote:https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-proLink to answer in D: http://codegolf.stackexchange.com/a/44417/13362
Jan 09 2015
On 1/9/15 6:10 PM, Jesse Phillips wrote:On Friday, 9 January 2015 at 13:50:29 UTC, eles wrote:Nailed it. -- Andreihttps://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-proLink to answer in D: http://codegolf.stackexchange.com/a/44417/13362
Jan 09 2015
On Saturday, 10 January 2015 at 02:10:04 UTC, Jesse Phillips wrote:On Friday, 9 January 2015 at 13:50:29 UTC, eles wrote:I think byLine is not necessary. By default . will not match line breaks. One statement solution: import std.net.curl, std.stdio; import std.algorithm, std.regex; void main() { get("http://www.stroustrup.com/C++.html") .matchAll(`<a.*?href="(.*)"`) .map!(m => m[1]) .each!writeln(); }https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-proLink to answer in D: http://codegolf.stackexchange.com/a/44417/13362
Jan 09 2015
Vladimir Panteleev via Digitalmars-d-learn píše v So 10. 01. 2015 v 07:42 +0000:On Saturday, 10 January 2015 at 02:10:04 UTC, Jesse Phillips wrote:Oh here is it, I was looking for each. I think it is allready in a phobos but I can not find. Now I know why :DOn Friday, 9 January 2015 at 13:50:29 UTC, eles wrote:I think byLine is not necessary. By default . will not match line breaks. One statement solution: import std.net.curl, std.stdio; import std.algorithm, std.regex; void main() { get("http://www.stroustrup.com/C++.html") .matchAll(`<a.*?href="(.*)"`) .map!(m => m[1]) .each!writeln(); }https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-proLink to answer in D: http://codegolf.stackexchange.com/a/44417/13362
Jan 10 2015
On Friday, 9 January 2015 at 13:50:29 UTC, eles wrote:https://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-proFrom the link: "Let's show Stroustrup what small and readable program actually is." Alright, there are a lot a examples in many languagens, but those examples doesn't should handle exceptions like the original code does? Matheus.
Jan 10 2015