This is the module I'm speaking about:
https://arsd-official.dpldocs.info/arsd.dom.html
So I have this HTML that not even parseGarbae() can del with:
<a href = "https://hostname.com/?file=foo.png&foo=baa">G!</a>
There is this spaces between "href" and "=" and "https..." which
makes below code fails:
string html = get(page, client).text;
auto document = new Document();
document.parseGarbage(html);
Element attEle = document.querySelector("span[id=link2]");
Element aEle = attEle.querySelector("a");
string link = aEle.href; // <-- if the href contains space, it
return "href" rather the link
let's say the page HTML look like this:
<font color="yellow">
<h2>
Hello, dear world!
<span id="link2">
<a href = "https://hostname.com/?file=foo.png&foo=baa">G!</a>
</span>
</h2>
</font>
I know the library author post on this forum often, I hope he see
this help somehow
to make it work. But if anyone else know how to fix this, will be
very welcome too!
On Sunday, 24 June 2018 at 03:46:09 UTC, Dr.No wrote:
string html = get(page, client).text;
auto document = new Document();
document.parseGarbage(html);
Element attEle = document.querySelector("span[id=link2]");
Element aEle = attEle.querySelector("a");
string link = aEle.href; // <-- if the href contains space, it
return "href" rather the link
[...]
<font color="yellow">
<h2>
Hello, dear world!
<span id="link2">
<a href = "https://hostname.com/?file=foo.png&foo=baa">G!</a>
</span>
</h2>
</font>
missing </body>
Seems to be buggy, the parsed document part refering to "a" looks
like this:
<a "https:=""https:" href="href" />G!
On Sunday, 24 June 2018 at 10:49:51 UTC, Timoses wrote:
<a href = "https://hostname.com/?file=foo.png&foo=baa">G!</a>
</span>
</h2>
</font>
missing </body>
Seems to be buggy, the parsed document part refering to "a"
looks like this:
<a "https:=""https:" href="href" />G!
It reads href as a no content attribute (like `checked` which
becomes `checked="checked"` in xhtml style), then ignored the =
as malplaced trash, then did the same with the https.
so the fix is to collapse whitespace around the =.....
On Sunday, 24 June 2018 at 03:46:09 UTC, Dr.No wrote:
I know the library author post on this forum often, I hope he
see this help somehow
Yeah, I'm out this week but it shouldn't be too hard to add, the
garbage attribute parser can special-case = surrounded by spaces
to just skip the spaces.
I won't get to it today, but I might be able to tomorrow. Shoot
me a reminder email if I don't by tomorrow night. The parser code
is unbelievably bad, but the code to change is somewhere around
line 450 if you wanna take a stab at it yourself.
On Sunday, 24 June 2018 at 03:46:09 UTC, Dr.No wrote:
to make it work. But if anyone else know how to fix this, will
be very welcome too!
try it now.
thanks to Sandman83 on github.