www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should std.conv:parse parse html entities?

reply berni44 <dlang d-ecke.de> writes:
Concerning issue 9621 [1]: There are two things, that parse 
doesn't parse currently, namely octal numbers and html entities. 
While there is no argument against the former (I actually wrote a 
PR to add them), there has been some discussion around the later, 
because the whole table of those entities (about 3000) would make 
it in the code, even if not needed at all.

As I don't think, I should try to decide this on my own, I'd like 
to know your oppinion: What is better: Add the entities or write 
in the docs, that they are not supported? What do you think?

[1] https://issues.dlang.org/show_bug.cgi?id=9621
Nov 13
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, November 13, 2019 5:17:17 AM MST berni44 via Digitalmars-d 
wrote:
 Concerning issue 9621 [1]: There are two things, that parse
 doesn't parse currently, namely octal numbers and html entities.
 While there is no argument against the former (I actually wrote a
 PR to add them), there has been some discussion around the later,
 because the whole table of those entities (about 3000) would make
 it in the code, even if not needed at all.

 As I don't think, I should try to decide this on my own, I'd like
 to know your oppinion: What is better: Add the entities or write
 in the docs, that they are not supported? What do you think?

 [1] https://issues.dlang.org/show_bug.cgi?id=9621
I fail to see why std.conv.to or std.conv.parse should handle either octal literals or HTML entities, and I don't know why anyone would expect them to. HTML entities are the kind of thing that I would expect an HTML parser to handle, not the standard library. The compiler does handle some of them (which honestly, I think is kind of weird), which is the only argument I can see for supporting them in std.conv, but it's not like std.conv is designed to be parsing D code. Also, IIRC, octal literals were removed from the language. So, that's not an argument for adding them to std.conv. They also not all that commonly needed by anything AFAIK. parse can already parse integer values of arbitrary bases if you give it an explicit based / radix. - Jonathan M Davis
Nov 13
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, November 13, 2019 7:41:45 AM MST Jonathan M Davis via 
Digitalmars-d wrote:
 On Wednesday, November 13, 2019 5:17:17 AM MST berni44 via Digitalmars-d

 wrote:
 Concerning issue 9621 [1]: There are two things, that parse
 doesn't parse currently, namely octal numbers and html entities.
 While there is no argument against the former (I actually wrote a
 PR to add them), there has been some discussion around the later,
 because the whole table of those entities (about 3000) would make
 it in the code, even if not needed at all.

 As I don't think, I should try to decide this on my own, I'd like
 to know your oppinion: What is better: Add the entities or write
 in the docs, that they are not supported? What do you think?

 [1] https://issues.dlang.org/show_bug.cgi?id=9621
I fail to see why std.conv.to or std.conv.parse should handle either octal literals or HTML entities, and I don't know why anyone would expect them to. HTML entities are the kind of thing that I would expect an HTML parser to handle, not the standard library. The compiler does handle some of them (which honestly, I think is kind of weird), which is the only argument I can see for supporting them in std.conv, but it's not like std.conv is designed to be parsing D code. Also, IIRC, octal literals were removed from the language. So, that's not an argument for adding them to std.conv. They also not all that commonly needed by anything AFAIK. parse can already parse integer values of arbitrary bases if you give it an explicit based / radix.
Actually, it looks like you can still have octal literals in strings even though support for octal integer literals was removed. Either way, given that the compiler is going to translate a string literal with an octal or HTML entity into what it represents rather than have it be something to parse, unless someone is constructing strings that use these rather than using string literals, there won't even be anything to parse. Personally, I don't see much reason to support either. What's the use case? - Jonathan M Davis
Nov 13
prev sibling next sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 13 November 2019 at 12:17:17 UTC, berni44 wrote:
 Concerning issue 9621 [1]: There are two things, that parse 
 doesn't parse currently, namely octal numbers and html 
 entities. While there is no argument against the former (I 
 actually wrote a PR to add them), there has been some 
 discussion around the later, because the whole table of those 
 entities (about 3000) would make it in the code, even if not 
 needed at all.

 As I don't think, I should try to decide this on my own, I'd 
 like to know your oppinion: What is better: Add the entities or 
 write in the docs, that they are not supported? What do you 
 think?

 [1] https://issues.dlang.org/show_bug.cgi?id=9621
Maybe you could put the table inside a template so it only get compiled/included when it's used? template HtmlEntityTable() { const HtmlEntityTable = ...; }
Nov 13
parent reply berni44 <dlang d-ecke.de> writes:
On Wednesday, 13 November 2019 at 18:31:09 UTC, Jonathan Marler 
wrote:
 Maybe you could put the table inside a template so it only get 
 compiled/included when it's used?

 template HtmlEntityTable()
 {
     const HtmlEntityTable = ...;
 }
As far, as I understood the discussion in the bugreport, the problem with that is, that most of the time you'll not know if it will be needed, but most strings parsed (I assume, they are not available on compiletime) do not contain entities (presumably).
Nov 13
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 13 November 2019 at 18:55:42 UTC, berni44 wrote:
 On Wednesday, 13 November 2019 at 18:31:09 UTC, Jonathan Marler 
 wrote:
 Maybe you could put the table inside a template so it only get 
 compiled/included when it's used?

 template HtmlEntityTable()
 {
     const HtmlEntityTable = ...;
 }
As far, as I understood the discussion in the bugreport, the problem with that is, that most of the time you'll not know if it will be needed, but most strings parsed (I assume, they are not available on compiletime) do not contain entities (presumably).
True, if its reachable through a high-level generic function then it would be used most of the time. Sorry I'm not familiar with which functions would be calling it, but for me, I've never really needed a function that escaped valid D strings so I'm not sure which specific function would be using this.
Nov 13
prev sibling parent reply Suleyman <sahmi.soulaimane gmail.com> writes:
On Wednesday, 13 November 2019 at 12:17:17 UTC, berni44 wrote:
 Concerning issue 9621 [1]: There are two things, that parse 
 doesn't parse currently, namely octal numbers and html 
 entities. While there is no argument against the former (I 
 actually wrote a PR to add them), there has been some 
 discussion around the later, because the whole table of those 
 entities (about 3000) would make it in the code, even if not 
 [...]
What is the concern about the table? Is it binary size, runtime performance, or something else?
Nov 14
parent reply berni44 <dlang d-ecke.de> writes:
On Thursday, 14 November 2019 at 18:14:07 UTC, Suleyman wrote:
 What is the concern about the table? Is it binary size, runtime 
 performance, or something else?
I think binary size. Runtime shouldn't be a problem, because it should be possible to implement a perfect hash table for that (or an other fast lookup method).
Nov 14
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Thursday, 14 November 2019 at 20:23:57 UTC, berni44 wrote:
 On Thursday, 14 November 2019 at 18:14:07 UTC, Suleyman wrote:
 What is the concern about the table? Is it binary size, 
 runtime performance, or something else?
I think binary size. Runtime shouldn't be a problem, because it should be possible to implement a perfect hash table for that (or an other fast lookup method).
A compile-time perfect hash generator sounds like a really nice feature. Someone should get on that.
Nov 14