digitalmars.D - Notepad++
- Stewart Gordon (18/18) Aug 12 2009 What's the best anybody's managed to get Notepad++ to syntax-highlight
- Sergey Gromov (11/24) Aug 12 2009 Scintilla uses plugins to highlight source. These plugins are written
- Stewart Gordon (17/29) Aug 12 2009 "1. If you have SciTE 1.76 for Windows installed simply replace
- Sergey Gromov (29/61) Aug 12 2009 There are two problems at least:
- Andrei Alexandrescu (5/9) Aug 12 2009 If they use binary interfacing with virtual functions a la COM's binary
- Sergey Gromov (7/16) Aug 12 2009 They don't, unfortunately. Every lexer defines a static instance of a
- Kagamin (2/4) Aug 13 2009 Uh... that's not an option.
- Stewart Gordon (29/55) Aug 13 2009 I can't see how it can be at all complicated to find the beginning and
- Sergey Gromov (28/66) Aug 14 2009 Well, you can write a regexp to handle a simple C string. That is, if
- Stewart Gordon (51/82) Aug 14 2009 So there is a problem if the highlighter works by matching regexps on a
- Sergey Gromov (25/68) Aug 15 2009 Highlighting the whole file every time a charater is typed is slow.
- bearophile (4/7) Aug 15 2009 Today the difference isn't much important because CPUs are fast. But on ...
- Stewart Gordon (78/105) Aug 17 2009 Of course. I suppose now that the right strategy is line-by-line with
- Sergey Gromov (12/74) Aug 17 2009 Exactly. There is a 32-bit "style" known for every character, plus
- Stewart Gordon (16/50) Aug 18 2009 Does it keep around in memory the style of every character, or only the
- Sergey Gromov (11/38) Aug 20 2009 It can tell about any character of which style it is. This is to
- Stewart Gordon (16/28) Aug 21 2009 Doesn't quite relate to what I was querying ... but anyway, it's
- Don (3/65) Aug 17 2009 Remember that the whole point of q{} strings was that they should NOT be...
- Sergey Gromov (4/11) Aug 17 2009 You confuse q{} and q"{}" here. The former is a token string which may
-
Stewart Gordon
(14/18)
Aug 14 2009
- Kagamin (2/7) Aug 13 2009 Wrong lexer is used here. Scintilla builtin d lexer supported nested com...
- Jussi Jumppanen (12/14) Aug 12 2009 FWIW Zeus is very similar to TextPad in feature set and the latest
- Kagamin (3/5) Aug 13 2009 I don't see how the lexer is being chosen.
- Kagamin (3/14) Aug 14 2009 At least PN chooses lexer. That's what I meant.
What's the best anybody's managed to get Notepad++ to syntax-highlight D? (I'm on version 5.4.5, if that makes a difference.) My userDefineLang.xml file is as given here http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus (note that I've fixed a few errors I've no idea how got there). Notepad++ does a good job of syntax-highlighting PHP files, whose syntactic structure is more complex than that of D. So clearly, Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). However, at the moment I can't even seem to get it up to C standard! (Can anybody find a full reference of the userDefineLang.xml format, for that matter?) Maybe it's just a case in point of some comments here: http://d.puremagic.com/issues/show_bug.cgi?id=3193 Anyway, attached is the result. Can anybody do better (other than by telling it to treat D as C or some other language instead)? Or maybe I should just go back to TextPad (which isn't perfect either) and put up with its not supporting Unicode.... Stewart.
Aug 12 2009
Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:What's the best anybody's managed to get Notepad++ to syntax-highlight D? (I'm on version 5.4.5, if that makes a difference.) My userDefineLang.xml file is as given here http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus (note that I've fixed a few errors I've no idea how got there). Notepad++ does a good job of syntax-highlighting PHP files, whose syntactic structure is more complex than that of D. So clearly, Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). However, at the moment I can't even seem to get it up to C standard! (Can anybody find a full reference of the userDefineLang.xml format, for that matter?)Scintilla uses plugins to highlight source. These plugins are written in C++ and have almost full access to the buffer so the highlighter code may be arbitrarily complex. I actually wrote such a plugin to highlight D a while back: http://dsource.org/projects/scrapple/browser/trunk/scilexer It seems like Notepad++ developers added their own highlighter plugin which takes userDefineLang.xml as its configuration. Such a configurable plugin is presumably much less flexible than pure C++ implementation for a particular language. It's very likely that PHP highlighter is written in C++ and comes bundled with Scintilla.
Aug 12 2009
Sergey Gromov wrote:Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:<snip>Scintilla uses plugins to highlight source. These plugins are written in C++ and have almost full access to the buffer so the highlighter code may be arbitrarily complex. I actually wrote such a plugin to highlight D a while back: http://dsource.org/projects/scrapple/browser/trunk/scilexer"1. If you have SciTE 1.76 for Windows installed simply replace SciLexer.dll and d.properties with the supplied files. 2. If you wish to build Scintilla from source:" Can it be used in Scintilla-based editors besides SciTE short of acquiring the whole Scintilla source and rebuilding it? For the record, there's a SciLexer.dll in my Notepad++ dir, but no d.properties to be found. The SciLexer.dll reports itself as file version 1.7.8.0, product version 1.78. So maybe the question is of what effect replacing it with a fork of version 1.76 would have. (Do SciTE versions correspond directly to Scintilla versions?)It seems like Notepad++ developers added their own highlighter plugin which takes userDefineLang.xml as its configuration. Such a configurable plugin is presumably much less flexible than pure C++ implementation for a particular language. It's very likely that PHP highlighter is written in C++ and comes bundled with Scintilla.It puzzles me that they didn't make this plugin powerful enough to highlight the language it (and indeed the whole of Notepad++) is written in. Even more so considering the sheer number of C-like languages out there, which people are likely to want to use N++ to write. Stewart.
Aug 12 2009
Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:Sergey Gromov wrote:There are two problems at least: 1. SciLexer.dll contains *all* of the built-in lexer modules. Replacing your DLL with another version will remove any extra lexers which 3rd party put there, like an XML-configurable lexer in case of Notepad++. 2. Lexers are written in C++ and interface with the rest of Scintilla via C++ classes. Therefore if a field is added or removed anywhere, or if you use a different compiler to build your DLL than that used to build Scintilla, you'll get GPF, or worse. Good news is that Notepad++ is on SourceForge so that the "from source" way is at least possible.Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:<snip>Scintilla uses plugins to highlight source. These plugins are written in C++ and have almost full access to the buffer so the highlighter code may be arbitrarily complex. I actually wrote such a plugin to highlight D a while back: http://dsource.org/projects/scrapple/browser/trunk/scilexer"1. If you have SciTE 1.76 for Windows installed simply replace SciLexer.dll and d.properties with the supplied files. 2. If you wish to build Scintilla from source:" Can it be used in Scintilla-based editors besides SciTE short of acquiring the whole Scintilla source and rebuilding it?For the record, there's a SciLexer.dll in my Notepad++ dir, but no d.properties to be found. The SciLexer.dll reports itself as file version 1.7.8.0, product version 1.78. So maybe the question is of what effect replacing it with a fork of version 1.76 would have. (Do SciTE versions correspond directly to Scintilla versions?)Yes, SciTE versions seem to be in sync with Scintilla versions.Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting. I think the best such engine I've seen was Colorer by Igor Russkih, and even there I wasn't able to express D's WYSIWYG or delimited strings. You need a real programming language for that. --- I've just had a look at Notepad++ sources. The Scintilla they use contains Scintilla's built-in D lexer. I think it's just not configured. SciTE uses *.properties files to configure stuff. Notepad++ uses XML files for the same purpose. I think it's all in langs.model.xml. My current idea is to take d.properties from the corresponding release of SciTE and try to translate it into the langs.model.xml format. I'll probably try it later when I have time. Of course it would be nice to replace the original D lexer with mine. Or, even better, to ask Scintilla developers to include my lexer into the official bundle. May be worth a try.It seems like Notepad++ developers added their own highlighter plugin which takes userDefineLang.xml as its configuration. Such a configurable plugin is presumably much less flexible than pure C++ implementation for a particular language. It's very likely that PHP highlighter is written in C++ and comes bundled with Scintilla.It puzzles me that they didn't make this plugin powerful enough to highlight the language it (and indeed the whole of Notepad++) is written in. Even more so considering the sheer number of C-like languages out there, which people are likely to want to use N++ to write.
Aug 12 2009
Sergey Gromov wrote:2. Lexers are written in C++ and interface with the rest of Scintilla via C++ classes. Therefore if a field is added or removed anywhere, or if you use a different compiler to build your DLL than that used to build Scintilla, you'll get GPF, or worse.If they use binary interfacing with virtual functions a la COM's binary standard, then field presence shouldn't matter. Also, most compilers on Windows respect the basic ABI. No? Andrei
Aug 12 2009
Wed, 12 Aug 2009 21:35:02 -0500, Andrei Alexandrescu wrote:Sergey Gromov wrote:They don't, unfortunately. Every lexer defines a static instance of a LexerModule class. The coloring function receives a reference to an Accessor class. They're full-blown classes, with fields and stuff.2. Lexers are written in C++ and interface with the rest of Scintilla via C++ classes. Therefore if a field is added or removed anywhere, or if you use a different compiler to build your DLL than that used to build Scintilla, you'll get GPF, or worse.If they use binary interfacing with virtual functions a la COM's binary standard, then field presence shouldn't matter.Also, most compilers on Windows respect the basic ABI. No?Even though they don't use inheritance, and therefore most compilers will likely build identical data layouts for them, there is still zero compatibility between different versions of those classes.
Aug 12 2009
Sergey Gromov Wrote:Or, even better, to ask Scintilla developers to include my lexer into the official bundle. May be worth a try.Uh... that's not an option.
Aug 13 2009
Sergey Gromov wrote:Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:<snip>I can't see how it can be at all complicated to find the beginning and end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.It puzzles me that they didn't make this plugin powerful enough to highlight the language it (and indeed the whole of Notepad++) is written in. Even more so considering the sheer number of C-like languages out there, which people are likely to want to use N++ to write.Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting.I think the best such engine I've seen was Colorer by Igor Russkih, and even there I wasn't able to express D's WYSIWYG or delimited strings. You need a real programming language for that.For WYSIWYG strings, all that's needed is a generic highlighter that supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)--- I've just had a look at Notepad++ sources. The Scintilla they use contains Scintilla's built-in D lexer. I think it's just not configured.Sounds as though N++'s developers overlooked to keep the configuration files up to date as new languages have been added to Scintilla.SciTE uses *.properties files to configure stuff. Notepad++ uses XML files for the same purpose. I think it's all in langs.model.xml. My current idea is to take d.properties from the corresponding release of SciTE and try to translate it into the langs.model.xml format. I'll probably try it later when I have time. Of course it would be nice to replace the original D lexer with mine. Or, even better, to ask Scintilla developers to include my lexer into the official bundle. May be worth a try.You have two good plans there. Scintilla's definition of a plugin is confusing - normally plugins are things that can be dynamically loaded at runtime, rather than having to compile them in. If only.... Stewart.
Aug 13 2009
Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:Sergey Gromov wrote:Well, you can write a regexp to handle a simple C string. That is, if your regexp is matched against the whole file, which is usually not the case. Otherwise you'll have troubles with C string: "foo\ bar" or D string: "foo bar" Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end. Then you have r"foo" which probably can be handled with regexps. Then you have q"/foo/" where "/" can be anything. Still can be handled by extended regexps, even though they won't be regular expressions in scientific sense. Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens. And of course q"BLAH whatever BLAH here BLAH", well, probably nice for help texts. And these are only strings. Try to write regexp which treats .__15 as number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as number(2), operator(..), number(3).Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting.I can't see how it can be at all complicated to find the beginning and end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.I think the best such engine I've seen was Colorer by Igor Russkih, and even there I wasn't able to express D's WYSIWYG or delimited strings. You need a real programming language for that.For WYSIWYG strings, all that's needed is a generic highlighter that supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)Scintilla's definition of a plugin is confusing - normally plugins are things that can be dynamically loaded at runtime, rather than having to compile them in. If only....I'm not sure they call them "plugins". They're lexer modules made so that lexer is relatively easily extendable.
Aug 14 2009
Sergey Gromov wrote: <snip>Well, you can write a regexp to handle a simple C string. That is, if your regexp is matched against the whole file, which is usually not the case. Otherwise you'll have troubles with C string: "foo\ bar" or D string: "foo bar"So there is a problem if the highlighter works by matching regexps on a line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end.Let's just concentrate for the moment on the simple process of finding the beginning and end of a string. Here's a snippet of a TextPad syntax file: StringsSpanLines = Yes StringStart = " StringEnd = " StringEsc = \ A possible snippet of lexer code to handle this (which FAIK might be near enough how TP does it): if (*c == StringStart) { beginHighlightString(c); for (++c; *c != StringEnd && *c != '\0' &&(StringsSpanLines || *c != '\n'); ++c) { if (*c == StringEsc) ++c; } endHighlightString(c+1); } It's simple and it should work. (OK, there are two assumptions made for simplicity: that line breaks are normalised to LF, and that the file is terminated by at least two null bytes in memory, but you get the idea.) While it doesn't support highlighting of escapes, I can't see this fact as being the reason N++'s developers haven't implemented even this in the generic lexer module. I probably couldn't see it being the reason even if the C lexer did highlight escapes (which it doesn't).Then you have r"foo" which probably can be handled with regexps. Then you have q"/foo/" where "/" can be anything. Still can be handled by extended regexps, even though they won't be regular expressions in scientific sense. Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens.Yes, these aspects are more complicated. Both TP and N++ (out of the box, anyway) are probably far from being able to lex D2 properly. But they certainly could do better in supporting D1. Still, once N++ gains access to Scintilla's D lexer, things will certainly be better.And of course q"BLAH whatever BLAH here BLAH", well, probably nice for help texts. And these are only strings. Try to write regexp which treats .__15 as number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as number(2), operator(..), number(3).<snip> We'd need many regexps to handle all possible cases, but a possible set to cover these cases and a few others (listed in a possible order of priority) is: \._*[0-9][0-9_]* ([1-9][0-9]*)(\.\.) [0-9]+\.[0-9]* [1-9][0-9]* \.\. \. [a-zA-Z_][a-zA-Z0-9_]* Note the use of capturing groups to handle the 2..3 case. Each capturing group would match a token, while in the other cases the whole regexp matches a token. Stewart.
Aug 14 2009
Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:Sergey Gromov wrote:Highlighting the whole file every time a charater is typed is slow. Scintilla doesn't do that. It provides the lexer with a range of changed lines. The lexer is then free to choose a larger range if it cannot deduce context from the initial range. I tried to ignore this range and re-highlight the whole file in my lexer. The performance was unacceptable."foo bar"So there is a problem if the highlighter works by matching regexps on a line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)Sure, TextPad uses a dozen of simple hacks specific to lexing programming languages. They're ad-hoc and they're limited to exactly what TextPad authors thought were important. Regexps is a different approach. They are more generic but are limited, too, because they're slow and don't nest naturally. Slow means they must try to re-color as little lines as possible. Not nestable means you need to invent some framework around regexps which is another sort of description language. If you implement the former naively and ignore the latter you'll get what presumably N++ has: not a very powerful system. It's actually trivial* to implement a lexer for Scintilla which would work exactly as TextPad does, including use of the same configuration files. * That is, if you know exactly how TextPad works.Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end.Let's just concentrate for the moment on the simple process of finding the beginning and end of a string. Here's a snippet of a TextPad syntax file: StringsSpanLines = Yes StringStart = " StringEnd = " StringEsc = \ A possible snippet of lexer code to handle this (which FAIK might be [...]Basically yes, but they're going to be much more complex. 3Lu...5 is also a range. 0x3e22.f5p6fi is a valid floating-point number. And still, regexps don't nest. Don't you want to highlight DDoc sections and macros?And these are only strings. Try to write regexp which treats .__15 as number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as number(2), operator(..), number(3).<snip> We'd need many regexps to handle all possible cases, but a possible set to cover these cases and a few others (listed in a possible order of priority) is: \._*[0-9][0-9_]* ([1-9][0-9]*)(\.\.) [0-9]+\.[0-9]* [1-9][0-9]* \.\. \. [a-zA-Z_][a-zA-Z0-9_]*
Aug 15 2009
Sergey Gromov:Sure, TextPad uses a dozen of simple hacks specific to lexing programming languages. They're ad-hoc and they're limited to exactly what TextPad authors thought were important.Today the difference isn't much important because CPUs are fast. But on Windows with a Pentium3 Scintilla was very slow. TextPad was fast enough even for very quick fingers. (TextPad may even contain some parts coded in assembly). TextPad on Windows is very fast :-) Bye, bearophile
Aug 15 2009
Sergey Gromov wrote:Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:Of course. I suppose now that the right strategy is line-by-line with some preservation of state between lines: - Keep a note of the state at the beginning of each line - When something is changed, re-highlight those lines that have changed - Carry on re-highlighting until the state is back in sync with what was there before. If this means going way beyond the visible area of the file, record the state of the next however many lines as unknown (so that it will have another go when/if those lines are later scrolled into view). - If a range of lines that has just come into view begins in unknown state, it's up to the particular lexer module to start from the first visible line or backtrack as far as it likes to get some context. Is this anything like how Scintilla works? <snip>Sergey Gromov wrote:Highlighting the whole file every time a charater is typed is slow. Scintilla doesn't do that. It provides the lexer with a range of changed lines. The lexer is then free to choose a larger range if it cannot deduce context from the initial range. I tried to ignore this range and re-highlight the whole file in my lexer. The performance was unacceptable."foo bar"So there is a problem if the highlighter works by matching regexps on a line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)It's actually trivial* to implement a lexer for Scintilla which would work exactly as TextPad does, including use of the same configuration files. * That is, if you know exactly how TextPad works.It would also be straightforward to improve TextPad's scheme to support an arbitrary number of string/comment types. How about this as an all-in-one replacement for TP's comment and string syntax directives? [DelimitedToken1] Start = /** End = */ Type = DocComment SpanLines = Yes Nest = No [DelimitedToken2] Start = /*! End = */ Type = DocComment SpanLines = Yes Nest = No [DelimitedToken3] Start = /* End = */ Type = Comment SpanLines = Yes Nest = No [DelimitedToken4] Start = /+ End = +/ Type = Comment SpanLines = Yes Nest = Yes [DelimitedToken5] Start = // Type = Comment SpanLines = No Nest = No [DelimitedToken6] Start = r" End = " Type = String SpanLines = Yes Nest = No [DelimitedToken7] Start = ` End = ` Type = String SpanLines = Yes Nest = No [DelimitedToken8] Start = " End = " Esc = \ Type = String SpanLines = Yes Nest = No [DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight. <snip>Basically yes, but they're going to be much more complex. 3Lu...5 is also a range. 0x3e22.f5p6fi is a valid floating-point number. And still, regexps don't nest. Don't you want to highlight DDoc sections and macros?That would be nice as well, as would being able to do things with Doxygen comments. But let's not try to run before we can walk. Stewart.
Aug 17 2009
Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:Sergey Gromov wrote:Exactly. There is a 32-bit "style" known for every character, plus another 32-bit field associated with every line. A lexer is free to use these fields for any purpose, except the lower byte of a style defines the characters' color.Highlighting the whole file every time a charater is typed is slow. Scintilla doesn't do that. It provides the lexer with a range of changed lines. The lexer is then free to choose a larger range if it cannot deduce context from the initial range. I tried to ignore this range and re-highlight the whole file in my lexer. The performance was unacceptable.Of course. I suppose now that the right strategy is line-by-line with some preservation of state between lines: - Keep a note of the state at the beginning of each line - When something is changed, re-highlight those lines that have changed - Carry on re-highlighting until the state is back in sync with what was there before. If this means going way beyond the visible area of the file, record the state of the next however many lines as unknown (so that it will have another go when/if those lines are later scrolled into view). - If a range of lines that has just come into view begins in unknown state, it's up to the particular lexer module to start from the first visible line or backtrack as far as it likes to get some context. Is this anything like how Scintilla works?<snip>Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs. Though you may consider this minor.It's actually trivial* to implement a lexer for Scintilla which would work exactly as TextPad does, including use of the same configuration files. * That is, if you know exactly how TextPad works.It would also be straightforward to improve TextPad's scheme to support an arbitrary number of string/comment types. How about this as an all-in-one replacement for TP's comment and string syntax directives? [...] [DelimitedToken8] Start = " End = " Esc = \ Type = String SpanLines = Yes Nest = No [DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight.<snip>This assumes that TextPad could run at some point. ;) This is exactly where I'm sceptical. I think that when it runs it'll have so many weird rules and settings that it won't be fun anymore. And they won't be powerful enough for anything authors didn't consider anyway.Basically yes, but they're going to be much more complex. 3Lu...5 is also a range. 0x3e22.f5p6fi is a valid floating-point number. And still, regexps don't nest. Don't you want to highlight DDoc sections and macros?That would be nice as well, as would being able to do things with Doxygen comments. But let's not try to run before we can walk.
Aug 17 2009
Sergey Gromov wrote:Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:<snip>Does it keep around in memory the style of every character, or only the 32-bit field associated with the line so that the lexer can re-style the characters on repaint/scroll? <snip>Is this anything like how Scintilla works?Exactly. There is a 32-bit "style" known for every character, plus another 32-bit field associated with every line. A lexer is free to use these fields for any purpose, except the lower byte of a style defines the characters' color.I don't understand. How does SpanLines not achieve this? Then what _does_ SpanLines achieve according to whatever conclusion you've come to?[DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight.Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs.Though you may consider this minor.You're right - it turns out TP doesn't get all the D floating point notations right. It appears that TP has hard-coded the syntax of C numeric literals. I must've just not noticed since I had never before changed the number colour from the same as the default text colour. Maybe we do want regexps for all these floating point notations after all.<snip>This assumes that TextPad could run at some point.Basically yes, but they're going to be much more complex. 3Lu...5 is also a range. 0x3e22.f5p6fi is a valid floating-point number. And still, regexps don't nest. Don't you want to highlight DDoc sections and macros?That would be nice as well, as would being able to do things with Doxygen comments. But let's not try to run before we can walk.;) This is exactly where I'm sceptical. I think that when it runs it'll have so many weird rules and settings that it won't be fun anymore. And they won't be powerful enough for anything authors didn't consider anyway.Maybe someone can come up with something.... Stewart.
Aug 18 2009
Tue, 18 Aug 2009 20:40:37 +0100, Stewart Gordon wrote:Sergey Gromov wrote:It can tell about any character of which style it is. This is to repaint unchanged lines without ever calling a lexer.Exactly. There is a 32-bit "style" known for every character, plus another 32-bit field associated with every line. A lexer is free to use these fields for any purpose, except the lower byte of a style defines the characters' color.Does it keep around in memory the style of every character, or only the 32-bit field associated with the line so that the lexer can re-style the characters on repaint/scroll?<snip>Here's a string which is valid in D but is invalid in C: "foo bar" Here's another string which is, on the contrary, valid in C but is invalid in D: "foo\ bar" They both "span lines."I don't understand. How does SpanLines not achieve this? Then what _does_ SpanLines achieve according to whatever conclusion you've come to?[DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight.Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs.
Aug 20 2009
Sergey Gromov wrote: <snip>Here's a string which is valid in D but is invalid in C: "foo bar" Here's another string which is, on the contrary, valid in C but is invalid in D: "foo\ bar" They both "span lines."Doesn't quite relate to what I was querying ... but anyway, it's perfectly straightforward to add another rule like LineSplice = \ among other possibilities. You could argue over whether it's worth going to all this effort, if you think the only point is to support C, C++ and D. But really, there are many C-like languages out there with their own slightly different rules, and even the likes of Prolog, SQL and Unix shell scripts with their own variants of C string syntax. I think the scheme I've come up with would be a good way to capture the subtle differences between these languages' string syntaxes, while at the same time being something that the average user wanting to add a new language to the system should be able to get their head around sooner or later. Stewart.
Aug 21 2009
Sergey Gromov wrote:Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:Remember that the whole point of q{} strings was that they should NOT be highlighted as strings!Sergey Gromov wrote:Well, you can write a regexp to handle a simple C string. That is, if your regexp is matched against the whole file, which is usually not the case. Otherwise you'll have troubles with C string: "foo\ bar" or D string: "foo bar" Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end. Then you have r"foo" which probably can be handled with regexps. Then you have q"/foo/" where "/" can be anything. Still can be handled by extended regexps, even though they won't be regular expressions in scientific sense. Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens.Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting.I can't see how it can be at all complicated to find the beginning and end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.I think the best such engine I've seen was Colorer by Igor Russkih, and even there I wasn't able to express D's WYSIWYG or delimited strings. You need a real programming language for that.For WYSIWYG strings, all that's needed is a generic highlighter that supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)
Aug 17 2009
Mon, 17 Aug 2009 10:37:47 +0200, Don wrote:Sergey Gromov wrote:You confuse q{} and q"{}" here. The former is a token string which may contain only valid D tokens. The latter is a delimited string with nesting delimiters. Like q"<<a href="#hi">hello</a>>".Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens.Remember that the whole point of q{} strings was that they should NOT be highlighted as strings!
Aug 17 2009
Stewart Gordon wrote: <snip>TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals.<snip> For the record, what I'd done is StringStart = " StringEnd = " StringAlt = ' StringEsc = \ CharStart = ` CharEnd = ` CharEsc = however, I've just found a bigger problem: only string literals, not char literals, can span lines in TP. Stewart.
Aug 14 2009
Stewart Gordon Wrote:For the record, there's a SciLexer.dll in my Notepad++ dir, but no d.properties to be found. The SciLexer.dll reports itself as file version 1.7.8.0, product version 1.78. So maybe the question is of what effect replacing it with a fork of version 1.76 would have. (Do SciTE versions correspond directly to Scintilla versions?)Wrong lexer is used here. Scintilla builtin d lexer supported nested comments and escape sequences from version 1.72, but support for multiline strings was added in version 1.79.
Aug 13 2009
Stewart Gordon Wrote:Or maybe I should just go back to TextPad (which isn't perfect either) and put up with its not supporting Unicode....FWIW Zeus is very similar to TextPad in feature set and the latest version also adds support for Unicode/UTF8. http://www.zeusedit.com/ It will do D syntax highlighting and code folding out of the box. It also comes with a version of ctags.exe made with these changes specifically for the D languages: http://www.zeusedit.com/z300/ctags_src.zip meaning it can produce tags infomation for your D source files. NOTE: Zeus like TextPad is shareware. Jussi Jumppanen Author: Zeus for Windows
Aug 12 2009
Stewart Gordon Wrote:Anyway, attached is the result. Can anybody do better (other than by telling it to treat D as C or some other language instead)?I don't see how the lexer is being chosen. Programmer's Notepad does it correctly.
Aug 13 2009
Nick Sabalausky Wrote:At least PN chooses lexer. That's what I meant. These issues do not pertain to PN. They're RFEs for Scintilla D lexer and as I said they were fixed in version 1.79. PN developer just plans to upgrade to new Scintilla in PN 3, in fact I compiled scintilla 1.78 with recent D lexer an it works fine. BTW bug 482 is invalid, support for nested comments was there from the start, make sure you don't use C lexer.I don't see how the lexer is being chosen. Programmer's Notepad does it correctly.I use Programmer's Notepad. It's good, but it still has some problems: http://code.google.com/p/pnotepad/issues/detail?id=480 (Proper Highlighting for D's Wysiwyg Strings) http://code.google.com/p/pnotepad/issues/detail?id=481 (In D, strings with embedded newlines are not highlighted correctly) http://code.google.com/p/pnotepad/issues/detail?id=482 (Support for D's nested comments)
Aug 14 2009