www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - So, You Want To Write Your Own Programming Language?

reply Walter Bright <newshound2 digitalmars.com> writes:
http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Jan 21 2014
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-22 05:29, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
From the article: "Regex is just the wrong tool for lexing and parsing." I'm wonder why is there so many books about implementing compilers that spends, usually, quite a large chapter about regular expressions? -- /Jacob Carlborg
Jan 22 2014
parent "Uplink_Coder" <someemail someprovider.some> writes:
 On Wednesday, 22 January 2014 at 10:36:31 UTC, Jacob Carlborg 
 wrote:
 I'm wonder why is there so many books about implementing 
 compilers that spends, usually, quite a large chapter about 
 regular expressions?
I wonder about that too. For anything halfway useful regex has too much limitations. Wich you only find out in later chapter or pretty soon in your parser :D
Jan 22 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Thank you for the simple nice article.
The poisoning approach. [...] This is the approach we've been 
using in the D compiler, and are very pleased with the results.<
Yet, even in D most of the error messages after the first few ones are often not so useful to me. So perhaps I'd like a compiler switch to show only the first few error messages and then stop the compiler.
Automated documentation generator. [...] Before Ddoc, the 
documentation had only a random correlation with the code, and 
too often, they had nothing to do with each other. After Ddoc, 
the two were brought in sync.<
And now the situation is even better, we have documentation unittests and the function arguments are verified to be in sync with their ddoc comment. Probably there's some space for further improvements.
One semantic technique that is obvious in hindsight (but took 
Andrei Alexandrescu to point out to me) is called "lowering."<
In Haskell the GHC compiler goes one step further, it translates all the Haskell code into an intermediate code named Core, that is not the language of a virtual machine, it's still a functional language, but it's simpler, lot of the syntax differences between language constructs is reduced to a much reduced number of mostly functional stuff.
My general rule is if the explanation for what the function does 
is more lines than the implementation code, then the function is 
likely trivia and should be booted out.<
In Haskell there's a standard module named Prelude, it's imported on default and defined lot of functions, etc of general use. Most functions in it are only few lines long (often 2-3 lines long, with some functions up to 10-13 lines long). Bonus: the cute idea of a language for students: http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c (On Reddit I seem to see some comments, like structs not allowing constructors?) Bye, bearophile
Jan 22 2014
next sibling parent reply "Dejan Lekic" <dejan.lekic gmail.com> writes:
On Wednesday, 22 January 2014 at 10:38:40 UTC, bearophile wrote:
 In Haskell the GHC compiler goes one step further, it 
 translates all the Haskell code into an intermediate code named 
 Core, that is not the language of a virtual machine, it's still 
 a functional language, but it's simpler, lot of the syntax 
 differences between language constructs is reduced to a much 
 reduced number of mostly functional stuff.
Same story is with Erlang as far as I know.
Jan 22 2014
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 22.01.2014 14:28, schrieb Dejan Lekic:
 On Wednesday, 22 January 2014 at 10:38:40 UTC, bearophile wrote:
 In Haskell the GHC compiler goes one step further, it translates all
 the Haskell code into an intermediate code named Core, that is not the
 language of a virtual machine, it's still a functional language, but
 it's simpler, lot of the syntax differences between language
 constructs is reduced to a much reduced number of mostly functional
 stuff.
Same story is with Erlang as far as I know.
Most likely due to its Prolog influence, which also does it.
Jan 22 2014
prev sibling parent reply "Don" <x nospam.com> writes:
On Wednesday, 22 January 2014 at 10:38:40 UTC, bearophile wrote:
 Walter Bright:

 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Thank you for the simple nice article.
The poisoning approach. [...] This is the approach we've been 
using in the D compiler, and are very pleased with the results.<
Yet, even in D most of the error messages after the first few ones are often not so useful to me. So perhaps I'd like a compiler switch to show only the first few error messages and then stop the compiler.
Could you give an example? We've tried very hard to avoid useless error messages, there should only be one error message for each bug in the code. Parser errors still generate a cascade of junk, and the "cannot deduce function from argument types" message is still painful -- is that what you mean? Or something else?
Jan 22 2014
parent "bearophile" <bearophileHUGS lycos.com> writes:
Don:

 Could you give an example? We've tried very hard to avoid 
 useless error messages, there should only be one error message 
 for each bug in the code.
 Parser errors still generate a cascade of junk, and the "cannot 
 deduce function from argument types" message is still painful 
 -- is that what you mean? Or something else?
There are situations where I see lots and lots of error messages caused by some detail that breaks the instantiability of for some function from std.algorithm. While trying to find you an example, I have found and filed this instead :-) https://d.puremagic.com/issues/show_bug.cgi?id=11971 Bye, bearophile
Jan 22 2014
prev sibling next sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright 
wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
"A good syntax needs redundancy in order to diagnose errors and give good error messages." This is also true of natural languages. The higher the redundancy, the easier it is to guess or reconstruct what a person tried to say (in a noisy environment) or write (if the message gets messed up somehow). Texts in highly inflectional languages (like German) can be "recovered" with higher accuracy than texts in English. If grammatical relations are no longer expressed by inflectional endings (as is often the case in English), the word order is crucial. "The dog bit the man." In Latin and German you can turn the statement around and still know who bit who(m). Over the centuries, natural languages have reduced redundancy, but there are still loads of redundancies e.g. "two cats" (it would be enough to say "two cat", which some languages actually do, see also "a 15 _year_ old girl). Syntax is getting simplified due to the fact that the listener "knows what we mean", e.g. "buy one get one free". I wonder to what extent languages will be simplified one day. But this is a topic for a whole book ...
Jan 22 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Chris:

 "A good syntax needs redundancy in order to diagnose errors and 
 give good error messages."
I'd like to measure this statement experimentally: are error messages in Go and Scala any worse because of the optional use of semicolons? My initial supposition is that the answer is negative. Bye, bearophile
Jan 22 2014
parent "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:
On Wednesday, 22 January 2014 at 11:59:30 UTC, bearophile wrote:
 I'd like to measure this statement experimentally: are error 
 messages in Go and Scala any worse because of the optional use 
 of semicolons? My initial supposition is that the answer is 
 negative.
Error messages in SML are either really neat or catastrophic.
Jan 22 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2014 3:40 AM, Chris wrote:
 Syntax is getting simplified due to the fact that the listener "knows what we
 mean", e.g. "buy one get one free". I wonder to what extent languages will be
 simplified one day. But this is a topic for a whole book ...
There was this article recently: http://www.onthemedia.org/story/yesterday-internet-solved-20-year-old-mystery/ about how english is so redundant one can write sentences using just the first letter of each word, and it is actually understandable.
Jan 22 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 22 January 2014 at 18:46:06 UTC, Walter Bright 
wrote:
 On 1/22/2014 3:40 AM, Chris wrote:
 Syntax is getting simplified due to the fact that the listener 
 "knows what we
 mean", e.g. "buy one get one free". I wonder to what extent 
 languages will be
 simplified one day. But this is a topic for a whole book ...
There was this article recently: http://www.onthemedia.org/story/yesterday-internet-solved-20-year-old-mystery/ about how english is so redundant one can write sentences using just the first letter of each word, and it is actually understandable.
These examples are more about context than redundancy in the grammar. This is very interesting, because the burden is more and more on the listener and less on the speaker. The speaker can omit things relying on the listener's common sense or knowledge of the world (or "you know what I mean" skills). In the beginning, languages were quite complicated (8 or more cases, inflections), but over the centuries things have been simplified, probably due to the fact that humans are experienced enough and can now trust the "interpreter" in the listener's head. A good example are headlines. A classic is "Driver refused license". Now, everybody will assume that it was not the driver who refused the license (default assumption or the _unmarked case_). If it were in fact the driver who refused the license, the headline would have been different, some sort of linguistic flag would have been raised. This goes into the realms of pragmatics, a very interesting discipline. Some of the concepts found in natural languages can also be found in programming languages. I find it extremely interesting how the human mind (not just language) is reflected in programming languages.
Jan 23 2014
next sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 1/23/2014 5:24 AM, Chris wrote:
 I find it extremely interesting how the human
 mind (not just language) is reflected in programming languages.
They way I usually see it is that the human mind HAS to be reflected in programming languages as that's the whole point. We already knew how to program computers back with manual switches, Altair-style. Every programming tool since then (and *including* Altair-style) has fundamentally been about bridging the gap between the way humans work and the way computers work. That naturally requires that the tool (ex. programming language) reflects a lot about the core nature of both humans and computers, because the language's whole job is to interface with both.
Jan 23 2014
parent "Chris" <wendlec tcd.ie> writes:
On Thursday, 23 January 2014 at 20:11:15 UTC, Nick Sabalausky 
wrote:
 On 1/23/2014 5:24 AM, Chris wrote:
 I find it extremely interesting how the human
 mind (not just language) is reflected in programming languages.
They way I usually see it is that the human mind HAS to be reflected in programming languages as that's the whole point. We already knew how to program computers back with manual switches, Altair-style. Every programming tool since then (and *including* Altair-style) has fundamentally been about bridging the gap between the way humans work and the way computers work. That naturally requires that the tool (ex. programming language) reflects a lot about the core nature of both humans and computers, because the language's whole job is to interface with both.
Yes, there is no other way. Humans cannot create anything that is not based on the human mind. However, it is interesting to see how it is done. Man against machine (or rather man in machine), how to make a computer work the way we work. Even the simplest things like x++; x += 5; are fascinating. It is already reflected in the development of writing systems, long before there was any talk of computers. And it is also interesting to see how different human ways of tackling problems are enshrined in programming languages. E.g. the ever patronizing Python vs C style (";"). One could write a book about it.
Jan 24 2014
prev sibling next sibling parent "Mike James" <foo bar.com> writes:
On Thursday, 23 January 2014 at 10:24:23 UTC, Chris wrote:
 On Wednesday, 22 January 2014 at 18:46:06 UTC, Walter Bright 
 wrote:
 On 1/22/2014 3:40 AM, Chris wrote:
 Syntax is getting simplified due to the fact that the 
 listener "knows what we
 mean", e.g. "buy one get one free". I wonder to what extent 
 languages will be
 simplified one day. But this is a topic for a whole book ...
There was this article recently: http://www.onthemedia.org/story/yesterday-internet-solved-20-year-old-mystery/ about how english is so redundant one can write sentences using just the first letter of each word, and it is actually understandable.
These examples are more about context than redundancy in the grammar. This is very interesting, because the burden is more and more on the listener and less on the speaker. The speaker can omit things relying on the listener's common sense or knowledge of the world (or "you know what I mean" skills). In the beginning, languages were quite complicated (8 or more cases, inflections), but over the centuries things have been simplified, probably due to the fact that humans are experienced enough and can now trust the "interpreter" in the listener's head. A good example are headlines. A classic is "Driver refused license". Now, everybody will assume that it was not the driver who refused the license (default assumption or the _unmarked case_). If it were in fact the driver who refused the license, the headline would have been different, some sort of linguistic flag would have been raised. This goes into the realms of pragmatics, a very interesting discipline. Some of the concepts found in natural languages can also be found in programming languages. I find it extremely interesting how the human mind (not just language) is reflected in programming languages.
Headlines are a good source. My favourites are from WW2... MacArthur flies back to front. British push bottles up Germans. -<mike>-
Jan 24 2014
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 23 January 2014 at 10:24:23 UTC, Chris wrote:
 A good example are headlines. A classic is "Driver refused 
 license". Now, everybody will assume that it was not the driver 
 who refused the license (default assumption or the _unmarked 
 case_).
Why it's not a driver who refused a license?
Jan 27 2014
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Monday, 27 January 2014 at 09:19:25 UTC, Kagamin wrote:
 On Thursday, 23 January 2014 at 10:24:23 UTC, Chris wrote:
 A good example are headlines. A classic is "Driver refused 
 license". Now, everybody will assume that it was not the 
 driver who refused the license (default assumption or the 
 _unmarked case_).
Why it's not a driver who refused a license?
More likely that it's a driver who was refused a license by the State (because of some reason such as "you've been caught drinking and driving 20 times so you're totally banned"). People aren't offered licenses and accept or reject them, they must seek them out. It doesn't make sense for someone to walk up (or be given a ride to by a friend) to the DMV wait 30 minutes and once they do all the work to get the license say "Wait, no, I refuse this after all." So, despite "Driver refused license" possibly meaning "the driver refused to accept the license despite being able to" or "driver was refused a license by the State (due to some circumstance)", it's massively more likely to be the latter.
Jan 27 2014
parent "Kagamin" <spam here.lot> writes:
On Tuesday, 28 January 2014 at 00:48:48 UTC, Chris Cain wrote:
 It doesn't make sense for someone to walk up (or be given a 
 ride to by a friend) to the DMV wait 30 minutes and once they 
 do all the work to get the license say "Wait, no, I refuse this 
 after all."
Pretty dramatic action, if he, say, did something bad with his car and swore to never drive again.
Jan 31 2014
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright 
wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Great article!
Jan 22 2014
prev sibling next sibling parent reply "Don" <x nospam.com> writes:
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright 
wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Great article. I was surprised that you mentioned lowering positively, though. I think from DMD we have enough experience to say that although lowering sounds good, it's generally a bad idea. It gives you a mostly-working prototype very quickly, but you pay a heavy price for it. It destroys valuable semantic information. You end up with poor quality error messages, and counter-intuitively, you can end up with _more_ special cases (eg, lowering ref-foreach in DMD means ref local variables can spread everywhere). And it reduces possibilities for the optimizer. In DMD, lowering has caused *major* problems with AAs, foreach. and builtin-functions, and some of the transformations that the inliner makes. It's also caused problems with postincrement and exponentation. Probably there are other examples. It seems to me that what does make sense is to perform lowering as the final step before passing the code to the backend. If you do it too early, you're shooting yourself in the foot.
Jan 22 2014
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2014 4:53 AM, Don wrote:
 On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Great article. I was surprised that you mentioned lowering positively, though. I think from DMD we have enough experience to say that although lowering sounds good, it's generally a bad idea. It gives you a mostly-working prototype very quickly, but you pay a heavy price for it. It destroys valuable semantic information. You end up with poor quality error messages, and counter-intuitively, you can end up with _more_ special cases (eg, lowering ref-foreach in DMD means ref local variables can spread everywhere). And it reduces possibilities for the optimizer. In DMD, lowering has caused *major* problems with AAs, foreach. and builtin-functions, and some of the transformations that the inliner makes. It's also caused problems with postincrement and exponentation. Probably there are other examples. It seems to me that what does make sense is to perform lowering as the final step before passing the code to the backend. If you do it too early, you're shooting yourself in the foot.
On the other hand, the lowering of loops to for uncovered numerous bugs, and the lowering of scope to try-finally made it actually implementable and fairly bug-free.
Jan 22 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/22/14 4:53 AM, Don wrote:
 On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Great article. I was surprised that you mentioned lowering positively, though. I think from DMD we have enough experience to say that although lowering sounds good, it's generally a bad idea. It gives you a mostly-working prototype very quickly, but you pay a heavy price for it. It destroys valuable semantic information. You end up with poor quality error messages, and counter-intuitively, you can end up with _more_ special cases (eg, lowering ref-foreach in DMD means ref local variables can spread everywhere). And it reduces possibilities for the optimizer. In DMD, lowering has caused *major* problems with AAs, foreach. and builtin-functions, and some of the transformations that the inliner makes. It's also caused problems with postincrement and exponentation. Probably there are other examples. It seems to me that what does make sense is to perform lowering as the final step before passing the code to the backend. If you do it too early, you're shooting yourself in the foot.
There's a lot of value in defining a larger complex language in terms of a much simpler core. This technique has been applied successfully by a variety of languages (Java and Haskell come to mind). For us, I opine that the scope statement would've had a million subtle issues if it weren't defined in terms of try/catch/finally. My understanding is that your concern is related to the stage at which lowering is performed, which I'd agree with. Andrei
Jan 22 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2014 3:21 PM, Andrei Alexandrescu wrote:
 My understanding is that your concern is related to the stage at which lowering
 is performed, which I'd agree with.
I also think we did a slap-dash job of it, not that the concept is wrong.
Jan 22 2014
prev sibling next sibling parent reply "Steve Teale" <steve.teale britseyeview.com> writes:
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright 
wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Nice Walter. You're almost as down-to-earth as me. I love what you have achieved.
Jan 24 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/24/2014 9:56 AM, Steve Teale wrote:
 On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Nice Walter. You're almost as down-to-earth as me. I love what you have achieved.
Thanks Steve! I've always found you inspiring. (For those who don't know, Steve & I go way, way back to the 1980's. He wrote the iostream implementation for Zortech C++, and was instrumental in the success of Zortech.)
Jan 24 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/21/2014 8:29 PM, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Just showed up on Hacker News: https://news.ycombinator.com/item?id=7172971
Feb 03 2014
parent reply "Gary Willoughby" <dev nomad.so> writes:
On Tuesday, 4 February 2014 at 07:43:36 UTC, Walter Bright wrote:
 On 1/21/2014 8:29 PM, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Just showed up on Hacker News: https://news.ycombinator.com/item?id=7172971
A reply blog article appeared on reddit today: http://genericlanguage.wordpress.com/2014/02/04/advice-on-writing-a-programming-language/
Feb 04 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/4/14, 8:59 AM, Gary Willoughby wrote:
 On Tuesday, 4 February 2014 at 07:43:36 UTC, Walter Bright wrote:
 On 1/21/2014 8:29 PM, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Just showed up on Hacker News: https://news.ycombinator.com/item?id=7172971
A reply blog article appeared on reddit today: http://genericlanguage.wordpress.com/2014/02/04/advice-on-writing-a-programming-language/
http://www.reddit.com/r/programming/comments/1wz8k6/advice_on_writing_a_programming_language/ Andrei
Feb 04 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2014 10:54 AM, Andrei Alexandrescu wrote:
 On 2/4/14, 8:59 AM, Gary Willoughby wrote:
 On Tuesday, 4 February 2014 at 07:43:36 UTC, Walter Bright wrote:
 On 1/21/2014 8:29 PM, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Just showed up on Hacker News: https://news.ycombinator.com/item?id=7172971
A reply blog article appeared on reddit today: http://genericlanguage.wordpress.com/2014/02/04/advice-on-writing-a-programming-language/
http://www.reddit.com/r/programming/comments/1wz8k6/advice_on_writing_a_programming_language/
And it's the top reddit article at the moment!
Feb 04 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/4/14, 12:52 PM, Walter Bright wrote:
 On 2/4/2014 10:54 AM, Andrei Alexandrescu wrote:
 On 2/4/14, 8:59 AM, Gary Willoughby wrote:
 On Tuesday, 4 February 2014 at 07:43:36 UTC, Walter Bright wrote:
 On 1/21/2014 8:29 PM, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
Just showed up on Hacker News: https://news.ycombinator.com/item?id=7172971
A reply blog article appeared on reddit today: http://genericlanguage.wordpress.com/2014/02/04/advice-on-writing-a-programming-language/
http://www.reddit.com/r/programming/comments/1wz8k6/advice_on_writing_a_programming_language/
And it's the top reddit article at the moment!
Not for long: http://www.reddit.com/r/programming/comments/1x0mid/scott_meyers_to_keynote_dconf_2014_discounted/ Andrei
Feb 04 2014