![]() |
|
![]() |
| Yes, people have definitely asked for a follow-up on compilers and/or static types. There's a little context here:
https://news.ycombinator.com/item?id=40956138 I'm not sure if I'm the best person to write a compiler book, or, at least, not yet. I've written a lot of language front ends and interpreters, but I don't have much experience targeting native code. It's something I've wanted to do for a long time, but haven't made the time to make it happen. Static types are also hard for the reasons the linked comment talks about. I still think about it sometimes, but writing a book is a lot of work and I've been enjoying not writing a book for the past couple of years. :) |
![]() |
| That book is an absolute treasure and you should be proud.
At a personal level, this book contained so much that I had always wanted to know, from how hashmaps work, to how to compile in a single pass. And I connected with your note on getting into writing languages partly due to a feeling of being less-than programmers who could create their own language. One thing I wanted to share: I found a small design change to Clox that eliminates an entire class of hard-to-debug GC bugs. Basically, the change is that garbage collection never occurs during the execution of an instruction. Instead, garbage collection is an instruction which is never emitted by the compiler. When the heap size crosses the garbage collection threshold, we schedule a garbage collection, which will occur after the current instruction is done executing, and before the next instruction executes. To implement this, in the Self-Adjusting Heap[1] section, we set a global variable (let's call it "resume pointer") to the next instruction, similar to the return pointer in a call frame. Then we set the instruction pointer to point to a global constant which contains only the garbage collection instruction. The way this works: let's say the current instruction allocates something, and the allocation needs to trigger a GC. To trigger the GC, we set the resume pointer to the next instruction, and the instruction pointer to the global constant garbage collection instruction (let's call this GCGCI). When the current instruction completes, the VM goes to the instruction pointer to get the next instruction, and finds it pointing to the GCGCI. It then executes the garbage collection instruction, which runs the garbage collection. The final step of the garbage collection instruction is to set the instruction pointer to the resume pointer (similar to a return pointer) so that when the VM goes to execute the next instruction it resumes the execution of the user program right where it left off. This adds a bit of performance overhead to garbage collection, but I would guess we more than get it back from the fact that it's no longer necessary to temporarily push objects onto the stack to prevent them from getting GCed mid-instruction. And this eliminates that entire class of GC bugs; code stability and programmer time are worth something! An alternative implementation might observe that the resume pointer is basically a return pointer, so you could potentially store this on the call stack. I didn't really think through the pros and cons of this as it seemed like just adding a global resume pointer solved the problem more simply. However, one thing I was working on is making my interpreter multithreaded, and this means that most of the global variables used by the VM get stuffed into a "Thread" struct. Keeping the size of this thread struct low is a high priority because that memory usage is a limiting factor in the number of threads you can reasonably run simultaneously, so in this case moving the resume pointer onto the call stack starts to look like a bigger benefit. [1] https://craftinginterpreters.com/garbage-collection.html#sel... |
![]() |
| Read Crafting Interpreters when building Crumb (https://github.com/liam-ilan/crumb). It was indispensable, especially the sections on scope and local variables. The balance between technical implementation and conceptual insights is super helpful, especially when trying to go off of the book’s set path.
It’s inspiring to see technical writing done like this. As an aspiring engineer, this sets a really high standard to aim for - excellent resource. |
![]() |
| > Aspiring engineer
You already made it, no need to be humble. You don’t need to finish the CS to call yourself engineer :) Great documentation and an awesome project. Good job. |
![]() |
| It was my first time making a language, so I built into the language whatever was needed to build something cool with Crumb.
Wanted first class functions to simplify the parse step (they can be treated like any other value), but I needed a different mechanism to invoke “native code” vs user-defined methods, so there’s two different types for that. Needed some kind of compound data type, but I didn’t want to deal with side effects from pass by reference, so Crumb implements lists, but they are always pass by value :) P.s. theres some pretty neat stuff build with Crumb at https://github.com/topics/crumb if anyone’s interested! |
![]() |
| Since people are talking about other compiler resources, one I have enjoyed so far though I still need to finish it, Immo Landwerth writing a compiler in c# that generates IL and also debug symbols and the like. It is older (about 5 years, so Core but like 3ish timeframe I believe) so doesn't use current c# syntax but shouldn't matter much for most of what you'd be doing other than a lot of warnings about nullability on types.
https://www.youtube.com/playlist?list=PLRAdsfhKI4OWNOSfS7EUu... |
![]() |
| >really helped me internalize the concepts, and they are useful all over the place, not just in compilers.
Which are some of those places? Parsing data formats could be one, I guess. |
![]() |
| Yes but my point is that these problems are as hard as we want to make them. For a simple language implemented in C, maybe it's fine to never free any memory until the process quits. |
![]() |
| Does anyone know of a good resoucce for creating a statically typed language with stuff like parametric polymorphism and basic type inference? |
![]() |
| Modern compiler implementation in ML by A. W. Appel
There are (inferior) versions of the same in C and Java. I'd use them together with the ML version. |
![]() |
| I just finished the second half, it's a great book! I recommend doing one or two of the proposed challenges in each chapter to reinforce your understanding of the content. |
![]() |
| It's hard to separate out what I know now that neither I nor the initial designers knew back then from what I thought we should have done even then.
For example, optional types ended up not working out, but I don't think I knew that then either. But even before we released, I did tell the language team that I thought it would be good to do: * Non-nullable types. We got there eventually with a ton of engineering effort and migration cost, but this could have been a really strong selling point at launch. * Types on the right. Think `var x: int` instead of `int x`. It's a little more verbose in some cases, but it makes the language easier to parse and makes it easier to have complex syntax for type annotations like function types, union types, etc. * Extension members. We added them years later, which is good, but was too late to affect the design of the core libraries. I suggested that it we had them in 1.0, then the core libraries and collection types could be designed around them. * `val` instead of `final` for single-assignment variables. It's a small thing, but it means that the syntax for single-assignment variables isn't longer then mutable ones, which is good because it doesn't punish users for doing the "right" thing. * No semicolons. Again, another small thing, but it does help the language feel cleaner and more modern. It's really hard to do this later when the syntax wasn't designed for it. * Coroutines instead of futures and async/await. Coroutines are definitely hard to compile to JS, so there is a good argument that I'm just wrong. But I really hate what async does to API design (https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...). |
![]() |
| I am the opposite. I think pedagogical materials like this should introduce things in the order that makes intuitive sense, and most of the time that means from the easiest to the hardest. |
![]() |
| > Doing it the other way is a common pitfall in teaching programming concepts.
100% disagree If someone is brand new to programming and you're expecting to teach them the following concepts: A class, methods, static methods, data types, method return types, void return type, arrays, and namespaces just to be able to write a simple "Hello World" app [1] I think your approach is the one that's misguided. The most common mistaken people make in teching is teaching things in the order the were discovered historically. Not always, but very often that is a distraction. The second most common is teaching from the outside in. The best way to teach is to explain what your tyring to accomplish and start with a very simple example for people to play with. Then pick on concept and modify it so they can see the results of those changes. Then pick the next concet and modify that so they can see tangibly what that means. In the hello world example, it would be starting with the program in [1] and focusing just on the constant string "Hello World!". Change it see what that means, understand what's going on. Then move out to the Console.WriteLine. Make a copy of that line have two lines of it then 3 to understand the method call. Then swap to Console.ReadLine to see what a different method call can do. Continue exploring out from that core concept. The "static" identifier on a method class is not where to start teaching someone programming. It's also why Racket removed a lot of the boiler plate from thier core teaching langauges for students. For them to not even have to see the things that are unnecessary at that point in their learning.[2] [1] https://www.geeksforgeeks.org/java-hello-world-program/ [2] https://docs.racket-lang.org/drracket/htdp-langs.html |
![]() |
| > I specifically said that including the boilerplate is a common pitfall, and requires teaching around a bunch of impertinent concepts.
And again, I'm saying I disagree with this. You are saying don't include the boilerplate at all when teaching. I a saying, do include the boilerplate and don't teach it until students are ready to understand it. As a separate topic, if the language doesn't need boilerplate to make the program work (like Racket) that's spectacular. But if it does, then I'm explicitly saying "show the boilerplate and explain 'you don't need to focus on it now. You will learn it later'". It is best give people a fully working program at all times, even if they don't understand all of it. So they can play around with it, rather than just read along while the author writes about what would have happened if it were fully functioning code they could play with. And to make it very clear, the example I linked to explicitly includes all of the boilerplate, and I explained exactly the best method to teach a student with that specific concrete source code. > https://www.geeksforgeeks.org/java-hello-world-program/ Now all of that said - this isn't really important enough to continue arguing on the internet over. I think we both had good intentions but just weren't communicating clearly. I hope you have a great day. |
![]() |
| codecrafters (YC S22, http://codecrafters.io) has a new module called "Building Your Own Interpreter" that works its way through this book. It's great because you work in small chunks, each of which has unit tests to tell you if you got, e.g. scanning string literals correct.
Currently free while the module is in beta. I spent today working through all of the lexer exercises in Rust (I'm currently learning it), and had a lot of fun. |
![]() |
| Do you have any thoughts about how to read some of the next step books like Engineering a Compiler/Dragon Book? I don't normally read large technical books end to end so I'm curious how others approach it. I am going through Crafting Interpreters right now and I like how it has well defined checkpoints that help me know that I can move on when I understand the current material. I skimmed Engineering a Compiler and it looks like it has exercises as well, do you recommend all/some of those, or any other methods?
Also I did some searching to see past discussions of Engineering a Compiler and found this interesting comment thread from Bob Nystrom for anyone interested: https://news.ycombinator.com/item?id=21725581 |
![]() |
| It’s had pattern matching since 16, it’s been expanded since with some syntactic sugar and more recently the introduction of sum types and exhaustiveness to go along with pattern matching. |
![]() |
| Java is incredibly readable. I don't care for writing Java either but reading the first part but implementing it in c# was incredibly easy (and also ensured no copy/pasting). |
![]() |
| Definitely. I just went through the first 60% or so and it's great. The language is java but it's easy to translate to other languages, which would be a good learning exercise in its own right. |
![]() |
| As the author of a POSIX standard utility, I would advise you to only reach for such utilities when portability is the most important thing.
POSIX utilities are not great. Lex and Yacc included. |
![]() |
| Interesting, I'll have to look into this because I'd never heard that before and now I'm curious.
A language with as much going on as Ruby was not one I would have picked as a candidate for yacc |
![]() |
| Yes, OCaml with its very complex syntax and hundreds of features uses the OCaml equivalents of Lex/Yacc. It is a myth that one cannot use Lex/Yacc in production. |
![]() |
| Python notably switched from hand written recursive descent, to a PEG based parser generator.
But indeed, last I checked, recursive descent was the most common choice overall. |
Author here. Seeing all of the positive comments about my book is really warming my heart. I appreciate everyone and I'm glad so many people have enjoyed the book. I put a ton of time and love into it and it's gratifying to see it had the effect I'd hoped for.