I've been working on a BASIC cross-compiler and it supports the Apple 2 family. It also supports compiling Applesoft and Integer BASIC programs to machine code.
It's located here, and I'd appreciate any feedback, suggestions, etc: https://telengard.itch.io/crustybasic
I'm hoping other folks find some use in it.
It's definitely a WIP, but is pretty functional. This is my first time making a compiler and I'm no BASIC/asm expert, but I do software for a living and I've learned a ton from building this. Also, full disclosure... I do use AI as like a "peer programmer". This isn't a purely vibe coded project, but I do have the LLMs help with writing code, unit tests, bouncing ideas off, making A/B tests for perf, etc.
Very interesting. I have downloaded it and will give it a try soon. Quite an ambitious project considering the machines are all so different in architecture. For the Apple II, do you plan on adding support for double high/low res? Mockingboard sound?
The actual Applllesoft BASIC in the Apple //e never had any real support for double hires or double lores at all since it was nearly unchanged from what shipped with the ][+ which didn't have those modes. And all support for the Mockingboard was add-ons. So as long as crustyBASIC allows the normal kinds of interfaces to external asssembler code like POKE, PEEK and CALL and the & functions, it would already be up to speed with what was always there.
I know that many folks would like to compile the old BASICs like Applesoft. And that is very interesting to me. However, I would like to see a Blitz Basic styled Basic Compiler. Coding comfortably on a PC and compiling it for the 8 bit. No line numbers. Crusty Basic seems a little of a hybrid in more than one way. It has it's own dialect for example. In fact that is the default usage.
I think it would be nice if double high/low res support was built-in along with Mockingboard support. No external assembler, pokes, peeks, calls ... though it does support these if you are translating Applesoft source. Imagine how accessible DHR and Mockingboard could be in a Basic that supports it directly? We could see a flood of games and programs as a result. Any besides, we gotta do something with our Mockingboards no? Imagine by 2030 dozens of new games that use double high res and mockingboard? Me personally, I would enjoy participating in writing some of those games.
What you are talking about would be cool in it's own way for building new stuff. I think what a lot of people are looking for that crustyBASIC more likely already solves is people who would like to compile existing Applesoft programs into assembler. Sure, there were several vintage Applesoft compilers back in the day but most of them didn't really compile anything to real machine code, instead they more or less were sort of pseudo byte code compilers. They made Applesoft programs run faster but not as much as they could have, and with a lot of limitations. Cross-compiling on a modern platform solves some of the limitations. I wrote a cross-assembler in Perl for compiling 65C02 code on modern platforms (mainly Linux and MacOS since I don't use Windows). I've also played with Tom Porter's "Idiot Compiler" which can do some very limited compiling of Applesoft to true assembler. I also did Applesoft tokenizer/detokenizer programs in Perl a while back and I toyed with the idea of expanding those and copying some of Tom's ideas into a cross compiler for Applesoft. But I haven't had the ambition to go for it. I haven't had a chance to download and try out crustyBASIC yet but it sounds like it may possibly be a better solution in a lot of ways to what I had in mind.
Anyway, if you have any interest in the stuff I did, it is all out there on my GitHub. I haven't messed with any of it in a while.
Compiling is just translating from one program representation to another, but for there to be real advantages, the representation should make optimization possible. Which types of optimization are made possible is the criterion for good compiler design.
BASIC is difficult because it doesn't allow functions to be designed (apart from the USER hack). So many optimizations that would be possible for pure functions are not obviously possible.
Echoing what others have said, developing a compiler that is best for new code will lead to different decisions from one that is best for old code. From a quick look it appears crustyBASIC is intended for new code. With Applesoft in particular, there are some quirky behaviors that you would want to keep when compiling old code. One such thing is Applesoft parses itself inconsistently (in general) depending on whether it is the tokenizer, the interpreter, or input routines that are doing the parsing. When it comes to ampersands and even CALL statements one thing that is not uncommon is the machine code will be parsing subsequent BASIC code and changing the state of the program. The syntax could be anything, and the side effects could also be anything. So the compiler would need to keep some or all of the BASIC code in memory since the third party's parser is expecting it to be there. It is possible that the machine code jumps to another line before returning, messes with stacks, and so-on. Maybe by imposing some reasonable limits on what is supported it could be made tractable. But if the target is new code I wonder if it would be simpler for the compiler to provide its own capabilities in place of legacy libraries.
Hey, its Tom Porter.... i haven't looked at these forums in a LONG time. Random chance to see somebody talk about my software. Some notes about Idiot Compiler, and my thoughts how to proceed building a new version if I ever could get around to it...
First... an intersting way to abuse a compiler is to take Basic code and make a closed loop subroutine. Take basic that cant be compiled, and process that in the interpeter, then take the few variables, pass them into memory locations (In Idiot it actually gives you the memory locations of each variable, you could poke them into)... then CALL the adr of the subroutine that the compiled basic went to (Idiot also gives you that list...) ... which you can then process that portion lightning fast, then return to basic for the parts that it couldn't do.. you can do this many times with many routines. Believe it or not... you can and I actually have, done with with TASC as well, as long as the lines are a closed loop, and you know the address where to call. *(I've done a test of boucing Idiot Compiler and TASC against each other, although its not memory efficient).
Now onto how I would make a new compiler. I would take all the information/code/work from the old compiler, and just translate it into a binary, one subset of code for each routine, keeping track of its address. All the routines are clearly followable in 'the original basic source', then keep track of its address. Whenever you wanted to ADD a instruction, it would just copy/paste it from this binary, with only changing the LDA/LDX/LDY or JMP/JSR addresses/values.... this would eliminate all the inefficiencies of the compiler process... speed up compilation by most likely a factor of 20x (Idiot is disk based, just like TASC, although it does use a slot #3 ram drive to help some)...
I'd love to write a new compiler... it was years ago, i don't think I have it in me anymore though. The new things to tackle would be how to do strings, and possibly arrays... I'd say that be tough to do. I think TASC piggybacks off applesoft with Strings, and in the manual actual says dealing with Strings could be slower than Applesoft alone... they choose the easy way out it seems.
There are some good ideas there. Try doing a literature search for "abstract interpretation".
It's not a magic bullet, but it can get some of these parser/code interactions confined to the code paths where they affect execution.
thanks for the responses!
@nucade - definitely can add that. Right now the resolutions supported are the "portable" defaults for hi/low res, but I'm adding the ability to access all of the video modes from crustyBASIC listings and have things initialize internally (like the different APIs, etc). I don't know much about the Mockingboard, but I will learn. Inline ASM is a first class citizen in crustyBASIC. If you are looking for a modern BASIC, the syntax is pretty clean and has a lot of nice features. "Coding comfortably on a PC and compiling it for the 8 bit. No line numbers." That is crustyBASIC in a nutshell, just add that it is portable (mac/win/linux) and (hopefully) is faster than the older basics. I've spent a lot of time on speed/size perf so it delivers there too.
I'm not sure if it's clear, but you can write in Applesoft or Integer BASIC and compile it, or you can use the crustyBASIC syntax which gives access to a lot of APIs and features not in the older dialects. I have a bunch of larger Applesoft and Integer BASIC programs compiling with no changes (Lemonade Stand, Taipan, Little Brickout).
And yeah, this is quite a bit of undertaking. I'm an expert on none of these systems so I'm learning a lot and leaning on LLMs as a peer to some degree. I'm having a blast and making a compiler to use in retirement!
If you do end up trying it out, would love any feedback or bug reports.
I just released 1.2.0 which doesn't have a ton of Apple changes, but a lot of core language niceties, and I did add page flipping for the A2's soft sprite API. It's much faster now.