LLVM is a compiler framework. Like GCC, there are frontends for different parsers. (C and C++ use the Clang frontend while GCC has its own frontends such as G++ for C++ and G77 for Fortran.) There is a common optimizer stage for all supported languages. A backend supports different kinds of processors. (x86, AMD64, PPC and ARM are all supported by both GCC and LLVM.) In order to get the Clang C/C++ compiler and other LLVM-based compilers to work with AROS 68k we'll need an LLVM backend for the 68k processors.
Also, there are a few advantages with the LLVM framework over GCC's framework: LLVM has a JIT compilation mode so it can be used to create a Java virtual machine while GCJ doesn't support JIT compilation; LLVM's JIT is also used by the Mesa shader emulation for graphics cards that don't support shaders; LLVM is written entirely in C++ and so it is easier to modify; and lastly, LLVM generates code more quickly than GCC. The disadvantages are that LLVM's code generator is slightly less efficient than the latest GCC is, and the 68k backend already exists for GCC.
The reason AROS should use LLVM is that GCC's developers oftentimes don't want to "waste" their time working on the 68k backend when they can be working on more modern processors so they reject patches that would make the backend better. LLVM has a bitcode file format for its internal machine language representation so it can run code from the internal representation once the endianness and memory alignment issues are resolved.
If by "debug output" you mean error messages, then yes. Clang has much more descriptive error outputs, often tracing the error down to the individual lexemes on the line where the error occurred. If you mean "debug output" as a debugging application, the LLDB debugger only works on Macintoshes at this point and maybe Linux. It's still in an early stage of development. On other systems GDB is still the only debugger for LLVM output.
As an added bonus, LLVM is being designed as a set of libraries so it will integrate well with an IDE. For example, Clang is capable of sharing token information with the host IDE for syntax highlighting.
Regarding the optimization technique they use, some of the LLVM team are not allowed to look at GPL 3 code. This makes trying to update LLVM to match GCC capabilities difficult.
As for making it work on 68k GCC, it probably already does that kind of optimization. Any optimizations done by GCC 4.5.x is usually done on the GIMPLE intermediate representation so the only problem is the backend not producing as finely tuned code during the final code generation stages.
LLVM would make a good addition to the AROS compiler toolbox. My dream is to have a system where programs could for example migrate from running on your x86 desktop to your ARM based smartphone, to your PPC A1/Pegasos.
LLVM's TableGen utility so modifying the x86 backend could prove difficult. Especially since they use some of the same patterns for AMD64 backend as they do for x86.
The way I would propose doing this in x86 would be to have a lib-call custom calling convention that requires the appropriate lib-base be passed to the function in %EBX and then index into the base index with a fixed offset to the jump table using whatever other registers in use for passing parameters. This way the register spilling will be automatically handled by the register allocator for %EBX in such a way that the previous value of %EBX can be passed in another register if the register allocator can spare a register to do so.
The remaining calling conventions supported. Potential problems include:
- LLVM intermediate representation supports multiple return codes unlike C which will require special attention to the stack
- Interrupts will not be able to rely on %EBX being reserved for system use because a computationally intense set of formulae may cause the register allocator to spill %EBX to the stack and
retrieve it later
- If we don't allow %EBX to spill to the stack, we'll have some additional work to do on the x86 backend to reserve %EBX from register selection.
It currently uses 3 calling conventions internally and allows for several system-specific ones.
As %ebx is a stack pointer you always have to restore when leaving a function or when calling an external function in a library. In the branch this is done for the fake cross compiler by adding the -ffixed-ebx option to gcc.
In addition to those calling conventions, system specific conventions are allowed. For this we'll need a library calling convention in order to make the libraries' base pointers get loaded in time for use. Same goes for the pure reentrant base-relative calling convention.
Currently %ebx is just an extra stack pointer going in the opposite direction as the normal stack pointer on i386. Most of the time it is used to store are base pointer. I implemented an extra stack pointer to not have to add the base pointer to every function call and not have to fiddle with the normal stack in the stub code for calling the library functions. This way you can push the base pointer in the stub on this stack for a function in the library without that the source code of the function has to start with a special AROS_LH macro. I use this for the adapted arosc.library so that it doesn't need ETask anymore.
Also, I'll need to know how this will affect the x86_64 version of AROS. AMD64 ( ;) ) ABI still need to be discussed but is there a reason not to follow i386 and use %ebx in the same way ? I'm not very familiar with AMD64 assembly.
Looks like someone's made a start at a M68K LLVM port, in Czechoslovakia. To see if you can get their LLVM code changes to me? I would like to support LLVM M68K, and this would make my job much easier. Student Vlamir and Professor.
I was wondering how hard it would be to incorporate an MMU-based 32-bit sandbox mode into AMD64 AROS. It seems to be how PNaCl is implementing their 32-bit LLVM-based sandbox in their Chromium browser plugin for portable apps. The only other option would be to have to generate 2 bitcodes of every app, one for the 32-bit machines and one for the 64-bit machines. The LLVM Wrapper I've got started on Sourceforge.net will only currently work on 32-bit apps and I've only tested it for going from one x86 OS to the next.