An Improved Quake C Compiler
Find a file
2012-06-07 10:59:57 -04:00
data util_strncmpexact 2012-05-04 22:01:02 -04:00
test TYPE_VARIANT in codegen, writing code from ast-test 2012-05-09 17:30:08 +02:00
.gitattributes gitattributes for whitespace 2012-04-28 16:40:03 -04:00
asm.c Preparing to generate code from the IR - code_write should return a bool, and take a filename rather than use program.dat hardcoded 2012-05-09 15:03:19 +02:00
ast.c Fixed some dangling '}', it compiles again now 2012-05-04 12:26:24 +02:00
ast.h ast_function gets 'breakblock' and 'continueblock' for break and continue support; fixed some typos; added huge ast_loop_codegen implementation... need to go through it and check 2012-05-04 00:16:51 +02:00
AUTHORS Test write access by adding an AUTHORS file 2012-04-24 18:47:09 +02:00
code.c fixing fwrite calls I had messed up 2012-05-09 18:24:13 +02:00
error.c Remove trailing whitespace from everything 2012-04-28 16:43:39 -04:00
gmqcc.h s/\t/ /g 2012-06-07 10:20:54 -04:00
ir.c Print the name, not the struct pointer :S 2012-05-13 14:06:50 +02:00
ir.h TYPE_VARIANT in codegen, writing code from ast-test 2012-05-09 17:30:08 +02:00
lex.c More compile warnings (disabled many for now, they will be re-enabled one-by-one as code that triggers warnings is rectified) 2012-06-07 10:57:48 -04:00
main.c More compile warnings (disabled many for now, they will be re-enabled one-by-one as code that triggers warnings is rectified) 2012-06-07 10:57:48 -04:00
Makefile fixed shadow issue 2012-06-07 10:59:57 -04:00
parse.c Remove trailing whitespace 2012-04-28 19:03:16 -04:00
propsal.txt new progs format proposal for engine developers (45% of globals are 0, why write them, let the engine populate them. We can essentially save 9884 bytes in xonotic's progs.dat with this new format.) 2012-04-24 08:19:48 -04:00
README readme s/\t/ /g 2012-05-06 17:00:16 -04:00
typedef.c fixed shadow issue 2012-06-07 10:59:57 -04:00
util.c Updated readme 2012-05-06 16:58:30 -04:00

This is a work in progress Quake C compiler. There are very few good QC
compilers out there on the internet that can be used in the opensource
community.  There are a lot of mediocre compilers, but no one wants those.
This is the solution for that, for once a proper Quake C compiler that is
capable of doing proper optimization.

The compiler is intended to implement modern day compiler design princibles
and support modifications through extensions that are provided for the
user through a low-level syntax specific-language inside the language itself
to implement language functionality.

The design goals of the compiler are very large, it's intended the compiler
supports a multitude of things, these things along with the status of
completeness is represented below in a table.

+-------------------+-----------------------------+------------------+
|     Feature       |  What's it for?             | Complete Factor  |
+-------------------+-----------------------------+------------------+
. Lexical analysis  .  Tokenization               .       90%        .
.-------------------.-----------------------------.------------------.
. Tokenization      .  Parsing                    .       90%        .
.-------------------.-----------------------------.------------------.
. Parsing / SYA     .  AST Generation             .       09%        .
.-------------------.-----------------------------.------------------.
. AST Generation    .  IR  Generation             .       ??%        .
.-------------------.-----------------------------.------------------.
. IR  Generation    .  Code Generation            .       ??%        .
.-------------------.-----------------------------.------------------.
. Code Generation   .  Binary Generation          .       ??%        .
.-------------------.-----------------------------.------------------.
. Binary Generation .  Binary                     .      100%        .
+-------------------+-----------------------------+------------------+

Design tree:
   The compiler is intended to work in the following order:
      Lexical analysis ->
         Tokenization ->
            Parsing:
               Operator precedence:
                  Shynting yard algorithm
               Inline assembly:
                   Usage of the assembler subsystem:
                      top-down parsing and assemblation no optimization
               Other parsing:
                  recrusive decent
            ->
               Abstract syntax tree generation ->
                  Immediate representation (SSA):
                     Optimizations:
                        Constant propagation
                        Value range propogation
                        Sparse conditional constant propagation (possibly?)
                           Dead code elimination
                           Constant folding
                        Global value numbering
                        Partial redundancy elimination
                        Strength reduction
                        Common subexpression elimination
                        Peephole optimizations
                        Loop-invariant code motion
                        Inline expansion
                        Constant folding
                        Induction variable recognition and elimination
                        Dead store elimination
                        Jump threading
                  ->
                     Code Generation:
                        Optimizations:
                           Rematerialization
                           Code Factoring
                           Recrusion Elimination
                           Loop unrolling
                           Deforestation
                     ->
                        Binary Generation

File tree and explination:
   gmqcc.h
      This is the common header with all definitions, structures, and
      constants for everything.

   error.c
      This is the error subsystem, this handles the output of good detailed
      error messages (not currently, but will), with colors and such.
   
   lex.c
      This is the lexer, a very small basic step-seek lexer that can be easily
      changed to add new tokens, very retargetable.
      
   main.c
      This is the core compiler entry, handles switches (will) to toggle on
      and off certian compiler features.
      
   parse.c
      This is the parser which goes over all tokens and generates a parse tree
      and check for syntax correctness.
      
   typedef.c
      This is the typedef system, this is a seperate file because it's a lot more
      complicated than it sounds.  This handles all typedefs, and even recrusive
      typedefs.
      
   util.c
      These are utilities for the compiler, some things in here include a
      allocator used for debugging, and some string functions.
      
   assembler.c
      This implements support for assembling Quake assembler (which doesn't
      actually exist untill now: documentation of the Quake assembler is below.
      This also implements (will) inline assembly for the C compiler.
      
   README
      This is the file you're currently reading
      
   Makefile
      The makefile, when sources are added you should add them to the SRC=
      line otherwise the build will not pick it up.  Trivial stuff, small
      easy to manage makefile, no need to complicate it.
      Some targets:
         #make gmqcc
            Builds gmqcc, creating a `gmqcc` binary file in the current
            directory as the makefile.
         #make test
            Builds the ir and ast tests, creating a `test_ir` and `test_ast`
            binary file in the current directory as the makefile.
         #make test_ir
            Builds the ir test, creating a `test_ir` binary file in the
            current directory as the makefile.
         #make test_ast
            Builds the asr test, creating a `test_ast` binary file in the
            current directory as the makefile.
         #make clean
            Cleans the build files left behind by a previous build, as
            well as all the binary files.
         #make all
            Builds the tests and the compiler binary all in the current
            directory of the makefile.

////////////////////////////////////////////////////////////////////////
///////////////////// Quake Assembler Documentation ////////////////////
////////////////////////////////////////////////////////////////////////
Quake assembler is quite simple: it's just an annotated version of the binary
produced by any existing QuakeC compiler, but made cleaner to use, (so that
the location of various globals or strings are not required to be known).

Constants:
   Using one of the following valid constant typenames, you can declare
   a constant {FLOAT,VECTOR,FUNCTION,FIELD,ENTITY}, all typenames are
   proceeded by a colon, and the name (white space doesn't matter).
   
   Examples:
      FLOAT: foo 1
      VECTOR: bar 1 2 1
      STRING: hello "hello world"
      
Comments:
   Commenting assembly requires the use of either # or ; on the line
   that you'd like to be ignored by the assembler. You can only comment
   blank lines, and not lines assembly already exists on.
   
   Examples:
      ; this is allowed
      # as is this
      FLOAT: foo 1 ; this is not allowed
      FLOAT: bar 2 # neither is this
   
Functions:
   Creating functions is the same as declaring a constant, simply use
   FUNCTION followed by a colon, and the name (white space doesn't matter)
   and start the statements for that function on the line after it
   
   Examples:
      FLOAT: foo 1
      FLOAT: bar 2
      FUNCTION: test1
         ADD foo, bar, OFS_RETURN
         RETURN
         
      FUNCTION: test2
         CALL0 test1
         DONE
         
Internal:
   The Quake engine provides some internal functions such as print, to
   access these you first must declare them and their names. To do this
   you create a FUNCTION as you currently do. Adding a $ followed by the
   number of the engine builtin (negated).
   
   Examples:
      FUNCTION: print $4
      FUNCTION: error $3

Misc:
   There are some rules as to what your identifiers can be for functions
   and constants.  All indentifiers mustn't begin with a numeric digit,
   identifiers cannot include spaces, or tabs; they cannot contain symbols,
   and they cannot exceed 32768 characters. Identifiers cannot be all 
   capitalized either, as all capatilized identifiers are reserved by the
   assembler.
   
   Numeric constants cannot contain special notation such as `1-e10`, all
   numeric constants have to be numeric, they can contain decmial points
   and signs (+, -) however.
   
   Constants cannot be assigned values of other constants, their value must
   be fully expressed inspot of the declartion.
   
   No two identifiers can be the same name, this applies for variables allocated
   inside a function scope (despite it being considered local).
   
   There exists one other keyword that is considered sugar, and that
   is AUTHOR, this keyword will allow you to speciy the AUTHOR(S) of
   the assembly being assembled. The string represented for each usage
   of AUTHOR is wrote to the end of the string table. Simaler to the
   usage of constants and functions the AUTHOR keyword must be proceeded
   by a colon.
   
   Examples:
      AUTHOR: "Dale Weiler"
      AUTHOR: "Wolfgang Bumiller"
      
   Colons exist for the sole reason of not having to use spaces after
   keyword usage (however spaces are allowed).  To understand the
   following examples below are equivlent.
   
   Example 1:
      FLOAT:foo 1
   Example 2:
      FLOAT: foo 1
   Example 3:
      FLOAT:  foo 2
      
   variable amounts of whitespace is allowed anywhere (as it should be).
   think of `:` as a delimiter (which is what it's used for during assembly).
   
////////////////////////////////////////////////////////////////////////
/////////////////////// Quake C Documentation //////////////////////////
////////////////////////////////////////////////////////////////////////
TODO ....