gmqcc/README
2012-05-06 17:00:16 -04:00

239 lines
9.9 KiB
Text

This is a work in progress Quake C compiler. There are very few good QC
compilers out there on the internet that can be used in the opensource
community. There are a lot of mediocre compilers, but no one wants those.
This is the solution for that, for once a proper Quake C compiler that is
capable of doing proper optimization.
The compiler is intended to implement modern day compiler design princibles
and support modifications through extensions that are provided for the
user through a low-level syntax specific-language inside the language itself
to implement language functionality.
The design goals of the compiler are very large, it's intended the compiler
supports a multitude of things, these things along with the status of
completeness is represented below in a table.
+-------------------+-----------------------------+------------------+
| Feature | What's it for? | Complete Factor |
+-------------------+-----------------------------+------------------+
. Lexical analysis . Tokenization . 90% .
.-------------------.-----------------------------.------------------.
. Tokenization . Parsing . 90% .
.-------------------.-----------------------------.------------------.
. Parsing / SYA . AST Generation . 09% .
.-------------------.-----------------------------.------------------.
. AST Generation . IR Generation . ??% .
.-------------------.-----------------------------.------------------.
. IR Generation . Code Generation . ??% .
.-------------------.-----------------------------.------------------.
. Code Generation . Binary Generation . ??% .
.-------------------.-----------------------------.------------------.
. Binary Generation . Binary . 100% .
+-------------------+-----------------------------+------------------+
Design tree:
The compiler is intended to work in the following order:
Lexical analysis ->
Tokenization ->
Parsing:
Operator precedence:
Shynting yard algorithm
Inline assembly:
Usage of the assembler subsystem:
top-down parsing and assemblation no optimization
Other parsing:
recrusive decent
->
Abstract syntax tree generation ->
Immediate representation (SSA):
Optimizations:
Constant propagation
Value range propogation
Sparse conditional constant propagation (possibly?)
Dead code elimination
Constant folding
Global value numbering
Partial redundancy elimination
Strength reduction
Common subexpression elimination
Peephole optimizations
Loop-invariant code motion
Inline expansion
Constant folding
Induction variable recognition and elimination
Dead store elimination
Jump threading
->
Code Generation:
Optimizations:
Rematerialization
Code Factoring
Recrusion Elimination
Loop unrolling
Deforestation
->
Binary Generation
File tree and explination:
gmqcc.h
This is the common header with all definitions, structures, and
constants for everything.
error.c
This is the error subsystem, this handles the output of good detailed
error messages (not currently, but will), with colors and such.
lex.c
This is the lexer, a very small basic step-seek lexer that can be easily
changed to add new tokens, very retargetable.
main.c
This is the core compiler entry, handles switches (will) to toggle on
and off certian compiler features.
parse.c
This is the parser which goes over all tokens and generates a parse tree
and check for syntax correctness.
typedef.c
This is the typedef system, this is a seperate file because it's a lot more
complicated than it sounds. This handles all typedefs, and even recrusive
typedefs.
util.c
These are utilities for the compiler, some things in here include a
allocator used for debugging, and some string functions.
assembler.c
This implements support for assembling Quake assembler (which doesn't
actually exist untill now: documentation of the Quake assembler is below.
This also implements (will) inline assembly for the C compiler.
README
This is the file you're currently reading
Makefile
The makefile, when sources are added you should add them to the SRC=
line otherwise the build will not pick it up. Trivial stuff, small
easy to manage makefile, no need to complicate it.
Some targets:
#make gmqcc
Builds gmqcc, creating a `gmqcc` binary file in the current
directory as the makefile.
#make test
Builds the ir and ast tests, creating a `test_ir` and `test_ast`
binary file in the current directory as the makefile.
#make test_ir
Builds the ir test, creating a `test_ir` binary file in the
current directory as the makefile.
#make test_ast
Builds the asr test, creating a `test_ast` binary file in the
current directory as the makefile.
#make clean
Cleans the build files left behind by a previous build, as
well as all the binary files.
#make all
Builds the tests and the compiler binary all in the current
directory of the makefile.
////////////////////////////////////////////////////////////////////////
///////////////////// Quake Assembler Documentation ////////////////////
////////////////////////////////////////////////////////////////////////
Quake assembler is quite simple: it's just an annotated version of the binary
produced by any existing QuakeC compiler, but made cleaner to use, (so that
the location of various globals or strings are not required to be known).
Constants:
Using one of the following valid constant typenames, you can declare
a constant {FLOAT,VECTOR,FUNCTION,FIELD,ENTITY}, all typenames are
proceeded by a colon, and the name (white space doesn't matter).
Examples:
FLOAT: foo 1
VECTOR: bar 1 2 1
STRING: hello "hello world"
Comments:
Commenting assembly requires the use of either # or ; on the line
that you'd like to be ignored by the assembler. You can only comment
blank lines, and not lines assembly already exists on.
Examples:
; this is allowed
# as is this
FLOAT: foo 1 ; this is not allowed
FLOAT: bar 2 # neither is this
Functions:
Creating functions is the same as declaring a constant, simply use
FUNCTION followed by a colon, and the name (white space doesn't matter)
and start the statements for that function on the line after it
Examples:
FLOAT: foo 1
FLOAT: bar 2
FUNCTION: test1
ADD foo, bar, OFS_RETURN
RETURN
FUNCTION: test2
CALL0 test1
DONE
Internal:
The Quake engine provides some internal functions such as print, to
access these you first must declare them and their names. To do this
you create a FUNCTION as you currently do. Adding a $ followed by the
number of the engine builtin (negated).
Examples:
FUNCTION: print $4
FUNCTION: error $3
Misc:
There are some rules as to what your identifiers can be for functions
and constants. All indentifiers mustn't begin with a numeric digit,
identifiers cannot include spaces, or tabs; they cannot contain symbols,
and they cannot exceed 32768 characters. Identifiers cannot be all
capitalized either, as all capatilized identifiers are reserved by the
assembler.
Numeric constants cannot contain special notation such as `1-e10`, all
numeric constants have to be numeric, they can contain decmial points
and signs (+, -) however.
Constants cannot be assigned values of other constants, their value must
be fully expressed inspot of the declartion.
No two identifiers can be the same name, this applies for variables allocated
inside a function scope (despite it being considered local).
There exists one other keyword that is considered sugar, and that
is AUTHOR, this keyword will allow you to speciy the AUTHOR(S) of
the assembly being assembled. The string represented for each usage
of AUTHOR is wrote to the end of the string table. Simaler to the
usage of constants and functions the AUTHOR keyword must be proceeded
by a colon.
Examples:
AUTHOR: "Dale Weiler"
AUTHOR: "Wolfgang Bumiller"
Colons exist for the sole reason of not having to use spaces after
keyword usage (however spaces are allowed). To understand the
following examples below are equivlent.
Example 1:
FLOAT:foo 1
Example 2:
FLOAT: foo 1
Example 3:
FLOAT: foo 2
variable amounts of whitespace is allowed anywhere (as it should be).
think of `:` as a delimiter (which is what it's used for during assembly).
////////////////////////////////////////////////////////////////////////
/////////////////////// Quake C Documentation //////////////////////////
////////////////////////////////////////////////////////////////////////
TODO ....