mirror of
https://github.com/DarkPlacesEngine/gmqcc.git
synced 2024-12-18 00:11:06 +00:00
239 lines
9.9 KiB
Text
239 lines
9.9 KiB
Text
This is a work in progress Quake C compiler. There are very few good QC
|
|
compilers out there on the internet that can be used in the opensource
|
|
community. There are a lot of mediocre compilers, but no one wants those.
|
|
This is the solution for that, for once a proper Quake C compiler that is
|
|
capable of doing proper optimization.
|
|
|
|
The compiler is intended to implement modern day compiler design princibles
|
|
and support modifications through extensions that are provided for the
|
|
user through a low-level syntax specific-language inside the language itself
|
|
to implement language functionality.
|
|
|
|
The design goals of the compiler are very large, it's intended the compiler
|
|
supports a multitude of things, these things along with the status of
|
|
completeness is represented below in a table.
|
|
|
|
+-------------------+-----------------------------+------------------+
|
|
| Feature | What's it for? | Complete Factor |
|
|
+-------------------+-----------------------------+------------------+
|
|
. Lexical analysis . Tokenization . 90% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. Tokenization . Parsing . 90% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. Parsing / SYA . AST Generation . 09% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. AST Generation . IR Generation . ??% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. IR Generation . Code Generation . ??% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. Code Generation . Binary Generation . ??% .
|
|
.-------------------.-----------------------------.------------------.
|
|
. Binary Generation . Binary . 100% .
|
|
+-------------------+-----------------------------+------------------+
|
|
|
|
Design tree:
|
|
The compiler is intended to work in the following order:
|
|
Lexical analysis ->
|
|
Tokenization ->
|
|
Parsing:
|
|
Operator precedence:
|
|
Shynting yard algorithm
|
|
Inline assembly:
|
|
Usage of the assembler subsystem:
|
|
top-down parsing and assemblation no optimization
|
|
Other parsing:
|
|
recrusive decent
|
|
->
|
|
Abstract syntax tree generation ->
|
|
Immediate representation (SSA):
|
|
Optimizations:
|
|
Constant propagation
|
|
Value range propogation
|
|
Sparse conditional constant propagation (possibly?)
|
|
Dead code elimination
|
|
Constant folding
|
|
Global value numbering
|
|
Partial redundancy elimination
|
|
Strength reduction
|
|
Common subexpression elimination
|
|
Peephole optimizations
|
|
Loop-invariant code motion
|
|
Inline expansion
|
|
Constant folding
|
|
Induction variable recognition and elimination
|
|
Dead store elimination
|
|
Jump threading
|
|
->
|
|
Code Generation:
|
|
Optimizations:
|
|
Rematerialization
|
|
Code Factoring
|
|
Recrusion Elimination
|
|
Loop unrolling
|
|
Deforestation
|
|
->
|
|
Binary Generation
|
|
|
|
File tree and explination:
|
|
gmqcc.h
|
|
This is the common header with all definitions, structures, and
|
|
constants for everything.
|
|
|
|
error.c
|
|
This is the error subsystem, this handles the output of good detailed
|
|
error messages (not currently, but will), with colors and such.
|
|
|
|
lex.c
|
|
This is the lexer, a very small basic step-seek lexer that can be easily
|
|
changed to add new tokens, very retargetable.
|
|
|
|
main.c
|
|
This is the core compiler entry, handles switches (will) to toggle on
|
|
and off certian compiler features.
|
|
|
|
parse.c
|
|
This is the parser which goes over all tokens and generates a parse tree
|
|
and check for syntax correctness.
|
|
|
|
typedef.c
|
|
This is the typedef system, this is a seperate file because it's a lot more
|
|
complicated than it sounds. This handles all typedefs, and even recrusive
|
|
typedefs.
|
|
|
|
util.c
|
|
These are utilities for the compiler, some things in here include a
|
|
allocator used for debugging, and some string functions.
|
|
|
|
assembler.c
|
|
This implements support for assembling Quake assembler (which doesn't
|
|
actually exist untill now: documentation of the Quake assembler is below.
|
|
This also implements (will) inline assembly for the C compiler.
|
|
|
|
README
|
|
This is the file you're currently reading
|
|
|
|
Makefile
|
|
The makefile, when sources are added you should add them to the SRC=
|
|
line otherwise the build will not pick it up. Trivial stuff, small
|
|
easy to manage makefile, no need to complicate it.
|
|
Some targets:
|
|
#make gmqcc
|
|
Builds gmqcc, creating a `gmqcc` binary file in the current
|
|
directory as the makefile.
|
|
#make test
|
|
Builds the ir and ast tests, creating a `test_ir` and `test_ast`
|
|
binary file in the current directory as the makefile.
|
|
#make test_ir
|
|
Builds the ir test, creating a `test_ir` binary file in the
|
|
current directory as the makefile.
|
|
#make test_ast
|
|
Builds the asr test, creating a `test_ast` binary file in the
|
|
current directory as the makefile.
|
|
#make clean
|
|
Cleans the build files left behind by a previous build, as
|
|
well as all the binary files.
|
|
#make all
|
|
Builds the tests and the compiler binary all in the current
|
|
directory of the makefile.
|
|
|
|
////////////////////////////////////////////////////////////////////////
|
|
///////////////////// Quake Assembler Documentation ////////////////////
|
|
////////////////////////////////////////////////////////////////////////
|
|
Quake assembler is quite simple: it's just an annotated version of the binary
|
|
produced by any existing QuakeC compiler, but made cleaner to use, (so that
|
|
the location of various globals or strings are not required to be known).
|
|
|
|
Constants:
|
|
Using one of the following valid constant typenames, you can declare
|
|
a constant {FLOAT,VECTOR,FUNCTION,FIELD,ENTITY}, all typenames are
|
|
proceeded by a colon, and the name (white space doesn't matter).
|
|
|
|
Examples:
|
|
FLOAT: foo 1
|
|
VECTOR: bar 1 2 1
|
|
STRING: hello "hello world"
|
|
|
|
Comments:
|
|
Commenting assembly requires the use of either # or ; on the line
|
|
that you'd like to be ignored by the assembler. You can only comment
|
|
blank lines, and not lines assembly already exists on.
|
|
|
|
Examples:
|
|
; this is allowed
|
|
# as is this
|
|
FLOAT: foo 1 ; this is not allowed
|
|
FLOAT: bar 2 # neither is this
|
|
|
|
Functions:
|
|
Creating functions is the same as declaring a constant, simply use
|
|
FUNCTION followed by a colon, and the name (white space doesn't matter)
|
|
and start the statements for that function on the line after it
|
|
|
|
Examples:
|
|
FLOAT: foo 1
|
|
FLOAT: bar 2
|
|
FUNCTION: test1
|
|
ADD foo, bar, OFS_RETURN
|
|
RETURN
|
|
|
|
FUNCTION: test2
|
|
CALL0 test1
|
|
DONE
|
|
|
|
Internal:
|
|
The Quake engine provides some internal functions such as print, to
|
|
access these you first must declare them and their names. To do this
|
|
you create a FUNCTION as you currently do. Adding a $ followed by the
|
|
number of the engine builtin (negated).
|
|
|
|
Examples:
|
|
FUNCTION: print $4
|
|
FUNCTION: error $3
|
|
|
|
Misc:
|
|
There are some rules as to what your identifiers can be for functions
|
|
and constants. All indentifiers mustn't begin with a numeric digit,
|
|
identifiers cannot include spaces, or tabs; they cannot contain symbols,
|
|
and they cannot exceed 32768 characters. Identifiers cannot be all
|
|
capitalized either, as all capatilized identifiers are reserved by the
|
|
assembler.
|
|
|
|
Numeric constants cannot contain special notation such as `1-e10`, all
|
|
numeric constants have to be numeric, they can contain decmial points
|
|
and signs (+, -) however.
|
|
|
|
Constants cannot be assigned values of other constants, their value must
|
|
be fully expressed inspot of the declartion.
|
|
|
|
No two identifiers can be the same name, this applies for variables allocated
|
|
inside a function scope (despite it being considered local).
|
|
|
|
There exists one other keyword that is considered sugar, and that
|
|
is AUTHOR, this keyword will allow you to speciy the AUTHOR(S) of
|
|
the assembly being assembled. The string represented for each usage
|
|
of AUTHOR is wrote to the end of the string table. Simaler to the
|
|
usage of constants and functions the AUTHOR keyword must be proceeded
|
|
by a colon.
|
|
|
|
Examples:
|
|
AUTHOR: "Dale Weiler"
|
|
AUTHOR: "Wolfgang Bumiller"
|
|
|
|
Colons exist for the sole reason of not having to use spaces after
|
|
keyword usage (however spaces are allowed). To understand the
|
|
following examples below are equivlent.
|
|
|
|
Example 1:
|
|
FLOAT:foo 1
|
|
Example 2:
|
|
FLOAT: foo 1
|
|
Example 3:
|
|
FLOAT: foo 2
|
|
|
|
variable amounts of whitespace is allowed anywhere (as it should be).
|
|
think of `:` as a delimiter (which is what it's used for during assembly).
|
|
|
|
////////////////////////////////////////////////////////////////////////
|
|
/////////////////////// Quake C Documentation //////////////////////////
|
|
////////////////////////////////////////////////////////////////////////
|
|
TODO ....
|