mirror of
https://github.com/DarkPlacesEngine/gmqcc.git
synced 2025-01-18 14:21:36 +00:00
An Improved Quake C Compiler
data | ||
doc | ||
test | ||
testsuite | ||
.gitattributes | ||
asm.c | ||
ast.c | ||
ast.h | ||
AUTHORS | ||
code.c | ||
error.c | ||
exec.c | ||
execloop.h | ||
flags.def | ||
gmqcc.h | ||
gmqcc.vcxproj | ||
ir.c | ||
ir.h | ||
lexer.c | ||
lexer.h | ||
main.c | ||
Makefile | ||
parser.c | ||
preprocess.c | ||
propsal.txt | ||
README | ||
util.c | ||
warns.def |
This is a work in progress Quake C compiler. There are very few good QC compilers out there on the internet that can be used in the opensource community. There are a lot of mediocre compilers, but no one wants those. This is the solution for that, for once a proper Quake C compiler that is capable of doing proper optimization. The compiler is intended to implement modern day compiler design princibles and support modifications through extensions that are provided for the user through a low-level syntax specific-language inside the language itself to implement language functionality. The design goals of the compiler are very large, it's intended the compiler supports a multitude of things, these things along with the status of completeness is represented below in a table. +-------------------+-----------------------------+------------------+ | Feature | What's it for? | Complete Factor | +-------------------+-----------------------------+------------------+ . Lexical analysis . Tokenization . 90% . .-------------------.-----------------------------.------------------. . Tokenization . Parsing . 90% . .-------------------.-----------------------------.------------------. . Parsing / SYA . AST Generation . 09% . .-------------------.-----------------------------.------------------. . AST Generation . IR Generation . ??% . .-------------------.-----------------------------.------------------. . IR Generation . Code Generation . ??% . .-------------------.-----------------------------.------------------. . Code Generation . Binary Generation . ??% . .-------------------.-----------------------------.------------------. . Binary Generation . Binary . 100% . +-------------------+-----------------------------+------------------+ Design tree: The compiler is intended to work in the following order: Lexical analysis -> Tokenization -> Parsing: Operator precedence: Shynting yard algorithm Inline assembly: Usage of the assembler subsystem: top-down parsing and assemblation no optimization Other parsing: recrusive decent -> Abstract syntax tree generation -> Immediate representation (SSA): Optimizations: Constant propagation Value range propogation Sparse conditional constant propagation (possibly?) Dead code elimination Constant folding Global value numbering Partial redundancy elimination Strength reduction Common subexpression elimination Peephole optimizations Loop-invariant code motion Inline expansion Constant folding Induction variable recognition and elimination Dead store elimination Jump threading -> Code Generation: Optimizations: Rematerialization Code Factoring Recrusion Elimination Loop unrolling Deforestation -> Binary Generation File tree and explination: gmqcc.h This is the common header with all definitions, structures, and constants for everything. error.c This is the error subsystem, this handles the output of good detailed error messages (not currently, but will), with colors and such. lex.c This is the lexer, a very small basic step-seek lexer that can be easily changed to add new tokens, very retargetable. main.c This is the core compiler entry, handles switches (will) to toggle on and off certian compiler features. parse.c This is the parser which goes over all tokens and generates a parse tree and check for syntax correctness. typedef.c This is the typedef system, this is a seperate file because it's a lot more complicated than it sounds. This handles all typedefs, and even recrusive typedefs. util.c These are utilities for the compiler, some things in here include a allocator used for debugging, and some string functions. assembler.c This implements support for assembling Quake assembler (which doesn't actually exist untill now: documentation of the Quake assembler is below. This also implements (will) inline assembly for the C compiler. README This is the file you're currently reading Makefile The makefile, when sources are added you should add them to the SRC= line otherwise the build will not pick it up. Trivial stuff, small easy to manage makefile, no need to complicate it. Some targets: #make gmqcc Builds gmqcc, creating a `gmqcc` binary file in the current directory as the makefile. #make test Builds the ir and ast tests, creating a `test_ir` and `test_ast` binary file in the current directory as the makefile. #make test_ir Builds the ir test, creating a `test_ir` binary file in the current directory as the makefile. #make test_ast Builds the asr test, creating a `test_ast` binary file in the current directory as the makefile. #make clean Cleans the build files left behind by a previous build, as well as all the binary files. #make all Builds the tests and the compiler binary all in the current directory of the makefile. //////////////////////////////////////////////////////////////////////// ///////////////////// Quake Assembler Documentation //////////////////// //////////////////////////////////////////////////////////////////////// Quake assembler is quite simple: it's just an annotated version of the binary produced by any existing QuakeC compiler, but made cleaner to use, (so that the location of various globals or strings are not required to be known). Constants: Using one of the following valid constant typenames, you can declare a constant {FLOAT,VECTOR,FUNCTION,FIELD,ENTITY}, all typenames are proceeded by a colon, and the name (white space doesn't matter). Examples: FLOAT: foo 1 VECTOR: bar 1 2 1 STRING: hello "hello world" Comments: Commenting assembly requires the use of either # or ; on the line that you'd like to be ignored by the assembler. You can only comment blank lines, and not lines assembly already exists on. Examples: ; this is allowed # as is this FLOAT: foo 1 ; this is not allowed FLOAT: bar 2 # neither is this Functions: Creating functions is the same as declaring a constant, simply use FUNCTION followed by a colon, and the name (white space doesn't matter) and start the statements for that function on the line after it Examples: FLOAT: foo 1 FLOAT: bar 2 FUNCTION: test1 ADD foo, bar, OFS_RETURN RETURN FUNCTION: test2 CALL0 test1 DONE Internal: The Quake engine provides some internal functions such as print, to access these you first must declare them and their names. To do this you create a FUNCTION as you currently do. Adding a $ followed by the number of the engine builtin (negated). Examples: FUNCTION: print $4 FUNCTION: error $3 Misc: There are some rules as to what your identifiers can be for functions and constants. All indentifiers mustn't begin with a numeric digit, identifiers cannot include spaces, or tabs; they cannot contain symbols, and they cannot exceed 32768 characters. Identifiers cannot be all capitalized either, as all capatilized identifiers are reserved by the assembler. Numeric constants cannot contain special notation such as `1-e10`, all numeric constants have to be numeric, they can contain decmial points and signs (+, -) however. Constants cannot be assigned values of other constants, their value must be fully expressed inspot of the declartion. No two identifiers can be the same name, this applies for variables allocated inside a function scope (despite it being considered local). There exists one other keyword that is considered sugar, and that is AUTHOR, this keyword will allow you to speciy the AUTHOR(S) of the assembly being assembled. The string represented for each usage of AUTHOR is wrote to the end of the string table. Simaler to the usage of constants and functions the AUTHOR keyword must be proceeded by a colon. Examples: AUTHOR: "Dale Weiler" AUTHOR: "Wolfgang Bumiller" Colons exist for the sole reason of not having to use spaces after keyword usage (however spaces are allowed). To understand the following examples below are equivlent. Example 1: FLOAT:foo 1 Example 2: FLOAT: foo 1 Example 3: FLOAT: foo 2 variable amounts of whitespace is allowed anywhere (as it should be). think of `:` as a delimiter (which is what it's used for during assembly). //////////////////////////////////////////////////////////////////////// /////////////////////// Quake C Documentation ////////////////////////// //////////////////////////////////////////////////////////////////////// TODO ....