Part 58: Fixing Pointer Increments/Decrements
In the last part of our compiler writing journey, I mentioned that there was a problem with pointer increments and decrements. Let’s see what the problem is and how I fixed it.
We saw with the AST operations A_ADD, A_SUBTRACT, A_ASPLUS and A_ASMINUS
where one operand is a pointer and the other is an integer type, we need
to scale the integer value by the size of the type that the pointer
points at. In modify_type() in types.c:
// We can scale only on add and subtract operations
if (op == A_ADD || op == A_SUBTRACT ||
op == A_ASPLUS || op == A_ASMINUS) {
// Left is int type, right is pointer type and the size
// of the original type is >1: scale the left
if (inttype(ltype) && ptrtype(rtype)) {
rsize = genprimsize(value_at(rtype));
if (rsize > 1)
return (mkastunary(A_SCALE, rtype, rctype, tree, NULL, rsize));
else
return (tree); // Size 1, no need to scale
}
}But this scaling doesn’t occur when we use ++ or --, either as
preincrement/decrement or postincrement/decrement operators. Here,
we simply strap an A_PREINC, A_PREDEC, A_POSTINC or A_POSTDEC AST node
to the AST tree that we are operating on, and then leave it to the code generator
to deal with the situation.
Up to now, this got resolved when we call either cgloadglob() or
cgloadlocal() in cg.c to load the value of a global or local variable.
For example:
int cgloadglob(struct symtable *sym, int op) {
...
if (cgprimsize(sym->type) == 8) {
if (op == A_PREINC)
fprintf(Outfile, "\tincq\t%s(rip), %s\n", sym->name, reglist[r]);
if (op == A_POSTINC)
fprintf(Outfile, "\tincq\t%s(rbp), %s\n", sym->st_posn, reglist[r]);
else
fprintf(Outfile, "\tleaq\t%s(rbp), %s\n", sym->st_posn, reglist[r]); break;
case 4: fprintf(Outfile, "\tmovslq\t%d(rbp), %s\n", sym->st_posn, reglist[r]);
}
} else {
switch (sym->size) {
case 1: fprintf(Outfile, "\tmovzbq\t%s(rip), %s\n", sym->name, reglist[r]); break;
case 8: fprintf(Outfile, "\tmovq\t%s(rax\n", reglist[r1]);
fprintf(Outfile, "\tcqo\n");
fprintf(Outfile, "\tidivq\t%s\n", reglist[r2]);
if (op== A_DIVIDE)
fprintf(Outfile, "\tmovq\trdx,%s\n", reglist[r1]);
free_register(r2);
return (r1);
}The tests/input147.c confirms that the above changes work:
#include <stdio.h>
int a;
int main() {
printf("%d\n", 24 % 9);
printf("%d\n", 31 % 11);
a= 24; a %= 9; printf("%d\n",a);
a= 31; a %= 11; printf("%d\n",a);
return(0);
}Why Doesn’t It Link
We are now at the point where our compiler can parse each and every of its
own source code files. But when I try to link them, I get a warning about
missing L0 labels.
After a bit of investigation, it turns out that I wasn’t properly
propagating the end label for loops and switches in genIF() in gen.c.
The fix is on line 49:
// Generate the code for an IF statement
// and an optional ELSE clause.
static int genIF(struct ASTnode *n, int looptoplabel, int loopendlabel) {
...
// Optional ELSE clause: generate the
// false compound statement and the
// end label
if (n->right) {
genAST(n->right, NOLABEL, NOLABEL, loopendlabel, n->op);
genfreeregs(NOREG);
cglabel(Lend);
}
...
}Now that loopendlabel is being propagated, I can do this (in a shell
script I call memake):
#!/bin/sh
make install
rm *.s *.o
for i in cg.c decl.c expr.c gen.c main.c misc.c \
opt.c scan.c stmt.c sym.c tree.c types.c
do echo "./cwj -c $i"; ./cwj -c $i ; ./cwj -S $i
done
cc -o cwj0 cg.o decl.o expr.o gen.o main.o misc.o \
opt.o scan.o stmt.o sym.o tree.o types.o
We end up with a binary, cwj0, which is the result of the compiler
compiling itself.
$ size cwj0
text data bss dec hex filename
106540 3008 48 109596 1ac1c cwj0
$ file cwj0
cwj0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically
linked, interpreter /lib64/l, for GNU/Linux 3.2.0, not stripped
Conclusion and What’s Next
For the pointer increment problem, I definitely had to scratch my head quite
a lot and look at several possible alternate solutions. I did get halfway
through trying to build a new AST tree with an A_SCALE in it. Then I tossed it
all away and went for the change in cgloadvar(). That’s much nicer.
The modulo operators were simple to add (in theory), but annoyingly difficult to get everything synchronised (in practice). There is probably some scope to refactor here to make the synchronisation much easier.
Then, while trying to link all the object files that our compiler had made from its own source code, I found that we were not propagating loop/switch end labels properly.
We’ve now reached the point where our compiler can parse every one of its source code files, generate assembly code for them, and we can link them. We have reached the final stage of our journey, one that is probably going to be the most painful, the WDIW stage: why doesn’t it work?
Here, we don’t have a debugger, we are going to have to look at lots of assembly output. We’ll have to single-step assembly and look at register values.
In the next part of our compiler writing journey, I will start on the WDIW stage. We are going to need some strategies to make our work effective. Next step