Part 58: Fixing Pointer Increments/Decrements

In the last part of our compiler writing journey, I mentioned that there was a problem with pointer increments and decrements. Let’s see what the problem is and how I fixed it.

We saw with the AST operations A_ADD, A_SUBTRACT, A_ASPLUS and A_ASMINUS where one operand is a pointer and the other is an integer type, we need to scale the integer value by the size of the type that the pointer points at. In modify_type() in types.c:

  // We can scale only on add and subtract operations
  if (op == A_ADD || op == A_SUBTRACT ||
      op == A_ASPLUS || op == A_ASMINUS) {
 
    // Left is int type, right is pointer type and the size
    // of the original type is >1: scale the left
    if (inttype(ltype) && ptrtype(rtype)) {
      rsize = genprimsize(value_at(rtype));
      if (rsize > 1)
        return (mkastunary(A_SCALE, rtype, rctype, tree, NULL, rsize));
      else
        return (tree);          // Size 1, no need to scale
    }
  }

But this scaling doesn’t occur when we use ++ or --, either as preincrement/decrement or postincrement/decrement operators. Here, we simply strap an A_PREINC, A_PREDEC, A_POSTINC or A_POSTDEC AST node to the AST tree that we are operating on, and then leave it to the code generator to deal with the situation.

Up to now, this got resolved when we call either cgloadglob() or cgloadlocal() in cg.c to load the value of a global or local variable. For example:

int cgloadglob(struct symtable *sym, int op) {
  ...
  if (cgprimsize(sym->type) == 8) {
    if (op == A_PREINC)
      fprintf(Outfile, "\tincq\t%s(rip), %s\n", sym->name, reglist[r]);
 
    if (op == A_POSTINC)
      fprintf(Outfile, "\tincq\t%s(rbp), %s\n", sym->st_posn, reglist[r]);
    else
      fprintf(Outfile, "\tleaq\t%s(rbp), %s\n", sym->st_posn, reglist[r]); break;
      case 4: fprintf(Outfile, "\tmovslq\t%d(rbp), %s\n", sym->st_posn, reglist[r]);
    }
  } else {
    switch (sym->size) {
      case 1: fprintf(Outfile, "\tmovzbq\t%s(rip), %s\n", sym->name, reglist[r]); break;
      case 8: fprintf(Outfile, "\tmovq\t%s(rax\n", reglist[r1]);
  fprintf(Outfile, "\tcqo\n");
  fprintf(Outfile, "\tidivq\t%s\n", reglist[r2]);
  if (op== A_DIVIDE)
    fprintf(Outfile, "\tmovq\trdx,%s\n", reglist[r1]);
  free_register(r2);
  return (r1);
}

The tests/input147.c confirms that the above changes work:

#include <stdio.h>
 
int a;
 
int main() {
  printf("%d\n", 24 % 9);
  printf("%d\n", 31 % 11);
  a= 24; a %= 9; printf("%d\n",a);
  a= 31; a %= 11; printf("%d\n",a);
  return(0);
}

We are now at the point where our compiler can parse each and every of its own source code files. But when I try to link them, I get a warning about missing L0 labels.

After a bit of investigation, it turns out that I wasn’t properly propagating the end label for loops and switches in genIF() in gen.c. The fix is on line 49:

// Generate the code for an IF statement
// and an optional ELSE clause.
static int genIF(struct ASTnode *n, int looptoplabel, int loopendlabel) {
  ...
  // Optional ELSE clause: generate the
  // false compound statement and the
  // end label
  if (n->right) {
    genAST(n->right, NOLABEL, NOLABEL, loopendlabel, n->op);
    genfreeregs(NOREG);
    cglabel(Lend);
  }
  ...
}

Now that loopendlabel is being propagated, I can do this (in a shell script I call memake):

#!/bin/sh
make install

rm *.s *.o

for i in cg.c decl.c expr.c gen.c main.c misc.c \
        opt.c scan.c stmt.c sym.c tree.c types.c
do echo "./cwj -c $i"; ./cwj -c $i ; ./cwj -S $i
done

cc -o cwj0 cg.o decl.o expr.o gen.o main.o misc.o \
        opt.o scan.o stmt.o sym.o tree.o types.o

We end up with a binary, cwj0, which is the result of the compiler compiling itself.

$ size cwj0
   text    data     bss     dec     hex filename
 106540    3008      48  109596   1ac1c cwj0

$ file cwj0
cwj0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically
      linked, interpreter /lib64/l, for GNU/Linux 3.2.0, not stripped

Conclusion and What’s Next

For the pointer increment problem, I definitely had to scratch my head quite a lot and look at several possible alternate solutions. I did get halfway through trying to build a new AST tree with an A_SCALE in it. Then I tossed it all away and went for the change in cgloadvar(). That’s much nicer.

The modulo operators were simple to add (in theory), but annoyingly difficult to get everything synchronised (in practice). There is probably some scope to refactor here to make the synchronisation much easier.

Then, while trying to link all the object files that our compiler had made from its own source code, I found that we were not propagating loop/switch end labels properly.

We’ve now reached the point where our compiler can parse every one of its source code files, generate assembly code for them, and we can link them. We have reached the final stage of our journey, one that is probably going to be the most painful, the WDIW stage: why doesn’t it work?

Here, we don’t have a debugger, we are going to have to look at lots of assembly output. We’ll have to single-step assembly and look at register values.

In the next part of our compiler writing journey, I will start on the WDIW stage. We are going to need some strategies to make our work effective. Next step