Lately I’ve taken to hacking Affe quite a bit. Just a quick list of a few changes:
- Value types can now be the target of invocations.
- A value type will be automatically boxed when being cast to an interface that it implements.
- Any object type can now be unboxed. This fixes the case where a value type boxed into an interface type could not be unboxed without being cast to System.Object first.
- Any expression can be used as a boolean. Reference types will be compared to null, and numeric values will be compared to 0.
- Added the &&, ||, and ?: operators.
null
is now a keyword.- Fixed a parser bug that would cause syntax errors when the sequence “(identifier)” was encountered and did not begin a CastExpression.
- The == and != operators can now be applied to reference types.
- OperatorExpression now checks for operator overloads and uses them when present.
- Arrays and types with indexers can now be indexed.
- Added support for single-line (//) comments.
- Added a “local bag” which is used when building locals.
The last item could use some explanation. It’s a size optimization really, and probably unneeded, but I’ve always thought something like this would be neat for mcs to have. This implementation is my proof-of-concept.
There are a lot of things that require extra locals to be defined. For example, if some method returns a value type and you’re directly invoking on it (for example, somestring.Length.ToString()
) then there are two IL sequences that could be used. Assuming the int is on the top of the stack, the first is box System.Int32; callvirt instance string object::ToString();
. This is what seems intuitive, but there’s a reason mcs does not do this. Boxing creates a new object, and that object must now be garbage collected. It’s much more efficient to use the stack for value types. So instead of boxing, a new local is declared, say “temp”, and this IL is used: stloc temp; ldloca temp; call instance string int32::ToString();
. No boxed object, no virtual method resolution, no additional strain on the GC, no wasted heap. But now we have another local.
What if you hard-coded ten such calls? That’s right, you get ten locals. The JIT may be smart enough to optimize them out, but the IL is still going to be bloated. That’s where my local bag comes in. When such an expression is compiled in Affe, it asks the compiler state object for a local from the bag, for temporary use. The state will look through a list of unused locals it has and will return the first it finds of the same type, but will keep it in the unused list (since the callee said it’s temporary). If none can be found then it will create one and add it to the list. (If the local was requested for permanent use then it would be removed from the list if it was there, or if it wasn’t then it won’t be added to the list.)
The upshot of this is that if you make 10 hardcoded invocations against a value type that’s being returned from somewhere (or even a constant) they will all use the same local, assuming they’re all the same value type.
Now consider the case where you have this code in C#: if (foo) { object o; ... } else { object o2; ... }
. This is admittedly contrived, but it does the job. We have two scopes with a variable of the same type. With mcs you get two locals. With Affe you get one.
To maintain scoped variables, the Affe compiler state maintains a stack of symbol tables. Each block has its own table that is pushed prior to analysis, then popped, and again for the compile pass. When a table is pushed, the state will check for symbols that correspond to locals and will remove them from the local bag if present. Then when popped, all the locals are added to the bag. So in the example above, the “if” block’s table is pushed during analysis and a new local symbol is added. Then it’s popped, and the local gets put in the bag. When the “else” block’s analysis is run and it asks for a local of type object, the one from the bag is handed to it.
It’s not a terribly major optimization, and I expect in Affe’s case it actually takes more time during compilation than it saves during runtime, but I’m interested if this would be interesting for the mcs hackers. I imagine it would eliminate a fair amount of IL bloat.
The JIT will optimize the sharing of locals when it makes sense: I suggest not wasting time doing this kind of stuff in upper compilers unless they generate hundreds or thousands of locals.