One of the biggest impact changes we made was a hardware read-barrier for GC - a simple instruction that tests invariants of freshly loaded pointers and takes a fast-trap if the test fails.
GC read-barrier enables a fully parallel & concurrent GC; we can sustain 40G/sec allocation on a 400G heap indefinitely, with max-pause times on the order of 10-20msec. This uber-GC is partially made possible because of the read barrier (and partially possible because we 'own' the OS and can play major page-mapping tricks).
Yes, wide tag per pointer. No problem (yet) with running out of classes. Big Java Apps these days seem to have about 2^15 classes.
The new insight I took from this is that effective hardware support for garbage collection does not have to be complicated. The other two quotes provide further evidence for opinions I espouse: cons all you need (10% of the heap per second!!), and object orientation is an inadequate paradigm for writing software that is now being stretched to absurdity (30,000 classes!!).