The VBO renderer needs to generate a buffer with two quads for each
cell, and then transfer it to the GPU. For large monitors, generating
this buffer can get quite slow. Most of the issues come from
IVertexBuilder (VertexConsumer under MojMap) having a lot of overhead.
By emitting a ByteBuffer directly (and doing so with Unsafe to avoid
bounds checks), we can improve performance 10 fold, going from
3fps/300ms for 120 monitors to 111fps/9ms.
See 41fa95bce4408239bc59e032bfa9fdc7418bcb19 and #1065 for some more
context and other exploratory work. The key thing to note is we _need_ a
separate version of FWFR for emitting to a ByteBuffer, as introducing
polymorphism to it comes with a significant performance hit.