- Make assertions a little more relaxed
- Increase timeouts of computer tests (again :D:).
- Log where we're up to in computer tests, to make tracking stalls a
little easier
- Rewrap everything at 80 columns. To make this tolerable I'm using
IDEA's language fragment support - hence the absurd line lengths.
- Add full stops to all comments.
- Clarify that HTTP rules are applied in-order.
- Parse Co-authored-by lines too. There's several contributors (mostly
via weblate, but a few GH ones too) who weren't credited otherwise.
- Add support for git mailmap, remapping some emails to a canonnical
username. This allows us to remove some duplicates - nobody needs
both SquidDev and "Jonathan Coates."
I'm not making this file public, as it contains email addresses. This
does mean that CI builds will still contain the full list with
duplicates.
- Add a TOC to the Local IPs page.
- Increase the echo delay in our speaker audio page to 1.5s. This
sounds much better and is less clashy than 1s. Also add a
sleep(0) (eww, I know) to fix timeouts on some browsers/computers.
- Move Lua feature compat to a new "reference" section. Still haven't
figured out how to structure these docs - open to any ideas really.
- Mention FFmpeg as an option for converting to DFPWM (closes#1075).
- Allow data-mount to override built-in files. See my comment in #1069.
Previously we would compute the current timeout flags every 128
instructions of the Lua machine. While computing these flags is
_relatively_ cheap (just get the current time), it still all adds up.
Instead we now set the timeout flags from the computer monitor/watchdog
thread. This does mean that the monitor thread needs to wake up more
often[^1] _if the queue is full_, otherwise we can sleep for 100ms as
before.
This does mean that pausing is a little less accurate (can technically
take up 2*period instead). This isn't great, but in practice it seems
fine - I've not noticed any playability issues.
This offers a small (but measurable!) boost to computer performance.
[^1]: We currently sleep for scaledPeriod, but we could choose to do less.
- Reset state while the server is starting rather than once it has
started. Hopefully fixes a weird issue where wireless modems wouldn't
be "connected" on server startup.
- Correctly reset the MainThread metrics on server start/stop.
Geesh, this is nasty. Because ComputerThread is incredibly stateful, and
we want to run tests in isolation, we run each test inside its own
isolated ClassLoader (and thus ComputerThread instance).
Everything else is less nasty, though still a bit ... yuck. We also
define a custom ILuaMachine which just runs lambdas[^1], and some
utilities for starting those.
This is then tied together for four very basic tests. This is sufficient
for the changes I want to make, but might be nice to test some more
comprehensive stuff later on (e.g. timeouts after pausing).
[^1]: Which also means the ILuaMachine implementation can be changed by
other mods[^2], if someone wants to have another stab at LuaJIT :p.
[^2]: In theory. I doubt its possible in practice because so much is
package private.
This is about 5-6x faster than using a String as we don't need to
allocate and copy multiple times. In the grand scheme of things, still
vastly overshadowed by the Lua interpreter, but worth doing.
- Start making the summary lines for modules a little better. Just say
what the module does, rather than "The X API does Y" or "Provides Y".
There's still a lot of work to be done here.
- Bundle prism.js on the page, so we can highlight non-Lua code.
- Copy our local_ips wiki page to the main docs.
This allows us to sync the position to the entity immediately, rather
than the sound jumping about.
Someone has set up rick-rolling pocket computers (<3 to whoever did
this), and the lag on them irritates me enough to fix this.
Fixes#1074
- Bump Forge version to latest RB.
- Generate an 8-bit audio stream again, as we no longer need to be
compatible with MC's existing streams.
No functionality changes, just mildly less hacky.
GlStateManager.glDeleteBuffers clears a buffer before deleting it on
Linux - I assume otherwise there's memory leaks on some drivers? - which
clobbers BufferUploader's cache. Roll our own version which resets the
cache when needed.
Also always reset the cache when deleting/creating a DirectVertexBuffer.
This /significantly/ improves performance of the VBO renderer (3fps to
80fps with 120 constantly changing monitors) and offers some minor FPS
improvements to the TBO renderer.
This also makes the 1.16 rendering code a little more consistent with
the 1.18 code, cleaning it up a little in the process.
Closes#1065 - this is a backport of those changes for 1.16. I will
merge these changes into 1.18, as with everything else (oh boy, that'll
be fun).
Please note this is only tested on my machine right now - any help
testing on other CPU/GPU configurations is much appreciated.
Historically I've been reluctant to do this as people might be running
Optifine for performance rather than shaders, and the VBO renderer was
significantly slower when monitors were changing.
With the recent performance optimisations, the difference isn't as bad.
Given how many people ask/complain about the TBO renderer and shaders, I
think it's worth doing this, even if it's not as granular as I'd like.
Also changes how we do the monitor backend check. We now only check for
compatibility if BEST is selected - if there's an override, we assume
the user knows what they're doing (a bold assumption, if I may say so
myself).
- For TBOs, we now pass cursor position, colour and blink state as
variables to the shader, and use them to overlay the cursor texture
in the right place.
As we no longer need to render the cursor, we can skip the depth
buffer, meaning we have to do one fewer upload+draw cycle.
- For VBOs, we bake the cursor into the main VBO, and switch between
rendering n and n+1 quads. We still need the depth blocker, but can
save one upload+draw cycle when the cursor is visible.
This saves significant time on the TBO renderer - somewhere between 4
and 7ms/frame, which bumps us up from 35 to 47fps on my test world (480
full-sized monitors, changing every tick). [Taken on 1.18, but should be
similar on 1.16]
Like #455, this sets our uniforms via a UBO rather than having separate
ones for each value. There are a couple of small differences:
- Have a UBO for each monitor, rather than sharing one and rewriting it
every monitor. This means we only need to update the buffer when the
monitor changes.
- Use std140 rather than the default layout. This means we don't have
to care about location/stride in the buffer.
Also like #455, this doesn't actually seem to result in any performance
improvements for me. However, it does make it a bit easier to handle a
large number of uniforms.
Also cleans up the generation of the main monitor texture buffer:
- Move buffer generation into a separate method - just ensures that it
shows up separately in profilers.
- Explicitly pass the position when setting bytes, rather than
incrementing the internal one. This saves some memory reads/writes (I
thought Java optimised them out, evidently not!). Saves a few fps
when updating.
- Use DSA when possible. Unclear if it helps at all, but nice to do :).
This takes a non-trivial amount of time on the render thread[^1], so
worth doing.
I don't actually think the allocation is the heavy thing here -
VisualVM says it's toWorldPos being slow. I'm not sure why - possibly
just all the block property lookups? [^2]
[^1]: To be clear, this is with 120 monitors and no other block entities
with custom renderers. so not really representative.
[^2]: I wish I could provide a narrower range, but it varies so much
between me restarting the game. Makes it impossible to benchmark
anything!
The VBO renderer needs to generate a buffer with two quads for each
cell, and then transfer it to the GPU. For large monitors, generating
this buffer can get quite slow. Most of the issues come from
IVertexBuilder (VertexConsumer under MojMap) having a lot of overhead.
By emitting a ByteBuffer directly (and doing so with Unsafe to avoid
bounds checks), we can improve performance 10 fold, going from
3fps/300ms for 120 monitors to 111fps/9ms.
See 41fa95bce4 and #1065 for some more
context and other exploratory work. The key thing to note is we _need_ a
separate version of FWFR for emitting to a ByteBuffer, as introducing
polymorphism to it comes with a significant performance hit.
- Move all RenderType instances into a common class.
Cherry-picked from 41fa95bce4:
- Render GL_QUADS instead of GL_TRIANGLES.
- Remove any "immediate mode" methods from FWFR. Most use-cases can be
replaced with the global MultiBufferSource and a proper RenderType
(which we weren't using correctly before!).
Only the GUI code (WidgetTerminal) needs to use the immediate mode.
- Pre-convert palette colours to bytes, storing both the coloured and
greyscale versions as a byte array.
Cherry-picked from 3eb601e554:
- Pass lightmap variables around the various renderers. Fixes#919 for
1.16!