TimeoutState now introduces a TIMESLICE, which is the maximum period of
time a computer can run before we will look into pausing it.
When we have executed a task for more than this period, and if there are
other computers waiting to execute work, then we will suspend the
machine.
Suspending the machine sets a flag on the ComputerExecutor, and pauses
the "cumulative" time - the time spent handling this particular event.
When resuming the machine, we restart our timer and resume the machine.
Oh goodness, when will it end?
- Computer errors are shown in red.
- Lua machine operations provide whether they succeeded, and an
optional error message (reason bios failed to load, timeout error,
another Lua error), which is then shown to the user.
- Clear the Cobalt "thrown soft abort" flag when resuming, rather than
every n instructions.
- Computers will clear their "should start" flag once the time has
expired, irrespective of whether it turned on or not. Before
computers would immediately restart after shutting down if the flag
had been set much earlier.
Errors within the Lua machine are displayed in a more friendly
When closing a BufferedWriter, we close the underlying writer. As we're
using channels, this is an instance of sun.nio.cs.StreamEncoder. This
will attempt to flush the pending character.
However, if throwing an exception within .write errors, the flush will
fail and so the underlying stream is not closed. This was causing us to
leak file descriptors.
We fix this by introducing ChannelWrappers - this holds the wrapper
object (say, a BufferedWriter) and underlying channel. When closed, we
dispose of the wrapper, and then the channel. You could think of this as
doing a nested try-with-resources, rather than a single one.
Note, this is not related to JDK-6378948 - this occurs in the underlying
stream encoder instead.
- TimeoutState uses nanoseconds rather than milliseconds. While this is
slightly less efficient on Windows, it's a) not the bottleneck of Lua
execution and b) we need a monotonic counter, otherwise we could
fail to terminate computers if the time changes.
- Add an exception handler to all threads.
- Document several classes a little better - I'm not sure how useful
all of these are, but _hopefully_ it'll make the internals a little
more accessible.
- Instead of setting soft/hard timeouts on the ILuaMachine, we instead
provide it with a TimeoutState instance. This holds the current abort
flags, which can then be polled within debug hooks.
This means the Lua machine has to do less state management, but also
allows a more flexible implementation of aborts.
- Soft aborts are now handled by the TimeoutState - we track when the
task was started, and now only need to check we're more than 7s since
then.
Note, these timers work with millisecond granularity, rather than
nano, as this invokes substantially less overhead.
- Instead of having n runners being observed with n managers, we now
have n runners and 1 manager (or Monitor).
The runners are now responsible for pulling work from the queue. When
the start to execute a task, they set the time execution commenced.
The monitor then just checks each runner every 0.1s and handles hard
aborts (or killing the thread if need be).
- Rename unload -> close to be a little more consistent
- Make pollAndResetChanged be atomic, so we don't need to aquire a lock
- Get the computer queue from the task owner, rather than a separate
argument.
The latest version of Cobalt has several major changes, which I'm
looking forward to taking advantage of in the coming months:
- The Lua interpreter has been split up from the actual LuaClosure
instance. It now runs multiple functions within one loop, handling
pushing/popping and resuming method calls correctly.
This means we have a theoretically infinite call depth, as we're no
longer bounded by Java's stack size. In reality, this is limited to
32767 (Short.MAX_VALUE), as that's a mostly equivalent to the limits
PUC Lua exposes.
- The stack is no longer unwound in the event of errors. This both
simplifies error handling (not that CC:T needs to care about that)
but also means one can call debug.traceback on a now-dead coroutine
(which is more useful for debugging than using xpcall).
- Most significantly, coroutines are no longer each run on a dedicated
thread. Instead, yielding or resuming throws an exception to unwind
the Java stack and switches to a different coroutine.
In order to preserve compatability with CC's assumption about LuaJ's
threading model (namely that yielding blocks the thread), we also
provide a yieldBlock method (which CC:T consumes). This suspends the
current thread and switches execution to a new thread (see
SquidDev/Cobalt@b5ddf164f1 for more
details). While this does mean we need to use more than 1 thread,
it's still /substantially/ less than would otherwise be needed.
We've been running these changes on SwitchCraft for a few days now and
haven't seen any issues. One nice thing to observe is that the number of
CC thread has gone down from ~1.9k to ~100 (of those, ~70 are dedicated
to running coroutines). Similarly, the server has gone from generating
~15k threads over its lifetime, to ~3k. While this is still a lot, it's
a substantial improvement.
- Remove a redundant logger
- Provide a getter for the ComputerCraft thread group. This allows us
to monitor child threads within prometheus.
- Replace a deprecated call with a fastutils alternative.
- Keep track of the number of created and destroyed coroutines for each
computer.
- Run coroutines with a thread pool executor, which will keep stale
threads around for 60 seconds. This substantially reduces the
pressure from short-lived coroutines.
- Update to the latest Cobalt version.
Whilst I'm pretty sure this is safe for general use, I'm disabling this
by default for now. I may consider enabling it in the future if no
issues are found.