Introduction
I recently encountered a bug in my application that caused it to hang during startup. The only change made was upgrading ZIO from version 2.0.x
to
2.1.0
. After some investigation, I discovered the issue was related to how I was forking fibers and a change in the behavior of Reloadable
introduced in the new version. Let me explain what happened because I was very much surprised by the unexpected change. However, in retrospect, it all
makes sense and works correctly; you just need to be aware of it.
Original Code
This is the original code that was “working” in ZIO 2.0.x
:
def layer(config: Config): ZLayer[Any, Throwable, Socket] =
ZLayer.scoped:
val zio = for
_ <- ZIO.logDebug(s"Connecting socket: ${config.host}:${config.port}")
socket <- makeSocket(config)
_ <- ZIO.addFinalizer(socket.close *> ZIO.logInfo("Socket closed"))
hearbeat <- socket.checkHeartbeat(config.heartbeatTimeout).repeat(Schedule.spaced(1.second)).fork
_ <- ZIO.addFinalizer(hearbeat.interrupt)
yield socket
zio.timeoutFail(TimeoutException("Socket initialization took more than 5 seconds!"))(5.seconds)
def reloadable(config: Config): ZLayer[Any, Throwable, Reloadable[Socket]] = Reloadable.manual(layer(config))
The layer
function creates a ZLayer
that initializes a Socket
and starts a heartbeat check in a separate fiber. The whole layer times out if it
takes more than 5 seconds. The reloadable
function makes the socket Reloadable
because we need to be able to restart it (e.g., when the heartbeat
fails).
I put “working” in quotes because the heartbeat was not actually running in this version; I just didn’t notice it. The problem was introduced by
adding the timeoutFail
method to the entire initialization. What happens is that it runs the process in its own fiber and races it against the
specified timeout. That fiber becomes the parent of all the fibers forked inside, and once the process finishes (the socket is initialized), the
parent fiber dies along with all its children. It does not seem very intuitive that adding timeoutFail
changes the behavior so drastically, but as I
said, it makes sense when you think about it.
ZIO Upgrade
The new version of ZIO 2.1.0
was recently released, containing this presumably small change which
made acquire
in ScopedRef
uninterruptible. This change also affects Reloadable
because it uses ScopedRef
internally. I noticed
an immediate fix was made in 2.1.1
, which mentions Reloadable
and hanging forever. This seemed
very similar to my situation, as my application would also hang forever during socket initialization.
It took me a while to debug and understand what was happening. However, knowing how timeoutFail
works and that acquisition was made uninterruptible
in Reloadable
, it all clicked. The heartbeat finalizer is run inside Reloadable
acquisition, thus in an uninterruptible context. So it can’t be
interrupted, and it is also being immediately interrupted due to the timeoutFail
bug I already had. As a result, it all hangs forever.
Fix
Two things must be done to fix the situation. First, we need to make the heartbeat interruptible. That’s easy. Second, we need to fork the heartbeat
in a different scope so that it does not get interrupted by timeoutFail
. After discussions with my colleagues, I think it’s always a good idea to
fork long-running tasks in a specific scope so that the introduction of a combinator like timeoutFail
can’t break your application.
def layer(config: Config): ZLayer[Any, Throwable, Socket] =
ZLayer.scoped:
val zio = for
_ <- ZIO.logDebug(s"Connecting socket: ${config.host}:${config.port}")
socket <- makeSocket(config)
_ <- ZIO.addFinalizer(socket.close *> ZIO.logInfo("Socket closed"))
scope <- ZIO.service[Scope]
hearbeat <- socket.checkHeartbeat(config.heartbeatTimeout).repeat(Schedule.spaced(1.second)).interruptible.forkIn(scope)
yield socket
zio.timeoutFail(TimeoutException("Socket initialization took more than 5 seconds!"))(5.seconds)
def reloadable(config: Config): ZLayer[Any, Throwable, Reloadable[Socket]] = Reloadable.manual(layer(config))
Conclusion
I hope this post helps someone understand the details of how forking and interruption work in ZIO. It was very puzzling for me at first, but I now have a better understanding of ZIO and how to use it correctly.
To reiterate the main takeaways:
- Fork long-running tasks in a specific scope to control when they are interrupted.
- Be aware of the interruptibility of your fibers, especially when using
Reloadable
.