Home
rss

shell: waiting for a host.

A recurring pattern in my workflow has been to reboot a machine and wait for its networking to be enabled. One issue here is that the latency until completion falls squarely within "too short to do something else, too long to stare at my screen". Instead, a shell function allows chaining a connection event with my desktop notification bus (or, well, you know, anything). The full check is for a TCP connection to be possible on a port, as usually access is through SSH.

# Run $@ until it succeeds, but allow SIGINT to cancel.
_cancellable_until () {(
    cancel="false"
    until "$@" > /dev/null; do :; done &
    trap 'kill $!; cancel="true"' INT
    wait $!
    trap - INT
    if [ $cancel = "true" ]; then
        return 1
    fi
)}

# ping $1 until it is up and port ${2:-22} is open.
waitfor () {
    port=${2:-22}
    _cancellable_until ping -W1 -c1 "$1" || return 1
    echo "Ping ok."
    _cancellable_until nc -w 1 -z "$1" "${port}" || return 1
    echo "Port ${port} is open."
}

# Usage:
$ waitfor srv0 2223 && notify-send "Host 'srv0' is ready."

Implementation

A much simpler implementation could be the following:

waitfor () {
    port=${2:-22}
    until ping -W1 -c1 "$1"; do :; done
    echo "Ping ok."
    until nc -w 1 -z "$1" "${port}" do :; done
    echo "Port ${port} is open."
}

Which is essentially the same. What I dislike about it is how until behaves in shell. The problem is the same for any loops: when cancelling the call, SIGINT can either reach until or the command executed. Reaching the latter, the command would exit with non-zero, meaning until would not break and continue to attempt it.

This is a recurring issue when looping on command failures, waiting for them to succeed. This pattern thus requires the _cancellable_until to be cleanly cancelled instead. This function will:

  1. Start the loop in a background process.
  2. Trap to kill the previous background process when receiving SIGINT.
  3. Wait on any child process to finish.
  4. Return 1 if the child was killed using SIGINT.
  5. Launch everything in a sub-shell to hide output noise.

The intended behavior is achieved, but at what cost! All of this to have a clean wait to cancel failure loops. Maybe more advanced shells could have better specs, but making the waitfor portable and clean is harder than it should be.