How I learned to never match on os:cmd output
Today I learned (2019-05-14)
Tagged:
A late change in requirements from a customer had me scrambling to switch an
HDFS connector script — from a Python program — to the standard Hadoop
tool hdfs
.
The application that was launching the connector script was written in Erlang, and was responsible for uploading some files to an HDFS endpoint, like so:
UploadCmd = lists:flatten(io_lib:format("hdfs put ~p ~p", [Here, There])),
"" = os:cmd(UploadCmd),
This was all fine and dandy when the UploadCmd
was implemented in full by
me. When I switched out the Python script for the hdfs
command, all my tests
continued to work, and the data was indeed being written successfully to my
local test hdfs node. So off to production it went.
Several hours later I got notified that there's some problems with the new
code. After inspecting the logs it became clear that the hdfs
command was
producing unexpected output (WARN: blah blah took longer than expected (..)
)
and causing the Erlang program to treat the upload operation as failed.
As is the case for reasonable Erlang applications, the writing process would
crash upon a failed match, then restart and attempt to continue where it left
off — by trying to upload Here
to There
. Now, this operation kept
legitimately failing, because it had in fact succeeded the first time, and HDFS
would not allow us to overwrite There
(unless we added a -f
flag to put
).
The solution
The quick-and-dirty solution was to wrap the UploadCmd
in a script that
captured the exit code, and then printed it out at the end, like so:
sh -c '{UploadCmd}; RES=$?; echo; echo $RES'
Now, your Erlang code can match on the last line of the output and interpret it as a integer exit code. Not the most elegant of solutions, but elegant enough to work around os:cmd/1's blindess to exit codes.
Lesson learned
The UNIX way states that programs should be silent on success and vocal on error. Sadly, many applications don't follow the UNIX way, and the bigger the application at hand, the higher the probability that one of its dependencies will use STDOUT or STDERR as its own personal scratchpad.
My lesson: never rely on os:cmd/1 output in production code, unless the command you're running is fully under your control, and you can be certain that its outputs are completely and exhaustively specified by you.
I do heavily rely on os:cmd output in test code, and I have no intention of stopping. Early feedback about unexpected output is great in tests.