Figuring out a gen_tcp:recv limitation
2019-02-18In which a suprisingly pernicious framed payload leads to OTP spelunking.
The setup: sending a string over TCP
Let's say you want to send the ASCII string Fiat lux!
to an Erlang process
listening on the other side of a TCP connection. Not a big deal, right?
Our sending application is written in Python. Here's what it might look like:
#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", 7777))
data_to_send = b"Fiat Lux!"
sock.sendall(data_to_send)
... and here's the receiving Erlang application:
#!/usr/bin/env escript
main(_) ->
{ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
{ok, Sock} = gen_tcp:accept(L),
{ok, String} = gen_tcp:recv(Sock, 0),
io:format("Got string: ~ts~n", [String]),
erlang:halt(0).
If we start the Erlang receiver (in shell 1), then run the Python sender (in shell2), we should see the receiver emit the following:
$ ./receive.escript
Got string: Fiat Lux!
$
As you can see, we optimistically sent all our data over TCP from the Python
app, and received all that data, intact, on the other side. What's important
here is that our Erlang socket is in passive
mode, which
means that incoming TCP data needs to be recv
'd off of the socket. The second
argument in gen_tcp:recv(Sock, 0)
means that we want to read however many
bytes are available to be read from the OS's network stack. In this case all
our data was kindly provided to us in one nice chunk.
Success! Our real production application will be dealing with much bigger pieces of data, so it behooves us to test with a larger payload. Let's try a thousand characters.
More data
We update the sender and receiver as follows:
#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", 7777))
data_to_send = b'a' * 1000
sock.sendall(data_to_send)
#!/usr/bin/env escript
main(_) ->
{ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reusaddr, true}]),
{ok, Sock} = gen_tcp:accept(L),
{ok, String} = gen_tcp:recv(Sock, 0),
io:format("Got string of length: ~p~n", [byte_size(String)]),
erlang:halt(0).
When we run our experiment, we see that our Erlang process does indeed get all
1000
bytes. Let's add one more zero to the payload.
#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", 7777))
data_to_send = b'a' * 10000
sock.sendall(data_to_send)
And we hit our first snag!
Got string of length: 1460
Aha! Our gen_tcp:recv(Sock, 0)
call asked the OS to give us whatever bytes it
had ready in the TCP buffer, and so that's what we received. TCP is a streaming
protocol, and there is no guarantee that a given sequence of bytes received on
the socket will correspond to a logical message in our application layer. The
low-effort way of handling this issue is by prefixing every logical message on
the TCP socket with a known-width integer, representing the length of the
message in bytes. "Low-effort" sounds like the kind of thing you put in place
when the deadline was yesterday. Onward!
Let's take our initial string as an example. Instead of sending the following
sequence of 9
bytes on the wire:
Ascii: F i a t ␣ l u x !
Binary: 70 105 97 116 32 108 117 120 33
We'd first prefix it with an 32-bit integer representing its size in bytes, and
then append the binary, giving 13
bytes in total.
Ascii: ␀ ␀ ␀ ␉ F i a t ␣ l u x !
Binary: 0 0 0 9 70 105 97 116 32 108 117 120 33
Now, the first 4 bytes that reach our receiver can be interpreted as the length
of the next logical message. We can use this number to tell gen_tcp:recv
how
many bytes we want to read from the socket.
To encode an integer into 32 bits, we'll use Python's struct module. struct.pack(">I", 9)
will do
exactly what we want: encode a 32-bit unsigned Integer (9
, in this case) in
Big-endian (or network) order.
#!/usr/bin/env python3
import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", 7777))
data_to_send = b'a' * 10000
header = struct.pack(">I", len(data_to_send))
sock.sendall(header + data_to_send)
On the decoding side, we'll break up the receiving into two parts:
1) Read 4 bytes from the socket, interpret these as Header
, a 32-bit unsigned
int.
2) Read Header
bytes off the socket. The receiving Erlang process will
'block' until that much data is read (or until the other side disconnects). The
received bytes constitute a logical message.
#!/usr/bin/env escript
main(_) ->
{ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
{ok, Sock} = gen_tcp:accept(L),
{ok, <<Header:32>>} = gen_tcp:recv(Sock, 4),
io:format("Got header: ~p~n", [Header]),
{ok, String} = gen_tcp:recv(Sock, Header),
io:format("Got string of length: ~p~n", [byte_size(String)]),
erlang:halt(0).
When we run our scripts, we'll see the Erlang receiver print the following:
Got header: 10000
Got string of length: 10000
Success! But apparently, our application needs to handle messages much bigger than 10 kilobytes. Let's see how far we can take this approach.
Yet more data
Can we do a megabyte? Ten? A hundred? Let's find out, using the following loop for the sender:
#!/usr/bin/env python3
import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1", 7777))
for l in [1000, 1000*1000, 10*1000*1000, 100*1000*1000]:
data_to_send = b'a' * l
header = struct.pack(">I", len(data_to_send))
sock.sendall(header + data_to_send)
sock.close()
...and a recursive receive function for the receiver:
#!/usr/bin/env escript
recv(Sock) ->
{ok, <<Header:32>>} = gen_tcp:recv(Sock, 4),
io:format("Got header: ~p~n", [Header]),
{ok, String} = gen_tcp:recv(Sock, Header),
io:format("Got string of length: ~p~n", [byte_size(String)]),
recv(Sock).
main(_) ->
{ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
{ok, Sock} = gen_tcp:accept(L),
recv(Sock).
Running this will lead to our Erlang process crashing with an interesting message:
Got header: 1000
Got string of length: 1000
Got header: 1000000
Got string of length: 1000000
Got header: 10000000
Got string of length: 10000000
Got header: 100000000
escript: exception error: no match of right hand side value {error,enomem}
enomem
looks like a strange kind of error indeed. It happens when we get the
100-megabyte header and attempt to read that data off the socket. Let's go spelunking to find out where this error is coming from.
Spelunking for {error, enomem}
First, let's take a look at what gen_tcp:recv
does
with its arguments. It seems that it checks inet_db
to find our socket, and
calls recv
on that socket.
OK, let's check out
inet_db
. Looks
like it retrieves module information stored via erlang:set_port_data
, in the call above.
A grepping for a call to inet_db:register_module
reveals that multiple modules register themselves this way. Among these, we find one of particular interest.
lib/kernel/src/inet_tcp.erl
169: inet_db:register_socket(S, ?MODULE),
177: inet_db:register_socket(S, ?MODULE),
Let's see how inet_tcp.erl implements recv. Hmm, just a pass-through to prim_inet
. Let's look there.
It seems here that our erlang call-chain bottoms out in a call to ctl_cmd
,
which is itself a wrapper to erlang:port_control
, sending control data over
into C-land. We'll need to look at out TCP port driver to figure out what comes
next.
case ctl_cmd(S, ?TCP_REQ_RECV, [enc_time(Time), ?int32(Length)])
A slight hitch is finding the source code for this driver. Perhaps the marco
?TCP_REQ_RECV
can help us find what we're after?
$ rg 'TCP_REQ_RECV'
lib/kernel/src/inet_int.hrl
100:-define(TCP_REQ_RECV, 42).
erts/preloaded/src/prim_inet.erl
584: case ctl_cmd(S, ?TCP_REQ_RECV, [enc_time(Time), ?int32(Length)]) of
erts/emulator/drivers/common/inet_drv.c
735:#define TCP_REQ_RECV 42
10081: case TCP_REQ_RECV: {
10112: if (enq_async(INETP(desc), tbuf, TCP_REQ_RECV) < 0)
A-ha! inet_drv.c
, here we come!
Indeed, this C function here, responsible for the actual call to sock_select
,
will proactively reject recv calls where the requested payload size n
is bigger than
TCP_MAX_PACKET_SIZE
:
if (n > TCP_MAX_PACKET_SIZE)
return ctl_error(ENOMEM, rbuf, rsize);
and TCP_MAX_PACKET_SIZE
itself is defined in the same source file as:
#define TCP_MAX_PACKET_SIZE 0x4000000 /* 64 M */
thereby explaining our weird ENOMEM error.
Now, how to solve this conundrum? A possible approach would be to maintain some state in our receiver, optimistically read as much data as possible, and then try to reconstruct the logical messages, perhaps using something like erlang:decode_packet to take care of the book-keeping for us.
Taking a step back — and finding a clean solution
Before we jump to writing more code, let's consider our position. We're trying to read a framed message off of a TCP stream. It's been done thousands of times before. Surely the sagely developers whose combined experience is encoded in OTP have thought of an elegant solution to this problem?
It turns out that if you read the very long man entry for inet:setopts, you'll eventually come across this revealing paragraph:
{packet, PacketType}(TCP/IP sockets)
Defines the type of packets to use for a socket. Possible values:
raw | 0
No packaging is done.
1 | 2 | 4
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The header length can be one, two, or four bytes, and containing an unsigned integer in big-endian byte order. Each send operation generates the header, and the header is stripped off on each receive operation.
The 4-byte header is limited to 2Gb.
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. Yes indeed they do! Let's try it out!
#!/usr/bin/env escript
recv(Sock) ->
{ok, String} = gen_tcp:recv(Sock,0),
io:format("Got string of length: ~p~n", [byte_size(String)]),
recv(Sock).
main(_) ->
{ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}, {packet, 4}]),
{ok, Sock} = gen_tcp:accept(L),
recv(Sock).
And the output is:
Got string of length: 1000
Got string of length: 1000000
Got string of length: 10000000
Got string of length: 100000000
escript: exception error: no match of right hand side value {error,closed}
Problem solved! (The last error is from a recv
call on the socket after it
has been closed from the Python side). Turns out that our TCP framing pattern
is in fact so common, it's been subsumed by OTP as a mere option for gen_tcp
sockets!
If you'd like to know why setting this option lets us sidestep the
TCP_MAX_PACKET_SIZE
check, I encourage you to take a dive into the OTP
codebase and find out. It's suprisingly easy to navigate, and full of great
code.
And if you ever find yourself fighting a networking problem using brute-force in Erlang, please consider the question: "Peraphs this was solved long ago and the solution lives in OTP?" Chances are, the answer is yes!