Streaming data to the client

Sometimes we want to stream data to the client. Maybe we don't know or cannot compute the size of the data. Regardless of the reason, we do not want to keep all data in memory until it's shipped to the client. We want to use chunked encodings, and simply send data in chunks to the client. This is performed in steps. First the out/1 return value:

 {streamcontent, MimeType, FirstChunk}

Is returned from the out/1 function This makes the erlang process within yaws processing that particular page go into a receive loop, waiting for more data. Somehow, another process in the erlang system must then deliver data to the waiting/receiving erlang process. There are two asynchronous API functions that can be used to deliver that data.

yaws_api:stream_chunk_deliver(YawsPid, Data)

and

yaws_api:stream_chunk_end(YawsPid)

The YawsPid argument is the process identifier of the original yaws process processing the page, i.e. self(), in the .yaws file.

Maybe this gets clear with a programming example, let's use a process reading a random number of bytes from /dev/urandom as the source of the data



out(A) ->
    Self = self(),
    spawn(fun() ->
                  %% Create a random number
                  {_A1, A2, A3} = now(),
                  random:seed(erlang:phash(node(), 100000),
                              erlang:phash(A2, A3),
                              A3),
                  Sz = random:uniform(100000),

                  %% Read random junk
                  S="dd if=/dev/urandom count=1 bs=" ++
                      integer_to_list(Sz) ++ " 2>/dev/null",
                  P = open_port({spawn, S}, [binary,stream, eof]),

                  rec_loop(Self, P)
          end),

    {streamcontent, "application/octet-stream", <<>>}.


rec_loop(YawsPid, P) ->
    receive
        {P, {data, BinData}} ->
            yaws_api:stream_chunk_deliver(YawsPid, BinData),
            rec_loop(YawsPid, P);
        {P, eof} ->
            port_close(P),
            yaws_api:stream_chunk_end(YawsPid),
            exit(normal)
    end.




The above slightly bizarre code can be executed here. The code creates a process which reads a random amount of bytes from /dev/urandom and sends them to the client, piece by piece.

There is also a version of the API that delivers the data in a blocking fashion. Whenever the producer of the stream is faster than the consumer, that is the WWW client, we must use a synchronous version of the code. The api function is called:

yaws_api:stream_chunk_deliver_blocking(YawsPid, Data)

For applications that want to avoid buffering data in memory but do not want chunked transfer, or for applications that employ long-polling (Comet) techniques, another streamcontent variant lets the application send data directly on the client socket. Such applications should first return the following from their out/1 function:

 {streamcontent_from_pid, MimeType, Pid}

where the Pid argument is the pid of the application process that will send the data, and MimeType is the MIME type of the data to be sent. When yaws is ready for Pid, it sends one of the following messages to it:

Pid can send data on the socket by calling:

 yaws_api:stream_process_deliver(Socket, IoList)

where IoList is the data to be sent. For chunked transfer, Pid must call:

 yaws_api:stream_process_deliver_chunk(Socket, IoList)

which tells yaws to use HTTP chunked transfer to send IoList. Applications using chunked transfer in this manner must always remember to end their data transfer by calling:

 yaws_api:stream_process_deliver_final_chunk(Socket, IoList)

where IoList is an iolist of size 0 or more. This creates the final HTTP chunk that the client uses to detect the end of the transfer.

When Pid finishes sending data, or when it receives a {discard, YawsPid} message, it must call:

 yaws_api:stream_process_end(Socket, YawsPid)

This informs yaws that Pid is finished with the socket and will no longer use it directly.

 yaws_api:stream_process_end(closed, YawsPid)

This informs yaws that Pid has not only finished with the socket, but has also closed it. Yaws will not attempt to use the socket anymore after the application calls this function.

Applications using streamcontent_from_pid should be sure to set a Content-Length header in their out/1 return value if they want to avoid chunked transfer encoding for their return value. Yaws automatically sets the HTTP Transfer-Encoding header to chunked if it doesn't detect a Content-Length header. Another alternative is to return the {header, {transfer_encoding, erase}} header from out/1 in order to disable chunked encoding.

Here's an example of using streamcontent_from_pid:



out(A) ->
    %% Create a random number
    {_A1, A2, A3} = now(),
    random:seed(erlang:phash(node(), 100000),
                erlang:phash(A2, A3),
                A3),
    Sz = random:uniform(100000),

    Pid = spawn(fun() ->
                        %% Read random junk
                        S="dd if=/dev/urandom count=1 bs=" ++
                            integer_to_list(Sz) ++ " 2>/dev/null",
                        P = open_port({spawn, S}, [binary,stream, eof]),
                        rec_loop(A#arg.clisock, P)
                end),

    [{header, {content_length, Sz}},
     {streamcontent_from_pid, "application/octet-stream", Pid}].


rec_loop(Sock, P) ->
    receive
        {discard, YawsPid} ->
            yaws_api:stream_process_end(Sock, YawsPid);
        {ok, YawsPid} ->
            rec_loop(Sock, YawsPid, P)
    end,
    port_close(P),
    exit(normal).

rec_loop(Sock, YawsPid, P) ->
    receive
        {P, {data, BinData}} ->
            yaws_api:stream_process_deliver(Sock, BinData),
            rec_loop(Sock, YawsPid, P);
        {P, eof} ->
            yaws_api:stream_process_end(Sock, YawsPid)
    end.



The above code can be executed Here. The code creates a process which reads a random amount of bytes from /dev/urandom and sends them to the client via the socket.

Valid XHTML 1.0!