Implementing POSIX Sockets for Linux Fast-STREAMS
Design for Linux
Brian F. G. Bidulock1
OpenSS7 Corporation
June 16, 2007
Abstract:
UNIX networking has a rich history. The TCP/IP protocol suite was first
implemented by BBN using Sockets under a DARPA research project on 4.1aBSD and
then incorporated by the CSRG into 4.2BSD (MBKQ97). Lachmann and
Associates (Legent) subsequently implemented one of the first TCP/IP protocol
suite based on the Transport Layer Interface (TLI) (TLI92) and STREAMS
(GC94). Two other predominant TCP/IP implementations on STREAMS
surfaced at about the same time: Wollongong and Mentat.
STREAMS is a facility first presented in a paper by Dennis M. Ritchie in 1984
(Rit84), originally implemented on 4.1BSD and later part of the
Bell Laboratories Eighth Edition UNIX, incorporated into UNIX
System V Release 3 and enhanced in UNIX System V Release 4 and
further in UNIX System V Release 4.2. STREAMS was used in SVR4 for
terminal input-output, pseudo-terminals, pipes, named pipes (FIFOs),
interprocess communication and networking. STREAMS was used in SVR3 for
networking (in the NSU package). Since its release in System V
Release 3, STREAMS has been implemented across a wide range of UNIX,
UNIX-like and UNIX-based systems, making its implementation and use an ipso
facto standard.
STREAMS is a facility that allows for a reconfigurable full duplex
communications path, Stream, between a user process and a driver in
the kernel. Kernel protocol modules can be pushed onto and popped from the
Stream between the user process and driver. The Stream can
be reconfigured in this way by a user process. The user process, neighbouring
protocol modules and the driver communicate with each other using a message
passing scheme. This permits a loose coupling between protocol modules,
drivers and user processes, allowing a third-party and loadable kernel module
approach to be taken toward the provisioning of protocol modules on platforms
supporting STREAMS.
On UNIX System V Release 4.2, STREAMS was used for terminal
input-output, pipes, FIFOs (named pipes), and network communications. Modern
UNIX, UNIX-like and UNIX-based systems providing STREAMS normally support some
degree of network communications using STREAMS; however, many do not support
STREAMS-based pipe and FIFOs2 or terminal
input-output3 without system reconfiguration.
UNIX System V Release 4.2 supported four Application Programming
Interfaces (APIs) for accessing the network communications facilities of the
kernel:
- Transport Layer Interface (TLI).
TLI is an acronym for the Transport Layer Interface
(TLI92). The TLI was the non-standard interface provided by
SVR3 and SVR4, later standardized by X/Open as the XTI
described below. This interface operated differently than the XTI in subtle
ways, and is now deprecated.
- X/Open Transport Interface (XTI).
XTI is an acronym for the X/Open Transport Interface
(XTI99). The X/Open Transport Interface is a standardization of
the UNIX System V Release 4, Transport Layer Interface. The
interface consists of an Application Programming Interface implemented as a
shared object library. The shared object library communicates with a
transport provider Stream using a service primitive interface called
the Transport Provider Interface(TPI99).
While XTI was implemented directly over STREAMS devices supporting
the Transport Provider Interface (TPI) (TPI99) under SVR4,
several non-traditional approaches exist in implementation:
- Berkeley Sockets.
Sockets uses the BSD interface that was developed by BBN for the TCP/IP
protocol suite under DARPA contract on 4.1aBSD and released in 4.2BSD. BSD
Sockets provides a set of primary API functions that are typically implemented
as system calls. The BSD Sockets interface is non-standard, operated
differently from the POSIX interface in subtle ways, and is now deprecated in
favour of the POSIX/SUS standard Sockets interface.
- POSIX Sockets.
Sockets were standardized by X/Open, later the
OpenGroup,4 and IEEE in the POSIX
standardization process. They appear in XNS 5.2 (XNS99), SUSv1
(SUS95), SUSv2 (SUS98) and SUSv3 (SUS03). POSIX/SUS
Sockets is now the common application environment for accessing networking,
deprecating the XTI for TCP/IP networking applications.
One systems supporting STREAMS, but not traditionally supporting Sockets (such
as SVR4), there are a number of approaches toward supporting BSD and POSIX
Sockets:
- Compatibility Library.
Under this approach, the oldest of approaches for STREAMS, a compatibility
library (libsocket.o) contains the socket calls as library functions
that internally invoke the TLI or TPI interface to an underlying STREAMS
transport provider. This is the approach originally taken by SVR4
(GC94), but this approach has subsequently been abandonned due to the
difficulties regarding fork(2) and fundamental incompatibilities deriving from
a library only approach.
- Library and Cooperating STREAMS Module.
Under this approach, a cooperating module (sockmod) is pushed on a
Transport Provider Interface (TPI) stream. The library (socklib) and
cooperating module (sockmod) provide the BBN or POSIX Socket
API (VS90) (Mar01). This is the intermediate implementation
approach taken by SVR4 and earlier releases of Solaris.
- Library and System Calls.
Under this approach, the BSD or POSIX Sockets API is implemented as system
calls with the sole exception of the socket(3) call. The underlying transport
provider is still a TPI-based STREAMS transport provider, it is just that
system calls instead of library calls are used to implement the interface
(Mar01). This is another intermediate implementation approach taken
by later releases of Solaris.
- System Calls.
Under this approach, even the socket(3) call is moved into the kernel.
Conversion between POSIX/BSD Sockets and TPI service primitives is performed
completely within the kernel. The sock2path(5) configuration file is used to
configure the mapping between STREAMS devices and socket types, domains and
protocols (Mar01).
Sockets were originally developed as part of the BBN DARPA contract for
providing TCP/IP networking for 4BSD UNIX. The TCP/IP networking stack was
first implemented on 4.1aBSD and included in 4.2BSD by the CSRG (MBKQ97).
The Sockets interface had the objective of being a general purpose, network
agnostic, interprocess communications system. The first networking components
implemented using Sockets were TCP/IP (released in 4.2BSD), XNS (released in
4.3BSD) and ISO (released in 4.4BSD).
Although the networking subsystems for BSD Sockets were intended to be general
purpose, only the TCP/IP networking components have received wide use.
Systems that take the BSD approach to networking seldom support
STREAMS.5For systems traditionally supporting Sockets and then retrofitted to support
the XTI interface, there is one approach toward supporting XTI without
retrofitting the entire networking stack to support STREAMS:6
- XTI Compatibility Library.
Several implementations of XTI on UNIX utilize the concept of an XTI
compatibility library.7 This is purely a shared object library approach to providing XTI.
Under this approach it is possible to use the XTI application programming
interface, but it is not possible to utilize any of the STREAMS capabilities
of an underlying Transport Provider Interface (TPI) stream. This approach is,
unfortunately, also not ABI compliant to the SVID.
- TPI over Sockets.
An alternate approach taken by the Linux iBCS package was to provide a
pseudo-transport provider using a legacy character device to present the
appearance of a STREAMS transport provider. While being ABI compliant to
SVID, under this approach it is still not possible to utilize any of the
STREAMS capabilities of an underlying Transport Provider Interface (TPI)
stream.
- XTI over Sockets.
Several implementations of XTI on BSD-style UNIX utilize the concept of XTI
over Sockets (or TPI over Sockets). Following this approach, a STREAMS
pseudo-device driver is provided that hooks directly into internal socket
system calls to implement the driver, and yet the networking stack remains
fundamentally BSD in style. This approach is ABI compliant to the SVID and
also allows the STREAMS capabilities of the resulting Transport Provider
Interface (TPI) to be used (such as pushing and popping modules). This
approach requires a rather complete STREAMS implementation, however, it does
not require replacement of the BSD-style networking stack with an SVR4 style
stack.
It is interesting that both Sockets and STREAMS were implemented on the same
operating system base (4.1BSD) at about the same time. Also, Dennis Ritchie
implemented STREAMS and was a significant contributor to the initial BSD
releases. BBN implemented the TCP/IP networking stack on 4.1aBSD using
sockets, both released by the CSRG in 4.2BSD. At about the same time STREAMS
went into System V Release 3 and the Network Services Utility (NSU) provided
the first TLI implementations of TCP/IP networking for TLI. Perhaps it is not
surprising that both interfaces provide similar functions:
TLI |
Sockets |
t_open() |
socket() |
t_bind() |
bind() |
t_listen() |
listen() |
t_connect() |
connect() |
t_rcvconnect() |
select() |
t_accept() |
accept() |
t_unbind() |
bind() |
t_close() |
close() |
t_snddis() |
close() |
t_sndrel() |
shutdown() |
t_sndreldata() |
|
t_rcvrel() |
select() |
t_rcvreldata() |
|
t_rcvuderr() |
|
t_snd() |
send() |
t_sndv() |
writev() |
t_sndudata() |
sendto() |
t_sndudata() |
sendmsg() |
t_rcv() |
recv() |
t_rcvv() |
readv() |
t_rcvudata() |
recvfrom() |
t_rcvudata() |
recvmsg() |
t_optmgmt() |
getsockopt() |
t_optmgmt() |
setsockopt() |
During the POSIX standarization process, networking and the Sockets interface
were given special treatment to ensure that both the BSD and STREAMS
networking approaches were compatible in the common application
environment.8 POSIX has standardized both the XTI and Sockets
programmatic interfaces to networking. STREAMS networking has been POSIX
compliant for many years, BSD Sockets, POSIX Sockets, TLI and XTI interfaces,
and were compliant in the SVR4 release. The STREAMS networking provided by
Linux Fast-STREAMS package provides POSIX compliant networking.
Therefore, any application using a Socket or Stream in a POSIX compliant
manner will be compatible with both BSD and STREAMS networking.
Figure 1 illustrates the organization of the
classical STREAMS networking stack.
Figure 1:
STREAMS Networking
|
- User to Transport Interface.
The interface between the Stream head and the uppermost module (transport) is
a well-defined Transport Provider Interface (TPI). This is primarily a
service primitive (message passing) interface.
- Transport to Network Interface.
The interface between the transport provider and the network provider is a
well-defined Network Provider Interface (NPI). This is primarily a service
primitive (message passing) interface.
- Network to Data Link Interface.
The interface between the network provider and the data link provider is a
well-defined Data Link Provider Interface (DLPI). This is primarily a service
primitive (message passing) interface.
Figure 2 illustrates the organization of the
classical BSD networking stack.
- User to Transport Interface.
The interface between the Socket layer and the uppermost (transport) protocol
layer is a well-defined BSD socket-to-protocol interface.
This is primarily a function pointer call interface.
- Transport to Network Interface.
The interface between protocol layers (transport and network) is a
well-defined BSD protocol-to-protocol interface.
This is primarily a function pointer call interface.
- Network to Data Link Interface.
The interface between the protocol layer and the device layer (network and
interface) is a well-defined BSD protocol-to-iface interface.
This is primarily a function pointer call interface.
This approach builds a library of socket functions that are implemented
internally as calls to the XTI library.
To maintain compatibilty with existing Linux sockets, the library
distinguishes between XTI streams and native sockets. Underlying system calls
(or input-output controls) are utilized on native sockets.
This approach builds a library of socket functions that are implemented
internally as service primitives passed to the TPI transport provider.
To maintain compatibilty with existing Linux sockets, the library
distinguishes between TPI streams and native sockets. Underlying system calls
(or input-output controls) are utilized on native sockets.
These approaches all invole pushing a module, named sockmod onto an
open transport provider stream as illustrated in Figure
3.
This approach pushes the sockmod STREAMS module onto the transport
provider STREAM. A library of socket functions are implemented internally as
calls to the cooperating module (VS90).
To maintain compatibilty with existing Linux sockets, the library
distinguishes between Sockmod streams and native sockets. Underlying system calls
(or input-output controls) are utilized on native sockets.
This is the approach of earlier Solaris releases (VS90).
This approach pushes the sockmod STREAMS module onto the transport
provider STREAM. A library of socket functions are implemented internally as
calls to the cooperating module. Where this approach differs from the Sockmod
approach above is that system calls are implemented directly as input-output
controls issued to the socket module to emulate system calls.
To maintain compatibilty with existing Linux sockets, the library
does not need to distinguish between Sockmod streams and native sockets as
both support the same set of underlying input-output controls (i.e. socksys
input-output control calls available for iBCS compatiblity). Sockets opened
in this fashion appear as STREAMS devices.
This is an intermediate approach that takes advantage of the socketsys
input-output control calls available in the Linux kernel.
The library can be made compatible with native sockets by passing the socket
call to the normal glibc socket(3) function whenever the domain, type and
protocol are not in the table.
These approaches all either transforms the inode associated with the Stream
head into a socket, or attach a Stream head onto a socket. The objective in
these approaches is to perhaps alter the socket(3) system call into a more
specialized library call, but to keep all other socket system calls consistent
with Linux and no need of replacing the other Linux socket system calls. This
approach to system calls is problematic from the standpoint that both a
struct socket and struct sock structure must be allocated.
Linux socket calls expect both structures to be present and complete.
Perhaps the most straighforward way of doing this is to create both a
socket and sock structure in the regular way for the Linux
socket layer and provide a more specialized Stream head that is part of (or
associated with) the sock structure.
Figure 4:
Socket Structures
|
Figure 4 illustrates the hybrid Stream head/Socket
structure arrangement. Some of the particulars of the structures vary from
kernel to kernel. For example, many kernels combine the inode and
socket structures into a single structure. Some kernels will also
combine the sock and stdata structures into a single
structure. Other kernels separate these structures and provide pointers
between the two. Also, Linux Fast-STREAMS currently allocates the
stdata structure and its associated read and write queues as a single
structure.
This approach pushes the sockmod STREAMS module onto the transport
provider stream. The sockmod module transforms the Stream head into
a socket (from the viewpoint of the kernel), by establishing the arrangement
shown in Figure 4.
A library implements the socket(3) system call as a library function to create
sockets in this fashion. The library looks up the socket domain, type and
protocol in a table and determines which transport provider device to open and
then pushes the sockmod module onto the transport provider stream to
transform it into a socket (from the viewpoint of the kernel). All other
socket calls are performed as native sockets system calls.
This approach requires a thin library that implements the socket(3) system
call. The library can be made compatible with native sockets by passing the
socket call to the normal glibc socket(3) function whenever the domain, type
and protocol are not in the table. This is the approach of later Solaris
releases (Mar01).
This approach uses the soconfig(8) utility and sock2path(5) file (or the
initsock(8) utility and the netconfig(5) file) to configure the system at boot
time. The utility opens the /dev/socksys driver (or an anonymous
device such as /dev/ticlts) and configures the system using
input-output controls. Entries are added to an internal table containing
socket domain, type, protocol and device path. The driver stores internally
the identity of the transport provider (i.e. major device number).
The approach uses a thin library that re-implements the socket(3) system call.
When the socket(3) system call is called, the library opens the
/dev/socksys driver and issues an input-output control passing the
arguments to the socket(3) system call (domain, type, protocol). The
/dev/socksys driver creates a socket inode and a Stream head
according to Figure 4, and attaches the driver of the
type included in the configuration table. This STREAMS file is attached to an
available file descriptor, which is returned from the input-output control.
This file descriptor appears as a real socket to the reset of the system.
Another approach is to have the /dev/socksys driver detach its own
driver queue pair and susbstitute the transport driver. The
/dev/socksys driver also transforms it Stream head into a socket per
Figure 4 and returns its own file descriptor in
the from the input-output control call.
All other socket calls are compatible with kernel system calls. If the
domain, type and protocol are not present in the configuration table, the
input-output control call can be passed to the sys_socket() call inside the
kernel and a native socket of the requested type returned. The
/dev/socksys driver itself implements the socket calls and converts
them to TPI and other calls to the driver via the associated Stream head.
This approach requires only a thin library that implements the socket(3)
system call and that is completely compatible with native sockets.
This approach uses the soconfig(8) utility and sock2path(5) file (or the
initsock(8) utility and the netconfig(5) file) to configure the system at boot
time. The utility opens the /dev/socksys driver (or an anonymous
device such as /dev/ticlts) and configures the system using
input-output controls. Entries are added to a table containing socket domain,
type, protocol and device path. The driver register a socket of the
corresponding domain, type and protocol with the Linux socket system. To
allow conflicting domains, if an existing domain is registered, a hack is used
(bit added to the domain, e.g. AF_INET | AF_HACK). The driver
stores internally the identity of the transport provider (i.e. major device
number). When the Linux kernel sys_socket() system call is invoked, a socket
structure is established and passed to the registered create function. The
create function uses the socket domain, type and protocol to determine the
STREAMS device to create. A Stream head is created and the transport provider
driver attached in the same fashion as the open of a character device, except
that the Stream head is attached to the socket inode. The
/dev/socksys driver implements the remainder of the system calls.
This approach does not required a library: the existing C library socket
functions are sufficient.
- GC94
-
Berny Goodheart and James Cox.
The magic garden explained: the internals of UNIX System V
Release 4, an open systems design / Berny Goodheart & James Cox.
Prentice Hall, Australia, 1994.
ISBN 0-13-098138-9.
- Mar01
-
Jim Mario.
Solaris sockets, past and present.
Unix Insider, September 2001.
- MBKQ97
-
Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quaterman.
The design and implementation of the 4.4BSD operating system.
Addison-Wesley, third edition, November 1997.
ISBN 0-201-54979-4.
- Rit84
-
Dennis M. Ritchie.
A Stream Input-output System.
AT&T Bell Laboratories Technical Journal, 63(8):1897-1910,
October 1984.
Part 2.
- SUS95
-
Single UNIX Specification, Version 1.
Open Group Publication, The Open Group, 1995.
http://www.opengroup.org/onlinepubs/.
- SUS98
-
Single UNIX Specification, Version 2.
Open Group Publication, The Open Group, 1998.
http://www.opengroup.org/onlinepubs/.
- SUS03
-
Single UNIX Specification, Version 3.
Open Group Publication, The Open Group, 2003.
http://www.opengroup.org/onlinepubs/.
- TLI92
-
Transport Provider Interface Specification, Revision 1.5.
Technical Specification, UNIX International, Inc., Parsipanny,
New Jersey, December 10 1992.
http://www.openss7.org/docs/tpi.pdf.
- TPI99
-
Transport Provider Interface (TPI) Specification, Revision 2.0.0,
Draft 2.
Technical Specification, The Open Group, Parsipanny, New Jersey,
1999.
http://www.opengroup.org/onlinepubs/.
- VS90
-
Ian Vessey and Glen Skinner.
Implementing Berkeley Sockets in System V Release 4.
In Proceedings of the Winter 1990 USENIX Conference. USENIX,
1990.
- XNS99
-
Network Services (XNS), Issue 5.2, Draft 2.0.
Open Group Publication, The Open Group, 1999.
http://www.opengroup.org/onlinepubs/.
- XTI99
-
XOpen Tranport Interface (XTI).
Technical Standard XTI/TLI Revision 1.0, X Programmer's Group, 1999.
http://www.opengroup.org/onlinepubs/.
Footnotes
- ... Bidulock1
- [email protected]
- ... FIFOs2
- AIX, for example.
- ...
input-output3
- HP-UX, for example.
- ...
OpenGroup,4
- http://www.opengroup.org/
- ...
STREAMS.5
- Perhaps an exception is early versions of Digital UNIX.
- ... STREAMS:6
- This is
the approach initially taken by Digital UNIX.
- ... library.7
- One was even available for Linux at one
point.
- ...
environment.8
- For example, when a transport connection indication has
been received with t_listen(3) the transport connection may have already been
established before t_accept(3) or t_snddis(3) are issued. This permits
t_listen(3) to be implemented using accept(3) and also permits the BSD
networking stack to be used (a TPI implementation is capable of deferring
accepting a connection at the protocol level and either accepting the
connection (T_CONN_RES) or refusing the connection attempt
(T_DISCON_REQ)). As another example, binding a socket with bind(3)
to an address containing an address family of AF_UNSPEC has the same
effect as t_unbind(3).
Brian F. G. Bidulock
2008-10-27