Network functionality

sockets access
address families: inet, unix, ...
network storage
network interfaces
network drivers

Linux kernel network functionality spans from sockets interface through protocols to network cards.

⚲ Shell interfaces:

man 8 netstat prints network connections, routing tables, interface statistics and other details
man 8 ip shows and configures routing, network devices, interfaces and tunnels
man 8 ss - socket statistics utility


⚲ API:

sys/socket.h — main user mode sockets header

Basic common and client side interface:

man 2 socket__sys_socket id creates an endpoint for communication
struct sockaddr id - abstract socket address
man 2 connect__sys_connect id;
man 2 shutdown shuts down part of a full-duplex connection
man 2 send__sys_sendto id sends a message on a socket
man 2 recv__sys_recvfrom id, __sys_recvmsg id receives a message from a socket

Additional server side interface:

man 2 bind__sys_bind id - binds a sockaddr to a socket
man 2 listen__sys_listen id - listens for connections on a socket
man 2 accept__sys_accept4 id - accepts a connection on a socket

⚙️ Internals

struct socket id @ linux/net.h inc contains
struct proto_ops id - abstract protocols interface
struct sock id - network layer representation of sockets net/sock.h inc

__sys_socket id ↯ call hierarchy:
sock_create id
__sock_create id
security_socket_create id
sock_alloc id
net_proto_family id->create.
for example inet_create id. See Address families for another options.

__sys_connect id ↯ call hierarchy:
move_addr_to_kernel id
audit_sockaddr id
__sys_connect_file id
sock_from_file id
security_socket_connect id
proto_ops id->connect.
for example inet_stream_connect id. See Protocols for another options.
net/socket.c src

📚 References

man 7 socket
linux/socket.h inc
Berkeley sockets

Network storageEdit

🚀 advanced topic


⚲ API:

man 2 sendfiledo_sendfile id.

See also Zero-copy between file descriptors

Application layer: Network File System
NFS doc
init_nfs_fs id, nfs4_fs_type id, nfs_fs_type id,
init_nfsd id, nfsd_fs_type id
CIFS doc
init_cifs id
cifs_fs_type id, smb3_fs_type id cifs_smb3_do_mount id
target and iSCSI Interfaces Guide doc



⚲ API: man 2 uname, man 2 sethostname, man 2 gethostname, man 2 setdomainname man 2 getdomainname

utsname id

⚙️ Details

utsname id returns writable pointer to new_utsname id from uts_namespace id from nsproxy id from current id task_struct id.
CLONE_NEWUTS id, setns id
kernel/utsname.c src

📚 References:

man 7 namespaces
man 7 network_namespaces
man 7 uts_namespaces

Address familiesEdit

⚲ API:

man 2 getsockname
man 2 getpeername
Address Family (AF) domain defines address format and address length socklen_t.
man 3 inet_ntop, man 3 inet_pton (derive socklen_t from AF)

Common AF: AF_UNIX id, AF_INET id, AF_NETLINK id.

PF - Protocol Family index (PF_MAX id) actually is the same as Address Family index (AF).

⚙️ Internals of some AF:

man 7 unixunix_family_ops id - sockets for local IPC
unix_create id
man 7 ipinet_family_ops id - IPv4
inet_create id
man 7 netlinknetlink_family_ops id - communication between kernel and user space
netlink_create id
man 7 vsockvsock_family_ops id - communication between VM and hypervisor
vsock_create id
man 7 packetpacket_family_ops id - device level interface
packet_create id
bt_sock_family_ops id - Bluetooth
bt_sock_create id

Totally there are more than 40 AFs (see AF_MAX id)

⚙️ Internals:

sock_register id - registers net_proto_family id. See references to this identifiers to find more than 30 protocol families.
__sock_create id

📚 References:

Internet layer
man 7 address_families


Each Protocol Family (PF, same index as Address Family AF) consists of several protocol implementations.

Directory /proc/net contains various files and subdirectories containing information about the networking layer. File /proc/net/protocols lists available and used protocols.

In each PF protocols are classified to different types sock_type id, for example stream, datagram and raw socket. TCP is type of stream, UDP is type of datagram, raw and ping are type of raw.

proto_register id - registers struct proto id - protocol implementations:
In inet_init id initcall, inetsw_array id, proto_ops id and proto id :
inet_stream_ops id & tcp_prot id tcp_sendmsg id ...
inet_dgram_ops id & udp_prot id udp_sendmsg id ...
inet_sockraw_ops id
raw_prot id raw_sendmsg id ...
ping_prot id ping_v4_sendmsg id ...
In af_unix_init id initcall:
unix_family_ops id
unix_create id
unix_stream_ops id unix_stream_sendmsg id ...
unix_dgram_ops id unix_dgram_sendmsg id ...
unix_seqpacket_ops id unix_seqpacket_sendmsg id ...

📚 References:

man 7 tcp
man 7 udp
man 7 raw
Transport layer and TCP


🚀 advanced topic

🗝️ Acronyms:

IB — InfiniBand, an interconnect standard, competes with Ethernet, Fibre Channel
IPoIB — IP network emulation layer over InfiniBand networks
SRP — SCSI RDMA Protocol
ULP — Upper-layer protocols
iSER — iSCSI Extensions for RDMA

⚲ Interfaces:
man 8 rdma
man 7 rdma_cm — RDMA communication manager
include/uapi/rdma src
include/rdma src

⚙️ Internals:

drivers/infiniband src
drivers/infiniband/ulp src — Upper-layer protocols
drivers/infiniband/sw src — software drivers
drivers/infiniband/hw src — hardware device drivers

📚 References:

InfiniBand doc
InfiniBand and RDMA Interfaces doc


🚀 advanced topic

⚲ Interface:

man 8 ebtables-nft
man 8 arptables-nft
man 8 xtables-nft
man 8 iptables
man 8 ip6tables
man 8 ebtables
man 8 arptables
linux/netfilter.h inc
uapi/linux/netfilter inc
net/netfilter inc
net/netns/netfilter.h inc
linux/netfilter inc

⚙️ Internals:

net/netfilter src

📚 References:

Netfilter Sysfs variables doc
Netfilter Conntrack Sysfs variables doc
Netfilter’s flowtable infrastructure doc

Network device interfacesEdit

devm_register_netdev id registers net_device id, net_device_ops
sk_buff id socket buffer (skb)
dev_queue_xmit id queues socket buffers into transmit queue

👁 Example: drivers/net/loopback.c src - the most famous and simple interface lo

⚙️ function loopback_xmit id receives skb and passes it back with netif_rx id

📚 References:

linux/netdevice.h inc
linux/skbuff.h inc
Queueing in the Linux Network Stack
net/core/dev.c src
man 7 netdevice

Network driversEdit

linux/etherdevice.h inc
netif_rx id - before NAPI
input_pkt_queue id
NAPI Driver design
⚲ API:
netif_napi_add id adds napi_struct id
napi_schedule id - called by an IRQ handler to schedule a poll
netif_receive_skb id - instead netif_rx, finally calls ip_rcv id
napi_complete_done id - called from custom napi->poll()
⚙️ Internals:
net_dev_init id
net_rx_action id
napi_poll id calls custom napi->poll()
👁 example
e1000_intr id calls __napi_schedule id
custom napi->poll() e1000e_poll id calls napi_complete_done id
ether_setup id setups Ethernet network device
👁 An example of Ethernet driver: e1000_probe id

⚙️ Internals:

drivers/net src
drivers/net/wireless src
drivers/net/ethernet src

📚 References:

Networking doc
Data link layer: Ethernet
GRO - Generic Receive Offload

💾 Historical:

LDD2:Network Drivers
LDD3:Network Drivers
Kernel Analysis: Networking, 2003