Network functionality
networking |
---|
sockets access |
address families: inet, unix, ... |
network storage |
protocols |
network interfaces |
network drivers |
Linux kernel network functionality spans from sockets interface through protocols to network cards.
⚲ Shell interfaces:
- man 8 netstat prints network connections, routing tables, interface statistics and other details
- man 8 ip shows and configures routing, network devices, interfaces and tunnels
- man 8 ss - socket statistics utility
Sockets
edit⚲ API:
sys/socket.h – main user mode sockets header
Basic common and client side interface:
- man 2 socket ↪ __sys_socket id creates an endpoint for communication
- struct sockaddr id - abstract socket address
- man 2 connect ↪ __sys_connect id;
- man 2 shutdown shuts down part of a full-duplex connection
- man 2 send ↪ __sys_sendto id sends a message on a socket
- man 2 recv ↪ __sys_recvfrom id, __sys_recvmsg id receives a message from a socket
Additional server side interface:
- man 2 bind ↪ __sys_bind id - binds a sockaddr to a socket
- man 2 listen ↪ __sys_listen id - listens for connections on a socket
- man 2 accept ↪ __sys_accept4 id - accepts a connection on a socket
⚙️ Internals
- struct socket id @ linux/net.h inc contains
- struct proto_ops id - abstract protocols interface
- struct sock id - network layer representation of sockets net/sock.h inc
- __sys_socket id ↯ call hierarchy:
- sock_create id
- __sock_create id
- security_socket_create id
- sock_alloc id
- net_proto_family id->create.
- for example inet_create id. See Address families for another options.
- net_proto_family id->create.
- __sock_create id
- sock_create id
- __sys_connect id ↯ call hierarchy:
- move_addr_to_kernel id
- audit_sockaddr id
- __sys_connect_file id
- sock_from_file id
- security_socket_connect id
- proto_ops id->connect.
- for example inet_stream_connect id. See Protocols for another options.
- move_addr_to_kernel id
📚 References
Network storage
edit🚀 advanced topic
🔧 TODO
⚲ API:
See also Zero-copy between file descriptors
Logistics
editNames
edit⚲ API: man 2 uname, man 2 sethostname, man 2 gethostname, man 2 setdomainname man 2 getdomainname
⚙️ Details
- utsname id returns writable pointer to new_utsname id from uts_namespace id from nsproxy id from current id task_struct id.
- CLONE_NEWUTS id, setns id
- kernel/utsname.c src
📚 References:
Address families
edit⚲ API
- man 2 getsockname
- man 2 getpeername
- Address Family (AF) domain defines address format and address length socklen_t.
- man 3 inet_ntop, man 3 inet_pton (derive socklen_t from AF)
Common AF: AF_UNIX id, AF_INET id, AF_NETLINK id.
PF - Protocol Family index (PF_MAX id) actually is the same as Address Family index (AF).
⚙️ Internals of some AF
- man 7 unix ↪ unix_family_ops id - sockets for local IPC
- man 7 ip ↪ inet_family_ops id - IPv4
- man 7 netlink ↪ netlink_family_ops id - communication between kernel and user space
- man 7 vsock ↪ vsock_family_ops id - communication between VM and hypervisor
- man 7 packet ↪ packet_family_ops id - device level interface
- bt_sock_family_ops id - Bluetooth
Totally there are more than 40 AFs (see AF_MAX id)
⚙️ Internals
- sock_register id - registers net_proto_family id. See references to this identifiers to find more than 30 protocol families.
- __sock_create id
📚 Further reading
- man 8 ip-address – protocol address management
- Internet layer
- man 7 address_families
Protocols
editEach Protocol Family (PF, same index as Address Family AF) consists of several protocol implementations.
Directory /proc/net contains various files and subdirectories containing information about the networking layer. File /proc/net/protocols lists available and used protocols.
In each PF protocols are classified to different types sock_type id, for example stream, datagram and raw socket. TCP is type of stream, UDP is type of datagram, raw and ping are type of raw.
- proto_register id - registers struct proto id - protocol implementations:
- In inet_init id initcall, inetsw_array id, proto_ops id and proto id :
- In af_unix_init id initcall:
📚 References:
- Transport layer and TCP
RDMA
edit🚀 advanced topic
🗝️ Acronyms:
- IB — InfiniBand, an interconnect standard, competes with Ethernet, Fibre Channel
- IPoIB — IP network emulation layer over InfiniBand networks
- SRP — SCSI RDMA Protocol
- ULP — Upper-layer protocols
- iSER — iSCSI Extensions for RDMA
⚲ Interfaces:
- https://github.com/linux-rdma/rdma-core
- man 8 rdma
- man 7 rdma_cm — RDMA communication manager
- include/uapi/rdma src
- include/rdma src
⚙️ Internals:
- drivers/infiniband src
- drivers/infiniband/ulp src — Upper-layer protocols
- drivers/infiniband/sw src — software drivers
- drivers/infiniband/hw src — hardware device drivers
📚 References:
🚀 advanced topic
⚲ Interface:
- man 8 ebtables-nft
- man 8 arptables-nft
- man 8 xtables-nft
- man 8 iptables
- man 8 ip6tables
- man 8 ebtables
- man 8 arptables
- ipset
- linux/netfilter.h inc
- uapi/linux/netfilter inc
- net/netfilter inc
- net/netns/netfilter.h inc
- linux/netfilter inc
⚙️ Internals:
📚 References:
Network device interfaces
edit⚲ Interfaces
ip -brief link show
ls -l /sys/class/net
- devm_register_netdev id registers net_device id, net_device_ops
- sk_buff id socket buffer (skb)
- dev_queue_xmit id queues socket buffers into transmit queue
- linux/netdevice.h inc
- linux/skbuff.h inc
👁 Example: drivers/net/loopback.c src - the most famous and simple interface lo
⚙️ Internals
- net/core/dev.c src
- function loopback_xmit id receives skb and passes it back with netif_rx id
📚 Further reading
- man 8 ip-link – network device configuration
- man 8 ip-stats – manage and show interface statistics
- man 7 netdevice – low-level access to Linux network devices
- man 7 packet – packet interface on device level
- Queueing in the Linux Network Stack
💾 Historical
Network drivers
edit- netif_rx id - before NAPI
- NAPI
- NAPI Driver design
- ⚲ API:
- netif_napi_add id adds napi_struct id
- napi_schedule id - called by an IRQ handler to schedule a poll
- netif_receive_skb id - instead netif_rx, finally calls ip_rcv id
- napi_complete_done id - called from custom napi->poll()
- ⚙️ Internals:
- net_dev_init id
- net_rx_action id
- napi_poll id calls custom napi->poll()
- net_rx_action id
- net_dev_init id
- 👁 example
- e1000_intr id calls __napi_schedule id
- custom napi->poll() e1000e_poll id calls napi_complete_done id
- ⚲ API:
- ether_setup id setups Ethernet network device
- 👁 An example of Ethernet driver: e1000_probe id
⚙️ Internals:
📚 References:
- man 8 ethtool – query or control network driver and hardware settings
- Data link layer: Ethernet
- GRO - Generic Receive Offload
- Segmentation Offloads doc
- https://wireless.wiki.kernel.org
💾 Historical:
📖 Further reading about networking
- Networking interfaces doc
- https://lwn.net/Kernel/Index/#Networking
- https://lartc.org/ – Linux Advanced Routing & Traffic Control
- man 8 ip – show / manipulate routing, network devices, interfaces and tunnels
- man 8 tc – show / manipulate traffic control settings
- bcc/ebpf networking tools
- eBPF-based Networking, Security, and Observability
- Retis – tracing packets in the Linux networking stack & friends