Local Area Network design/Introduction to Storage Area Networks
Storage architectures Edit
A company typically needs to store a lot of data:
- mainframes (historical): data access is centralized on the same machine where they are physically stored;
- client-server model: several clients ask a server machine to retrieve data stored on hard disks;
- peer-to-peer model: data are distributed among all the machines connected one with each other, and every machine can ask every other machine to have some data.
- costs: each machine in a peer-to-peer network does not require a high computing power and a high storage capacity, in contrast to a server having to manage requests from multiple clients at the same time → servers are very expensive;
- scalability: in the peer-to-peer model data can be distributed over an unlimited number of machines, while the computing capacity and the storage capacity of a server are limited;
- robustness: a server is characterized by a high reliability, but a fault is more critical to be solved; machines in a peer-to-peer network instead are more subject to faults because they are low-end and less reliable machines, but software managing the peer-to-peer network, being aware of this weakness, is designed to keep data integrity, by performing for example automatic backups.
A datacenter is a centralized location where all servers are concentrated, and allows to avoid having too many servers scattered around the company under the control of so many different organizations:
- data access: data may be available, but people needing them may belong to another organization or may not have the required permissions;
- integrity: it is difficult to back up all servers if they are scattered around the company;
- security: it is easy to steal a hard disk from an unprotected server.
In a DAS system, every server has exclusive access to its own hard disk set:
- internal disks: it is not a proper solution for servers because, in case of fault, hard disks have to be physically extracted from the inner of the machine;
- external disks: disks are connected to the server via SCSI; multiple disk sets can be connected in a cascade like a bus architecture.
Disks can be put in a dedicated cabinet called JBOD: the SCSI controller is able to export a virtual drive structure which is different from the one of the physical disks, by aggregating or splitting disk capacities and providing advanced services (e.g. RAID).
The SCSI standard defines a full protocol stack:
- physical interfaces (e.g. cables and connectors): they allow to physically connect hard disks to servers;
- protocols: they allow to perform read and write transactions by directly addressing disk blocks according to the Logical Block Addressing (LBA) schema;
- commands exported to applications: they allow to perform read and write operations by issuing commands like READ, WRITE, FORMAT, etc.
- low latency: it is in the order of milliseconds through a disk and of microseconds through a cache;
- high reliability: the error probability is very low, and the data integrity is always guaranteed;
- wide compatibility: it is widely supported by operating systems and is used by a lot of external devices besides disks.
- slow error recovery: since errors rarely occur, error recovery mechanisms are not particularly efficient from the performance point of view;
- centralized access to disks: just the server can access disks → in case the server faults, disks can no longer be accessed;
- scalability limitations: at most 16 devices for a maximum length of 25 meters can be connected in a cascade.
A NAS exports file systems, serving logical files, instead of disk blocks, over the network (usually LAN).
File systems are shared with network clients: both servers and clients connected to the network can access files.
Typical protocols used to export file systems are:
- NFS: popular on UNIX systems;
- CIFS: used by Windows systems;
which work over a TCP/IP network:
- along with the file system, user permissions and access protections (e.g. username and password) can be exported;
- compatibility with network clients: a NAS system has a minimal impact on the existing infrastructure: all operating systems are able to mount a shared disk without additional drivers.
- compatibility with applications: the raw disk is invisible to the client: disks can not be formatted or managed at the block level → some applications which need to directly access disk blocks can not work on remote disks: operating systems, database management systems, swap files/partitions;
- the NAS appliance requires enough computational power for user permission management and remapping from file-related requests to block-related requests;
- the protocol stack is not developed for NASes: TCP error-recovery mechanisms may introduce a non-negligible performance overhead.
A SAN exports physical disks, instead of logical volumes, and allows to address disk blocks according to the LBA schema, just as if the disk was connected directly to the server via SCSI (DAS system).
Clients can access data through servers, which they are connected to via a Wide or Local Area Network. Typically a datacenter follows a three-tier model:
- web server: it is the front-end exposed to clients;
- application/database server: it can mount a shared-disk file system which converts file-related requests by clients to block-related requests to be sent to remote disks via the SAN;
- hard disks: they are often put in JBODs.
SANs can not base exclusively on the classical TCP/IP, since TCP error-recovery mechanisms may introduce a non-negligible performance overhead → some protocols have been developed for SANs aimed to keep as much as possible high speed, low latency and high reliability typical of SCSI:
All SAN protocols adopt SCSI as the upper layer in their protocol stacks and work below it → this guarantees compatibility with all the existing SCSI-based applications, with a minimum impact for DAS to SAN migration.
Fibre Channel Edit
The Fibre Channel standard was born from the need to have a reliable support for optical fiber connections between servers and storage disks, and is thought to replace the physical layer of SCSI. Fibre Channel supports high transfer rates: 1 Gbps, 2 Gbps, 4 Gbps, 8 Gbps, 16 Gbps.
The standard contemplates three possible topologies for SANs:
- point-to-point: direct connection between a server and a JBOD, like in SCSI;
- arbitrated loop: ring topology for reliability purpose;
- switched fabric: multiple servers are connected to multiple JBODs through a fabric, that is a mesh network of bridges.
The switched fabric topology is new in the storage world: SCSI allowed only to connect in a cascade like a bus architecture.
Routing is performed by the Fabric Shortest Path First (FSPF) protocol, very similar to the OSPF protocol in IP networks. No spanning tree protocols are contemplated for rings in topology.
Every port of a Fibre Channel node (server or JBOD) is dynamically assigned a 24-bit address:
|Domain ID||Area ID||Port ID|
where the fields are:
- Domain ID field (8 bits): it identifies the bridge which the node is connected to;
- Area ID field (8 bits): it identifies the group of ports which the bridge port, to which the node is connected, belongs to;
- Port ID field (8 bits): it identifies the node port.
Every server is connected to the fabric through an interface called Host Bus Adapter (HBA).
Flow control Edit
Fibre Channel enhances SCSI error-recovery mechanisms by introducing a hop-by-hop flow control based on a credit mechanism: each port has an amount of credits, which is decreased whenever a packet is forwarded and is increased whenever an acknowledge is received → if the available amount of credits goes down to 0, the port can not send other packets and has to wait for the next hop to communicate via an acknowledge which it is ready to receive other data into its buffer → this mechanism avoids node buffer congestions and therefore packet losses.
Moreover the credit mechanism allows resource reservation and guarantees in-order delivery of frames: the destination node has not to implement a mechanism for packet re-ordering (like in TCP).
- traffic over a link can be blocked for a while due to lack of credits → the maximum number of credits for a port has to be set properly based on the buffer capacity of the port which is at the other endpoint of the link;
- deadlocks may happen in a mesh network with circular dependencies.
Advanced features Edit
- Virtual SAN (VSAN): the equivalent of VLANs for SANs;
- link aggregation;
- load balancing;
- virtualization: virtualization features of the SCSI controller can be moved directly to the bridge which the JBOD is connected to.
| Parts of this page are based on materials from:
Wikipedia: the free encyclopedia.
The Fibre Channel over Ethernet (FCoE) technology allows to encapsulate Fibre Channel frames into Ethernet frames via the FCoE adaptation layer, which replaces the physical layer of Fibre Channel → this allows to use 10 Gigabit Ethernet (or higher speeds) networks while preserving the Fibre Channel protocol.
Before FCoE, datacenters used Ethernet for TCP/IP networks and Fibre Channel for SANs. With FCoE, Fibre Channel becomes another network protocol running on Ethernet, alongside traditional IP traffic: FCoE operates directly above Ethernet in the network protocol stack, in contrast to iSCSI which runs on top of TCP and IP:
- advantage: the server has no longer to have a Fibre Channel-specific HBA interface, but a single NIC interface can provide connectivity both to the SAN and to the Internet → smaller number of cables and bridges, and lower power consumption;
- disadvantage: FCoE is not routable at the IP layer, that is it can not go over the Internet network outside the SAN.
Since, unlike Fibre Channel, the classical Ethernet includes no flow control mechanisms, FCoE required some enhancements to the Ethernet standard to support a priority-based flow control mechanism, to reduce frame loss from congestion.
The basic idea is adopting PAUSE packets from the 802.3x standard for flow control over Ethernet, but the Ethernet channel between two bridges is logically partitioned into lanes (for example, one dedicated to storage traffic and another one dedicated to the normal internet traffic) → the PAUSE packet, instead of blocking the whole traffic over the concerned link, just blocks traffic of a certain lane without affecting traffic of other lanes.
Typically for servers with FCoE technology top-of-the-rack (TOR) switches are preferred to end-of-the-row (EOR) switches used with Fibre Channel, because switches with FCoE technology are less expensive with respect to switches with Fibre Channel technology:
- end-of-the-row switch: there is a single main switch and every server is connected to it through its own cable → longer cables;
- top-of-the-rack switch: on top of each rack there is a switch, and every server is connected to its rack switch, then all rack switches are connected to the main switch → more numerous switches, but shorter cables.
The iSCSI protocol, proposed by Cisco to counteract Fibre Channel hegemony, allows to make a SAN by using the most common network technology, namely TCP/IP: SCSI commands are encapsulated into TCP packets via the iSCSI adaptation layer and cross the SAN over an Ethernet network.
- the server has no longer to have a Fibre Channel-specific HBA interface, but a single NIC interface can provide connectivity both to the SAN and to the Internet → smaller number of cables and bridges, and lower power consumption;
- disks can be reached also by clients via the Internet;
- optical fibers dedicated specifically for SAN connection do not need to be laid.
- bridge buffers in the SAN need to be sized so as to minimize packet losses due to buffer overflow and therefore performance overhead due to TCP error-recovery mechanisms;
- the Ethernet technology is not very known in the storage world, where Fibre Channel tools are used to be used → the iSCSI protocol has not been very successful.
A datacenter is subject to data loss risk due to natural disasters (like earthquakes, tsunamis, etc.) → in order to improve resiliency (business continuity), the datacenter can entirely be replicated in another location, generally at a distance of some hundred kilometers. The main datacenter and the backup datacenter could communicate one with each other by using Fibre Channel, but connecting them through a simple optical fiber would be too expensive due to the long distance.
The Fibre Channel over IP (FCIP) technology allows geographically distributed SANs to be interconnected by using the existing TCP/IP infrastructure, namely Internet, without making internal devices in datacenters be aware of the presence of the IP network:
- the main datacenter sends a Fibre Channel frame;
- the edge router encapsulates the Fibre Channel frame into a TCP packet, via the FCIP adaptation layer replacing the physical layer of Fibre Channel, then forwards the TCP packet over the Internet network, in a sort of tunnel, up to the other edge router;
- the other edge router extracts the Fibre Channel frame and sends it to the backup datacenter;
- the backup datacenter receives the Fibre Channel frame.
The Fibre Channel frame minimum size, however, exceeds the Ethernet payload size limit, and the overhead for fragmentation would be excessive → Ethernet frames must be extended to about 2.2 KB so that minimum-size Fibre Channel frames can be encapsulated.