Building a Beowulf Cluster/Installation, Configuration, and Administration/Networking
We want the computers to work in a local network.
The easiest way to setup the network is by DHCP. For practical purposes it is easiest to set up dynamic addresses (DHCP) handled by the master on the basis of physical addresses of slaves' network interfaces. DHCP simplifies the installation of new nodes, because the mac address and hostname is the only thing that is different among the nodes and the DHCP server on the master can manage a new node by a new entry into the configuration file.
In this example we will set up the network IPs to 192.168.1.1 until 192.168.1.8, where 8 is the master.
slave(s):/etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 ONBOOT=yes BOOTPROTO=dhcp
master:/etc/dhcpd.conf option subnet-mask 255.255.255.0; option broadcast-address 192.168.1.255; option routers 10.5.170.254; # define individual nodes subnet 192.168.1.0 netmask 255.255.255.0 { group { # define all slaves, the master (head node) is has a static ip address host node0{ hardware ethernet 00:1E:8C:30:AC:2A; fixed-address 192.168.1.250; } host node1{ hardware ethernet 00:1E:8C:30:B0:A1; fixed-address 192.168.1.1; } #... Here you can put more nodes. Make a list of the mac addresses of all your machines and enter them in the list. } } # ignore petitions from second network interface subnet 10.5.170.0 netmask 255.255.255.0 { not authoritative; }
The idea is to give to slaves the names nodei corresponding to their ip address 192.168.1.i.
Note that the DHCP server provides IP addresses for the other machines not for itself. The master you give a static ip address (192.168.1.250 here). In red hat based distributions this is configured in /etc/sysconfig/network-scripts/ifcfg-eth0 or /etc/sysconfig/network-scripts/ifcfg-eth1.
I configured eth0 for the organization network and eth1 for the cluster intranet. Example files: /etc/sysconfig/network-scripts/ifcfg-eth0 corresponds to your organization network settings.
DEVICE=eth0 ONBOOT=yes ...
/etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 ONBOOT=yes BOOTPROTO=static NETWORK=192.168.1.0 IPADDR=192.168.1.250 TYPE=Ethernet
For the slaves, the interface to the cluster intranet is as follows:
DEVICE=eth0 ONBOOT=yes BOOTPROTO=dhcp NETMASK=255.255.255.0 NETWORK=192.168.1.0 BROADCAST=192.168.1.255 DNS1=... DNS2=...
If you don't use a DNS service on your head you use the DNS service of the network of your organization.
/etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.1.250 node0 192.168.1.1 node1 # ... add more names of machines here
Note in /etc/hosts that in the loopback line (first line) the hostname is not given in order to avoid problems with message protocols (PVM, MPI).
You need to activate ip forwarding on the head in order to have internet access on all machines. You enable the firewall and include masquerading on the network interface to you cluster. This you do by changing the /etc/sysconfig/iptables file or using some user interface, e.g. system-config-firewall on red hat based systems. Be careful not to make your firewall too restrictive as this can cause problems.
In the /etc/sysconfig/network you need to have:
NETWORKING=YES IPFORWARD=YES HOSTNAME=nodei
You have to reinitiate the network services and startup the dhcp server daemon (dhcpd). To have dhcpd startup at boot, in fedora you the ntsysv program allows you to search a list and mark the corresponding entry.
You may want to setup your printers on master and slave (you can copy an existing printer configuration recursively from /etc/cups e.g. from your local office desktop computer).