Building a Beowulf Cluster/Installation, Configuration, and Administration/Networking

We want the computers to work in a local network.

The easiest way to setup the network is by DHCP. For practical purposes it is easiest to set up dynamic addresses (DHCP) handled by the master on the basis of physical addresses of slaves' network interfaces. DHCP simplifies the installation of new nodes, because the mac address and hostname is the only thing that is different among the nodes and the DHCP server on the master can manage a new node by a new entry into the configuration file.

In this example we will set up the network IPs to 192.168.1.1 until 192.168.1.8, where 8 is the master.

slave(s):/etc/sysconfig/network-scripts/ifcfg-eth0
 
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp
master:/etc/dhcpd.conf
 
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 10.5.170.254;
# define individual nodes
subnet 192.168.1.0 netmask 255.255.255.0 {
group { 
# define all slaves, the master (head node) is has a static ip address
host node0{
hardware ethernet 00:1E:8C:30:AC:2A;
fixed-address 192.168.1.250;
}
host node1{
hardware ethernet 00:1E:8C:30:B0:A1;
fixed-address 192.168.1.1;
}
#... Here you can put more nodes. Make a list of the mac addresses of all your machines and enter them in the list. 
}
}
# ignore petitions from second network interface
subnet 10.5.170.0 netmask 255.255.255.0 { not authoritative; }


The idea is to give to slaves the names nodei corresponding to their ip address 192.168.1.i.

Note that the DHCP server provides IP addresses for the other machines not for itself. The master you give a static ip address (192.168.1.250 here). In red hat based distributions this is configured in /etc/sysconfig/network-scripts/ifcfg-eth0 or /etc/sysconfig/network-scripts/ifcfg-eth1.

I configured eth0 for the organization network and eth1 for the cluster intranet. Example files: /etc/sysconfig/network-scripts/ifcfg-eth0 corresponds to your organization network settings.

DEVICE=eth0
ONBOOT=yes
...
/etc/sysconfig/network-scripts/ifcfg-eth1
 
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
NETWORK=192.168.1.0
IPADDR=192.168.1.250
TYPE=Ethernet

For the slaves, the interface to the cluster intranet is as follows:

DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
DNS1=...
DNS2=...

If you don't use a DNS service on your head you use the DNS service of the network of your organization.

/etc/hosts
 
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.1.250 node0
192.168.1.1 node1
# ... add more names of machines here

Note in /etc/hosts that in the loopback line (first line) the hostname is not given in order to avoid problems with message protocols (PVM, MPI).

You need to activate ip forwarding on the head in order to have internet access on all machines. You enable the firewall and include masquerading on the network interface to you cluster. This you do by changing the /etc/sysconfig/iptables file or using some user interface, e.g. system-config-firewall on red hat based systems. Be careful not to make your firewall too restrictive as this can cause problems.

In the /etc/sysconfig/network you need to have:

NETWORKING=YES
IPFORWARD=YES
HOSTNAME=nodei

You have to reinitiate the network services and startup the dhcp server daemon (dhcpd). To have dhcpd startup at boot, in fedora you the ntsysv program allows you to search a list and mark the corresponding entry.

You may want to setup your printers on master and slave (you can copy an existing printer configuration recursively from /etc/cups e.g. from your local office desktop computer).

Useful referencesEdit

Last modified on 29 July 2009, at 17:58