Last modified on 21 June 2012, at 08:45

Ruby Programming/Standard Library/DRb

Distributed ruby is used for calling into the methods (like RPC) of another running Ruby process.

A good tutorial can be found here.

And another good one here.

What is DRb?Edit

DRb (Distributed Ruby) is remote method calling for Ruby. It's written in Ruby and supplied as part of the standard library set. Because it serializes objects using Marshal, which is written in C, it's surprisingly fast. 50 method calls per second is easily achievable.

Let's start with a simple example. Here's server.rb, where we create a single instance of an object (in this case a hash) and share it on TCP port 9000.

  require 'drb'

  myhash = { :counter => 0 }
  def myhash.inc(elem)
    self[elem] = self[elem].succ
  end

  DRb.start_service('druby://localhost:9000', myhash)  # replace localhost with 0.0.0.0 to allow conns from outside
  DRb.thread.join

And here's client.rb:

  require 'drb'
  DRb.start_service
  obj = DRbObject.new(nil, 'druby://localhost:9000')

  puts obj[:counter]
  obj.inc(:counter)
  puts obj[:counter]

  puts "Last access time = #{obj[:lastaccess]}"
  obj[:lastaccess] = Time.now

Start the server in one window (or in the background), and in another window run the client a few times:

  $ ruby client.rb
  0
  1
  Last access time = 
  $ ruby client.rb
  1
  2
  Last access time = Fri Oct 22 22:23:59 BST 2004

The client can happily be run on a remote machine - just change 'localhost' to the hostname or IP address where the server is running.

Even just this simple example is immensely powerful. The above object could be used as a shared data store for session data on a webserver. Each web page request can look up and store information in this shared object. It works whether the web pages are served via standalone CGI scripts, Webrick threads, Apache mod_ruby, or fcgi/mod_fastcgi. It even works if you have a cluster of webservers. Furthermore, the session data is not lost if you restart Apache.

So how does it work?Edit

DRb is actually rather sophisticated and elegant in its design, but the fundamental principle is very straightforward.

DRb packages up a method call as an array containing the method name and the arguments, Marshals it into a stream of bytes, and squirts it at the server. The success or failure (exception), plus the return value from the method call, are Marshalled back as the response.

Since DRb is written in Ruby, you can look at the code, which contains lots of comments and examples. It will be on your system in a location like /usr/local/lib/ruby/1.8/drb/drb.rb

Security concernsEdit

If you are using a DRb object to store session data, make sure that only the webserver can contact your DRb object, and that it is not directly reachable from the outside world, otherwise unwelcome guests could directly manipulate its contents. You can bind it to localhost (127.0.0.1) if all clients are on the same machine; otherwise you can put it on a separate private network, or use firewall rules or DRb ACLs to block access from unwanted clients. It is important to do this before calling start_service.

  require 'drb'
  require 'drb/acl'

  acl = ACL.new(%w{deny all
                  allow localhost
                  allow 192.168.1.*})
  DRb.install_acl(acl)

  DRb.start_service('druby://localhost:9000', obj)

Beware that every Object contains methods which could be very dangerous if called by a hostile party. Some of these are private (e.g. exec, system), and DRb prevents these being called, but there are other public methods which are equally dangerous (e.g. send, instance_eval, instance_variable_set). Consider for example obj.instance_eval("`rm -rf /*`")

So sharing an object with the whole Internet is a risky business. If you're going to do this then you should run with at least $SAFE=1, and you should start your object from a blank slate without these dangerous methods included. You can achieve that like this:

  class BlankSlate
    safe_methods = %w{__send__ __id__ inspect respond_to? to_s}
    (instance_methods - safe_methods).each do |method|
      undef_method method
    end
  end

  class MyService < BlankSlate
    def increase_count
      @count ||= 0
      @count += 1
    end
  end

  DRb.start_service('druby://localhost:9000', MyService.new)

Note that this example doesn't use initialize() for setting @count to 0. If it did this clients would also be able to reset @count.

Here's an alternative implementation from Evil-Ruby - HTTP://rubyforge.org/projects/evil/

  # You can derivate your own Classes from this Class
  # if you want them to have no preset methods.
  #
  #   klass = Class.new(KernellessObject) { def inspect; end }
  #   klass.new.methods # raises NoMethodError
  #
  # Classes that are derived from KernellessObject
  # won't call #initialize from .new by default.
  #
  # It is a good idea to define #inspect for subclasses,
  # because Ruby will go into an endless loop when trying
  # to create an exception message if it is not there.
  class KernellessObject
    class << self
      def to_internal_type; ::Object.to_internal_type; end
  
      def allocate
        obj = ::Object.allocate
        obj.class = self
        return obj
      end
  
      alias :new :allocate
    end
  
    self.superclass = nil
  end

Additionally, rather than sharing your original object, you may wish to build a wrapper object and share that instead. The wrapper object can have a limited set of methods (just the ones you really want to share), validate the parameters of incoming data, and delegate to another object when the data has been sanitised.

Thread-safety issuesEdit

Each incoming method call which hits the object you've shared by DRb is started in a new thread. This is pretty essential if you think about it; there may be many clients, and the server can't control when the clients decide to send method calls to it. DRb does not serialise the requests, so that one client can't block out the other clients.

However, this does mean you have to take the same care with your DRb object as you would in any other threaded application. Consider what happens, for example, if two clients both decided to run

  obj[:counter] = obj[:counter] + 1

at the same time. It might happen that both clients would retrieve obj[:counter] and see the same value (say 100), then independently add 1, and then both write back 101. That's probably not what you want, if :counter is supposed to generate unique sequence numbers.

Even the method myhash.inc shown at the top of this page suffers the same problem, because two clients could decide to call inc(:counter) at the same time, causing two threads on the server to suffer the same race condition. The fix is to protect the increment operation with a Mutex:

  require 'drb'
  require 'thread'

  class MyStore
    def initialize
      @hash = { :counter=>0 }
      @mutex = Mutex.new
    end
    def inc(elem)
      @mutex.synchronize do
        self[elem] = self[elem].succ
      end
    end
    def [](elem)
      @hash[elem]
    end
    def []=(elem,value)
      @hash[elem] = value
    end
  end

  mystore = MyStore.new
  DRb.start_service('druby://localhost:9000', mystore)
  DRb.thread.join

Why does the client run 'DRb.start_service'?

A very good question, which leads us on to another interesting aspect of DRb.

In normal operation, DRb will use Marshal to send the arguments to a method call; when they are unmarshalled at the server side, it will have a copy of those objects. The same applies to the result returned from the method; it will be marshalled, sent back, and the client will have a copy of that object.

In many simple cases this copying of objects is not a problem, but there are several cases where it might be:

  • If the server makes a change to the local copy it received, then the client won't see that change
  • The argument or response objects could be extremely large, and you might not want to send them back and forth (such as an object which holds references to other objects, forming a tree)
  • Some types of objects cannot be marshalled at all: they include files, sockets, procs/blocks, objects with a singleton class, and any object which contains those objects indirectly, e.g. in an instance variable.

In these cases, DRb can instead send over a 'proxy object' containing contact details to allow the original object to be called via DRb: that is, the hostname and port where the original object can be found. This is done automatically for any object which cannot be marshalled, or you can force it by including DRbUndumped in your object.

How can we demonstrate this? Well, consider the class defined in the following file, foo.rb

  class Foo
    def initialize(x)
      @x = x
    end
    def inc
      @x = @x.succ
    end
  end

Now, let's have a server which accepts an object and calls 'inc' on it:

  require 'drb'
  require 'foo'
  
  class Server
    def update(obj)
      obj.inc
    end
  end
  
  server = Server.new
  DRb.start_service('druby://localhost:9001', server)
  DRb.thread.join

Here's the corresponding client:

  require 'drb'
  require 'foo'

  DRb.start_service
  obj = DRbObject.new(nil, 'druby://localhost:9001')
  a = Foo.new(10)
  b = Foo.new(20)
  p a
  p b
  obj.update(a)
  obj.update(b)
  p a
  p b

Now, here's what happens if we run it:

  $ ruby client2.rb
  #<Foo:0x817e760 @x=10>
  #<Foo:0x817e74c @x=20>
  #<Foo:0x817e760 @x=10>
  #<Foo:0x817e74c @x=20>

Oops. We passed across our objects 'a' and 'b', but because they were copied onto the server, only the local copies got updated by 'inc'. The objects on the client are unaffected.

Now try modifying the definition of Foo like this:

  class Foo
    include DRbUndumped
    ... same as before

Or alternatively you can modify the client program like this:

  a = Foo.new(10)
  b = Foo.new(20)
  a.extend DRbUndumped
  b.extend DRbUndumped
  ... same as before

And now the result is what we'd hope for:

  $ ruby client2.rb
  #<Foo:0x817e648 @x=10>
  #<Foo:0x817e634 @x=20>
  #<Foo:0x817e648 @x=11>
  #<Foo:0x817e634 @x=21>

So what's happened is, instead of marshalling across an instance of Foo, we have marshalled across the information needed to build a proxy object: it contains the client's hostname, port, and object id which can be used to talk to the original object. When we pass across the proxy object for 'a' to the server, and it calls obj.inc, the 'inc' method call is made back over DRb to the client machine where object 'a' actually lives. You have effectively built a remote 'reference' to the object which can be passed around much like a normal object reference, except it can be handed from machine to machine. Method calls via this reference hit the same object.

Now, this is why the client program needs to run DRb.start_service - even though it's a "client" from our point of view, there might be method call arguments which generate these DRb proxy 'references', at which point the client also becomes a server for those objects.

We didn't specify a host or port here, so DRb chooses any spare TCP port on the system, and the host is whatever the system hostname is according to the 'gethostname' call - e.g. if the machine is called server.example.com then DRb might choose druby://server.example.com:45123

These two-way method calls can be a problem though when there is a firewall between the two machines. You can choose a fixed port on the client side in DRb.start_service instead of having one chosen dynamically; that lets you open up a hole in the firewall for DRb. However, if you are behind a NAT firewall, it almost certainly won't work at all.

Running DRb over sshEdit

One way to solve the problem with two-way method calls through a firewall is to run DRb over SSH. Not only do you get two-way operation with just a single outbound TCP connection through the firewall; you also have your method calls securely encrypted!

Here's how to set it up.

  1. Choose one port for the client end (say 9000) and one for the server end (say 9001)
  2. Establish an ssh connection with a pair of tunnels: port 9001 at the client side is redirected to port 9001 at the server side, and port 9000 at the server side is redirected to port 9000 at the client side.
    $ ssh -L9001:127.0.0.1:9001 -R9000:127.0.0.1:9000 server.example.com
    
    The -L flag requests that connections to port 9001 at the local (client) side are redirected through the ssh tunnel, and reconnected to 127.0.0.1:9001 at the server side. The -R flag request that connections to port 9000 at the remote (server) side are redirected back down the ssh tunnel, and connected to 127.0.0.1:9000 at the client side.
  3. At the server side, do DRb.start_service('druby://127.0.0.1:9001', a) as you would normally
  4. At the client side, do DRb.start_service('druby://127.0.0.1:9000') instead of just DRb.start_service. This gives us a fixed port number to work from.
  5. At the client side, connect to the remote object as:
  obj = DRbObject.new(nil, 'druby://127.0.0.1:9001')

Voila, you are up and running. You can try the DRbUndumped example from above, with the client behind a NAT firewall. Also notice that the ssh -L and -R options bind to 127.0.0.1 by default, so people on other machines cannot connect to the tunnel endpoints (although of course, other people on the same machine can do so).

You could try using the Net::SSH module to establish the ssh connection and the tunnels, instead of using the command-line ssh client. [If you've done this, insert example here]

Running DRb over SSLEdit

SSL is another way to secure and encrypt your connections (note: SSL and SSH are *not* the same thing!)

Online tutorial: HTTP://segment7.net/projects/ruby/drb/DRbSSL/

Running DRuby through firewalls - ruby-only solution ( HTTP://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/89976 ) Often a client has firewall installed, so standard DRb will not be able to make callbacks, making block/io/DRbUndumped? arguments useless. To make sure DRb operates as normal, one can use HTTP://rubyforge.org/projects/drbfire and HTTP://drbfire.rubyforge.org/classes/DRbFire.html

from documentation:

  1. Start with require 'drb/drbfire'.
  2. Use drbfire:// instead of druby:// when specifying the server url.
  3. When calling DRb.start_service on the client, specify the server's uri as the uri (as opposed to the normal usage, which is to specify *no* uri).
  4. Specify the right configuration when calling DRb.start_service, specifically the role to use. Server: DRbFire::ROLE => DRbFire::SERVER and client: DRbFire::ROLE => DRbFire::CLIENT

Simple server:

   require 'drb/drbfire'
   front = ['a', 'b', 'c']
   DRb.start_service('drbfire://some.server.com:5555', front, DRbFire::ROLE => DRbFire::SERVER)
   DRb.thread.join

And a simple client:

   require 'drb/drbfire'
   DRb.start_service('drbfire://some.server.com:5555', nil, DRbFire::ROLE => DRbFire::CLIENT)
   DRbObject?.new(nil, 'drbfire://some.server.com:5555').each do |e|
     p e
   end