?

Log in

No account? Create an account

GEEK: Running a ruby script via crontab - Open Knowledge

Jul. 16th, 2008

09:42 pm - GEEK: Running a ruby script via crontab

Previous Entry Share Next Entry

So, I’m trying to set up a ruby script that screenscrapes another website, writes the results to a local file, then rsyncs the file to another machine behind a firewall. This post details the problems I ran into while trying to get cron to run the ruby script and rsync to the remote host.

First of all, to test that crontab is running at all, you can add this command:

* * * * * /bin/echo "foobar" >> /tmp/crontabtest

This should cause “foobar” to be appended to the file /tmp/crontabtest every minute.

Next, the crond daemon doesn’t know anything about your environment variables. So you have to set them in the crontab file. If you can manually run the script from within Terminal, but you get error messages when you run it from your crontab (or nothing happens at all), there’s probably a problem with the environment variables.

Here’s what beginning of my crontab looks like:

% crontab -e

SHELL=/bin/sh
PATH=/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin
HOME=/var/log
MAILTO=crasch@myaddress.com
CRASCH_ROOT=/Crasch
SVN_ROOT=/Users/crasch/Projects/trunk
RUBYOPT=rubygems

Note that the order of the path is important to ruby, for reasons that aren’t entirely clear to me. If I put /usr/local/bin at the end, like this:

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

I get this error message:

/usr/local/lib/ruby/1.8/thread.rb:5:in `require': No such file to load -- thread.so (LoadError)
    from /usr/local/lib/ruby/1.8/thread.rb:5
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:85:in `require'
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:85
    from /usr/local/lib/ruby/site_ruby/1.8/ubygems.rb:10:in `require'
    from /usr/local/lib/ruby/site_ruby/1.8/ubygems.rb:10

If I put /usr/local/bin at the beginning of my PATH:

PATH=/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin

…the script executes fine.

Setting RUBYOPT environment variable is also important, if you plan to use any Ruby gems. By setting the RUBYOPT environment variable to the value rubygems, you tell Ruby to load RubyGems every time it starts up.

If you don’t set it, you’ll likely see this error message:

/Users/crasch/Projects/trunk/Scripts/build/bin/loadpath.rb:24:in `require': no such file to load -- getopt/std (LoadError)
    from /Users/crasch/Projects/trunk/Scripts/build/bin/loadpath.rb:24

You may also need to set these variables:

RUBYLIB=/Crasch/lib/ruby:/usr/local/lib/ruby/site_ruby/1.8:/usr/local/lib/ruby/1.8
GEM_HOME=/usr/local/lib/ruby/gems/1.8
GEM_PATH=/usr/local/lib/ruby/gems/1.8

You can find the GEM_HOME/GEM_PATH with this command:

gem environment gemdir

On my machine, this returns:

/usr/local/lib/ruby/gems/1.8

Next, you’ll need to make sure you can rsync to the machine behind the firewall.

I had already set up passwordless ssh authentication between my local machine and both the firewall machine and the target machine. ssh-agent handles private key requests, and I use keychain as a frontend to the ssh-agent.

In order to use my existing ssh-agent process, the cron process needs to know the following environment variables:

SSH_AGENT_PID=338
SSH_AUTH_SOCK=/tmp/ssh-EAw1mq6I1l/agent.337

You can find out the current values of these variables with this command:

env | grep 'SSH_A*'

However, these variables change whenever the ssh-agent is restarted (such as after a system reboot). I set them by sourcing them from my keychain file before executing the script of interest:

*      *      *       *       1-5     source ~/.keychain/crasch2-sh; $SVN_ROOT/Scripts/build/bin/loadpath.rb >> $CRASCH_ROOT/Logs/loadpath.`date +%Y-%m-%d-%H%M` 2>&1

The script should now be able to rsync the file to the remote machine. Here’s a script I used to debug and test my crontab and ruby settings:

#!/usr/bin/env ruby

require 'date'

STARTTIME = Time.now
TODAY = STARTTIME.strftime("%Y-%m-%d")

# prints out ruby's load path
puts $LOAD_PATH

# prints out all the environment variables that Ruby knows about
ENV.to_hash.each {|key, value| puts "#{key}=#{value}"}

# Trying loading a common RubyGem
puts "Trying to load getopt..."
require 'getopt/std'
puts "getopt/std loaded"

# Create a sample file
filename = "/tmp/test.#{TODAY}"
`touch "#{filename}"`

# Copy the sample file to the /tmp directory on the target machine
`rsync -avv -e "ssh firewall.machine.com ssh" #{filename} crasch@targetmachine:/tmp >> /Crasch/Logs/test_rsync.#{TODAY} 2>&1`

~

Original: craschworks - comments

Comments:

[User Picture]
From:mudita
Date:July 18th, 2008 09:07 pm (UTC)
(Link)
On a slightly related note, do you do any stuff with Rails? Always looking for good Rails programmers around here at Zoom Strategies.
(Reply) (Thread)
[User Picture]
From:crasch
Date:July 22nd, 2008 11:47 pm (UTC)
(Link)
I use Ruby mostly as a sysadmin tool -- haven't done much Rails programming yet.
(Reply) (Parent) (Thread)
[User Picture]
From:reichart
Date:July 20th, 2008 08:46 am (UTC)
(Link)
Easier to do this in REBOL (www.REBOL.com)
(Reply) (Thread)
[User Picture]
From:crasch
Date:July 22nd, 2008 11:48 pm (UTC)
(Link)
Thanks for the pointer!
(Reply) (Parent) (Thread)