crasch (crasch) wrote,
crasch
crasch

GEEK: Running a ruby script via crontab

So, I’m trying to set up a ruby script that screenscrapes another website, writes the results to a local file, then rsyncs the file to another machine behind a firewall. This post details the problems I ran into while trying to get cron to run the ruby script and rsync to the remote host.

First of all, to test that crontab is running at all, you can add this command:

* * * * * /bin/echo "foobar" >> /tmp/crontabtest

This should cause “foobar” to be appended to the file /tmp/crontabtest every minute.

Next, the crond daemon doesn’t know anything about your environment variables. So you have to set them in the crontab file. If you can manually run the script from within Terminal, but you get error messages when you run it from your crontab (or nothing happens at all), there’s probably a problem with the environment variables.

Here’s what beginning of my crontab looks like:

% crontab -e

SHELL=/bin/sh
PATH=/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin
HOME=/var/log
MAILTO=crasch@myaddress.com
CRASCH_ROOT=/Crasch
SVN_ROOT=/Users/crasch/Projects/trunk
RUBYOPT=rubygems

Note that the order of the path is important to ruby, for reasons that aren’t entirely clear to me. If I put /usr/local/bin at the end, like this:

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

I get this error message:

/usr/local/lib/ruby/1.8/thread.rb:5:in `require': No such file to load -- thread.so (LoadError)
    from /usr/local/lib/ruby/1.8/thread.rb:5
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:85:in `require'
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:85
    from /usr/local/lib/ruby/site_ruby/1.8/ubygems.rb:10:in `require'
    from /usr/local/lib/ruby/site_ruby/1.8/ubygems.rb:10

If I put /usr/local/bin at the beginning of my PATH:

PATH=/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin

…the script executes fine.

Setting RUBYOPT environment variable is also important, if you plan to use any Ruby gems. By setting the RUBYOPT environment variable to the value rubygems, you tell Ruby to load RubyGems every time it starts up.

If you don’t set it, you’ll likely see this error message:

/Users/crasch/Projects/trunk/Scripts/build/bin/loadpath.rb:24:in `require': no such file to load -- getopt/std (LoadError)
    from /Users/crasch/Projects/trunk/Scripts/build/bin/loadpath.rb:24

You may also need to set these variables:

RUBYLIB=/Crasch/lib/ruby:/usr/local/lib/ruby/site_ruby/1.8:/usr/local/lib/ruby/1.8
GEM_HOME=/usr/local/lib/ruby/gems/1.8
GEM_PATH=/usr/local/lib/ruby/gems/1.8

You can find the GEM_HOME/GEM_PATH with this command:

gem environment gemdir

On my machine, this returns:

/usr/local/lib/ruby/gems/1.8

Next, you’ll need to make sure you can rsync to the machine behind the firewall.

I had already set up passwordless ssh authentication between my local machine and both the firewall machine and the target machine. ssh-agent handles private key requests, and I use keychain as a frontend to the ssh-agent.

In order to use my existing ssh-agent process, the cron process needs to know the following environment variables:

SSH_AGENT_PID=338
SSH_AUTH_SOCK=/tmp/ssh-EAw1mq6I1l/agent.337

You can find out the current values of these variables with this command:

env | grep 'SSH_A*'

However, these variables change whenever the ssh-agent is restarted (such as after a system reboot). I set them by sourcing them from my keychain file before executing the script of interest:

*      *      *       *       1-5     source ~/.keychain/crasch2-sh; $SVN_ROOT/Scripts/build/bin/loadpath.rb >> $CRASCH_ROOT/Logs/loadpath.`date +%Y-%m-%d-%H%M` 2>&1

The script should now be able to rsync the file to the remote machine. Here’s a script I used to debug and test my crontab and ruby settings:

#!/usr/bin/env ruby

require 'date'

STARTTIME = Time.now
TODAY = STARTTIME.strftime("%Y-%m-%d")

# prints out ruby's load path
puts $LOAD_PATH

# prints out all the environment variables that Ruby knows about
ENV.to_hash.each {|key, value| puts "#{key}=#{value}"}

# Trying loading a common RubyGem
puts "Trying to load getopt..."
require 'getopt/std'
puts "getopt/std loaded"

# Create a sample file
filename = "/tmp/test.#{TODAY}"
`touch "#{filename}"`

# Copy the sample file to the /tmp directory on the target machine
`rsync -avv -e "ssh firewall.machine.com ssh" #{filename} crasch@targetmachine:/tmp >> /Crasch/Logs/test_rsync.#{TODAY} 2>&1`

~

Original: craschworks - comments

Tags: cron, crontab, geek, programming, ruby
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 4 comments