{ by david linsin }

August 10, 2009

Cron Jobs on Google App Engine

I've been developing for the Google App Engine (GAE) for a couple of months now and there's a lot, I want to talk about in future blog posts. This installment is about scheduled tasks on the GAE - cron jobs - and a couple of pitfalls that you should be aware of.

The configuration of a cron on GAE is pretty straightforward. You define your job in a file called cron.xml, which goes to your WEB-INF folder in your WAR file.
<?xml version="1.0" encoding="UTF-8"?>
<description>job to clean tmp every 5 minutes</description>
<schedule>every 5 minutes</schedule>

The file is easy to understand: url denotes where the GAE is supposed to send a GET request to, when the cron is triggered. Whatever you place behind the denoted url is your own choice. It can be a plain Servlet or some RESTful resource. The schedule tag contains a english-like syntax to define when the url is supposed to be requested. In this example GAE makes a HTTP GET request to http://yourname.appspot.com/cron/clean every 5 minutes.

One of GAE's subtle details is that your application is being shutdown, as soon as there are no requests coming in for a certain period of time. That's no a problem by itself, however, if you want to access the cached data of your application from within your cron, you are in trouble. It's not a severe problem, you just need prepared for it.

Another pitfall, that you might encounter, is configuring the url of the cron to be secured by SSL. You can define which urls are supposed to be confidential in your web.xml. As soon as you add your cron's url, you'll encounter an error message in your admin console under "Cron Jobs", which says "Too many continues". This indicates, that GAE wants to execute cron jobs using http instead of https, which leads to a HTTP 302 response. Browsers can easily interpret it as "don't use http, use https instead", but the GAE can't.

This is not a security problem, since the urls of your cron job should only be accessible by administrators anyways. You can easily exclude the cron's url from your SSL configuration and everything should work fine.

There are more little gotchas, that I encountered developing for the GAE and I'm going to blog more about it soon.


Cat Replacement Parts said...
This comment has been removed by a blog administrator.


  • mail(dlinsin@gmail.com)
  • jabber(dlinsin@gmail.com)
  • skype(dlinsin)