{ by david linsin }

April 13, 2009

Playing with Google App Engine

I'm jumping on the hype wagon and checking out the Google App Engine (GAE) for Java. After writing a simple sample application using Servlets, Groovlets and the JPA, I want to share with you the bumpy ride I had.

If you are interested in checking out the sample application you can either download it or check it out on github.

Fankly, I only had a bumpy ride, because I didn't follow Google's suggestion to use the Eclipse IDE plug-in together with a set of ANT tasks, they provide. Instead I wanted to use IntelliJ and Maven. Another suggestions of Google I didn't follow, is to use Java Data Object (JDO) as a persistence API. Yes, that's right, Google suggests JDO as a preferred way to store data. If you are working on a somewhat up to date Java enterprise application, you are probably using the Java Persistence API (JPA) instead. The difference between the two standards, according to wikipedia, are:

... JPA, however, is an Object-relational mapping (ORM) standard, while JDO is an Object-relational mapping standard and a transparent object persistence standard. JDO, from an API point of view, is agnostic to the technology of the underlying datastore, whereas JPA is being oriented totally around RDBMS datastores.

Google's underlying persistence layer is called Bigtable. It's designed to scale across distributed systems, by being non-relation. That means you can dump your data into Bigtable, without brooding over your schema first. On the question why Google favors JDO over JPA, I think it's because you are not dealing with an RDBMS, but Bigtable.

The first thing I did was to write a simple hello world JSP and Servlet and packaged it as a WAR file. Thanks to IntelliJ's built-in enterprise support, I was up and running in no more than 10 minutes. The only difference to a regular enterprise application is the file appengine-web.xml. It contains your application identifier and version. The versioning schema is quite interesting. You can configure which version a user sees and simultaneously deploy a new version for testing.

Google provides a couple of command-line tools, which help you to start a local version of the app engine and upload your application to the cloud. Once you are online, you can access your server logs through a dashboard. Unfortunately I wasn't able to see the logs instantly, after encountering errors. I had to wait for a couple of seconds, which is pretty annoying and makes local testing much more important.

The first hurdle I had to overcome was integrating JPA. Google uses an implementation called DataNucleus, which relies on post-compile bytecode manipulation of your entities. I had the hardest time integrating the bytecode manipulation with my Maven build. Fortunately, the App Engine Google Group was very helpful and the problem was solved quickly. After a couple of hours, I was able to run my first JPA enabled application, without thinking about deployment or database issues. No problems with getting a server to run or twiddling with database settings - the stuff just runs.

As a developer, your probably don't want be bothered with infrastructure, which sometimes looks like black magic. I really like the completely transparent way of developing for the GAE, but I'm skeptical if I'd want to rely my business on this kind of hosting. With Amazon's EC2 you still have the configuration of the system under control and that's probably important, if you want to run a business on it. In terms of pricing, I can't really say much. To me it looks almost the same, except that with GAE you have a quota on everything. There's even a daily quota on datastore API calls, which can't be increased even by paying for it. That's a bit of a bummer! However, I think they chose the quotas reasonably and maybe the are being increased after the public release of the GAE for Java.

In my next blog post, I will cover some more details on how to run Groovlets. I also did some simple load testing with Apache JMeter to get a feeling on how GAE scales.

4 comments:

unmaintainable said...

I think most developers underestimate the limitations those new persistence layers (BigTable, SimpleDB, CouchDB, etc.) impose on application design compared to the relational model we're all used to. They're not just trading consistency for availability, you are also quite limited when it comes to querying. For BigTable, Google itself uses its MapReduce clusters for complex queries, but as an external developer, you don't have access to that.

Everybody who considers locking himself into a proprietary hosting solution should check carefully if he needs the scalability Google or Amazon provide and if it's actually possible to cover all use cases with it. I, for example, decided that Django on AppEngine is not what I need.

david said...

I totally understand your reservations about proprietary solutions. There are trade-offs the same way as with every other decision you have to make in our business.

As for the limitation in querying, I have to agree that GQL is limited and also the implemented version of JPA is not at its best. However, I guess for crunching big sets of data, GAE is not the solution you are looking for. For web apps that it's suitable for, doing some coarse grained query and reducing/associating your data in your application code (aka application join), is absolutely sufficient.

Maybe you could write a blog post about why you didn't choose GAE for your app sometimes.

Moritz Post said...

Nice writeup david. How did those jmeter test turn out?

I did some experimenting with osgi on the GAE (using equinox) but it is not as straigt forward as the framework wants to write to the local filesystem.

Chris has done a blog post on that: http://eclipsesource.com/blogs/2009/04/10/osgi-on-appengine/

david said...

Moritz > How did those jmeter test turn out?

Looks pretty good! I didn't do any thorough analysis, but I think GAE scales just as expected. More about it in a couple of days...


com_channels

  • mail(dlinsin@gmail.com)
  • jabber(dlinsin@gmail.com)
  • skype(dlinsin)

recent_postings

loading...