Tuesday, May 17, 2016

Accessing Google Cloud Storage with Go and AppEngine

I spend the last night figuring out how to access Google Cloud Storage (GCS) with Go. This short howto is explains how to create a GCS bucket (for free, no credit cards required) and how to access this GCS bucket from a Go program, an AppEngine instance and the AppEng dev server.

Creating a bucket

To store data, you need a bucket. It is possible to create one for free, using https://console.cloud.google.com/storage/ but it will ask to turn on billing.

Another option is to create a free AppEngine project which comes with a 5GB bucket for free and does not require any billing to be turned on. Whether you want to use AppEngine or not does not matter. Simply go to https://console.cloud.google.com/appengine and create a new project (the drop down button on the top right corner). When asked for a name type "myproject". AppEngine will come back to you with an app-id somewhat like "myproject-1234". Now you should be able to select "Settings" on the left side of the screen. There is a blue button that will finally create a new GCS bucket. The name of the bucket is "myproject-1234.appspot.com".

Now you can go to https://console.cloud.google.com/storage/ and browse the bucket. So far it is empty, but you can already upload and download files manually this way.

Installing the Client Libraries

You need to install the Go client libraries to work with GCS and appengine.

go get -u golang.org/x/oauth2
go get -u google.golang.org/cloud/storage
go get -u google.golang.org/appengine/...

Installing the SDK

Install the SDK from here: https://cloud.google.com/sdk/ and follow the installation instructions on the web page. The SDK is required if you want to work with GCS from your computer. It is not required to use GCS from AppEngine in a production environment. However, it is required to use GCS from the AppEngine dev server. Now initialize the SDK:
gcloud init
This will prompt you to grant access to your google account. You should select "myproject-1234" (or whatever the name of your app is) as the active project.

Accessing GCS on AppEngine

Since the bucket is linked to your AppEngine project, AppEngine knows the name of the bucket. You can programmatically retrieve it like this

import (
        "golang.org/x/net/context"
"google.golang.org/cloud/storage"
        "google.golang.org/appengine"
        "google.golang.org/appengine/file"
)

...
ctx := appengine.NewContext(r)
bucketname, err := file.DefaultBucketName(ctx)
client, err := storage.NewClient(ctx)

Now you can use the client to work on the bucket and its objects. See https://godoc.org/google.golang.org/cloud/storage for more information.

However, this will only work if you deploy the app to AppEngine. It will not work with the dev server. The reasons seems to be that the dev server has neither a GCS locally nor the right credentials to access the GCS

Accessing GCS on AppEngine Dev Server

AppEngine's dev server assigns your app a different default bucket. So we have to convince it to choose our already existing bucket. Start your application like this
dev_appserver.py --default_gcs_bucket_name "myproject-1234.appspot.com" myapp/
The important part is not to use goapp. Instead launch the dev server as shown above as this allows us to pass along the right bucket name.

Accessing GCS this way is slow, but it works for dev stuff somehow. Please note that this solution uses the online GCS. The dev server does not come with a GCS emulator (not yet).

Accessing GCS from a Go Application

You need to grant your program access to your GCS data. See https://developers.google.com/identity/protocols/application-default-credentials for more details. There are several options. The most convenient one for development is to use the following command:
gcloud beta auth application-default login
Now each process running within your account on your local machine will have the credentials to access your GCS data. Accessing the bucket is now as easy as:

client, err := s.NewClient(context.Background())
bucket := client.Bucket("myproject-1234.appspot.com")

Saturday, April 9, 2011

The case of functional programming, excel and the web

When I was a CS student, I thought functional programming (FP) was absolutely useless. Lisp was a nice way of solving problems that took a list of strings as input and produced another list as output. But when it came to GUI programming, operating systems, or low-level network hacking, FP was nowhere to be seen. It just did not fit. The crazy approaches in doing FP GUIs just demonstrated how poorly FP was suited to the task.

I used to smiled when reading an yet another article demonstrating that QuickSort can now be implemented even more elegantly in two lines of code or that it is now as fast as procedural QuickSort thanks to some crazy compiler optimizations. When I wanted QuickSort, I called the library function. Thank you.

Years later I ended up in the lecture hall of Microsoft Research in Cambridge, UK. I was facing a series of talks on FP by some of the brightest minds the field had to offer. I prepared to be bored to death, especially since I could not leave for political reasons. When finally Simon Peyton Jones entered the room he caught my attention. Not because he is the godfather of Haskell, but because he did not wear any shoes. Famous CS people need their own crazy habit. Let it be a long beard like Ken Thomson, ugly glasses like Bill Gates, or a dislike of shoes.

I remember only one statement from these talks. Simon said that Excel is perhaps the most successful functional programming language of all times. This is kind of true. An Excel sheet is a bunch of functions and Excel decides when and how to evaluate these functions. Being the author of the KOffice spreadsheet myself, I realized that I had even implemented a functional language without knowing it.

Ever since I changed my mind. FP is useful, you just have to find the proper use case. I am still convinced that a general purpose programming language is not the right use case and therefore I pretty much ignore LISP, Haskell and all of their friends.

Whenever I see a new problem I wonder whether it might be a fit for functional programming. Several years ago when using template languages for generating HTML on the server side I realized that these are FP as well. A template is just a function that takes some data as input and produces HTML as output. It avoids state and mutable data.

However, templates have a problem with todays real-time push web applications. Modern web applications render their HTML on the client-side. It seems that they have to. Using XMLHTTPRequest the client polls for new data and then some nasty DOM traversal code modifies the HTML to incorporate the new data. Writing applications like these is no fun at all. When the design of the web page needs a change, the DOM traversal code must be adapted. Luckily I don't have to earn my money this way.

Lately I had the idea of combining my two favorite FP languages (templates and Excel) to write modern web apps more easily. The nice thing about spreadsheets is that the interpreter is very good at recomputing the value of all the function upon changes of the input data, i.e. when you change one cell in Excel only the minimum number of formulas is recomputed because Excel knows about the functional dependencies between the cells.
The cool thing is that Excel can do this optimization automatically.

Let's apply this idea to web programming. In Excel the input are cells with data and the output are strings and numbers in other cells. In web programming the output is HTML and the input is some live data coming from a database. A template is a function that maps input to output. So far so trivial.

It becomes more interesting when the input data changes, e.g. someone appended a new record to the database, or an existing record has been updated. Recomputing the entire template and replacing the HTML in the browser would be insanely inefficient. However, by pulling off the Excel trick, we can make it more efficient. The template engine must know the functional dependencies between the database and certain pieces of HTML. When the data changes it can determine which parts of the template need to be updated.

Thus, the server is simply sending minimal HTML patches to the client, telling it about the minimal HTML modifications. All it needs on the client side is a general purpose library that applies these patches to the DOM. On the server side a new template engine is required that works more like Excel, i.e. it can compute the minimal changes to its previous HTML output as input data is being updated. Since this magic is realized by the template engine, programmers are entirely shielded from this ... just as in spreadsheets.