News, ideas and randomness

The Open Elm Project

Posted: October 24th, 2011 | Author: Andrew Gleave | Filed under: Django, Uncategorized, couchdb, mobile app, portfolio | No Comments »

This is a blog post which is well overdue.

In April we launched the Open Elm Project which, in collaboration with the Isle of Man Department of Environment, Food & Agriculture, enables the public to monitor and record the Isle of Man’s Elm tree population and report potential outbreaks of Dutch Elm Disease.

Unlike the UK, the Island has been largely unaffected by Dutch Elm Disease and has a population of ~200,000 trees. Unfortunately, the disease is on the rise and although the Isle of Man Government has done a sterling job of controlling and mitigating it’s impact, budget constraints have reduced the funds available for regular professional tree surveys meaning little is known about how quickly or to where the disease is spreading.

Early in the year we approached the Government with a concept: give the public simple tools which they can use to help the fight against the disease and report suspected outbreaks. The idea comprised of a website and two apps for iPhone and Android devices which can be used to find out information about Dutch Elm Disease and record sightings of diseased (or healthy) trees. To our delight, the Government jumped at the chance to participate in the project.

We build a site which enables people to get information about Elm trees and about Dutch Elm Disease itself, and learn how to spot the signs of the disease. We also build two mobile apps (another first for the IoM Government), which enables people to record sightings while they’re out in the countryside.

Using the apps users can take a picture of the tree(s), choose whether it requires inspection and submit it for review by the DEFA team. The records are automatically geotagged by the phone’s GPS radio, so the team can see where the tree is to an accuracy of ~10m on the site’s Google Map – much better than a grid reference!

All reports are first reviewed by the DEFA team and are then made public on the site’s report map and in the mobile apps themselves.

From the off, we wanted this project to be the Isle of Man’s first Open Data project and we released all the source code, and have documented how to get direct access to the database – everything about the project is fully open and transparent.

The project was build entirely using Open Source technology: Django, CouchDB, PhoneGap, jQuery Mobile and the source code is licensed under the GPL. We encourage others who think this type of project could be beneficial to their cause to use the code as they see fit.

The project announcement proved a hit with it being reported by the BBC and by numerous sources in the US and we’ve had a great uptake for such a new project.

Since the disease is hard to spot during the winter months, we’ll be promoting the project with urgency next spring and hope to get a loyal band of contributors to help preserve these trees.


London Taxi Quotes and Bookings

Posted: August 9th, 2010 | Author: Scott Barnham | Filed under: Django, portfolio | No Comments »

If you’re looking for a taxi in London, check out Tick Tock Taxi, the new taxi booking site recently launched by our friends at mochii.

Instant Fare Quote

Enter the address you’re at and where you want to go and the site gives you an instant quote for a mini-cab. Booking is easy, just enter your name and phone number.

Tick Tock Taxi

Behind the scenes, there’s a database of cab companies, the nearest of which is notified and will contact you within minutes. It’s a simple and painless way to find the cost and book your taxi. There’s a mobile version under development, too.

Fun Project

We did the web programming for the site using Django. It integrates with Google Maps for the address lookups (geocoding) and to find the distance by road for the fare calculation (using Google Maps directions). It uses an SMS gateway to send text messages to customers and taxi companies.

Tick Tock Taxi was conceived by mochii who also provided the design work. They called us in to do the web development side and we’re happy to be involved.

Do you have an innovative website, web-based or mobile app? Get in contact.


Don’t delete an image file when deleting a Django model instance

Posted: March 2nd, 2010 | Author: Scott Barnham | Filed under: Django | Tags: | 1 Comment »

If you have a Django model with a FileField or ImageField, when you delete the model instance, the associated file or image is also deleted. In most cases this is desirable and keeps things tidy, but I had a situation recently where the image file should not be deleted when the model was deleted. Here’s a simple way to override the default behaviour.

Custom file storage

Django uses storage classes to determine how files are read and written. Normally, the data is just written as files to disk, but there are other possibilities such as storing on remote servers.

It’s easy to write a custom file storage class to override the behaviour of the default FileStorageSystem. In this case, we only need to change the delete method so it does not delete the file.

In custom.py

from django.core.files import storage

class NoDeleteFileStorage(storage.FileSystemStorage):
    def delete(self, name):
        pass

We can then use the custom file storage by making an instance and passing it to the ImageField.

In models.py

from custom import NoDeleteFileStorage

ndfs = NoDeleteFileStorage()

class ImageInstance(models.Model):
    image = models.ImageField(storage=ndfs, ...)

It’s as simple as that! Custom file storage has some interesting possibilities. With it you can handle how files are named or integrate with some caching or CDN.


Requiring https for certain paths in Django

Posted: February 6th, 2010 | Author: Scott Barnham | Filed under: Django | 2 Comments »

A while ago I wrote about Securing Django with SSL. Here’s a small addition.

Some paths need https

If you’re using SSL it makes sense for certain parts of the site to require a secure connection. For example, the admin section.

Previously I shared the secure_required decorator which forces requests to use https for specific views. This works ok, but if you know an entire section of the site under a given path (e.g. /admin/) should be secure, it’s hassle to have to add the decorator to each view.

You can require secure connections over https using webserver config or using Django itself.

Requiring https using Nginx

In your Nginx config file under the section for the unsecure http/port 80 server you can specify a location path and redirect all requests to it to https instead.

server {
    listen 10.10.10.10:80;
    server_name example.com;
...
    location /admin {
        # force admin to use https
        rewrite (.*) https://example.com/$1 permanent;
    }
...
}

Apache and other web servers can have a similar configuration.

If you can configure it in the web server, that’s more efficient because the request can be redirected by the server, without having to contact your Django project. However, it should be fairly rare for requests to be redirected like this so it’s not a big performance issue and sometimes it’s easier to handle things in Django.

Requiring https using Django middleware

In Django it’s easy to write custom middleware which gets called before each request reaches a view.

Here’s a small piece of middleware which checks if the request is over http to a path we want to be secure and if so redirects to the same path but over https.

from django.http import HttpResponsePermanentRedirect
from django.conf import settings

class SecureRequiredMiddleware(object):
    def __init__(self):
        self.paths = getattr(settings, 'SECURE_REQUIRED_PATHS')
        self.enabled = self.paths and getattr(settings, 'HTTPS_SUPPORT')

    def process_request(self, request):
        if self.enabled and not request.is_secure():
            for path in self.paths:
                if request.get_full_path().startswith(path):
                    request_url = request.build_absolute_uri(request.get_full_path())
                    secure_url = request_url.replace('http://', 'https://')
                    return HttpResponsePermanentRedirect(secure_url)
        return None

In settings.py

MIDDLEWARE_CLASSES = (
...
    'myproject.middleware.SecureRequiredMiddleware',
)

HTTPS_SUPPORT = True
SECURE_REQUIRED_PATHS = (
    '/admin/',
    '/accounts/',
    '/management/',
)

SECURE_REQUIRED_PATHS is a list or tuple of paths that should be secure. Any request to a path which starts with one of these will be required to use https.

HTTPS_SUPPORT is a custom setting to make it easier to use this on your dev server without SSL support. Set it to True in the settings for the live server and False in the settings for the dev server.

So there we go, an easy way to require secure https requests for certain parts of your Django site.


Securing Django with SSL

Posted: February 18th, 2009 | Author: Scott Barnham | Filed under: Django | 2 Comments »

When we built the centralized authentication system for Red Robot Studios we wanted all authentication and account resources to be available solely over https.

This article covers some tips and tricks we discovered while building the app, and how you can use Django to get fine-grained control as to which resources are available securely.

Why bother with security?

We all know that data sent over http is cleartext and can potentially be read on any network between the client and server. But the risk feels pretty minimal and many sites don’t bother using SSL to encrypt sensitive traffic. For online banking and ecommerce, you’d be crazy not to use it, but for other sites, why bother?

The chances of your http requests being snooped upon by an ISP, intermediate networks or your hosting company seem minimal. But one potentially big risk is users accessing your website on an open wireless network.

For example, perhaps your user has an unsecured wireless home or office network or maybe they use wireless networks in coffee shops and airports: It’s really easy in this situation for sensitive requests to be snooped upon.

The data on your website may not be sensitive, but if you use Django’s admin or authentication frameworks, two important bits of information are passed as cleartext.

When a user logs in, their username and password is posted in cleartext. Assuming login is successful, each subsequent request includes a cookie containing the sessionid. The sessionid is just a random string, but if you know the sessionid of a user, it is trivial to hijack the session and have the same access to the website as that user does until they log out.

Encrypting login sessions

If you want to be sure user credentials and sessions cannot be compromised by eavesdroppers, you need to use SSL encryption. Install an SSL certificate on the server so that traffic is encrypted end-to-end between client and server.

You probably don’t want the whole site to be secure because it will be a lot slower and significantly increase the load on your servers. Instead, you can be selective about which parts of the site should use https instead of http. If you want user sessions to be secure, you should make sure that logging in and all parts of the site that require a logged-in user use https.

SSL

Standard SSL certificates are pretty cheap these days – under $20 per year. We go some from RapidSSLOnline. Each secure site needs its own IP address, so if you’re hosting multiple sites using virtual hosting, you’ll need to look in to getting some dedicated IPs.

There are lots of guides to installing SSL certificates and configuring web servers such as Apache, Lighttpd and Nginx, so I won’t cover that here.

Making Django sessions secure

Django uses cookies for its sessions. When a cookie is set, you can specify that it be a secure cookie, meaning it is only ever passed over https and not in http requests. We can tell Django to use secure cookies for sessions by adding a setting to settings.py

SESSION_COOKIE_SECURE = True

If you set Django to use secure cookies then try to log in over http you will get the error

Looks like your browser isn’t configured to accept cookies. Please enable cookies, reload this page, and try again.”

This happens because Django sets the cookie, but it’s a secure cookie, so when the page loads over http, Django can’t see the cookie and so assumes cookies are disabled in your browser.

Requiring https for admin

To avoid this cookie warning and make sure you only ever pass your admin credentials over https, you can configure your web server so that any http requests are redirected to https.

For example, in Nginx it would look like:

server {
    server_name example.com;
    location /admin {
        # force admin to use https
        rewrite (.*) https://example.com/$1 permanent;
    }
...
}

In Apache, something like:

<Location /admin>
    RewriteRule (.*) https://example.com/$1 [L,R=301]
    ...
</Location>

Of course, these bits of config should go in the http config, not the https config or you will cause infinite redirects!

Requiring https for certain views

If all the logged-in parts of your site are in a certain path (e.g. /accounts/ and /members/) you can configure your web server in the same way to require https for these locations.

If certain views require https (e.g. /members/bert/ is public but /members/bert/edit/ requires login), you may want to check request.is_secure() in those views. A neat way to do it is with a decorator which can also redirect any http requests to https.

from django.conf import settings
from django.http import HttpResponseRedirect

def secure_required(view_func):
    """Decorator makes sure URL is accessed over https."""
    def _wrapped_view_func(request, *args, **kwargs):
        if not request.is_secure():
            if getattr(settings, 'HTTPS_SUPPORT', True):
                request_url = request.build_absolute_uri(request.get_full_path())
                secure_url = request_url.replace('http://', 'https://')
                return HttpResponseRedirect(secure_url)
        return view_func(request, *args, **kwargs)
    return _wrapped_view_func

Then on your view:

@secure_required
@login_required
def edit_member(request, slug):
    ...

Moving between http and https pages

It’s normal to use full path URLs like /accounts/login/ and /blog/. Bear in mind that if you are accessing the site over https and follow one of these links, you will also access them over https. If you want to be explicit, you need to specify the protocol and domain in the links, e.g. https://example.com/accounts/login/ and http://example.co/blog/ .

For $20 and a bit of config, you can secure logged-in sessions on your site and protect yourself and your users from being compromised by eavesdroppers. There are still plenty of sites where this is overkill, but you can see now how easy it is to secure your Django site with SSL.


Versioned Media and Expires Headers in Django

Posted: December 18th, 2008 | Author: Andrew Gleave | Filed under: Django | 2 Comments »

We try to make our sites as responsive as possible, and as part of our testing, we realised that we should do the right thing and add Expires Headers to our static media. Our web servers are configured so when a client requests an image, stylesheet or JavaScript file, it is returned along with a far-future expires header. This tells the client not to ask for that file again, but to cache it for a month or more.

Encouraging Caching with Expires Headers

Without an expires header, the client will request media files each time it loads a page. Using if-modified-since and etag headers, the server usually doesn’t return the media files, but instead returns a 304 Not Modified response. Not resending the data is good. Not having to deal with the request at all is even better – that’s what expires headers offer.

Of course, if you tell clients not to request your stylesheet again for a month, what happens when you change your stylesheet? The client won’t know and won’t get the changes. That’s pretty disastrous. What we need is a way of changing the URL when our media changes so that clients will pick up the new version.

Changing URLs when Content Changes

There are a number of ways to serve your media so you can specify far-future dates in the expires header, but still have the client pick up new versions. We refer to this as versioned media.

One common scheme is to put the modification date of the file in its URL. When the file date changes, the URL changes and clients request the new version. The URL might look something like /media/main.css?200812180930 or /media/main-2008-12-18-0930.css. The former is easier because the querystring is ignored by the web server and the file returned as normal.

Using the date is good if you want more granular per-file versioning, but it seems a little messy. We decided to use a version number in the URL instead, e.g. /media/v123/main.css. To make this work we need to put a version number in the templates and have our web server ignore the version number and just serve the file.

Versioned Media Context Processor

Typically, Django-based sites use the MEDIA_URL context processor to include external resources such as Javascript and images in to their templates. We expanded on this idea by having VERSIONED_MEDIA_URL which puts the version number in as well.

Remembering to update a version number would be error prone, so we wanted to transparently support any updates to media. We use Subversion for version control, and figured out that we could use the versioning metadata of our media directory to help us generate a unique path, which would change as the media was updated. That’s exactly what we needed, and would allow us to specify expires headers on all paths which include a version number but still ensure that users would receive new copies of files if any changed.

from django.utils.version import get_svn_revision
from django.conf import settings

VERSIONED_MEDIA_URL = None

def get_versioned_media_url():
    if hasattr(settings, 'MEDIA_VERSION') and settings.MEDIA_VERSION is not None:
       version = 'v%s' % settings.MEDIA_VERSION
    else:
        revision = get_svn_revision(settings.MEDIA_ROOT)
        version = revision.replace('SVN-', 'v')
    return u'%s%s/' % (settings.MEDIA_URL, version)

def versioned_media(request):
    """Adds versioned media url to the context."""
    global VERSIONED_MEDIA_URL
    if not VERSIONED_MEDIA_URL:
        VERSIONED_MEDIA_URL = get_versioned_media_url()
    return {'VERSIONED_MEDIA_URL': VERSIONED_MEDIA_URL}

You can see from the code that we’ve added a MEDIA_VERSION setting which is either manually set, or can be set by a deployment script. We make use of django’s get_svn_revision method to pick up the version number from our MEDIA_ROOT and we then append our version number to our MEDIA_URL, adding it to context as VERSIONED_MEDIA_URL.

It’s convenient to update code on your server with a simple svn up, but serving from a working copy may have security issues. Instead, we have a deployment script which updates a working copy then copies the files (excluding .svn directories) to the directories used by the web server. It finds the revision number and writes that to the settings file so our version number is still updated automatically.

Configuring Expires Headers in Nginx

Now that we have a version-specifc URL for all our media, we need to configure the webserver to add an Expires Header to any requests which are destined for a versioned URL. We use Nginx, but the theory applies to any webserver.

location /versioned-media {
    internal;
    expires 90d;
    alias   /srv/www/live/thebarbershop/site/media;
}

location /media {
    rewrite /v(?:\d+)/(.*) /versioned-media/$1;
    rewrite /vunknown/(.*) /media/$1;
    root   /srv/www/live/thebarbershop/site;
}

We configure our /media URL so that any request which matches the version string created by our context processor is forwarded to the /versioned-media path, which then applies the expires header and sets the expiry date to 90 days in the future. Any request path without a version number simply gets served without the expires header.

One drawback: committing a change means that all versioned media URLs change, not just the one for the file that changed. However, we feel this is only a small drawback given the advantages this gives for the common case of a high-traffic site with relatively infrequent changes to the base media.

When you couple adding Expires Headers with other techniques like:

you can dramatically reduce both the number and size of requests to your application, and give users a more responsive experience.


Handling Subdomains in Django

Posted: December 12th, 2008 | Author: Scott Barnham | Filed under: Django | 1 Comment »

This is the first part of our series on some of the more interesting tech we’ve developed for Red Robot Studios. We’re working from webserver up, so we thought that subdomains would be a good place to start.

Subdomains are useful when you want to host multiple sites with the same code and different data. For example, providing websites for clubs where each club has its own subdomain. In Django, you could have a Club model and some associated models holding data. When a user visits alpha.example.com, you want to show the data from the Club model instance for alpha.

DNS Subdomains and Wildcards

You could add individual subdomains using DNS CNAMEs or A records, but if you want to generate objects in your Django app on the fly, have a look at wildcards. To match any subdomain you add “*” as a subdomain. It looks something like:

*.example.com 14400 IN A 208.77.188.166

This will match www.example.com as well as alpha.example.com, beta.example.com, etc.

Webserver Wildcards

The webserver config needs a similar setting so it knows to respond to any subdomain.

In Nginx, it looks something like:

server {
    listen       208.77.188.166:80;
    server_name  example.com *.example.com;
...

In Apache:

<VirtualHost 208.77.188.166>
    ServerName example.com
    ServerAlias *.example.com
...

If you want to use SSL certificates on your subdomains, you need to get a “wildcard subdomain certificate”. They cost more than a regular certificate, but are necessary to provide a valid certificate for any subdomain on your site. We go ours from RapidSSLOnline.

This should be enough config for your Django app to receive requests for any subdomain. Now your Django app needs to respond appropriately for each subdomain.

Getting the Subdomain in Django

There are a few different ways to do it, but we went with a piece of middleware that gets the subdomain from the request and retrieves a matching model.

We have added two additional settings: DOMAIN_MIDDLEWARE_MODEL and DOMAIN_MIDDLEWARE_INSTANCE_NAME to our settings.py so we can specify the model which the middleware queries, and the name which the instance is given when added to our request instance.

DOMAIN_MIDDLEWARE_MODEL = 'core.Club'
DOMAIN_MIDDLEWARE_INSTANCE_NAME = 'club'

The model is assumed to have a field named “slug”, which the middleware uses to match the subdomain against an instance of the model.

Right, so let’s create our middleware:

class DomainMiddleware(object):
    """Gets the correct instance of an application-specific model by matching the
    sub-domain of the request."""

    def __init__(self):
        self.site_domain = Site.objects.get_current().domain
        if self.site_domain.startswith('www.'):
            self.site_domain = self.site_domain[4:]
        self.SUBDOMAIN_RE = re.compile(r'^(?:www\.)?(?P<slug>[\w-]+)\.%s' % re.escape(self.site_domain))
        try:
            app_name, model_name = settings.DOMAIN_MIDDLEWARE_MODEL.split('.', 2)
            self.model = get_model(app_name, model_name)
            self.instance_name = settings.DOMAIN_MIDDLEWARE_INSTANCE_NAME
            assert self.instance_name
        except (AttributeError, AssertionError):
            raise ImproperlyConfigured('DomainMiddleware requires DOMAIN_MIDDLEWARE_MODEL and DOMAIN_MIDDLEWARE_INSTANCE_NAME settings')

In our init method we do some basic setup like creating a regex which will match the subdomain slug, and loading in our model using django.db.models.get_model with the app.model args from DOMAIN_MIDDLEWARE_MODEL.

    def process_view(self, request, view_func, view_args, view_kwargs):
        """If domain is not main site, check for subdomain.

        Get the model from the subdomain slug.
        """
        port = request.META.get('SERVER_PORT')
        domain = request.META.get('HTTP_HOST', '').replace(':%s' % port, '')
        if domain.startswith('www.'):
            domain = domain[4:]
        if domain != self.site_domain:
            match = self.SUBDOMAIN_RE.match(domain)
            if match:
                slug = match.group('slug')
                instance = get_object_or_404(self.model, slug=slug)
            setattr(request, self.instance_name, instance)
        return None

In process_view we grab the subdomain from the HTTP_HOST header of the request, and using get_object_or_404 we load the correct instance of the model and set it as an attribute on our request object with the name given in DOMAIN_MIDDLEWARE_INSTANCE_NAME.

When someone goes to alpha.example.com the middleware picks out alpha and gets the Club instance with slug=alpha and adds it to request to be used in views.

The middleware uses Site.objects.get_current() to get the base URL, so make sure you have Site set up properly or none of your subdomains will match.

The advantage of having the middleware load in the correct instance for you is that your views can simply use the club attribute to access all related data for this club.

def index(request):
    members = request.club.members.approved().order_by('-creation_date')
    return list_detail.object_list(
        request                 = request,
        queryset                = members,
        template_name           = 'club/index.html',
        template_object_name    = 'member'
    )

This is extremely useful when you want to ensure that your data is correctly filtered: you don’t need to have each view filter based on the subdomain, which is pretty error-prone, and if you get it wrong a user’s data would end up on someone else’s page. This way is a lot simpler.