[typo] Thinking about caching

Piers Cawley pdcawley at bofh.org.uk
Sat Apr 29 07:06:37 EDT 2006


I've been thinking about how to get the benefits of page caching back
without the pain of the same.

It occurred to me that, for a snappy response to 'real' readers, we
could probably get away with just caching the first page of /articles,
articles/category/whatever, /articles/tag/whatever and so on, along
with their associated feeds and the full articles. It's my gut feeling
that the vast majority of hits are to the first page of an index, or
directly to an article[1]. In order for this to work, we'd have to
ensure that none of our links were ever generated in the form
'/articles/tag/whatever?some_arg=...', but that's merely tedious, not
necessarily a showstopper.

However, there are problems with this approach because we've taken
advantage of fragment caching to introduce goodies like time limited
sidebars and future publication times[2]. One approach to cache
sweeping, suggested by Tobi on IRC, is to have each request process
fire off a sleeper thread, set to wake up at next expiry time (or in
an hour say, whichever is sooner) and zap the cache. Because every
dispatch thread would be doing this, it doesn't really matter if one
of 'em gets killed before time, and a double expiry's not really a
problem either, so what if the cache gets zapped twice, it'll still
get rebuilt.

Another possibility is to stick a bit of javascript in the default
layout that fires off a request to some kind of uncached heartbeat
action, so a page may get served up without touching typo (so the the
browser gets a fast response) but typo still gets hit 'at leisure' so
to speak and can do any posted cache sweeps/publication actions in a
postfilter. I like this idea rather less, because it doesn't work for
feeds, and it's the sort of suspicious javascript activity that gets
apps a bad name.

A third option is to have a separate process that gets fired from a
cron script or something and handles any cache sweeping required, but
I don't think that's going to fly for a lot of people using hosting
services. 

Whatever option gets chosen, I reckon there's a case for unifying the
handling of future events. Here's a sketch of a possible approach:

  class Trigger < ActiveRecord::Base
    belongs_to :pending_item, :polymorphic => true

    class << self
      def post_action(due_at, item, method='came_due')
        create!(:due_at => due_at, :pending_item => item,
                :method => method)
        fire
      end

      def fire
        destroy_all 'due_at < now()'
        true
      end
    end
    
    def destroy
      pending_item.send(method)
    end
  end

This allows for arbitrary ActiveRecord based objects to post trigger
requests. Then application could declare a post_filter that does
'Trigger.fire' at the end of every request. Here's an example of how
Article could take advantage of triggers:

  class Article
    def before_create
      Trigger.post_action(published_at || created_at, self, 'publish!')
    end

    after_save :ping_on_publication

    def publish!
      unless published?
        self.published = true
        self.save!
      end
    end

    def published=(publication_state)
      if publication_state && unpublished?
        @just_published = true
      end
      self[:published] = publication_state
    end

    def ping_on_publication
      if @just_published
        send_notifications
      end
    end
  end

(Note that there's no need to do the cache sweeping logic here
because the cache sweeper already handles that)

We'll have to do some fancy footwork to make sidebars work as
pending_items, but there's virtue in making it happen. For instance,
an aggregation sidebar could check to see if anything had changed in
its target feed and only trigger a cache flush if there was something
new.

Note too that, with this interface, the cron option is easy -- the
commandline to handle everything is:

  /typo_installation/script/runner -e production 'Trigger.fire'

Which seems pretty cute to me.

Thoughts? What have I missed?


1. I would really appreciate it if anyone who can be bothered would go
   through their typo logs and quickly check for the relative
   frequency of non search engine crawler hits on any index pages
   after the first. I'm guessing that there's a serious power law
   curve in effect here.

2. Using the 'created_at' field, which should really, really, really
   be published_at or some such -- overloading 'created_at' in this
   way is simply confusing.

-- 
Piers Cawley <pdcawley at bofh.org.uk>
http://www.bofh.org.uk/


More information about the Typo-list mailing list