snim2.org

I am a Senior Lecturer in Computer Science at the University of Wolverhampton. My interests generally lie in the area of programming languages and tools, especially for the Internet of Things and other distributed systems.

« Personal blog »

Europython 2014 talk on message passing concurrency and Python


Tagged: concurrency, csp, dynamic languages, manycore, multicore, parallel processing, python

Research diaries and lab notes

The idea of keeping a diary fills me with dread. It conjurers up distant memories of receiving leather-bound paper diaries from well-meaning relatives at Christmas and the crushing obligation to write something, anything every single day, when actually nothing very interesting was going on. The obligation to do something every day is a sure-fire killer of motivation for me. So, as you can imagine, I have never been keen on keeping a regular diary of research notes and results. Not that I haven’t tried. I have a paper notebook that I use to keep track of discussions and obligations from meetings and at various times I’ve tried to use that as a discipline for writing down ideas and notes from my research work. Somehow though, it never stuck.

That is, it never stuck until I read this blog post by Mikhail Klassen on the writeLaTeX blog. Mikhail points out that having a digital diary has some compelling advantages. It allows you to keep track of intermediate results and ideas, links to software repositories and BibTeX citations. This means that next time you need to quickly put together a presentation or poster, or you are starting to write a paper, you can pull figures, citations and text directly from your diary. This is especially useful if a lot of your writing has equations and citations that are time-consuming to keep track of. So, keeping a diary means that a lot of the time-consuming tasks involved at the start of writing a paper or presentation just disappear – those costs are amortized with the costs of keeping the diary. This has enormous appeal to me. The time I get for research is not large, and anything I can do to make my work more efficient makes the process a lot less stressful.

So, having looked carefully at Mikhail’s template I was really impressed, but I wanted to tweak a few  things. In particular I changed the layout of the whole diary and based my version on the excellent tufte-latex class which is inspired by the work of Edward Tufte. I also added a couple of new sections at the top of the diary – Projects and Collaborations and Someday / MaybeProjects and Collaborations is there to help keep track of ongoing commitments, and as a reminder that those projects need to be regularly progressed or abandoned. Someday / Maybe is there to keep track of vague ideas that sound good but you aren’t yet committed to acting on. I find it useful to have a list of these, as they can easily get forgotten, and many good ideas which aren’t quite ready for action can be used as student projects or re-purposed. Other ideas can sit around for a long time, but suddenly become useful when a new collaboration comes about, or you find some scientific result or new technology which makes a previously very difficult idea tractable.

Lastly, like Mikhailmy template and my own notes are on writeLaTeX, which is a cloud platform for writing LaTeX documents. writeLaTeX (and its cousins ShareLaTeX and Authorea) have some great features, like collaborative real-time document editing, auto-compilation so that you can see a current version of the PDF of your document as you type, a wealth of templates and a friendly near-WYSIWYG editor. writeLaTeX can also has a limited sync-with-Dropbox feature for offline work. All of this makes diary entries really simple to write. I just have a writeLaTeX window open in my browser all day and I can write updates and upload new documents as I go along.

Oh, and because I have a pathological aversion to keeping a diary, I call mine “Lab Notes”. Much friendlier!

writeLaTeX.com

Example Lab Notes


Tagged: academic writing, advice, best practice, papers, productivity, research, undergraduate projects, writing

Automate, automate, automate

I’ve recently been working on a new Python project, which started off as a bit of an experiment at the recent PyPy London Sprint. Working on a brand new repository is always nice, a blank slate and a chance to write some really elegant code, without all the crud of a legacy project.

In this case, the infrastructure for the project is pretty involved. I was using the pytest unit testing framework and using the rpython toolkit from pypy, both for the first time.

That led to an interesting situation. When I run the unit tests, I want to use the CPython interpreter. This means I can use all the standard library modules that I know well, and can test the basic algorithms I’m writing. When I want to “translate” my code into a binary executable, I use pypy and some of its rlib replacements for the Python standard library modules. When I get an runtime error in the translation, I need to know whether that is related to my use of the rlib libraries or my code is just plain wrong, and using CPython  helps me to do that.

The problem is that I have to keep switching between different standard libraries and interpreters. Somewhere in my code there is a switch for this:   

DEBUG = True

In testing that switch should be True and in production it should be False, but changing that line manually is a real pain, so I need some scripts to catch when I’ve set the DEBUG flag to the wrong mode.

Test automation #1

Here’s my (slightly simplified) first go at automating a test script:

import subprocess

debug_file = ...
framework = 'pytest.py'
try:
    retcode = subprocess.check_output(['grep', 'DEBUG = False', debug_file])
    print 'Please turn ON the DEBUG switch in', debug_file, 'before testing.'
except subprocess.CalledProcessError:
    subprocess.call(('python', framework))

What does this do? First the script calls the UNIX utility grep to find out whether there the DEBUG flag is correctly set:

retcode = subprocess.check_output(['grep', 'DEBUG = False', debug_file])

If it is, the script prints a warning message:

print 'Please turn ON the DEBUG switch in', debug_file, 'before testing.'

which tells me I have to edit the code, and if not, the script runs the tests:

subprocess.call(('python', framework))

Nice, but I still have to edit the file if the flag is wrong.

Test automation #2

Nicer, would be for the script to change the flag for me. Fortunately, this is easily done with the Python fileinput module. Here’s the second version of the full test script (slightly simplified):

import fileinput
import subprocess
import sys

debug_file = ...
debug_on = 'DEBUG = True'
debug_off = 'DEBUG = False'

def replace_all(filename, search_exp, replace_exp):
    """Replace all occurences of search_exp with replace_exp in filename.

    Code by Jason on:

http://stackoverflow.com/questions/39086/search-and-replace-a-line-in-a-file-in-python

    """
    for line in fileinput.input(filename, inplace=1, backup='.bak'):
        if search_exp in line:
            line = line.replace(search_exp, replace_exp)
        sys.stdout.write(line)

def main():
    """Check and correct debug switch. Run testing framework.
    """
    framework = 'pytest.py'
    opts = ''

    try:
        retcode = subprocess.check_output(['grep', debug_off, debug_file])
        print 'Turning ON the DEBUG switch in', debug_file, 'before testing...'
        replace_all(debug_file, debug_off, debug_on)
    except subprocess.CalledProcessError:
        pass
    finally:
        subprocess.call(('python', framework, opts))
    return

if __name__ == '__main__':
    main()

Test automation #3

So, now the flag is tested, set correctly if needs be and the tests are run. But I still have to run the test script! What a waste of typing. So, the next step is simply to call this script from a git pre-commit hook

Code for this post

The full history for this script can be found here and here.


Tagged: git, programming, pytest, python, software, unit testing, workflow

West Midlands Employment Data

At the Government Open Data Hack Day event organised by James Cattell and Gavin Broughton, Andy Pryke, Christophe Ladroue and I had a go at analysing employment statistics for the West Midlands. In particular we were looking for correlations between employment data and other factors, such as census data about age and gender. As with all data mining work, the most difficult and time-consuming job was cleaning the available data before it could be usefully used in an analysis. Christophe wrote a very clear account of the work he did using R to deal with nomis data. You can see a summary of our results in the video below.

… and if you want to download the yourself here it is publicly available here:

https://docs.google.com/spreadsheet/ccc?key=0AtT1QPEACWUldE9NTVduRGV


#efdhack2012 26th May 2012

This one’s a little different. Python West Midlands is hosting a hackday to kick off a new open source project for a very interesting little charity called Evidence for Development(EfD). EfD wants to help people make better decisions about aid projects – at local and national level – by putting real data about the real situation in the hands of the people making the decisions.

If you want to know if your aid programme is making a difference to the right people then you need to model the economy of your target village or district, before and after. Makes sense; simple science right? Problem is you can’ afford a bunch of western econometricians crawling all over the place (cost too much, takes too long) and anyway their cash-based economic models don’t work that well in a place where cash is only a small part of the economy (grow your own; harvest wild food; get paid in kind or cash or both for day labour; trade crops, labour or other goods; etc, etc). So EfD developed simple economic models that work in this environment, that can be learned and applied by locally trained people and that, are built to run on laptops. No reliance on big foundations’ data centres.

Last year EfD, in partnership with Chancellor College of the University of Malawi and The University of  Wolverhampton developed a Python/MySQLapp to model local economies that is already in use in several countries in Southern Africa.

This year the challenge is bigger – to build software that can model national and international economies. The model exists and works (it has a great track record of predicting famine effects from annual summary surveys of rural economies). But the only current implementations are proprietary, ill-supported and not extensible. Smells like open source spirit.

So for this hackday we’re going to have with us the two developers who led the IHM development last year (from Chancellor College in Zomba, Malawi) and the developers of the modeling methodologies from EfD (from Barnes and Surrey – exotic eh?). We’ll have a pretty completeMySQL database schema to work on and we hope to finish the day with a simple demo scenario that downloads reference data about a geographical area (a livelihood zone) produces a spreadsheet template to capture information about that livelihood zone (what they grow there, what they eat, how they make a living) runs some local completeness reports and uploads the captured data for merging (with other livelihood zone surveys) to allow analysis of a national survey.

I’m not a software developer, can I still contribute?

Yes! Absolutely. There are a number of jobs that can be contributed without writing any code. We would really appreciate the support of contributors who can build a web presence for these projects, write user and developer documentation, help spread the word and any number of jobs! If you’re keen to help out, there will definitely be a place for you.

When:

10:30 onwards, 26th May 2012. Please sign up here.

Where:

Thyme Software, Coventry University Technology Park, Puma Way, Coventry, CV1 2TT [map]


Tagged: charity, development, event, evidence_for_development, hackathon, hackday, malawi, mysql, python, pythonwm, software, uk

The great Christmas email experiment of 2011-12

This year I took pretty much all the holiday time I could over Christmas, probably for the first time ever. As an experiment, I let all the emails I received over this period accumulate in my Inobx, with the exception of things like posts to mailing lists which get automatically filtered, labeled and skip the Inbox. Generally, I try to follow an Inbox Zero policy, which means my Inbox is usually empty and every email I get is either dealt with as soon as I read it or saved in a “Next Action” list to be dealt with later. That policy makes it much easier to carve out large blocks of time for more difficult tasks, like writing lectures, marking or programming which all require uninterupted concentration. I think this works pretty decently, and at least I haven’t had to declare email bankruptcy.

So, the point of this experiment was really to see how well my Inbox Zero policy is working as well as I thought and, in particular, whether the bulk of the email I deal with is sensible content that really requires attention.

Of course, the “experiment” as such is a little silly, after all this is email from a vacation period and out of term time, so the results are weighted heavily. Usually I get a lot more email per day and a lot more relevant, sensible email that needs attention and the aim is always to maximise the time spent on those emails and minimise the time spent on unecessary emails.

Starting point

Anyway, enough caveats. My starting point was this:

Inbox: 316

Action list: 50

Before going on vacation I cleared out both the Inbox and the Action List of everything that could be dealt with then. So, the starting point here is all the email accumulated over a short vacation and all the items on my to-do list that couldn’t be finished before the holiday started.

The data

Yesterday I spent a happy (!!) afternoon going through each email and either responding to it, deleting it, reporting it as SPAM or filing it. In a Google Docs spreadsheet I wrote down the sender (anonymously unless the sender was a company), sender type and action for each email or group of emails from the same sender. I say “email”, actually I mean “email thread”. So one email on my spreadsheet here could well mean a thread of many emails from various senders. However, what I’m interested in here is really the aggregate data from the 300 emails, which you can see on this table:

Aggregated data from 300 emails
So, there are two things I’m interested in here:
  1. Where is the email from? Is it from people I need to communicate with or from companies and others sending “news” and other updates that can be ignored or processed in a more convenient way, such as via an RSS reader. Obviously emails from colleagues (including external collaborators) and students are all important. Other senders vary considerably depending on the content of the email.
  2. How were the emails processed? Emails that were deleted or marked as SPAM are emails I don’t want to receive repeatedly, so are best unsubscribed from. Emails that needed real attention can be filtered to be marked as important if they aren’t already.

Where to emails come from?

330 emails broken down by sender type

330 emails broken down by sender type

So, thinking of this email as signal and noise, the signal here is email from students, colleagues, friends and open source projects. Of course, SOME of the other emails will be important too and will need some action too, but this is a rough guide. The total number of “noise” emails, according to the sender, worked out as 78 out of 316, or around 25%.

Now, 25% to my mind is astonishingly low. Given that most of the email that hits my account gets filtered out and never sees the Inbox in the first place, 25% is really not what I expected to see here. 

What happened to all those emails?

300 email conversations broken down by next action

300 email conversations broken down by next action

The other way to look at signal vs noise is how the emails were processed. The signal in this case is the emails that were actioned immediately or saved for working on next week, which was 73 out of 316 or just over 23%. That’s very close to the previous SNR, becasuse the sender of a message is a good predictor of its importance.

Again though, 23% is astonishingly low. The main culprit is web apps and social media apps that send frequent notifications, updates and other fluff. Often when you sign up to these things they subscribe you to all sorts of email alerts automatically, then it takes effort on your part to change your settings and unsubscribe. A better way to deal with this, if you use GMail, is to use the Gmail plus trick which allows you to filter out all these emails automatically.

A point about unsubscribing from mailing lists 

When you unsubscribe from an email alert you are informing the sender that you no longer wish to be contacted. The very LAST thing you then need is another email saying “Well done! You have unsubscribed” which you then have to deal with separately. Seriously, this is a terrible way to treat potential customers. Very few of the email alerts I unsubscribed to did this, but those that did really annoyed me.TripIt, Klout, SAA, Costa, the Electoral Reform Society and UCU: consider yourself mildly whinged at. Hurumph.

End point

Just for the record…

Inbox: 0

Action list: 89

Actioned immediately: 34

The take home…

This stuff is boring common sense. It’s motherhood and apple pie. You know it all already. So you’re doing this already, right?

  • Email is a huge sink of time.
  • Process email in batch mode, once or twice a day. Don’t let incoming emails dictacte your work schedule.
  • Unsubscribe to everything you can at the first chance you get. Better still, don’t sign up in the first place.
  • If you use GMail, use the Gmail plus trick.
  • If you sign up to a lot of web apps and different services with logins and passwords, keep confirmation emails in a specific folder or label (I use web-signups) so you can keep track of which services you already have an account for.
  • Filter and label emails automatically whenever you can. Don’t let anything into your Inbox that doesn’t need to be there (looking at you posts to mailing lists).
  • Learn the keyboard shortcuts on your favourite email client. Use them. Banish the mouse.
  • Deal with emails that can be dealt with immediately, immediately.
  • Keep a “next action” folder of emails that cannot be dealt with immediately. Don’t have them hanging around your Inbox making you feel guilty, nervous and demoralised.
  • Keep a sensible hierarchy of folders or labels to organise your email. Or use something like ActiveInbox.

Tagged: email, productivity

What errors does my Python module define and raise?

On StackOverflow someone asked a whileago whether you can find out what errors a module defines and throws.In Python, a function does not declare that it throws a particularerror object, so you need to look inside the module to see whatexceptions it defines, or what exception it raises. You can do this byreading the docs (RTFM!) but of course they may be out of date, orwhat have you, so an alternative is to use the Python API to do lookfor you.

Which errors does a module define?

To first find which exceptions a module defines, just write a simplescript to go through each object in the module dictionarymodule.__dict__ and see if it ends in the word Error or if it is asubclass of Exception:

def listexns(mod):    """Saved as: http://gist.github.com/402861    """    module = __import__(mod)    exns = []    for name in module.__dict__:        if (isinstance(module.__dict__[name], Exception) or            name.endswith('Error')):            exns.append(name)    for name in exns:        print '%s.%s is an exception type' % (str(mod), name)    return

If I run this on the shutils module from the standard library I get this:

$ python listexn.py shutilLooking for exception types in module: shutilshutil.Error is an exception typeshutil.WindowsError is an exception type$

That tells you which errors are defined, but not which ones arethrown. Of course, if the module has errors with funny names, or onesthat are not subclasses of Exception, then this code will miss them.

What errors are thrown by a module?

To find out what errors a module can throw, we need to walk over theabstract syntax tree generated when the Python interpreter parses themodule, and look for every raise statement, then save a list of nameswhich are raised. The code for this is a little long, but prettystraight forward, so first I’ll state the output:

$ python listexn-raised.py /usr/lib/python2.6/shutil.pyLooking for exception types in: /usr/lib/python2.6/shutil.py/usr/lib/python2.6/shutil.py:OSError is an exception type/usr/lib/python2.6/shutil.py:Error is an exception type$

So now we know that shutil.py defines the errors Error andWindowsError and raises the exception OSError and Error. If wewant to be a bit more complete, we could write another method to checkevery except clause to also see which exceptions shutil handles.

Here’s the code to walk over the AST, it just uses thecompiler.visitor interface to create a walker which implements thevisitor pattern from the Gang of Four book:

class ExceptionFinder(visitor.ASTVisitor):    """List all exceptions raised by a module.     Saved as: http://gist.github.com/402869    """    def __init__(self, filename):        visitor.ASTVisitor.__init__(self)        self.filename = filename        self.exns = set()        return    def __visitName(self, node):        """Only called from within a raise statement.        """        self.exns.add(node.name)        return    def __visitCallFunc(self, node):        """Only called from within a raise statement.        """        self.__visitName(node.node)        return    def visitRaise(self, node):        """Visit a raise statement.        Cheat the default dispatcher.        """        if isinstance(node.expr1, compiler.ast.Name):            self.__visitName(node.expr1)        elif isinstance(node.expr1, compiler.ast.CallFunc):            self.__visitCallFunc(node.expr1)        return

Tagged: lint, python, static_analysis

First Pachube Hackathon #pachubehack

The start of April saw the first Pachube (pronounced “Patch Bay”) hackathon in London, Lancaster, New York, Eindoven, Linz and Zurich. Pachube is doing some very interesting and innovative work quite closely aligned to some of the Internet of Things research we are doing in Wolverhampton.

The London event was great fun, and the nice folks at 01zero-one kept us going through the night with pleanty of coffee and pizza — geek fuel! It was good to meet up with some old friends and make some new ones. Our former student from Coventry, and first Nuffield Bursar Tim Churchard is now a founder at Ignoto Consulting and doing some excellent work in embedded systems design. Great to see one of our graduates doing so well in the current economic climate. It was also interesting to meet Sam the Techie, who I’ve followed on twitter for a while now. Sam is the brains behind the now famous Sukey app, which keeps demonstrators in touch with one another in real time. Sukey is probably one of the few IoT apps that has really been used in earnest by large numbers of end-users and it’s a great example of putting online data to real use. The folks behind Nanode were also there, making good progress on their web-enabled Arduino board, which looks like it’s going to be a great boost to hobbyist hackers. You can sign up to purchase one of the first runs of the board on the London Hackspace wiki.

The hackathon itself produced some really interesting applications. Rainycat came up with a great wearable based on the Arduino Lilypad called Yr in Ma Face! which discourages people from invading your personal space. I’d imagine it would go crazy on the London Tube network.

There was a great RFID app called Display Case which coupled simple real-world avatars with a vertical projector and real-time information:

The Nanode guys made quite a bit of progress and the Arkessa team came up with a nice Arduino hack to produce “banana graphs” with a Lego NXT (warning: charts may not be to scale :P ):

and we had a wifi-enabled umbrella, much like the Senz Umbrella, although possibly more likely to electrocute its users!

My own offering was Marvin the Paranoid Android, which Usman very kindly described as “inspiring”. Marvin is an attempt to create an affective interface out of some ostensibly boring data — the temperature, remaining battery power, and so on from my laptop. Affective interfaces are ones designed to elicit emotions from their users, and to my mind, that’s an important part of making the Internet of Things useful, and more than just a technical concern. Marvin is a chatbot interface, in the style of Marvin the Paranoid Android of Douglas Adams fame. You can find the data that feeds Marvin here and the chat interface itself is hosted here. The AI still needs a lot of work to be really usable, but you can already have suitably patronising and miserable conversations with Marvin, like this one:

 

Puny human: hi marvin

Marvin: Life, loathe it or ignore it, you can’t like it.

Puny human: what is your status

Marvin: Current battery capacity is 57667mWh out of a capacity of 57667mWh that’s 100% Currently at 56.0C. When I get to 107.0 I blow up. Currently working at 118.0% load. Call that reasonable working conditions? I don’t.

 

Rain said to me during the day “if you don’t document things, you lose them”, and that’s something I’m slowly trying to learn. This time around I made a simple slide deck to remind myself how the Marvin system worked, hopefully I’ll slowly get better at documenting this sort of work, although the fact that this blog post has come out so long after the hackathon shows I’ve got a way to go yet!

http://www.slideshare.net/snim2/marvin-the-paranoid-laptop-by-his-owner-snim2

 

 


Tagged: affectivecomputing, affectiveinterface, arduino, chatbot, douglasadams, hackathon, hackday, hacking, internetofthings, iot, marvin, pachubehack, pervasivecomputing, robot, usability, wearablecomputing