Consulting for Entagen; HMMs and BioPython

Excited to be starting a new gig on Monday consulting for Entagen, a Boston-based biomedical consulting firm. Initially I’ll be on a three person project for a major pharma. The project involves bioinformatics as well as web app development. (Entagen is a Grails shop – I’ve been a fan of Grails since using it at InnoCentive, have wondered why it hasn’t gotten more traction compared to Ruby on Rails, given the power of the underlying JEE platform.) To provide a business vehicle for the consulting, I’ve created a “sole proprietorship” named “Twenty Geese Software.” The name “Twenty Geese” was inspired by this excerpt from Homer’s Odyssey in which Odysseus’s wife Penelope has a mysterious dream featuring the geese. Happily the domain name twentygeese.com was available. In the Odyssey, the metaphorical geese come to an unhappy end, but I won’t take that as a bad omen for the business :). “Shining tusk” was another possibility, but I like “twenty geese” better.

As a warmup exercise during some downtime, I’ve dug into Hidden Markov Models (HMM) and the Viterbi algorithm, a versatile tool used for diverse applications such as speech recognition and DNA sequence alignment. I’m doing a little work now on the HMM feature of the BioPython open source project, e.g., adding support for non-ergodic models and for specifying initial state probabilities. HMMs are interesting and powerful, hope to learn more about them in the course of the Entagen work.

IKEA “Fredrik” desk for standup computing: cheap but wobbly

Now that I’m working remotely, I need a real home office. At OPOWER, my previous employer, I switched to a standing desk and loved it. After lots of sitting at home, my back is now acting up again, so I decided to get my own standing desk. OPOWER splurged on adjustable standing desks from Dynamic Business Interiors. These desks are fabulous, but would cost me over $1,000 installed! Fuhgeddaboudit, I haven’t raised $50M of VC money (yet 🙂 so my budget is tighter. I went to IKEA and found a solution: the Fredrik computer workstation. Now that’s it’s all set up, I’m a bit disappointed. The desk is cheap – just $150! – reasonably attractive, has plenty of room for both my 17″ MacBook Pro and a 24″ monitor, but there is one major defect, wobbliness. IKEA saved on materials by using only a single metal support on each side. As a result, unless I type very gently, the monitor wobbles noticeably. So I’d buy it again, but if my consulting business takes off I’ll consider replacing it with something more substantial.

Notes for prospective Fredrik buyers: you can set up the main shelf that will hold your keyboard at a good typing height – there are slots for the shelf about 4″ apart all the way up the supports. But having chosen your configuration, be prepared to live with it for a while; in order to reconfigure the shelf height, you would need to almost completely disassemble/reassemble the desk. Also be aware that you should assemble the shelves and supports from the bottom up. The directions show the main shelf being added first, but it’s really the bottom-most shelf that should go in first. Otherwise you’ll paint yourself into a corner, as I did, because it’s hard to insert cross-pieces in the middle when the structure is rigidly connected. (I managed to wiggle my way out of this problem with only partial disassembly of the bottom, but at the cost of some ear-ringing crashes when some metal pieces fell down during my first few attempts.) Good luck!

Cassandra: lazyboy

Lazyboy is a “wrapper for Cassandra’s Thrift client API, written in Python. It aims to make working with Cassandra datastores painless from Python.” Lazyboy provides higher-level record and view classes that are a lot more pleasant to use than working with Thrift directly, so this looks promising.

Started experimenting with the examples and ran into problems. Drew Schleck kindly provided fixes for some problems, I was able to fix the rest and get it working. Required learning about git and github, an entertaining diversion. I didn’t dig deep enough to completely get git, but I’m impressed so far. If you’re an svn guy like me, git can be confusing at the outset, recommend Git – SVN Crash Course which was very helpful.

Git – SVN Crash Course

Day 2 fiddling with Cassandra

Now nosetests fails:

ERROR: Failure: ImportError (No module named thrift.transport)

Found another blogger who has already gone down this road:

Installed Python module (thrift) not being picked up

See above for the full story, the short answer is I need to add this line:

export PYTHONPATH=/usr/lib/python2.6/site-packages

to ~/.profile so that the Python search path is always up to date.

Next problem: nosetests is failing with ERROR: system.test_server.TestMutations.test_bad_calls. Solution: kill off a stale Cassandra process that was interfering with the tests (thanks go to Jonathan Ellis and Michael Greene for quick help with that). Even better solution would be to have the test detect the stale process and exit with a clear warning – maybe explore that later.

Cassandra: getting started as a developer

Cassandra is very cool: a “highly scalable, eventually consistent, distributed, structured key-value store” open-sourced by Facebook. I decided to try becoming a Cassandra contributor. This blog posting concerns some of the mundane details of getting the development environment set up on Ubuntu. Later posts should get into higher-level topics. The details are boring, but could be useful to someone else embarking on the same path, so I’m posting them here.

First step: get Cassandra up and running on my Ubuntu laptop via Getting Started. That was easy.

Next step: check out the Cassandra source code and follow the steps in HowToContribute. That went fine until step 3 under “Setting up and running system tests”:

from trunk/interface/, run thrift -gen py cassandra.thrift

Problem (following step 3 above): bash: thrift: command not found.

Solution: Cassandra uses the Thrift interface. Need to download Thrift source and build/install thrift exe. The rest of this post is concerned with problems I encountered in trying to build thrift on Ubuntu 9.04.

After wget’ing the Thrift source code and expanding the tarball, I tried ./bootstrap.sh per the instructions linked to above. Error: can’t find autoscan. No worries, let’s do sudo apt-get install autoscan to get the autoscan package. Oops, there is no autoscan package for Ubuntu. Dug around and found that the package to get is automake, that includes autoscan as a component and pulls in autoconf, another essential package, as a dependency. So install automake and move on.

Next missing dependency: libtool. So:

sudo apt-get install libtool

Now ./bootstrap.sh finally works, but ./configure still errors out. Need:

  • libboost1.35-dev – installed this via Synaptics Package Manager GUI rather than apt-get, chose this version because it’s the supported one for this version of Ubuntu
  • install bison and flex packages to get yacc and lex, respectively

Then I was finally able to run the classic sequence make and make install.to build the thrift exe and install it in /usr/local/bin.

Important: when you install packages to recover from a configure failure, always runs configure again before moving on, to clean up debris.