Tuesday, August 16, 2011

No longer the critical path

Frequently research in theoretical physics is done in small collaborations of 2 to 5.  There are lots of different styles of collaboration.  One of these styles is to have everyone work simultaneously on every aspect  of a project.  This tends to work for more mathematical projects where there are steps that require mathematical inspiration.   In the age of the Large Hadron Collider, a lot of researchers are dealing with analyzing large data sets.  These projects seem to be more amenable to dividing up the labor and working on separate aspects.  This has the problem that if someone falls behind, they can be come the critical path -- their work must be completed before any further work is done.

The more senior you get, the more responsibilities you have and the more likely you are to become the critical path in a project if you sign up to do something substantial.  Most senior physicists seem to recuse themselves (for good reason) from taking roles where their other commitments may cause them to become the critical path.  I've done this on several projects.

On one of my current projects, I made the mistake of taking responsibility for a major technical aspect of the work.    It is actually a fun problem that can reduced to the following game illustrated in the figure below:
So there are a set of boxes you want to get that are numbered 1 to 5 in the figure above.  There are different methods of getting some of these boxes that are labeled A through D in the figure above.  Method A will get boxes 1, 2 and 5.  So the game is to find the minimum number of methods necessary to get all the boxes.  The example above requires 2 methods:  either A & B or C & D.    Not so bad.  At the very worse, you could try all the possibilities:

  • A, B, C, D  
  • A&B,  A&C,  A&D,  B&C, B&D, C&D 
  • A&B&C,  A&B&D,  A&C&D,  B&C&D, 
  • A&B&C&D  

15 possibilities = 2^4 -1.  You can see that this is the maximum number because you can choose to either use a method or not -- meaning that there are 2 choices for each method and the one where no methods are used is boring.

The problem I was trying to solve there were 6411 boxes and 8064 methods.  Well here, there 2^8064 -1 = 10^2400 possibilities. Even if we start going through the simplest searches by the time we get up to 15 or 20 necessary methods, it will take more time than the age of the Universe to complete even if we put all the computing power of the Earth on this problem.  

So I devised a simple genetic algorithms to solve this problem.  I had worked on a smaller version of this problem before, and I had hacked out a programming that did the job well.  But it was pretty kludgy and was requiring way too much work to run the code -- each one of those 6411 boxes and 8064 methods means something very important and trying to keep track of what box 2342 means or method 4721 entails was running into big problems.  I would have to spend hours reminding myself what all the conventions were and I was getting it wrong the first half dozen guesses I made.

This put a bit of a cramp into the project flow because I either had to devote 2 consecutive weeks of undivided concentration in order to not make a critical mistake.  Or I had to start over and rewrite the code so that the program kept track of all this information.   I was traveling extensively at the time and there was no way I could guarantee that type of dedicated concentration would arise in the near future (or ever!).  So that meant I had to throw away the kludge of the program (that worked damned well) and start over.

The right thing to do was to rewrite from scratch so that we all could trust the results.    This put me on an odyssey that caused me to take the 1100 line C program and transmute into a 3000 line Objective-C program.   If you know anything about the differences between languages, Objective-C allows for much more efficient programming!   The standard factor is 2.96 lines of Objective C is equal to 0.77 lines of C -- a factor of 4!  So this was a program that had 10 times the initial complexity.   However, the complexity was for the purpose of making the input and output human readable.   There was no chance that there would be a mistake.  I could ask questions such as "If I ignore all boxes of this type, how do the results change?" and do it easily and simply.  

Another bit of progress was that I was able to parallelize large chunks of the code so that instead of taking 20+ minutes to run, it would run in about 1 minute.
If you look at the red circle, it shows that the program is using 2382% of a CPU -- it is going 24 times as fast as as program running serially.    (This computer is a hyperthreaded, dual hex-core machine, so has 24 processors). 

So it all runs beautifully now and I'm no longer the critical path.

No comments:

Post a Comment