Meditations For Moses

Sunday, September 18, 2011

Jetset Lifestyle

Every now and again, I get that feeling of what it would be to look at my current life from the vantage point of my younger self. What would a 17 year old version of myself think of my current life? I had one of these moments when I arrived at the Venice airport yesterday morning.

I've been to the Marco Polo Airport a half dozen times over the past few years, usually to go to the International Centre for Theoretical Physics in Trieste. The airport feels very familiar and I'm quite comfortable with it. After landing, I waked straight out with my carry-on and computer bag and headed straight for the rental car station (noleggio auto). The woman manning the AutoEuropa/Dollar/Thrifty stall recognized me and proceeded to give me a better rate and said "you remember the way out to the cars?" To which the answer was "yes." I popped out and got my car and then began driving out without looking at a map. I knew all the turns and main cities, very much like the drive from SF to Palo Alto. Knowing that the main city is Udine, where the toll booths are, knowing the exit off the A4 on to the SS14 and knowing the blind turn down to the Adriatico guesthouse.

It's just a strange life being with dozens of cities around the world: New York, Tokyo, London, Florence, Chicago, Oslo, Geneva, LA, Istanbul, etc. There's no wonder or feeling of exoticness that there was a decade ago. It's hard to imagine before it happens.

I guess what is missing is the expected level of anxiety around traveling to some place half way around the world (okay only 9 timezones). After traveling for a decade, I'd gotten so familiar with having stress levels rise. The purpose of stress is to cause you to be more aware and vigilant -- Experiencing the narrowing of vision and the raising concentration and a general level of jumpiness. Instead that's all gone and life on road is comfortable.

It's a bit sad, because that sense of newness is one of the reasons people travel. I don't think it's gone for good, but I think I need to travel to new places to experience it and that list is beginning to shrink. Europe is a very small place to me now. I still have a couple of countries to axe off the list: Portugal, Ukraine, Belarus, Lithuania, Latvia, Moldova, Iceland, Malta, Cyprus. But that's getting to be short now. Africa and South America are still mostly unexplored and India I've still yet to go to. I should go to one of these places in the aftermath of tenure.

Friday, September 16, 2011

Learning Something New

As you get a bit more mature in your profession, things begin to get easier. You've seen most things before in some guise, you've had practice at dealing with the issues and most ideas arise from synthesis with other ideas. You can get really confident about things and have your ego inflate. You look at younger people and think, "What's wrong with these people, it was so easy for me!" Of course that's baloney.

As a way to keep myself in check, every now and again I take up a challenge to learn something new. A while back it was cooking. More recently it was improving my writing. Recently it's been learning modern programming.

I know how to program alright. I'm pretty good at C, comfortable with Objective-C, can write C++ in a jam. Java and Python look pretty straight forward to pick up. So after reading around for a bit, I found that the big new thing is Scala. Scala's big thing is that it is a fully-enabled functional programming language -- meaning that functions are "first-class citizens" -- i.e. you can pass functions in every way you can pass a variable. In functional programming languages the whole structure of coding changes so that you want to avoid thinking in terms of states, instead you should think in terms of operations. This is a big change, because if you're familiar with object oriented programming, you'll know that the whole idea is built around having states that you're keeping good track of.

Scala is also an object oriented language too, built on top of Java. Of course, I don't really know Java -- most importantly, all the standard libraries and classes. While this is considered a great feature of the language for most computer scientists, for me, it's a bit of a block trying to figure out what the Java-like object oriented syntax is doing at the same time as learning the truly bizarre looking functional programming.

I have to say that the functional aspects of the language are so bizarre and non-intuitive that I had to give up and go back to more back to more elementary concepts. I didn't know this beforehand, but there were two different types of paradigms for programming -- the Turing-style of state-ful programming that we're all familiar with and the Church-style which is what functional programming descends from. There's a whole mathematical branch of computer science called Lambda Calculus and Combinators that is required to really understand functional programming. And so I'm back at basics staring blankly at definitions and trying to decipher what they mean.

I definitely see progress and it's great because it's opening up whole new ways of thinking. But it gets me back to the start of feeling so comfortable with your knowledge because you're restricting yourself to a small playground. I feel like I'm 18 years old and in real analysis for the first time again and feel real stupid -- which is a good feeling to know that there are major intellectual challenges out there.

Friday, August 26, 2011

Hate getting old... less time and gotta do things right

I've been working on a research project with two postdocs and a graduate students for almost a year now. Usually research projects take between 3 and 6 months, so this one is definitely dragging. We've had some definite breaks in progress along the way. These breaks usually occur when one of us is finishing another research article. So this long term project has been a major hassle. At two points along the way I became the critical path. Each time it slowed down the project by a month to a month and a half.

So all four of us have finally put this article on top of the list of things to do. Well, unfortunately, we've waited so long, that the world has changed out from under us. Some of the Monte Carlo calculations we have performed simply do not go far enough. We also left off a few cases due to an oversight. This is a big pain now because we have to go back and redo certain aspects of the project that had been done for 6 or 8 months. It's always a pain.

When I was younger, I was much more impatient and this sort of thing would have gotten me into a tail spin. I would have procrastinated and hemmed and hawed and would try to figure out a way of getting around doing the work. Now that I'm older, I find that it's better just to dive straight in and do it right. Don't try to dance around the issue. I usually find that the more you try to avoid doing the problem, the longer the ultimate task will take. These problems are a big deal because it will require a week of solid running on a computer cluster and essentially brings progress to a temporary halt again.

The only reason that it is not a complete disaster, a disaster big enough to cause us to potentially abandon a year's worth of research, is that when I was the critical path of the project, instead of hacking something together, I wrote code that was right. It is flexible enough to allow us to change our Monte Carlo data without any changes to the code.

Unfortunately, this put the critical path back on to my graduate student. I'm really impressed with the graduate student on the project. He's really grown throughout the years I've worked with him. He used to be impatient, but now when something comes up like this, he just dives straight into it to solve the problem. Ultimately, this will lead to the article being better and more useful.

But the upshot is that, letting a project stagnate causes many times more work. Doing something right is rarely glamorous at the time and it requires the will to push through the hurdles.

Thursday, August 25, 2011

Searching in the Dark

At the LHC we are searching for new laws of physics, new phenomena that we have never seen before. When speaking to people, I always get the feeling like they don't really grasp what I'm trying to do. The tried and true analogy is an explorer. But I think it's a bit more than that.

I'd like to take this analogy a bit further. Imagine that we are 15th century explorers setting foot in the New World for the first time. Now instead of finding a passage to the Orient, King Ferdinand and Queen Isabella wants us to find extra-terrestrial species. In this alternate Universe, the Church has reason to believe that there are creatures descending from outer space and they want to discover them. Since these alien species haven't been discovered, they don't have a great idea what they look like.

The challenge is that there are a lot of different species of plants and animals that are from this world, but look rather different than what we've seen before. So the challenge is to not have too many false discoveries. The other challenge is that we don't know what alien species look like! We've never seen one before. Now as a conquistador tromping through the Amazon, you are overwhelmed with the possibilities, but and you're searching through unfamiliar land and searching for something that you can't quite identify. It's a real challenge to even define how you go about starting!

Now one way to go about searching for aliens, is to listen to the friars at the monastery back in Espanga who have their own opinions about what extra-terrestrial life looks like. The problem is that their opinions are driven by almost certainly wrong assumptions. Furthermore, every friar you ask has a different opinion. You'll end up with tomes of drawings and the chances that any one of their pictures of a space alien will match up with a real space alien hiding out in the Amazon are zero to nil.

Another way is to make a list of all normal species and then systematically add to it all the normal species that you can identify one-by-one, a long and arduous process. This will definitely discover an alien while if they're out there, but it may take years to go about doing it. Furthermore, since you haven't specified what an alien could be, you might accidentally sort an alien into your dictionary and it might take decades to sort itself out. (Think of the Burgess Shale and the delay in discovering the Cambrian Explosion).

Yet another way is to take the long list of aliens that your friars created and deconstruct them down to their bare anatomies -- find the essential commonalities between all the different models of aliens. You find the essential features that makes something an extraterrestrial and look for that alone, rather than being distracted by superfluous specifications that the friars came up with. May be it's the green blood. Or may be the 3 arms. This approach isn't perfect because you still have to rely on the friars, but you're looking at coarser features than their elaborate drawings that they come up with. This approach is also much faster because you don't have to understand everything before you can start looking for aliens -- after all, falling out of favor with the King during the Inquisition isn't completely ideal *wink-wink*.

So these three different approaches are being used at the LHC to search for new physics. The first approach is a model dependent search. A theorist (one of the sage, but insane friars) comes up with a model that should be searched for. The experimentalist (the conquistador) then has to go and look for that wonderful model, even though it has almost no chance of being precisely right. The next theorist comes along with a slightly different model and the experimentalist has to do a similar search.

The second approach is a model independent search. Here, you take in all the data and you just make sure that it agrees with the Standard Model of Particle Physics prediction. The problem here is that it is really difficult to know what the Standard Model predicts. You have the possibility of normalizing away signals, particularly if you aren't looking for anything in particular.

The final approach is one that I've been championing for several years. Here we strip out all of the details of models and get the essential aspects of the theories down to their cores. We call these stripped down theories Simplified Models. These Simplified Models capture most of the details of full theories, but captures many different theories at the same time. Here you know what you're looking for, but are doing so in a general enough way that you may discover new physics even if you haven't explicitly proposed the correct model, but were in the right ballpark.

Right now Simplified Models have gained a lot of traction at both experiments at the LHC. I'm about to put out a paper soon extending the set of Simplified Models that should be considered. I was also invited to a workshop on the Epistemology of LHC Physics in Aachen, Germany in January, 2012 to speak about what Simplified Models imply for the philosophy of physics.

Tuesday, August 23, 2011

More Regression to the Mean

I wrote about regression to the mean earlier this week:
http://jaywacker.blogspot.com/2011/08/regression-to-mean.html
This is what happens when you have a strange result, for instance flipping a coin 6 times and getting all heads, but if the coin is fair, you'll find that eventually, you'll get back to a 50:50 heads to tails ratio.

In particle physics, we have to read tea leaves a lot. It can take a long time for a result to regress to the mean, in some cases it can be a decade wait. So it's useful to understand how a result regresses back to the mean.

So let's consider trying to measure if a coin is fair or not. To do this, we need to have a mathematical model of an unfair coin. My unfair coin model is defined in terms of one parameter, r, that ranges from -1 to +1. It describes the probability that a coin will end up heads or tails. The probability for for a coin to land heads is

pH = (1+r)/2

while the probability for tails is

pT = (1-r)/2.

If r=1, then the coin will always end up heads and if r=-1 it will always end up tails.
The binomial expansion is the easiest way to compute the probability of getting M heads in N flips. We just multiply out:

P(N) = ( pH Heads + pT Tails)^N

and grab the coefficient in front of the Heads^M Tails^(N-M) term which is:

P(M;N) = (N choose M) pH^M pT^(N-M)

Now comes the fun part where we can do some science. If we get a result, say 6 Heads and 0 Tails, we can say, our coin had an unfairness parameter, r, would I get this many heads (or more) greater than 5% of the time. The range of r where this state meant is true is known as the 95% Confidence Limit.

With r=0, we would find this result would only occur 1.5% of the time, therefore, we say "that at the 95% confidence level, we have excluded r=0." In fact, we'd say 98.5% confidence level, but 95% confidence level is a standard ruler.

Okay, so now we can say, what is the minimum value for r that is consistent with 6 Heads and 0 Tails at the 95% confidence level. And the answer is r=0.21. We can reverse the requirement and say, what is the maximum r so that we would get 6 or less Heads 95% of the time. Well that is easy, r=1 the maximum value, and r=1 is certainly consistent with getting 6 Heads in a row. So we can plot the 95% confidence interval below from 0.21 < r < 1.0 :

Of course, if it was a fair coin, 1.5% of the time, we would get 6 Heads and 0 Tails -- not that rare if we flip a lot of coins. So we'd want to keep on flipping and watch how the number of Heads and Tails evolve. So let's flip the coin another 6 times and then flip it another 12 times after that. So we'll double the data and then double it again. The following diagram illustrates the regression

So the green band corresponds to the 95% confidence level on r for the initial measurement, the blue bands correspond to the 95% confidence level for some selected outcomes of the next 6 flips (added together with the first 6) and the red bands correspond to some selected measurements of the following 12 flips (added together with the first 12 flips). The thickness of the arrows illustrate how probable an r=0 (i.e. a fair coin) will take each path. The thicker the line, the more probable. The percentages are shown in the inset box. The red shows the most probable trajectory for an r=0.0 coin while the blue illustrate the trajectory for an r=0.62 coin.

The thing to notice is that when we double the data, we don't get back immediately to the mean, but that it slowly begins to back off the exceptional result. So what we see is that we had excluded r=0 at the 95% confidence level with the first measurement, but that the most probable outcome of the next measurement will be consistent with r=0.0 and further measurements make the most probable value for r smaller and smaller. Obviously it's possible to fluctuate up again, but it's not common (as common as downward fluctuations).

In contrast, lets look at an r=0.62 unfair coin.

What we see here is that the 6 Heads and 0 Tails was consistent with r=0.62, in fact it occurs 29% of the time. Now as we add more data, we accumulate higher values of heads and tails and the most probable values of heads and tails are more and more inconsistent with r=0. The regression to the mean of r=0 is incredibly unlikely.

So what does this have to do with physics? Well, we are trying to discover r!=0 and so we look for exceptional results. We frequently get measurements that occur only 1.5% of the time. If you do 1000 measurements, you expect 15 such results by chance alone even if r=0 exactly. So what's important after an anomaly is seen the next measurement. If you see a regression towards the mean of r=0, then the original anomaly was probably just a fluke. On the other hand, if the anomaly keeps on growing, then we're in business!

With the Higgs anomaly at EPS, we saw something akin to 6 Heads and 0 Tails. At Lepton Photon we saw that anomaly develop to 9 Heads and 3 Tails. Now that occurs 7.1% of the time in an r=0.62 coin, but occurs 21% of the time in r=0 coins. So we can release our breath and take a more wait-and-see approach. If on the other hand, that anomaly had developed to a 12 Heads and 0 Tails (or even 10 heads and 2 Tails), then we'd be in a different situation.

Monday, August 22, 2011

Regression to the mean...

Statistics is important in physics. The primary reason is that quantum mechanics says that you can not predict the outcome of any single experiment, but only the distribution of outcomes. This sounds very mysterious and bizarre, but it's not that unfamiliar. Let me describe a toy example of a quantum mechanical theory: a coin flip.

So you can understand everything about your coin, but if you flip the coin high enough, you won't be able to predict whether it will land heads or tails on any given flip. However, if you study the coin in enough detail, you can determine what fraction of the time it lands heads or tails. If it's a fair coin, then it will land 50% heads and 50% tails. I'll use a fancy name -- the fair coin hypothesis -- to describe the assumption that the coin is fair.

So let's say we want to test the fair coin hypothesis. How would we do it? Well you'd start to flipping a coin and start logging the results. Now if it's a fair coin, you can compute the probability that you'll get M heads in N flips of a coin:
P(M; N) = (N choose M) / 2^N
where (N choose M) is
M!/(N! (N-M)!).
If you work really hard you can show that this formula gives a normal distribution centered around M = N/2. Cool. (If you want to prove this yourself, use the Stirling approximation or see: Quora answer)

So how do we use this to test a hypothesis? Well we can start asking, "What is the probability that we have a fair coin given our results?" This is a fairly straight forward question, but not quite precise enough. A better question is, "If we have a fair coin, what is the probability that we get as exceptional of a result as we observed?" So let's take an example of 6 coin flips. The probability we get any given result is

6 Heads or 0 Heads: 1.56%
5 Heads or 1 Heads: 9.38%
4 Heads or 2 Heads: 23.4%
3 Heads: 31.3%

If we saw 5 Heads and 1 Tail, we would say that a fair coin would give this exceptional of a result 21.9% of the time. Which is 5 Heads, 1 Heads, 6 Heads, 0 Heads. So about 1 out of 5 times we would get such this weird of a result... which is pretty common. Now if we saw 6 Heads and 0 Tails we would say that a fair coin would give this exceptional of a result 3.13% of the time (6 Heads or 0 Heads are as exceptional as this). This is getting pretty rare. We might start to get suspicious that the coin isn't fair. Of course we don't know that it's not fair, this type of result happens occasionally.

There is a theorem in statistics that goes under the name regression to the mean. What this expression says, is that if you get an exceptional results -- say 6 Heads & 0 Tails -- that if your hypothesis is right and you keep on performing the experiment, you'll find you'll get the average result eventually.

So if we flip the coin another 6 times we'll have the same probability as above, but we need to add the new results to our old results and we get

12 Heads: 0.024%
11 Heads: 0.26%
10 Heads: 1.6%
9 Heads: 5.4%
8 Heads: 12.1%
7 Heads: 19.3%
6 Heads: 22.4%

Now given if we get 5 or 6 Heads in the second round, we'd start to be getting pretty suspicious. There is only a 0.5% chance of getting 11 or 12 Heads in 12 flips. However, if we got 4 Heads, we'd still have a fairly exceptional result of 10 Heads in 12 flips, but 3.7% of the time we'd get that, which means that we doubled our data and the exceptional result didn't get more exceptional. If we got 3 Heads in the second round we'd be up to a 14.1% chance of getting this result -- the result would be less exceptional. If we got 0, 1 or 2, we'd be back to the mean.

If the first result wasn't a fluke, the further results would begin to accumulate around 12 or 11 flips if the coin wasn't fair.

So what does this have to do with the Higgs boson? Well at the EPS, the experiments at the LHC saw a 1 in 1000 fluke that could be interpreted as a Higgs boson. A 1 in 1000 fluke sounds weird, but we do a lot of measurements at the LHC, so flukes happen just by chance. So we usually set the bar for discovery very high: a 1 in a 1,000,000 fluke (in terms of standard deviations, 5 sigma). The only way to get that is to take more data and see if the result becomes more exceptional.

So today at the Lepton Photon conference in Mumbai, the experiments added in 60% more data. If the 1 in 1000 fluke was real, it should begin to become closer to 1 in 10,000 fluke rather than staying a 1 in 1000 fluke or reducing to 1 in 100 fluke.

We saw that the data stayed being a 1 in 1000 fluke. This is kinda like the example of getting 6 Heads in a row and then in the second batch getting 3 Heads and 3 Tails -- the most typical outcome for a fair coin and the combined 9 Head and 3 Tails isn't that unusual and it makes it seems like the result is regressing to the mean.

Friday, August 19, 2011

Releasing Into the Wild

It's always exciting to put out a paper for the world to see. In high energy physics we can do it very quickly through the arXiv (http://arxiv.org) which was created back in 1991 as a preprint server. Journals are an intrinsically slow process. You submit a paper, an editor sends it out for review, then a referee writes a response, the authors make subtle changes (perhaps iterate the last two steps a few times), the paper is accepted, the journal typesets the article, the proofs are sent back to the authors, typos are found, the proofs are corrected and the article is put in for printing. If all goes smoothly, this is 4 months from when you submitted the paper to when it appears in print. In most circumstances it's closer to 6 or 8 months -- an eternity to communicate ideas. To avoid this delay, physicists used to send around packets of preprints to different groups (by mail!). This was pretty effective, every week a packet of preprints would arrive and people would spend the day reading new articles. Of course this created a pretty terribly hierarchical system of information availability. Are you going to send off your preprints to the small universities? More importantly, to every institution in China, India and Russia? Thus a preprint server was made shortly after the invention of the WWW. This allowed people to submit preprints of papers and let physicists around the world download them for free. A great democratization of information. These days at 5pm Pacific Time, papers are released to world Sunday through Thursday. About a hundred papers a day are released in physics. In terms of propriety, whoever gets to the arXiv first, wins.