Tuesday, November 17, 2015

Discerning Meaning From a Sample Size of One

Way back when I was in college. Nate Jenkins made a comment on his blog about how each athlete, marathon runner specifically, is a study of one, someone who will respond slightly differently to the training as anyone else. As a master of science (and I do have my M.S.) it has always stuck out to me because the scientific part of me thinks, 'what can you possibly learn based on one example?' Yet another part of me knows that to say you can't learn anything from a sample of one does not make sense because that is how we have learned everything. At one point Thomas Edison, or probably one of his assistants, turned the light bulb on and it didn't burn out in a few seconds, or even a few minutes, it kept glowing for 13.5 hours. It was only a sample of one, but it was a breakthrough. 

While I could talk about Patrick Makau, Wilson Kipsang, or Dennis Kimetto, the last three marathon world record holders, who don't have any Olympic medals between the three of them because they are all relatively up and coming, and I could say how each one is a study of one with little to teach sedentary Americans about running, those three are a huge study in team work, training, and Kenya having an environment for the marathon. However, Kenya is about to get a number of athletes exposed for doping over the next couple months, so I won't talk about running tonight. I just wanted to say that as a middle class American I highly doubt I can replicate the conditions for anyone in the U.S.A. to have the same background as each of them. However, they all run lots of miles, and run hard workouts, which is a good place to start training to be your best. No, I'm not going to write about running today, but rather engineering. 

I have had four days of testing in the cold room at -40 degrees C/F this year. Two in May and two just recently in November. While preparing for these tests one of the comments was how, "each test is different, you can't conclude X, Y and Z from one test." Now that I have been through four of these I see what he meant. Do you know what oil does at -40 degrees? Probably not, because no one really does, if we knew how to analytically model it, instead of testing in the cold room, we would. Every test has presented some surprise. I won't divulge the specifics, but like the title of the article how can you say X works or does not work from a one hour long real time test? It can work for ten minutes, then quit working. Another example of small sample sizes I really like, the space program, specifically Mercury, Gemini and Apollo. Every mission was trying something that had never been done before, and if they got it to work, once, it was a huge success. In hindsight we only hear about the successes, and maybe Apollo 1, but space walking was a huge challenge on Gemini until Buzz Aldrin trained in a swimming pool. Apollo 11 had a computer malfunction giving an error as it became overloaded. Apollo 14 almost didn't land on the moon because of a short circuit in the lander that could have triggered an automatic abort. I'm getting off topic, but the point is to show that just because we landed successfully on the moon once doesn't mean that future landing were risk free at all. 

Of my four tests, they always happen in pairs, a baseline and a prototype test, so I have really only tested two new designs. In May we failed, the results were inconclusive due to a number of factors so we went back to the drawing board and asked the experts and came up with a new design. The new design "passed" but there were a couple of issues we encountered during testing and while the results still seemed to show the new design was better than the old design, there is no one waving a flag and cheering saying "This is the answer! You solved the problem!" Partly because people don't do that often in engineering, but also because we all know that minor variations in one test could affect the future production reliability. 

If I was more versed in statistics I would enjoy talking about Weibull slopes and probabilities, but I am not an expert there. There is always the chance that this test was of the best possible situation and you won't pass again. Typically situations in real life are more similar than they are different, which is very fortunate, but we engineers still spend much of our time working on the rare cases and trying to discern the two factors that contributed to failure in that particular instance. 

What can you conclude from a sample of one? It worked that time, it may work again, under the same conditions. That's a positive spin on it, but an accurate view too.

No comments:

Post a Comment