[obtained from Scientific American Blog, by Ed Yong]
Erez Lieberman Aiden is a talkative witty fellow, who will bend your ear on any number of intellectual topics. Just don’t ask him what he does. “This is actually the most difficult question that I run into on a regular basis,” he says. “I really don’t have anything for that.”
It is easy to understand why. Aiden is a scientist, yes, but while most of his peers stay within a specific field – say, neuroscience or genetics – Aiden crosses them with almost casual abandon. His research has taken him across molecular biology, linguistics, physics, engineering and mathematics. He was the man behind last year’s “culturomics” study, where he looked at the evolution of human culture through the lens of four per cent of all the books ever published. Before that, he solved the three-dimensional structure of the human genome, studied the mathematics of verbs, and invented an insole called the iShoe that can diagnose balance problems in elderly people. “I guess I just view myself as a scientist,” he says.
His approach stands in stark contrast to the standard scientific career: find an area of interest and become increasingly knowledgeable about it. Instead of branching out from a central speciality, Aiden is interested in ‘interdisciplinary’ problems that cross the boundaries of different disciplines. His approach is nomadic. He moves about, searching for ideas that will pique his curiosity, extend his horizons, and hopefully make a big impact. “I don’t view myself as a practitioner of a particular skill or method,” he tells me. “I’m constantly looking at what’s the most interesting problem that I could possibly work on. I really try to figure out what sort of scientist I need to be in order to solve the problem I’m interested in solving.”
It’s a philosophy that has paid dividends. At just 31 years of age, Aiden has a joint lab at MIT and Harvard.In 2010, he won the prestigious $30,000 MIT-Lemenson prize, awarded to people who show “exceptional innovation and a portfolio of inventiveness”. He has seven publications to his name, six of which appeared the world’s top two journals – Nature and Science. His friend and colleague Jean-Baptiste Michel says, “He’s truly one of a kind. I just wonder about what discipline he will get a Nobel Prize in!”
When I meet Aiden in Harvard, he is dressed casually in a jersey, chinos and trainers. He talks quickly but eloquently, at once relaxed and in deep concentration. The door to his office, lettered with “Aiden Lab”, opens into a room that feels more like a lounge. In place of benches and stools, there is a comfortable sofa, armchairs, several computers and a big TV. Aside from a pile of snacks, the space is notably Spartan. There are no photos on the walls. Three rows of shelves are largely empty. The desks are unburdened. It is as if the room, like the man himself, is uncluttered by the past.
Rather than specialising in any one area, Aiden takes the opposite tack. He naturally gravitates to problems that he knows little about. “The reason is that most projects fail,” he says. “If the project you know a lot about fails, you haven’t gained anything. If a project you know relatively little about fails, you potentially have a bunch of new and better ideas.” And Aiden has a habit of using his failures as springboards for success.
In 2005, Aiden was fascinated by the way we make antibodies. Antibodies are all very similar, but their tips – the bits that recognise invaders – are extremely variable. These are created through a genetic pick-and-mix – genes from three groups, each with many different members, are united together in one of a 100 million different combinations. These vast permutations provide the variety we need to counter a legion of threats from bacteria, viruses, parasites, tumour cells and more. “The immune system constantly creates genes on the fly that are specific to the things that show up in the body. It’s amazing,” says Aiden. His goal was ambitious but simple: catalogue these genes and sequence the human immune system.
He failed. “The problem is that all the genes are very, very similar,” he says. Sequencing genes isn’t like reading text from start to finish. It’s more like looking at isolated fragments of sentences, and trying to join them into the original narrative. If the sentences all contain roughly the same words, that task becomes very difficult. “At a certain point, we just realised that the data just wasn’t good enough. That was a disaster – it was 85% of my time for 18 months. That was an epic failure.”
But it was not a wasted opportunity. In 2007, Aiden’s interest in antibodies took him to an immunology conference, where he accidentally went into the wrong talk. In that unplanned meander, Aiden found the inspiration that would lead him to solve the three-dimensional structure of the human genome.
The speaker, Amy L. Kenter, was discussing the physical distances between our genes. Each of our cells has the unenviable task of folding a two-metre-long stretch of DNA into a chamber about a million times shorter in diameter. They do it by folding DNA into complex shapes, a feat of origami that often turns distant genes into close neighbours. In the talk that he wandered into, Aiden learned that these distances were very hard to calculate. People would spend up to six months on figuring out the distance between two sites. “It prompted a knee-jerk response,” he says. “I was completely convinced that what they were doing could be done better and faster.”
To speed up the process, Aiden invented a technique called Hi-C that simultaneously identifies neighbouring sites across the entire genome. First, he embalms the genome with formaldehyde. The chemical creates physical bridges between different pieces of DNA that lie next to one another, freezing the genome in all its twists and turns. Special enzymes shred the DNA, and the fragments are isolated, sequenced and mapped onto the reference copy of the human genome (watch Aiden illustrating the technique through the medium of dance). The result is a massive library of interacting DNA – a genetic social network. Aiden could then work out how the genome must have folded to accommodate these interactions.
He found something odd. Polymers – long chain molecules, such as DNA – tend to fold in predictable ways. They ought to form densely packed and knotted bundles called “equilibrium globules” ( video) – think a plate of cooked noodles, or headphones that have been left in a pocket for too long. But the Hi-C results weren’t compatible with this shape; they suggested that the genome was doing something different. At first, Aiden thought his technique had failed, so bizarre were the results. He started reading voraciously, absorbing everything he could find about polymer physics. And every source led to the same conclusion: his results seemed to violate established physical principles.
His breakthrough came in the dead of night. He discovered a paper by a physicist called Alexander Grosberg, who described a shape called a “fractal globule” ( video). It too is a densely packed bundle, but unlike the equilibrium globule, it doesn’t have a single knot. The strands may loop and twist, but they never cross and tangle. Aiden likens it to uncooked noodles – you can pull out one strand without disrupting the rest.
The fractal globule was first described by an Italian mathematician called Guiseppe Peano in 1890, but it was completely theoretical. It took almost a century for Grosberg to suggest (in 1988) that a real polymer might fold into a fractal globule if the conditions were right. In 2009, Aiden proved him right. “I read [Grosberg’s paper] and I immediately thought: this solves it!” The fractal globule makes perfect sense as the shape of a genome. With no tangles, any stretch of DNA can be easily exposed so that its information can be transcribed and used. “That was one of the most exciting moments of my life intellectually,” says Aiden.
As far as anyone knew, the fractal globule was a hypothetical shape that only existed in Peano’s imagination. Aiden showed that it exists inside every human that has ever walked the earth. He tells me with a wry smile, “One is not reasonably entitled to expect that one’s data is going to happen to be consistent with some age-old dead hypothesis that ends up being much more beautiful than the dominant idea. That’s just pennies from heaven.”
All of this came from the failed project on antibodies. Aiden’s cutting room floor is filled with equally ambitious dead projects on the evolution of Chinese iconography, or network analyses of people suing each other. In most cases, they simply became too boring to continue but rare cases like the 3-D genome really took off. “The best types of problems are those that seem harder at first than when you think about it. If you have ten such projects and one of them works, you’re good because lots of people think it’s astronomically unlikely the project would have worked and they don’t know you’ve tried ten of them,” he says.
“The failures very naturally lead to new successes and opportunities. That’s why it’s great to get a couple of failures under your belt in a new area. The immunology project was the first big genomics project that I really sunk my teeth into and all of the tools that I picked up during that failure turned out to be very useful in the 3-D genome sequencing.”
In many ways, the 3-D genome project epitomises many of the themes that run through Aiden’s diverse oeuvre. He has a great belief in the power of technological advances. “Much of contemporary science is really the length and shadow of the technology we apply,” he says. By inventing the Hi-C technique, he could ask questions about the genome that simply weren’t possible to answer before. “I’m always on the lookout for new methods that I think will open up whole new domains.” In particular, he likes to accumulate large sets of data without any preconceptions. “For me, seeing is believing. I rarely have any hypotheses when I start looking at a dataset. I’m just trying to see what features jump out at me.”
Aiden’s mindset runs in the family. His son, Gabriel Galileo, is just one year old and shares his father’s high-flying streak. “He’s figuring out the fundamental things that challenged humans. Billions of years to work out how to balance on two feet and he’s like, ‘Well, that’s Thursday.’”
As a child himself, Aiden learned the value of being curious and well-rounded from his father – a technology entrepreneur called Aharon Lieberman. “I spent many days and even summer months working with him in his factory,” says Aiden. “The idea that one could support oneself by making ideas a reality is one my dad always emphasizes. He gave me a lot of self-confidence. This helps, because when you suddenly change the subject in your work, all you take with you are your brains and your confidence in your own ability to figure things out.”
As an undergraduate, he studied mathematics, physics, and philosophy at Princeton. “My reasoning was that I would be able to figure out the universe and make all subsequent life decisions from first principles,” he says, grinning. “It was the kind of thing that makes sense to you in high school. Oh yeah, everything is going to reduce to quantum mechanics and you can work it out… Anyway, that was a disastrous failure.” And once again, the quest to “debug this failure” led to something interesting.
“It turns out you can’t work everything out from first principles, because it seems like a lot of things have happened and I didn’t know anything about the universe before I was born in 1980,” he says dryly. “So I thought I have to go and understand that stuff.” To do that, he spent a year at New York’s Yeshiva University studying for a Masters in History. He took classes going back in time from the present day, reading forward from ancient history (he can now read Aramaic), and stopping when the two streams met in the 17th century.
Eventually, Aiden returned to the sciences, securing a Masters in Applied Physics at Harvard and a PhD in Applied Math and Bioengineering at Harvard and MIT. But his foray into the humanities never left his side. His most ambitious project to date – culturomics – is very much a fusion of the so-called “two cultures”.
Once again, it began with a talk, this time by Steven Pinker. Pinker mentioned that while just three per cent of English verbs are irregular (such as ‘be’ or ‘do’), they are the most commonly used ones. All the ten most commonly used verbs are irregular. For Aiden, who had long been thinking about studying culture in a mathematical way, this piece of trivia was irresistible. Together with Jean Baptiste Michel, he charted the course of irregular verbs from the 9th century Beowulf, to the 13th century Canterbury Tales, to 21stcentury Harry Potter. They focused on 177 irregular verbs and found that they “regularise” with time, with rarer verbs falling in line more quickly. ( Hear him talk about his project in this early video).
More surprisingly, this road to conformity can be described by a very simple mathematical formula. Verbs regularise in a way that is “inversely proportional to the square root of their frequency”. If one is used a hundred times less frequently than another, it will become regular ten times as fast. If it’s used a million times less frequently, it will regularise a thousand times as fast. Based on how frequently a verb appears, you could predict when it will yield to regularity. ‘Read’ is unlikely to change to ‘readed’ anytime soon, but ‘burnt’ is rapidly being cast aside in favour of ‘burned’.
The result was fascinating, but scouring old books was an unenviable task. “The data collection took a year and a half. It was a huge pain and a Hail Mary because we never knew whether it would work,” says Aiden. “At the end of it, we said, we can never do this again.” Fortunately, they never had to. As the paper went to press, Aiden went back to his Middle English texts to check his facts and realised that, in the meantime, someone else had taken them out – Google.
In 2004, Google began digitising the world’s books, in an ambitious project that has since scanned over 15 million books from over 40 university libraries. This online corpus represents 12 per cent of all the books ever published, a massive electronic record of humanity’s culture. “On some level, this was phenomenally embarrassing,” says Aiden. “We realised our methods were so hopelessly obsolete. It was clear that you couldn’t compete with this juggernaut of digitisation.”
So instead of competing, Aiden and Michel decided to join them. Their pitch was simple: they would use the words in Google’s corpus to track the path of culture over time, just as palaeontologists use fossils to deduce the evolution of living things. Peter Norvig, Google’s Director of Research, was sold from the first meeting.
As the project’s merits became clear, the company’s commitment (and its funding) grew, but there were serious obstacles. “Midway through the project, Google gets sued by absolutely everybody,” says Aiden. “That’s not helping.” There were also problems with the data. In some cases, the scans weren’t clear enough and in others, the ‘metadata’, such as the dates of publication, were often inaccurate. This meant that words like “internet” would turn up well before such a thing was conceived.
It took a year to clean up the data and still there were imperfections. Eventually, Aiden and Michel restricted themselves to the third of the corpus – some 5 million books in six languages. They pulled out billions of individual words and phrases (“n-grams”) and tracked their frequency over time, compiling everything into a large data set that anyone can download and explore.
At the time, Aiden wrote, “Together, these furnish a great cache of bones from which to reconstruct the skeleton of a new science.” He named the science “culturomics” – the quantitative study of human culture. It was envisioned as the cultural equivalent of the human genome project – a treasure chest of data to be pored over by scholars or by more casual users, via Google’s popular n-gram viewer.
Michel and Aiden revealed culturomics to the world in 2010, with a paper that offered a tasting platter of the n-grams’ potential. It showed the expanding nature of the English lexicon and the evolving nature of its grammar. It shows “men” and “women” converging in frequency, new technologies permeating through culture with growing pace, and celebrities rising to ever-higher peaks of fame but falling from them more quickly. It even reveals the traces of suppression and censorship – “Tiananmen Square” is suspiciously absent from Chinese texts after 1989, as are Jewish artists and academics from German texts during Nazi Germany.
The new approach was eye-opening but it was inevitable that it would draw fire. “There were significant subgroups within the humanities that were up in arms,” says Aiden, “because there were no humanists or historians on the paper.” Such criticisms were perplexing to a man who regularly jumps from field to field. “[Qualifications] never even occurred to me as something that’s relevant,” he says. By contrast, when he published the 3-D genome paper, his most advanced degree was his masters in history. “No one gave a damn in the sciences!”
Other critics focused on the problems with the data, which users of the n-gram viewer discovered for themselves. Aiden finds that frustrating. “We said in the paper that there are huge issues with the data outside the 1800 to 2000 range, but it’s like if you get a TiVo or a Wii, you don’t spend time reading the instructions. You just want to play with it. My hope is that people who are doing this for serious purposes eventually get the value of the tool.”
Several people certainly have, and Aiden has countless examples that vindicate the project’s value in his eyes. “[Alexis Madrigal] at the Atlantic, instead of writing a column on the nuclear age, collected a bunch of n-grams about it. These things are so clear and visual and transparent that people get that this is a way for the general public to learn a bit of history.”
There have been more substantial uses too. Wikipedia compared the quality of their articles about scientists to how famous those scientists are, as measured through n-grams. “There’s a strong effect. People who are more famous have better Wikipedia articles. That’s a good control. It shows that their editors have a good sense of what’s important.” But the analysis found something more unusual. It suggested that female scientists have systematically worse articles than their comparably famous male peers. “People talk about the fact that 15% of Wikipedians are female and that has the potential to introduce so much bias into Wikipedia itself. You could speculate about that, but now you can measure and check it.”
Aiden isn’t done with culturomics. He and Michel are now visiting faculty at Google (“We have access to almost all their data so that opens up a lot of doors”). They have started a group at Harvard called the Cultural Observatory, with the aim of creating more powerful sets of data like the one that powers culturomics. And Aiden is even working on a musical version that looks at scores across time.
Once again, data quality is a big issue – musical scores are poorly annotated – but once again, Aiden’s experience in unrelated fields is yielding unexpected benefits. One of the technical challenges he solved while working on his failed immunology project turns out to be “identical” to a problem with annotating scores. “I’d seen that because I’d been in this other area and invested lots of time and knew about it.” These are the moments that justify his nomadic career. “If we’re in a room and we’re talking about X, the X specialist will know more about X than I do, but I’ll know more about not-X. Every once in a while, something that’s not-X turns out to be very relevant.”
This comes at an obvious price: it is hard to hit the ground running in a new area, and Aiden often finds himself playing catch-up. But to him, his broader horizons compensate for this drawback. “People have this romantic notion of inventors as people who go into caves and come out with an amazing thing that’s totally novel. I think a huge amount of invention is recognising that A and B go together really well, putting them together and getting something better. The limiting step is knowing that A and B exist. And that’s the big disadvantage that one has as a specialist – you gradually lose sight of the things that are around. I feel I just get to see more.”
Aiden’s approach harkens back to an older era for the sciences, when polymaths like Liebniz and Newton commanded respect in a variety of different fields. Such people are a rare breed in today’s world, where the widening frontiers of scientific knowledge funnel scientists into narrow specialist channels. The intellectual nomads are being squeezed out.
But Aiden senses that the balance is shifting, and the connective power of the Internet plays a large part in that. “Thirty years ago, you didn’t know what was going on in a different field and you did not have Google. It could take you months to figure out that an idea was a good or bad one. These days, you can get a good sense of that in a matter of minutes because information is much more accessible. That’s really, really huge. It makes it much easier to move from one field to another.”
The free flow of information not only makes it easier to work out which problems are available and tractable, it also makes it clear how many problems there still are, enough to fill a rich career of discipline-hopping. “I had this feeling out of graduate school that everything had been done,” says Aiden. “Now, I think, wow, we don’t know anything yet.”
This feature is longer than my usual Not Exactly Rocket Science posts, and the type of story I would normally try to place in a paying mainstream publication. For various reasons, it’s been difficult to do so, but I’ve been interested in Aiden’s work for a long time and I very much wanted to tell his story. So here it is, cross-posted over at the Scientific American guest blog. If you enjoyed this story, please consider a small contribution to the Not Exactly Rocket Science tip-jar.