▶ Searching for Meaning In the Digital Deluge
▶ Statistician is the next ‘sexy job,’ one economist says.
By STEVE LOHR MOUNTAIN VIEW, California
AT HARVARD UNIVERSITY, Carrie Grimes majored in archaeology and ventured to places like Honduras, where she studied Mayan settlement patterns by mapping where artifacts were found. But she was drawn to what she calls “all the computer and math stuff” that was part of the job.
“People think of field archeology as Indiana Jones, but much of what you really do is data analysis,” she said.
Now Ms. Grimes does a different kind of digging. She works at Google, where she uses statistical analysis of mounds of data to come up with ways to improve its search engine.
Ms. Grimes is an Internet-age statistician, one of many who are changing the image of the profession as a place for dronish number nerds. They are finding themselves increasingly in demand - and even cool.
“I keep saying that the sexy job in the next 10 years will be statisticians,” said Hal Varian, chief economist at Google. “And I’m not kidding.”
The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore, as diverse as sensor signals and surveillance tapes or social-network chatter and public records. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.
Yet data is merely the raw material of knowledge. “We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfs son, director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.”
The new breed of statisticians tackle that problem. They use powerful computers and sophisticated mathematical models to hunt for meaningful patterns and insights in vast troves of data. The applications are as diverse as improving Internet search and online advertising, culling gene sequencing information for cancer research and analyzing sensor and location data to optimize the timing and handling of food shipments.
Statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than university degrees. So data sleuths come from backgrounds like economics, computer science and mathematics. The data-handling experts are so highly valued these days that a new word - “datarati” - has been coined to describe them.
They are certainly welcomed in the United States government. “Robust, unbiased data are the first step toward addressing our long-term economic needs and key policy priorities,” Peter R. Orszag, director of the Obama administration’s Office of Management and Budget, declared in a speech in May. Later that day, Mr. Orszag confessed in a blog entry that his talk on the importance of statistics was a subject “near to my (admittedly wonkish) heart.”
I.B.M., seeing an opportunity in data-hunting services, created a Business Analytics and Optimization Services group in April. The unit will tap the expertise of the more than 200 mathematicians, statisticians and other data analysts in its research labs - but that number is not enough. I.B.M. plans to retrain or hire 4,000 more analysts across the company.
The data surge is transforming a profession that traditionally tackled less high-profile and less lucrative work, like figuring out life expectancy rates for insurance companies and calculating crop yields in agriculture.
Ms. Grimes, 32, got her doctorate in statistics in 2003 from Stanford University in California and joined Google later that year. She is now one of many statisticians in a group of 250 data analysts. She uses statistical modeling and prediction to help improve the company’s search technology.
For example, Ms. Grimes worked on a statistical algorithm to fine-tune Google’s crawler software, which roams the Web to constantly update its search index. The model increased the chances that the crawler would scan Web pages that are regularly updated and would make fewer trips to pages that change less frequently.
The goal, she explained, is to make tiny gains in the efficiency of computer and network use. “Even an improvement of a percent or two can be huge, when you do things over the millions and billions of times we do things at Google,” she said.
It is the size of the data sets on the Web that opens new worlds of information and discovery. Traditionally, social sciences tracked people’s behavior by interviewing or surveying them. “But the Web provides this amazing resource for observing how millions of people interact,” said Jon Kleinberg, a computer scientist and social networking researcher at Cornell University in New York State.
The rich lode of Web data has its perils, experts warn. Its sheer volume can easily overwhelm statistical models. Statisticians also caution that strong correlations of data do not necessarily prove a cause-and-effect link.
For example, in the late 1940s, before there was a polio vaccine, public health experts in America noted that polio cases increased in step with the consumption of ice cream and soft drinks. Eliminating such treats was even recommended as part of an antipolio diet. But the real link was that polio outbreaks were most common in the hot months of summer.
If the data explosion magnifies longstanding issues in statistics, it also opens new frontiers.
“The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd,” said Daniel Gruhl, an I.B.M. researcher whose recent work includes mining medical data for clues to improve disease treatment. “And that makes it easier for humans to do what they are good at - explain those anomalies.”
Statisticians like Carrie Grimes, who works for Google, are in demand to analyze the mass of data on the Web.
댓글 안에 당신의 성숙함도 담아 주세요.
'오늘의 한마디'는 기사에 대하여 자신의 생각을 말하고 남의 생각을 들으며 서로 다양한 의견을 나누는 공간입니다. 그러나 간혹 불건전한 내용을 올리시는 분들이 계셔서 건전한 인터넷문화 정착을 위해 아래와 같은 운영원칙을 적용합니다.
자체 모니터링을 통해 아래에 해당하는 내용이 포함된 댓글이 발견되면 예고없이 삭제 조치를 하겠습니다.
불건전한 댓글을 올리거나, 이름에 비속어 및 상대방의 불쾌감을 주는 단어를 사용, 유명인 또는 특정 일반인을 사칭하는 경우 이용에 대한 차단 제재를 받을 수 있습니다. 차단될 경우, 일주일간 댓글을 달수 없게 됩니다.
명예훼손, 개인정보 유출, 욕설 등 법률에 위반되는 댓글은 관계 법령에 의거 민형사상 처벌을 받을 수 있으니 이용에 주의를 부탁드립니다.
Close
x