I spent the week in San Francisco at GraphLab, a data science and machine learning summit.
The interest in the field of data science is simply amazing, growing rapidly (last year’s GraphLab had ~300 attendees, this year’s GraphLab had something like 500+)…and well justified. Companies like Pandora and Netflix who have made Machine Learning and Data Science techniques work have conquered the world…but capturing and recreating their magic remains an elusive task. The field is fraught with problems.
Data Science is a field which is deeply entrenched in it’s academic roots and the number of people who can claim any sort of expertise are few and far between (and often already rich/highly valued and therefore impossible to hire). Even rarer are large enterprises willing to invest in obtaining data science expertise.
The problem is part supply and demand. A Data Scientist is basically a software engineer who can do statistics. Most tech companies have a difficult enough time finding normal software types, now they have to find ones who dig stats. Such a person will be as rare as the mythical “Developer-Designer-corn” or hybrid developer / designer. Such a person would be so intelligent and talented that they would likely not bother working for your company and be off starting their own.
The next part of the problem is communication: How do you actually get this mythical Software Engineering Statistician Unicorn to both turn up useful business insights and then convince management types to act on these insights?
In my experience, having business insights is often much, much easier than actually getting bandwidth to implement any insights outside of the realm of critical path tasks. An organization which is willing to proceed through the use of Data Science probably can’t exist unless the founder built the company around such an approach.
In short: Most current existing companies are very unlikely to suddenly sprout machine learning competencies where they had none before and the supply of people able to claim the expertise is nearly non existant. Therefore, most large enterprises are going to be better off getting some serious external help in that department. Otherwise, the only way I see data science becoming a focus is by starting your tech company with it as a cornerstone and proceeding from there.
The final hope for Data Science finding its way into main stream technology companies is to expose functionality in a way that BI, data analysts and online marketers can actually run exotic Graph based queries with simple interfaces (see: Google Analytics). Such interfaces have not been invented yet…will they ever be?
Graph based data does not really lend itself to analysis via traditional Microsoft Excel spreadsheets, a new set of tools will be needed.