Starting a couple years back, there has been a lot of hype about data scientists and how this was going to be the hottest job in the coming years; analysts were predicting that the need for analyzing big data and finding patterns and trends would require thousands of data scientists who can build statistical and predictive analysis tools.
Terms like R and Python suddenly became sexy and all the statistical nerds starting coming out of the woodworks. Caching, sharding, clustering, classification, overfitting, underfitting, and scalability became terms to be thrown out @ random in social events and parties. And to make sure resume scanning programs found the buzz words, Hadoop, MapReduce, Hive, and HBase became must have words on resumes.
The dirth of qualified data scientists was a hot topic of discussion and following the projected supply demand laws of economics, companies were scrambling to hire the few data scientists available and paying them highest dollars (or pounds, or Euro, or bitcoins…..). Of course, not to miss out on the hype and opportunity, academia jumped on the bandwagon and started hyping up their data science programs or offerings. Boring and unpopular statistics and econometrics classes suddenly got rebranded as ‘data science’ courses – marketing @ its best.
A 2011 McKinsey report estimates there will be 140,000 to 190,000 unfilled positions of U.S. data analytics experts by 2018. In response, universities are scrambling to improve their existing degree programs and create entirely new offerings.
Note: Ironically, while the companies like Facebook, and Twitter that generate much of the data are in California, there are only a handful of universities in California offering a dedicated data science program.
I have not tracked if companies like eHarmony.com jumped into the fray as well promoting the socially challenged candidates trying to find a match, but would not be surprised if they did (“John does not talk much, has no interests or activities, but is great @ projecting the outcome of a coin toss…….”).
Recently though, the hype seems to have gone down.
One of the possible reasons could be the improvement in new predictive analysis tools like #SAS, #SPSS, #KXEN (now called #InfiniteInsight after being acquired by SAP). I recently attended a 2-day workshop focused on SAP Insider Insight (formerly #KXEN) and I came away impressed. The tool allows you to build models using various statistical methods and predictive algorithms that can be used to analyze structured and unstructured data. With SAP HANA, one can process large volumes of data streams from various data sources (including Hadoop, O Data, etc.) and use SAP Infinite Insight to process this data and publish trends and co-relations. I was able to easily build various models with minimal training using the easy to use UI.
Unlike a data scientist, who has to constantly tweak and update his/her model manually with changing business conditions and data feed types, predictive analysis software tools can update models easily and be used to analyze data for predictions.
Infinite Insight incorporates R and other statistical capabilities and can be used independently or on HANA. Infinite Insight is going to be embedded into SAP #Hybris, according to the SAP roadmap; this will allow companies to use predictive analysis on their e-commerce sites.
Now back to the original topic – where does this leave the data scientist? My guess is while the data scientist may be hired to help build models in tools like Infinite Insight, it will not necessarily be a long-term employment. With more machine learning being developed almost on a daily basis, predictive analytics software will only get better @ modeling and predictions.
Now if only if there was a software to predict the future of the data scientist.