The numbers are staggering; a 2011 McKinsey & Company study(a) estimated that by 2018 USA could face shortage of up to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
Add to this the fact that more than 2.5 exabytes of data (that is 2.5 billion gigabytes, or 2.5 followed by 18 zeros) is generated everyday. Astonishingly almost 90% of this data has been created in the last few years. And we are nowhere near peaking out, each self driven car of the near future will generate 5,000 GB of data per day. You get the idea.
The term Data Scientist is relatively new, coined in 2008 by D.J. Patil, and Jeff Hammerbacher. For some, Data Scientists are just glorified Statisticians, for others these are the people who are going to change the world by providing us new insights and figuring out the future. Take your pick.
On a more serious note Data Science is all about using scientific methods, processes and systems to gain insights and knowledge from data in various forms.
“In God we trust, all others bring data”, William Edwards Deming
Today data science is an integral part of our lives; healthcare, retail, transportation, manufacturing & the public sector are just few of the examples of where data science is a fundamental part of decision making and day to day management.
Data is the new Oil
Not too long ago data used to be more of a liability. How do we store these boxes full of paper? How do we keep it safe and God forbid if we need to find something in this haystack of paper journals? Today data is an asset, you only need to look at the two largest companies in the world; Google and Facebook, what is their comparative advantage? Yes, it’s the data that they “own”.
The changing shape of data
Data is not only growing in size every minute but over the last few years it has changed shape as well. Broadly speaking there are two types of data; Structured & Unstructured. The structured data is what you have in tabular format, nicely laid out in rows and columns in an excel sheet for example.
The unstructured data is everything else, text, voice, video, etc. Not too long ago most data was structured and hence easy to work with. However, today more than 75% of the data being generated is unstructured, think of millions of blogs, web content and videos being produced every day.
The continuously increasing computing power and the exponentially decreasing cost of storing data is also contributing towards this rise in data quantity. Cloud based storage has made it easy and affordable to store humongous amounts of data, something that only the richest governments of the world were able to do not so long ago.
So who can be a Data Scientist?
You need to have a curious mind to delve into data science, at times you will need to have a certain conviction and even a firm notion, and you must be looking for something, trying to prove something. You need to formulate a hypothesis and set out to prove it right or wrong.
You also need strong story telling skills. As discussed Data Science is now not just confined to the Bureau of Statistics, it is now a critical part of many business. Trust me any CEO would be least interested in knowing your confidence interval or variance, what they would be interested in is how they can use your data insights to win more business and be ahead of the competition.
So if you have these three skills; Statistics & Mathematics, Curiosity and Story Telling you can be well on your way to the sexiest job of the century.
Without disregarding the hype and the title of “sexiest job”, Data Science will definitely be a critical part of our lives in the future. If you are looking for a career change or wanting to dive into something new, this may be what you were waiting for.
(a) Big data: The next frontier for innovation, competition, and productivity
(b) Data Scientist: The Sexiest Job of the 21st Century
(c) Data Science and Cognitive Computing courses at Cognitive Class by IBM
(d) Microsoft Professional Program Certificate in Data Science
(e) Cloudera Certified ProfessionalCloudera Certified Professional