Source: kdnuggets.com September 2018
Author: William Schmarzo, Hitachi
One of the most exciting challenges I have at Hitachi as the Vice-Chairmen of Hitachi’s “Data Science” is to help lead the development of Hitachi’s data science capabilities. We have a target number of people who we want trained and operational by 2020, so there is definitely a sense of urgency. And I like urgency because it’s required to sweep aside the inhibitors and resistors to change.
I started this assignment with a blog titled “What’s the Difference Between Data Integration and Data Engineering?” that laid out the differences between traditional Data Integration and modern Data Engineering (see Figure 1).
Figure 1: Data Integration versus Data Engineering
But that blog only addressed the Data Engineer role. To achieve the goals for the “Data Science” – which is to become more effective at leveraging data and analytics to optimize key business and operational processes, mitigate compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated user experience – we need to consider three key roles, and the interaction between those three key roles, that round out the data science community. We need to understand the responsibilities, capabilities, expectations and competencies of the Data Engineer, Data Scientist and Business Stakeholder.
Researching the Data Science Triumvirate
We started this assignment by researching the job hiring profiles of Data Engineers and Data Scientists at the Silicon Valley’s leading data science organizations (Thanks John!). We then created a graphic to highlight the focus and capabilities of those roles, as well as the interactions between those roles – the Data Science Capabilities Venn Diagram (see Figure 2).
Figure 2: Data Science Capabilities Venn Diagram