As part of its efforts to help define standards for data science roles in the analytics industry, Initiative for Analytics and Data Science Standards (IADSS) has organized a workshop at IEEE ICDM 2018 conference in Singapore on November 17, 2018. This brief report provides highlights from the contributors to the workshop as well as summarizing its purpose and suggestions for next steps.
Workshop Title: Establishing Data Science Industry Standards
Proposed Standards on Definitions of Analytics Roles, Skill-sets and Career Paths in the Data Science Industry
“Who is a Data Scientist? Do you think everyone understands the same from 'data scientist'?”
Background and Motivation of the Workshop
As the role of data and analytics is expanding very rapidly in creating new business models or changing existing ones, the demand for analytics professionals is growing at an increasing rate. The world has witnessed an explosion in the number of people describing themselves as Data Scientists or Analytics Professionals. Yet in the majority of cases, such people do not fit the bill for an available role. This leaves employers, trainers, educational institutions, recruiters, and customers in a total state of confusion.
Kaggle, as a platform for data science projects, gives insight about the rapid increase in the number of analytics professionals. The Number of Kaggle members exceeded 1 million with annual growth over 100%.
The growing number of analytics professionals can also be observed from LinkedIn
The growing number of analytics professionals can also be observed from LinkedIn. A quick keyword search on LinkedIn targeting job titles and listed capabilities show a large number of professionals who define themselves in analytics or related spaces. The number of LinkedIn profiles with an analytics/ data science related titles exceeds 1.6 million, while this number exceeds 12 million when analytics/ data science-related capabilities are targeted. Top 100 analytics and data science LinkedIn groups have more than 2.3 million de-duplicated members.
Almost every company in the industry has a unique way of defining roles and assigning titles in data analytics related positions. For any given role or title, such as ‘Data Scientist’ or ‘Data Mining Manager’, a variety of role definitions, expected hard and soft skills, expected level of experience, level in the organization, or career development plan including training can be seen. This creates inefficiencies and makes it difficult for companies to find the right match for a given position, leverage analytics skills effectively and retain talent. It also makes it hard for professionals to understand what a certain position requires and develop their own development plans. This has resulted in a chaotic market that is confusing to employers, academic and training institutions, recruiters, managers, customers, and candidates; with a large number of unqualified candidates calling themselves “data scientist” or “analytics professional”.
The workshop aimed to have an in-depth conversation about some of the issues brought forward above and explore the idea of establishing professional standards in data science.
Presenters & Presentation Highlights
Shonali Krishnaswamy, CTO, AIDA Technologies & Professor, Swinburne University of Technology
Ying Li, Chief Scientist, Eureka Analytics
Jirapun Daengdej, CTO, Merlin Solutions International
In her presentation, Shonali Krishnaswamy emphasized the importance of combining academic skills and real-world experience in tackling data science challenges. She distinguished between three types of roles she considers: data engineer, data scientist and machine learning scientist and explained how factors such as diversity of experience and domain knowledge play a role in matching the right person to the right job. She shared her conviction that there is as much art to solving analytics problems as there is science which makes tools and algorithms only part of the equation. She believes that given how much is available in open source libraries and number of papers written on machine learning algorithms, the key to success is implementing them in combination with data management and domain expertise.
Ying Li shared her experience recruiting, training and managing large groups of analytics resources throughout her career ranging from data analysts to applied scientists, from product managers to cybersecurity experts. She talked about the lack of a ‘dictionary’ meaning of a data scientist as opposed to many other professions and how one needs to look at the processes a data professional is involved in to label certain roles, e.g. a data analyst generating reports for others to incorporate into their decision-making process, a data scientist building algorithms that can make different decisions when given different data or an engineer building systems that run the algorithms that makes decisions. She finally shared insights on recruiting analytics resources and how difficult it is to get a measure of competency. She noted that HR departments are generally not equipped to handle this and a thorough interview puts too much strain on the existing but very limited data science resources in the company.
Our final presenter Jirapun Daengdej started by talking about the role of data science in digital transformation and how that has driven the demand for data science talent. He also shared the challenges of companies trying to find unicorn data scientists that have math, statistics, programming, database, communication and visualization skills as well as domain knowledge all together and pointed to the importance of building teams with the overall right skills. He then shared his expertise on data science project implementation process and highlighted several reasons why data science projects might fail, including not having the right talent mix in the team and what is probably still the largest issue for most companies: data quality. He then shared the long list skills that people expect from data scientists from various sources and reiterated his view that we should stop looking for the ‘Data Superman’.
Panelists & Panel Highlights
Shaowei Ying, COO, Dataspark
Feng Yuan Liu, Former Director, Data Science and AI Division, Govtech Singapore
Gabor Benedek, Data Scientist Partner, Lynx Analytics
Graham Williams, Director of Data Science, Cloud AI and Research, Microsoft Asia Pacific
Shaowei Ying leads a team that generates insights from data in multiple geographies across Asia and he sees that for different skills are needed for the different types of projects. He thinks for strong data scientists, strong academic fundamentals are crucial, whether in the math/statistics or computer science domain. However, he was quick to point out that the strongest skill set does not necessarily translate into the strongest performance because being able to contribute in a team and passion for data usually become even more important than technical know-how.
While leading the Centre of Excellence for Data Science for Singapore Government, Feng Yuan Liu grew his team from 8 members to 60 in three years. He pointed out how with the ‘data scientist’ title, businesses started seeking out PhD’s who were traditionally confined to research functions and its transformative effect. His team includes people with social sciences/economics background, who are developing models using R and open source libraries. Their main output is reports for the ministry office to support decision-making so communication and visualization skills are critical, similar to what a management consultant would do. They use the ‘quant strategist’ title for this role. On the other hand, there are also data engineering roles that look more like traditional software engineers which require a completely different skill-set. He also underlined that the industry is providing definitions for these jobs as ‘professions’ and it has yet to find an exact match in academia as an area of educational focus.
Graham defines himself as a machine learning scientist and sees a data scientist’s main role as transforming data into a narrative. He considers open source platforms and tools important in the development of data science. Graham distinguishes between people with advanced degrees in fields such as quantum physics developing new technology vs. people who can expertly use existing technology to drive actionable insight from data. He commented on the impact of the discipline of statistics in the development of data science into a mainstream area and the critical nature of the conversation between statistics and computer science practitioners.
Gabor provided his observations from a service provider perspective and talked about the impact of his education as a mathematical economist on how he came to view the data science field. He mentioned that he had a clear understanding of the ‘data miner’ definition in the earlier days of analytics and how it came to mean a combination of statistical and communication skills. He is not sure the multi-disciplinary approach in defining a ‘data scientist’ will persist and thinks there will be several specialist focus areas that will come out of this very broad definition.
Closing and Next Steps
This workshop was an important part of an ongoing initiative to address the confusion surrounding data science roles in the analytics domain. In the workshop, we had an opportunity to hear from academicians, industry leaders, practicing data scientists and representatives of large employers in the field about their perspective on how they view the data analytics as a profession and ideas about how to resolve the aforementioned confusion. We also had a productive conversation with workshop attendees and we thank them also for their time and, attention and contribution.
IADSS is currently running its research and data collection effort where we are reaching out to several hundred analytics leaders and practitioners globally through in-depth interviews and detailed questionnaires as well as extensive review of any existing market insight, social media data, job advertisements etc. We are also in the process of organizing the Standards Board and Advisory Committee for this effort. We hope the discussion in the workshop is going to help us advance the initiative and provide meaningful input to sustain it.
We believe setting industry standards for analytics and data science professionals will support the healthy growth of the field and look forward to sharing our findings and recommendation with the broader analytics community over the coming weeks and months through similar workshops, presentations as well as on online channels such as the initiative website, LinkedIn group and other platforms.
We finish by thanking again our program committee, workshop participants and attendees for contributing to this important topic.
Click here to get involved in the study or to learn more.