KDD 2020 San Diego

2nd Workshop on Data Science Standards – What do you need to know as a Data Scientist? 

Special Workshop Theme: Training Data Scientists of the Future

Background, Definition and Audience of the Workshop

 

As the demand for data science talent has exploded so have the efforts to train data scientists. There are many programs and formats for training Data Science, but these seem to have a very wide variance in what they target and what they emphasize in the training. They range from short online courses to full-time undergraduate and graduate degree programs. As there is not yet an agreed-upon definition of who data scientists are and which skills and knowledge they need to have, designing training programs or developing curriculae have become challenging. On the other hand, organizations in industry are often not able to articulate their expectations from data science talent clearly, which in turn makes hiring, managing and creating development programs for data professionals inefficient and ineffective.  

The first workshop in this series was held at KDD-2019, organized by the Initiative for Analytics and Data Science (IADSS), where contributors discussed challenges in defining data related roles and ways to standardize skills and knowledge required for a variety of roles in analytics and data science space. The topic of training data scientists has been a major topic over the last year, and an important area of coverage for Harvard Data Science Review (HDSR), the new open access platform of Harvard Data Science Initiative that features foundational thinking, research milestones, educational innovations, and major applications in data science. Our proposed workshop, co-chaired by Usama Fayyad, Co-Founder of IADSS and Xiao-Li Meng, Founding Editor-in-Chief of HDSR, aims to discuss how we can better train data scientists with the skills and knowledge that the industry needs and explores how the industry and academia can collaborate to make sure we’re not only meeting the demands of today but also preparing for the changes and challenges of the future.

This workshop has the goal of sharing detailed findings from IADSS research on industry needs and skill-set requirements for data science professionals, discuss the various ways universities and training organizations are currently approaching data science education and explore how we improve effectiveness through industry and academia collaboration. We aim to have participation and contribution by administrators and educators from leading institutions as well as industry managers and executives responsible for managing, hiring and training data science talent, in order to gain insight into both supply and demand side of the data science profession. We will also invite representatives of training boot camps, and other training programs, as well as representatives on in-house training programs in industry. In order to achieve intended aim, workshop will be held as a half-day working meeting with short talks, invited panels and discussion sessions to plan for future steps in the topic.

Attendance at First Workshop:  The first workshop help at KDD-2019 attracted over 200 attendees and involved very active and vigorous discussion of the hot topics. The program for the first workshop is listed on the KDD-2019 web site.

IADSS Advisory Board

Previous Conference Presence​​

KDD 2019 Anchorage, Half-Day Workshop

“Proposed Standards on Definitions of Analytics Roles, Skill-sets and Career Paths in the Data Science Industry”

Speakers

KDD 2018 London, Applied Data Science Invited Panel

“Who is a Data Scientist? Defining the Analytics Profession and Cutting Out the Hype and Confusion”​

 

IADSS co-founders Usama Fayyad and Hamit Hamutcu spoke at a special invitation panel to discuss the professional standards in analytics and data science related roles in the industry, along with global analytics leaders.

Knowledge Discovery and Data Mining (KDD) London Research

KDD 2018 Panel Speakers

ICDM Standards Workshop

“Establishing Data Science Industry Standards”

 

A special workshop was organized by IADSS at ICDM Singapore on November 17 - 20, 2018 to share the initial results of the study and hear from industry leaders and academicians regarding the topic of data science & analytics professional standards. You can see details here:

ICDM IEEE Singapore 2018 workshop

IEEE ICDM 2018 Singapore, Half-Day Workshop

“Establishing Data Science Industry Standards”

Speakers

Organizers
 Usama Fayyad, Hamit Hamutcu
  1. Opening by Usama Fayyad and Xiao-Li Meng: Motivation of the workshop and introduction of agenda

  2. Talk: Usama Fayyad / Hamit Hamutcu: Results from IADSS Research on Standardized Definitions in Industry Data Science Roles and Data Science Knowledge Framework

  3. Talk: Xiao-Li Meng / Liberty Vittert:  Ensuring a Healthy Data Science Ecosystem: Pedagogical Challenges and Opportunities 

  4. Short presentations representing the views from data science education program directors and academicians (2 short presentations)

  5. Short presentations representing the views from training organizations (2 short presentations)

  6. Short presentations representing the views from data science industry executives (2 short presentations)

  7. Panel discussion on the issues and requirements for training data scientists

  8. Open Discussion Session with Attendees

  9. Closing and Next Steps

We are also planning to include a special section on the impact of COVID-19 pandemic on training data scientists. Potential factors to explore include: increased need for effective communication and collaboration, faster modeling, data sharing, ethical use of data, ensuring education quality and assessing expertise with more training being conducted online.

Agenda of the Workshop at KDD 2020

Biographical Summaries of the Organizers

Usama M. Fayyad, Ph.D.  (Workshop Co-Chair)

 

Usama serves as founder/CEO of Open Insights (founded in 2008) where he works with large and small enterprises on AI/Machine Learning, BigData strategy, and launching new business models based on Data Assets: Most recently serving as Interim CTO for Stella.AI, a VC-funded startup in AI for HR/recruiting; and Interim COTO of MTN2.0 – helping develop new revenue streams in mobile payments/MFS and Data-as-a-Service businesses at MTN, Africa’s largest mobile operator.

 

Usama was the first Global Chief Data Officer & Group Managing Director at Barclays in London (2013-2016) where he also took on additional role as CIO of Risk, Finance & Treasury Technology in 2015. From 2010-2013 Usama was co-founder of OASIS-500, a tech startup investment fund, following his appointment as Founding Executive Chairman in 2010 by King Abdullah II of Jordan.  Up until joining Barclays in 2013 he was also Chairman, Co-Founder and Chief Technology Officer of Blue Kangaroo Corp building a mobile search engine service for offers personalization and activation based in Silicon Valley. His background includes Chairman/CEO roles at several startups, including DMX Group (acquired by Yahoo!) and digiMine (Audience Science) which was founded in 2000 in Seattle to build hosted data warehousing and data mining solutions for Fortune 500 companies. 

 

He was the first person ever to hold the Chief Data Officer (CDO) title when Yahoo! acquired his second startup in 2004. In addition to CDO he was also Executive VP of Research and Strategic Data Solutions where he ran Yahoo!'s global data strategy, architecting its data policies and systems, and managing its data analytics and data processing infrastructure. The data teams he built at Yahoo! collected, managed, and processed over 25 terabytes of data per day, and drove a major part of ad targeting revenue and data insights businesses globally. He also founded Yahoo! Research Labs where much of the early work on BigData made it to open source and established the early collaborations that launched Hadoop and other open source contributions.

 

Usama held leadership roles at Microsoft (1996-2000) and founded the machine learning systems group at NASA's Jet Propulsion Laboratory (1989-1995) where his work on machine learning resulted in the top Excellence in Research award from Caltech, and a U.S. Government medal from NASA. 

Usama earned his Ph.D. in engineering in AI/Machine Learning from the University of Michigan. He holds two BSE’s in Engineering, MSE Computer Engineering and M.Sc. in Mathematics. He has published over 100 technical articles on data mining, data science, AI/ML, and databases; and holds over 30 patents, is a Fellow of the Association for Advancement of Artificial Intelligence (AAAI) and a Fellow of the Association of Computing Machinery (ACM). He is active in the academic community with several adjunct professor posts and is the only person to receive both the ACM’s SIGKDD Innovation Award (2007) and Service Award (2003). He has edited two influential books on data mining and served as editor-in-chief on two key industry journals. He is an active angel investor and advisor in many early-stage tech startups across the U.S., Europe and the Middle East. He served on the boards or advisory boards of several private and public companies including: Criteo, Invensense, RapidMiner, Stella.AI, Martini Media, Virsec, Silniva, Abe.AI, Medio, NetSeer, Choicestream, and others. On the academic front his is on advisory boards of the Data Science Institute at Imperial College, AAI at UTS, and The University of Michigan College of Engineering.

Prof. Xiao-Li Meng

 

Xiao-Li Meng, the Whipple V. N. Jones Professor of Statistics, and the Founding Editor-in-Chief of Harvard Data Science Review, is well known for his depth and breadth in research, his innovation and passion in pedagogy, his vision and effectiveness in administration, as well as for his engaging and entertaining style as a speaker and writer. Meng was named the best statistician under the age of 40 by COPSS (Committee of Presidents of Statistical Societies) in 2001, and he is the recipient of numerous awards and honors for his more than 150 publications in at least a dozen theoretical and methodological areas, as well as in areas of pedagogy and professional development. He has delivered more than 400 research presentations and public speeches on these topics, and he is the author of “The XL-Files," a thought-provoking and entertaining column in the IMS (Institute of Mathematical Statistics) Bulletin.

 

His interests range from the theoretical foundations of statistical inferences (e.g., the interplay among Bayesian, Fiducial, and frequentist perspectives; frameworks for multi-source, multi-phase and multi- resolution inferences) to statistical methods and computation (e.g., posterior predictive p-value; EM algorithm; Markov chain Monte Carlo; bridge and path sampling) to applications in natural, social, and medical sciences and engineering (e.g., complex statistical modeling in astronomy and astrophysics, assessing disparity in mental health services, and quantifying statistical information in genetic studies). Meng received his BS in mathematics from Fudan University in 1982 and his PhD in statistics from Harvard in 1990. He was on the faculty of the University of Chicago from 1991 to 2001 before returning to Harvard, where he served as the Chair of the Department of Statistics (2004-2012) and the Dean of Graduate School of Arts and Sciences (2012-2017).

Call for Contribution

Deadline for Submission: June 30, 2020
Notification of Acceptance: July 10, 2020
Workshop Date: August 24, 2020
 
The workshop welcomes contributions in the form of short presentations.
 
Please send the following information to info@iadss.org:

-    Full name
-    Contact information
-    Title of the presentation
-    A brief summary of the presentation

 

© 2020 by IADSS