#DemystifyDS Conference Updates and Q&A
Updated: Sep 5, 2019
“Best talk yet. Would really, really, really be nice to get access to his slide deck!”
- A listener from the DemystifyDS Audience
After Usama Fayyad’s IADSS presentation at the Demystifying Data Science Conference of Metis, we received very positive comments regarding the analytics and data science standards research along with many questions from the audience. Here is the record of the presentation at IADSS YouTube Channel and below you can find some answers to questions from the audience after the speech, by IADSS Co-founders Usama Fayyad and Hamit Hamutcu.
Remember to follow IADSS on Twitter, YouTube and LinkedIn, or subscribe to our bi-monthly newsletter by filling a short form to hear the latest insight from the research and news from data & analytics world.
The full recording of the presentation from Metis:
Question: How do you draw the line between Data Analyst or Data Scientist? For example, my title is Data Analyst but I built our Data Warehouse pipeline and build models and insights. Does that put me more in the Data Scientist or Data Analyst category?
Usama Fayyad: Exactly. Great question, this is basically the core of the dilemma, right? So where is it that you know, I draw my own line. I have my box, right? So a data scientist, I expect them to have a good degree of programming. I expect them to know statistics. I expect them to meet certain bars that I have, but who says that my bars should be the right bars? And that's what we're trying to get to is what are the right bars? What do companies need and how do we call these people with the right titles?
So for example, because data scientist is such a highly paid and highly coveted position, lots of people are claiming to be data scientists. Most of them probably aren't qualified and they wouldn't even know a data warehouse if it hit them in the face. They wouldn't understand how to like query it. They don't know how to do storytelling with analytics.
So this is why we need to kind of solve the problem, but excellent question because it really hits on the topic.
Because data scientist is such a highly paid and highly coveted position, lots of people are claiming to be data scientists. Most of them probably aren't qualified...
Question: Yeah, yeah. So I'm going to throw another wrench in your discussion, just for the sake of sort of strong manning things. There are many fields where sort of expertise in the specific topic is as important or more important than things that are sort of standard like statistics. And I will give you an example from my own background.
I'm trained as a biochemist, so as a traditional bench lab scientist and the data science and machine learning jobs that I've had, I could not have done those jobs without that training. So it's actually probably more important.
So how do you propose that we would evaluate people where domain expertise is paramount?
Usama Fayyad: Yeah. Look, I mean domain expertise is a must in almost any real tasks that you do kind of outside research or even in some cases in research, if you're doing research on a certain field.
But here's what I would say. The core skills for doing data science are applicable in many areas and many domains. The way I try to think of this, is this is very much like engineering. If you understand the principles of design, principles of problem solving, the principles of how to represent the problem abstractly, you can apply it to many domains and many areas from manufacturing to transportation to construction to whatever.
The core skills for doing data science are applicable in many areas and many domains. If you understand certain principles, you can apply it to many domains.
Now, the trick here is how do you work with the right experts because you can never get, I mean of course the best combination is an expert like you in a certain area who actually picks up the data science skills. That is very rare and very difficult.
It's much easier to find somebody who really, really knows how to do the data science that are machine learning expert or they know their statistics inside out. They can program, they can dive into data and grab stuff who are working very, very closely with deep domain experts.
And this was kind of my lucky, at least exposure when I first graduated with my PhD in AI and machine learning from the University of Michigan. My first job was with NASA jet propulsion lab, which is a Caltech lab.
So I ended up hooking up with many scientists, real scientists in astronomy, planetary geology, atmospherics, many of these areas who really, really knew the domain inside out. But I could bring a new perspective and we could solve problems that the scientists struggled with for 30, 40, 50 years without being able to solve them because they didn't know what's possible with machine learning and algorithmic approaches to analysis.
So I think it's the fusion of both. How do you, you got to find ways to know how to talk to domain experts and you got to know how to collaborate with them and figure out how to make your tools and your knowledge useful.
Question: From my experience the team is more critical than the individual when it comes to data science. Has IADSS considered recommendations/standards for how to hire data science team?
Usama Fayyad: I would say IADSS has been focused on getting the basic definitions right and getting some standards in place. I fully agree that teams matter and the whole notion of what is the right team and what is the right diversity of thought on a team makes the team more effective. That's an open problem in almost any area of management, it's not unique to data science.
It's a great question and the direct answer is, no we haven't looked at it from a team perspective. We're just trying to nail the basics which is when you describe a certain role how do you describe it, what do you expect out of it, how do you evaluate whether somebody knows or has the knowledge that is required to successfully fulfill this position.
Hamit Hamutcu: A side benefit of what Usama said would be that if the roles are more clearly defined, probably it would be easier to define what a team should look like versus what they currently have, so we can put together all of the roles that they have and make sure that it's covering all of the necessary functions that the organization needs to have or all the skills your organization needs to have as a team.
If the roles are more clearly defined, probably it would be easier to define what a team should look like.
Question: So title requirements depend highly on the need of the company is the issue mostly you see with companies or with candidates?
Usama Fayyad: You are right that the title these days depends on the need of the company, but that's part of the problem. Because there is no understanding and no agreement on standards of how to think of what a data scientist can do, what a data scientist should know, we end up with this everybody using the same title but intending different things kind of situation. So I would say the confusion is both on the part of the employer who writes the job description and of course it is fed by the fact most employees or candidates don't know enough to push back on it or refine it. So it's probably both but that candidates pretty much respond to what companies want and companies right now are confused. They don't know what they want.
Question: Being a burgeoning field why aren't more employers willing to bring in talent and bridge the gap with training? Academia is only good for training for a snapshot in time.
Usama Fayyad: First of all, employers think that candidates will come in with the necessary training either from prior jobs or from university, but they are not. So what we see in our surveys is that there's a huge reliance on self-teaching. Employers realize these candidates don't have the skills but they don't support them by doing formal training programs. They actually let them go off and find their own courses online, university, learn on the job etc. What I would say for the role of academia is that there are basic principles, and academia is very good at teaching the basic principles. So fundamental understanding of probability distributions, fundamental understanding of how do you reason on top of those distributions, fundamental understanding of how do you do estimations and when is an estimate stable versus unstable. Good understanding of algorithm, what works, what doesn't, and how does complexity come into it, and what is practical what is not. All of those are things that are common and needed as foundations. That should be taught in academia and should be taught in a systematic way.
However, the question points a need, which is what do you do about this practical training and how do you reduce the inefficiency and high variance that comes with this learn on your own approach, and I think bootcamps can help a lot here. I think practical courses can help a lot and I think employers need to start thinking hard about once you've understood how to standardize the role description and expectations then you can create this checklist and say okay let's make sure we cover the weak points in employees. Right now, none of this is available, so it’s kind of the blind leading the blind.
Employers think that candidates will come in with the necessary training either from prior jobs or from university, but they are not. What we see in our surveys is that there's a huge reliance on self-teaching.
Question: What university courses do you think would be necessary to enter the data science market?
Usama Fayyad: I think the most important familiarity would be the experience and having solved problems before. From a courses' perspective, if we're talking academically and university, in particular, I would say any courses in statistics, probabilistic reasoning, and algorithms. Nowadays also a good overview machine learning class. And probably good familiarity, probably very useful to have an artificial intelligence class.
Question: From your experience how do employers view bootcamp graduates?
Usama Fayyad: There's a lot of bootcamps, and some of them are extremely more useful and deeper than others and impart better knowledge. So there's a high variance.
But in terms of how employers view them, I would say probably employers under-estimate the value of bootcamp. And this is why it's important to the candidate to pick the right bootcamp because what employers are looking for are results and prior experience in dealing with data and dealing with these kinds of problems. So some bootcamps give good practical hands on experience in terms of how to program, how to deal with issues, how to deal with data quality issues, how to mitigate data quality issues or missing data. Those are things you gain by experience and some bootcamps are very useful in that respect, however for most employers they probably discount bootcamps in general because bootcamps haven't built a greater reputation yet.
Employers look at university courses more favorably than bootcamps, but I do believe strongly that some bootcamps are more valuable than many university courses.
Question: What are some early start up practices that companies can implement to store our data for future analysis?
Usama Fayyad: Some practices that I think big companies can learn from startups are that startups normally start with naturally smaller problems and more modern systems, and they end up recording a lot more details along with each event is becoming very valuable down the road. Probably some of the biggest problems in big companies I see with data is a lot of missing details and learning algorithms and data mining algorithms really require a lot of detail.
Question: I am doing Mechanical Engineering but I love data Science and I am working on a project related to it. Do I need to pursue Masters to get a job at top companies like Amazon, Google, IBM, etc.?
Usama Fayyad: I think data scientist come from a variety of fields, so you just need the core training and understanding of the algorithms, so the fact you didn't study computer science is not a minus. The fact that somebody is a statistician I would say that's a plus. The analogy I would draw here is back in the days when I was at Microsoft the most common major for undergrads who turn out to be good programmers was physics, it wasn't computer science. So as long as you get the right training and as long as you know the right algorithms it doesn't matter what the basic traditional training is in, any field of engineering or technology or statistics or math would be useful, or physics for that matter.
Question: From ETL/Big data Engineer myself. How will I sell myself to companies to hire me as a Data Scientist, given I have a fair amount of knowledge in Data Science?
Usama Fayyad: I would say here, one of the themes that we focus on is the fact that these are distinct roles. The difference between data engineer, and a data scientist is as big as a difference between back-end engineer, and a front-end engineer. You have a different set of skills, different kinds of technologies and you tackle different kinds of problems. I would suggest don’t try to sell yourself as a data scientist. Try to sell yourself as a very knowledgeable engineer who understands the big data technology and understands how to use it and understands how to work with a data scientist. If you wind up picking up those skills then you qualify to be a data scientist. But, the demand is high for both, and we need very solid strong data engineers as much as we need solid data scientists. I wouldn't try to do it as a positioning.
The demand is high for both, we need very solid strong data engineers as much as we need solid data scientists.
Question: Would you employ someone who has proven himself with the necessary skills but obtained them through moocs?
Usama Fayyad: Yes, of course. To me as long as you have the right skills and you have the right knowledge, and the questions becomes how do I assess that knowledge, how do I get confidence that you have enough experience to know what to do with that knowledge, that is much more important to me than a university degree.
Question: What do you think about EdX Micro Masters?
Usama Fayyad: I think it's a great idea. My answer would be you don't need a full masters to gain the knowledge you need. If you take a Micro Masters and focus on the topics you care about, you can become a very effective data scientist faster.
At IADSS Blog, we try to bring you real value with content including insight from our standards research, news from data & analytics world and off-topic discussions with industry leaders.