__STYLES__

/

Data Team Collaboration: Top Lessons from Nike

EPISODE 3

EPISODE 3

EPISODE 3

Data Team Collaboration: Top Lessons from Nike

Data Team Collaboration: Top Lessons from Nike

Data Team Collaboration: Top Lessons from Nike

Meet the Guest

Vino Duraisamy

Vino is a Data Engineer at lakeFS, a “Git for data” platform enabling users to implement best practices from software engineering on data lakes. Vino is also a seasoned conference speaker and technical writer for Medium and Towards Data Science on a variety of topics including Data Engineering, AI, ML, Statistics, and Data Analytics.

Watch the Episode

meet your host

meet your host

Enrique Ruiz

Enrique Ruiz

Enrique is a certified Microsoft Excel Expert and top-rated instructor with a background in business intelligence, data analysis, and visualization. He has been producing advanced Excel and test prep courses since 2018, along with adaptations tailored to Spanish-speaking learners.

Enrique is a certified Microsoft Excel Expert and top-rated instructor with a background in business intelligence, data analysis, and visualization. He has been producing advanced Excel and test prep courses since 2018, along with adaptations tailored to Spanish-speaking learners.

Top 3 Insights from Vino Duraisamy

  1. Aspiring data scientists should start with a foundation in SQL and Python and then tailor the rest of their skillset depending on the jobs they’re looking for.

  2. Companies lack a structured and methodical approach to employee training, but Nike had effective collaboration sessions between their different data teams that helped spread best practices and enhance their collective knowledge.

  3. You would never ship software to production without testing, but somehow in the data world, it’s acceptable? ETL & data quality testing is a big gap that data companies are not addressing.

————————————————————-

READ THE TRANSCRIPT

Enrique:

I don't know if you know this, but you're actually the 3rd guest in our Mavens of Data series. Fun fact: our first guest, Daniel Lee, mentioned the Harvard Business Review article on “Data Scientists: The Sexiest Job of the 21st Century” as the reason that he actually got into the profession. He was actually a Stats major, which is close enough, but not really where he thought he was heading. And then Alice Zhao, who's our data science instructor at Maven and was the host of the second interview, quoted the same exact article in the conversation that I had with her as the reason that she got hired after her data science masters. So imagine my surprise when I go on your Twitter feed to do some research for this interview and then I find an image of that same article with the quote, “the one that started it all”. I know you're an Engineering major, so is that article the reason you decided to get into the world of data science as well? Tell us a little bit about how you got started.

Vino:

Yeah, for sure. I mean, when I said “the one that it's that started it all” it was mostly around “the one that started the frenzy” around data science and people getting into data. Almost all my colleagues and friends at the time were trying to pivot from what we were studying at the time, electronics, into data science. So that was definitely a starting point for me as well because I did not directly jump into data science after the article, but I did see a lot of things move around me. Mostly my colleagues at the time trying to wrap up their data science skills by learning Python and Pandas. I was working for this company called NetApp at the time and we used to have our own brown bag sessions trying to catch up on all things new in the industry. And, at that time, it happened to be data science. So we were already learning data science, even though I was not working in data science at the time. But, for sure, that definitely started it. And yeah, once you start learning Python and you start doing some projects on the side, you look for anything that you can do with these newly learned skills and then apply it to your role. And I guess I gradually then got interested in the data side of things.

Enrique:

It's so interesting to me because I'd seen the article myself as well, but I think I was much earlier in my career path and didn't really see any influence. And then I'm just starting to see pretty much all the successful data science people that I know converge around the same point. It's funny, I read the book Outliers, which is all about how there are different, often overlooked, factors for success and how timing can be so important. They talk about Bill Gates and Steve Jobs and all these people being born within 12 months of each other. So it seems like there's a similar thing with data science as well. So, for anyone that maybe feels like they're late to the party and they missed that article and they're just starting their transition. What advice would you give to those folks to encourage them to keep going?

Vino:

Timing definitely does matter to some extent, because if you are early enough to have the skills and become someone in demand, you can pick and choose where you want to be, what interesting projects you work on, and everything. But I went for my masters only in about 2019-2020 and I thought: “Oh my, I'm too late to the party” because it's been there for a while. Does it even make sense to do masters now or should I consider doing a little more in deep learning or focus and specialize in NLP? There was a lot that was going on. And then I realized, three years down the line, that I wasn’t too late. I actually don't think the market ever saturates or the technology ever saturates. I think it was around 2012 when the HBR article came up, and over time I saw the market evolve more towards data engineering and data infrastructure. There was also a mix of data, data science, data analytics, and data engineering. Now the market is moving towards large language models, ChatGPT, and everything. But I would say the foundation for it all is still data science and the foundations of ML. So it's never too late because you're going to build up these foundations and then by the time you're done with your courses or boot camps, things might have evolved somewhere else. But all these foundations would definitely help you know what's going on, and it would be easy for you to get up to speed on anything that you want to work on later.

Enrique:

That's a great answer. And it's interesting that you mentioned you got started with your masters and then you touched on courses and boot camps. What would you recommend for anyone that's getting started? Or what would you do if you had to do it all over again?

Vino:

I think if I had to do it again, I would not go back to school. What I was talking about with the industry also applies to the learning side of it. Things have changed. A few years ago, we were all focused on getting certifications and going back to school, and all of them had a lot of value. But today, having hands-on experience on a specific project that you can do in your spare time and being able to play with the product and the tech is what matters. And I think boot camps really do give you that project-based, hands-on experience, which I would 100% go for if I were to do it all over again.

Enrique:

I agree, and I think it starts to touch other subjects that I wanted to ask you about. As you mentioned, in 2012 you started to notice an evolution, which brings me to something interesting I found about your profile in particular. You’ve had such a cool progression of different tools that you were learning and using from one job to the next. From what I could gather, it started with SQL, Tableau, and Excel, l then you added some Python on top of that, and now there's a myriad of tools like AWS, Spark, Airflow, Docker and so many others. Were all of these tools you were learning yourself in your spare time, or did you end up picking some of these up on the job?

Vino:

Yeah, I think it's mostly a combination of both, right? When I wanted to pivot into data science, I was trying to go through courses and pick up side projects, and get to learn something by myself. But there is only so much you can do because when you’re on the job it’s a whole different thing. You're not necessarily learning the tools, but you also get to deal with the data at scale, in production, and the learnings that come there are totally incomparable. So it was a mix of both. When I was learning myself, it was more intro-level courses, like Airflow 101 or Docker 101. And then on to the job you’re exposed to everything. But it is a mix for sure.

Enrique:

I'm curious. It makes sense to learn SQL on the side because that's an all-purpose tool that you're going to use no matter where you go, but some of these others that you learned in the 101 courses; did you learn those just because you wanted to, or because maybe you had an eye on a job that was using it? 

Vino:

——————

Actually, this is very interesting. So when I started out, I went out to all these meet-ups that happened at the time. I was in Phoenix for a Databricks meet up and I spoke to a couple of data engineers and data science folks. I asked, “Hey, I'm doing this masters and I plan to pivot into these roles. What should my tech stack be? What should I learn?” And everybody told me that I shouldn’t worry too much about all the things that are out there and just focus on Python and SQL since SQL is everywhere and Python would give you the basic programming knowledge you need. But then I looked at job descriptions, and they ask you for so much and I didn’t have any of those skills. My foundation was built on Python and SQL, just like they advised me. So what I did was go over the job descriptions in my target companies and think, “OK, so now everybody's looking for database experience or GCP experience”. So I took time to figure out what I wanted, stuck to one cloud, AWS, and then went all in. Not just to expand my skills, but mostly to align myself with the job descriptions so I could eventually land a job.

——————

Enrique:

I'm assuming then you'd recommend that same path that was recommended to you, right?

Vino:

For sure. I think everybody should start with the foundation and then tailor it to whatever job they’re looking at.

Enrique:

So you prepared yourself on the side, identified the opportunity, prepared yourself a bit more specifically for that opportunity, and then landed the job. Once you were there, did you find that the companies themselves did a good job of training and upskilling you and the rest of the employees in the relevant tools needed for success there? Or was it more of a hands-on fire experience? Are there any areas of opportunities for companies to do a better job?

Vino:

I don't think there was any specific training per se. When I was working as a data engineer at Nike, I had never done any work with Airflow in production, so it was all on-the-job learning. You will probably pick up a ticket just to fix the bugs in the existing DAG, so you're not jumping in straight away writing a new DAG by yourself. But then you’d gradually get used to the tools that the company was using and eventually start to contribute to the team by writing your own jobs and everything. But there was no defined way of learning the company-specific stack.

——————

And I think the one thing that Nike did better was having brown bag sessions with different teams. For example, there were different data teams even within the organization and they sat within different businesses. But we’d have knowledge exchange sessions between us. So one team would talk about how they updated their infrastructure, and we’d learn from each other. It wasn’t a very organized or methodical learning approach, it was just, “Hey, you know, we've done this. Maybe you might find this useful. Why don't you take a look at it?” kind of approach. But I do think this is something that in general companies don't really focus so much on.

——————

Enrique:

That peer-to-peer approach of leveraging the assets you already have to make them all better at the same time is such a good idea, and it's something that should be brought up. But it's interesting to see that no one is specifically trying to train their employees, which certainly feels like a gap somehow.

Vino:

Yeah, it's it's mostly like a peer-to-peer effort. If you want to do something, then find people who may be interested in doing similar things and sit together and try to do that. And it was also during COVID times, so we were trying to find excuses to sit together and spend more time with the team members. And if I remember correctly, we even had a couple of ML paper reading sessions. We would pick up a paper every week and we would just read the paper and try to understand it and it would be so much fun.

Enrique:

So that was Nike, but I know you've been at Apple as well. What was that like? Is it very different?

Vino:

Yeah, it was very different. I somehow assumed it would be very similar to Nike because of the size of the company and the data maturity of these two organizations. But it was very different in the sense that, with Apple, I started and I think one week in I already had my own tickets and was contributing to the team. It was literally like jumping in and starting right away and there was no time to ramp up or anything else. Yet it was very interesting, for sure, and it was fast-paced. Although the company was bigger, the data team that I worked with was smaller. We were about three data engineers and three ML engineers trying to work together to put a couple of models in production and it was super interesting.

Enrique:

I see you've also worked in research. How does that differ compared to working in a company?

Vino:

I worked as a research assistant to the professor when I was doing my masters. And I had never done that before. Comparing that to what I'm doing currently as a data engineer at work in an organization, it was 100% different. I don't think there was any framework or anything at all that we were following with research, and you would constantly feel like you were putting in so many hours and not seeing the results. But that’s research, it's slow-moving in itself. It takes a lot of effort to really move the needle. Compared to here where every two weeks you have tickets and you have something quantifiable to look at, and it gets you gratification, it keeps you motivated. And the other thing was also we were not really following a lot of engineering best practices. We would just copy the training data around to different members working on our team. I don't think we even used Git. We were all working on our own local Jupyter notebooks, and only when we had something solid to present or show each other we would then push the code to Git. To think about it now, that was a truly wild west of a data team.

Enrique:

Is there anything that you like to feel that companies can learn from the research process that maybe they're lacking because they've never been there personally, or was it more of an isolated experience?

Vino:

I think I would say the other way around. Research could probably take a little more of a methodical approach or put engineering practices in place to make our lives easier and to make things more trackable.

Enrique:

Fair enough and hopefully not many data researchers are going to listen to this episode. 

Vino:

But that's the thing, right? Because we do take our algorithms and everything from the researchers, and just apply it to different industries.

Enrique:

That's true. And I think it leads very nicely to what you're doing now working as a data engineer at lakeFS and going through some of those engineering best practices that maybe you were lacking in the past. And I was looking at some of the articles you wrote on Medium and Towards Data Science on ETL testing. Which, I have to admit, I never really thought about. Certainly not in a standardized way. And judging from your Reddit survey, it looks like many people are with me. So I think I know the answer to this question, but is this a gap that you've identified in different companies that should be addressed?

Vino:

Yes, for sure. I’d worked with three different companies and three different data organizations before I landed at lakeFS, right? And lakeFS is an open-source project that gives data versioning solutions, so when I looked deep into what exactly they offer I realized that I needed this exactly two years ago when I was working on that specific ML project or that specific data engineering pipeline. Which is why I got into this, and now I'm a developer advocate for lakeFS. This means I go and talk about these engineering best practices and how we can test them. I don’t want to mention when, but I've been guilty of tinkering with my production code directly, and not even having to go through the dev or staging or the production. Which is what can happen when you have hard deadlines. You just focus on pulling the data and putting something together in time. But then you end up not testing not just the data pipelines but also the data quality.

——————

So there are a lot of things that get missed because we don't follow the traditional software development best practices in the data world. So once I heard of the lakeFS project, I could literally look back on my previous experiences and realize that I needed to talk more about what was missing and how could we take it further. And ETL testing was one of the biggest things that not everybody did, even though in software it's very common. You cannot think of any software that gets shipped without going through testing, right? But somehow we thought in the data world it was totally acceptable to do that. In most companies, the data teams do not have extensive ETL testing. I’m clearly not the only one that has worked directly on production data or production code, which is what got me to write about ETL testing and how can we make it better.

——————

Enrique:

It's always nice to be working somewhere and realizing that what you’re doing to others now is what you needed for yourself in the past. And it just kind of gives you that extra motivation of knowing you’re in the right place and you're going towards a great goal. Moving on, what is one example where you've seen analytics talent make a major impact on an organization?

Vino:

I think it happens in every organization, doesn't it? I’ve been the only data person or analytics person in an entire company in the past, and even then I helped the sales and marketing teams with realigning their strategy or telling them where to focus. And some days I used to think, “Am I really just one person who is running a couple of SQL queries on our prospect data and using that to drive marketing decisions?”. And this was for a whole company with about 200 employees, so suddenly it feels so important. That’s when I realized how powerful analytics is. So yeah, everything today is data-driven and the analytics teams are, of course, being valued more and more each passing day. So that's a good sign.

Enrique:

I can definitely relate to the weird, powerful feeling of knowing that you hitting enter on a line of code will drive all these big decisions. And then you're just sitting there at night thinking, “Please let me not have made a mistake there”, right?

Vino:

Yeah, it's so much power to be wielding.

Enrique:

If you could give one piece of advice to analytics leaders about hiring, training, or mentoring, what would that advice be?

Vino:

I can understand that, currently, you cannot have an exact training plan for your data teams. But as things keep evolving, I do feel like we need that sort of training. Especially as data folks themselves transition to different areas within the industry. You have those that are SQL-based, or Spark-based engineers, so helping them transition between these different roles is where I see the training could be really valuable. And I don’t feel data teams do a great job here.

Enrique:

It’s so interesting for you to say that, since it reminds me of what you mentioned during your Nike days, right? Having the different departments collaborate and bridge those gaps, enabling them to make those evolutions and transitions. Beyond that, though, what would your number one pitfall be that analytics organizations or companies should avoid when it comes to data?

Vino:

This is something very interesting, because I've been talking with a friend of mine who is on the business side of it, and he brought up that data teams don’t understand the impact they have on the business end, specifically in the way they prioritize certain things. But my reply is that businesses don’t understand how complex it is to put together a data pipeline and to provide the data they need. And I feel like in all the companies I’ve worked for in the past there’s been this constant tussle. Unfortunately, it's not a technical pitfall, but mostly the communication between internal customers. But I don't think a lot of data teams have got that right.

Enrique:

Yeah, maybe we're talking amongst each other, but not with the rest of the non-data teams in the company, which makes a lot of sense. Final question from me, if you had to give one or two keys to success in building a truly data-driven and successful organization, what would that be?

Vino:

I think one of the biggest challenges I see the data teams facing, especially when building a team from scratch, is identifying what exactly they need. Most teams think they need data scientist and ML engineers, but then don’t have enough data engineers to support them. Then the data scientists end up working on data pipelines themselves. And it's always such a hassle because that's not their primary responsibility, and I don't think any data scientist enjoys working on data pipelines. So it ends up lowering the morale and slowing things down. So when you building a team from scratch, identify exactly what you need, what the data maturity of your team and your company is, and then get the right people on board. Instead of just going out and hiring four data scientists because of the HBR frenzy.

Enrique:

It's so interesting how that same concept applies to the work that just one person is doing as well. When you rush into the data without asking yourself why you’re analyzing it in the first place, you end up going in all different directions, or even worse, the wrong one. So I like that answer. Have you ever had a scenario in which you do feel like a synergy exists and it's not just a back-and-forth struggle?

Vino:

When I was working in bigger companies, like Nike, we had product managers who would smoothly interface between the business and the engineering teams. And that made it super easy for you to work with them or understand them because you’d have this person to go to for any questions you have from the business. And similarly, the business teams have us as well. So it was the data product managers we have who would take care of building this synergy between the teams. Unfortunately, though, not every company is the size of Nike and can afford to have that. And that’s when the data engineering and the data science teams directly interface with each other, and there are a lot of things that needed to be ironed out.

Enrique:

That's so interesting. And do you think that was by design or was it due to the person that was in that spot that made everything flow so nicely?

Vino:

That is interesting. I thought it was by design because, around the same time, I saw a rise in data product managers and AI product managers. So I thought it was going to become a big role, which is much needed. But I don't see as many data product manager roles these days, so I don't know if that is still a thing or if we've just decided to keep it directly between the engineering and data science teams.

Enrique:

We missed the fight, right? It was getting too easy.

Vino:

Yeah, it does look like it.

Subscribe for Updates

Get new episodes and key insights from The Mavens of Data series sent to your inbox.

Get new episodes and key insights from The Mavens of Data series sent to your inbox.

mAVEN FOR BUSINESS

Empower your team to make smart, data-driven decisions

Assess your team's skills, discover project-based courses to close the gaps, and create custom learning plans to build the data skills you need most.

Assess your team's skills, discover project-based courses to close the gaps, and create custom learning plans to build the data skills you need most.

READY TO GET STARTED?

Request a Free Team Trial & Platform Demo