The Importance of Collaboration in Data
Asking for feedback is a secretly powerful tool in data work. Let’s talk about why, and how to do it well
A recent conversation with a fellow data practitioner sparked an idea that I want to share today. What is your process for conducting data analysis or modeling, and what do you consider important but perhaps unsung parts of getting the job done well? I realized as we were talking that getting feedback from other people as I go through the work is an extremely important part of my process, but it’s not actually something that is explicitly instructed to junior practitioners in my experience. I thought it would be useful to explain how I do this and why, and what the benefits are, for anyone whose process doesn’t necessarily include a collaborative or peer feedback component.
The kinds of projects where I think this matters are those where you’re doing most of the work in solitude, so you’re alone with your data and your wits. That often includes building models, of course, but also things like analyses that are intended to answer specific business questions or explore research topics. Most of the work is going to get done alone, which can be nice because you can work at your own pace and explore the project area the way you want to. However, it’s easy to get sidetracked, lose the plot, or miss something important when you’re the only one looking at the work.
For this reason, I like to consult with other data scientists from time to time as I proceed. This can take a number of different forms, and it’s not necessarily the same for every project.
Casual Chats
It’s important in whatever kind of feedback you’re getting that you have already done a significant amount of work. This is not a time to ask somebody else to do the work for you. You should have a solid project plan, and you should have already made significant progress (loading your data, looking thoroughly at it, and conducting at least some of the analysis, at minimum) and have work to show for it before you go bringing in anyone else. Once you’re there, ping a colleague and see if they have a few minutes to look at your work and give their thoughts — not necessarily right at that moment, but when their schedules permit. If they can’t help, go to somebody else, and take a mental note for the future. It’s natural to be concerned about bothering people, which is why I try to limit this kind of thing to one time per project, and try not to bug the same person multiple times in a row for different projects. Distribute your asks so that you’re not always taking up one person’s time. If you need more than one person’s help, that’s for the next section.
It’s also important to note that your feedback solicitation should be structured. Don’t just say “what do you think?” or dump a long script on somebody, but prepare readable, concise content and ask specific questions. “Here’s a draft slide. Does this EDA look complete to you? Do you see anything I have done that doesn’t make sense?” or “I tried these different hyperparameter combinations, and the model performance just isn’t improving past X. Do you think that’s reasonable or is there something else I should try to get it higher?” As you might intuit, this means having results, perhaps some visualizations or at least some tables, to show.
If you ask somebody and they don’t give very helpful feedback, make a mental note of this too — maybe they were busy, or your questions were unclear, or your work was hard to read. Experiment with what different reviewers and colleagues find easiest to work with, and get better at framing your questions. You should also volunteer proactively to return the favor and give feedback to them in future, and follow through and do a thorough job when the time comes. You’ll get out of this what you put into it, as with most work or life relationships.
Model Reviews
A model review can be the culmination of a project, but it can also be a nice penultimate step before a report or model is completed and productionalized. Let me first explain what I mean by a model review, since this is not necessarily part of everyone’s process.
A model review is a presentation, complete with visuals and documentation, that explains the project (usually model training) you have completed from start to finish. You should explain your data, how it was collected, what it means, how you cleaned it, how you chose your model architecture and trained it, what things you tried that didn’t work, how your final model performs, where it excels and where it struggles, what things you didn’t do but would in future, etc. This should be a pretty comprehensive discussion of everything that you did as part of the work. Your audience should be other data scientists or engineers who are familiar with the subject matter to the degree possible, and if they are not familiar with the subject, you should explain that too.
This may sound like an overwhelming lot of work when you could just skip this step, jot down some notes in a wiki, and toss the model into prod, but I firmly believe this is an important part of successful modeling. For one thing, just preparing this will require you to review what you did and think about it. Knowing that this is coming will incentivize you to document your work and keep a modeling journal, like a lab journal in science classes. You’ll do more organized and effective work when you’re tracking your progress.
In addition, producing a quality model review will give you a chance to get suggestions and feedback from many of your peers at once. Your team should have ground rules around making sure the feedback is constructive, kind, and given with good intentions, but if you have that, then the things you learn from doing a model review will be invaluable. You will also probably teach other people by example, as well! It’s an opportunity for everyone in the team to observe other people’s processes and ideas, instead of this insight being hidden in a DM thread on slack or a 1:1 zoom call.
Go in to a model review expecting that the work is NOT perfect, and that you’ll come away with new things to try, changes to make, or questions to investigate and answer. The point is not to get accolades but to get insight into what you may have missed, so your model will be better at the end of the day.
Other Benefits
Obviously, soliciting feedback about your work from peers has the end goal of getting you advice that will make your work better, on this project and future ones as well. But I think it’s important to note that there are some less obvious advantages you can gain from this. For one thing, you will gain a reputation as someone who wants to learn and improve, if you do this right — take the feedback graciously, apply it, and show that you’re internalizing good notes. Your work will get better, but it will also be visible to your peers and managers.
Second, you’ll get better at communicating your ideas. Even if you don’t do full presentation model reviews (though I think you should), it’s a skill to be able to explain your work 1:1 to a peer and frame clear questions that you want them to answer. As we progress in our careers, being able to communicate clearly about technical topics becomes increasingly important, and practicing this in informal, low-pressure situations will benefit you.
Third, feedback will get easier to receive. I know lots of people, including myself, feel anxiety about getting feedback sometimes. It’s easy to find that intimidating or to hear only the negatives and not the positives when others are evaluating your work. But the more you do it, the easier it gets. Asking repeatedly, in a culture of positivity and constructiveness, will desensitize you to hearing critiques or ways you can improve, and you’ll get better at absorbing and learning from that as a result.
So, as you proceed with your data work, build in intentional opportunities to ask for constructive advice to improve your results. If you don’t have peers who are data savvy, get involved with meetups or slack groups of data scientists where you can ask for advice (without violating any NDAs of course), and build a network of people whose opinions you trust, and offer the same feedback to them. I promise you’ll see benefits to your career and your skills, to say nothing of your models and data analyses, as a result.
Read more of my work at www.stephaniekirmer.com.
Also, you can see me speaking at the AIQ Conference in San Francisco on June 25 about data privacy considerations when building ML solutions.
The Importance of Collaboration in Data was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.