Data scientists getting things done
You're presenting the next iteration of your machine learning model to your team:
"I improved the model lift by 20 percentage points!"
"Oh, that's really great. But didn't the product owner tell you? They're scrapping that project due to budget constraints."
You present a model to the product owner:
"Look I think if we use multi-armed bandits we can really drive customer engagement!"
"Hmm, I'm not sure... How will that interact with campaign X? I think we should wait until we're sure it will work."
You present an ETL pipeline for creating features to your boss:
"I've completed 90% of the features. All that's left is feature set Y but I don't have access to the source system."
"What? But feature set Y is the whole reason we launched this project! And it normally takes 4 weeks to get access."
If any of this sounds or feels familiar, it may be because there are barriers to you getting things done like the following:
Time/budget constraints.
Uncertainty on project outcomes.
Access issues.
Changing requirements.
Lack of alignment.
The approaches I highlight can help you to iterate on feedback more rapidly and stay aligned with your organisation's goals. When you're working in a field where things can change quickly and there are always new techniques, it can be tempting to try and find the perfect solution to a problem using the latest and greatest approach. The danger is that the data, the tech, or the organisation’s requirements can all change rapidly making weeks to months of work worthless. If you’re not careful, your impact at your organisation will be limited. It’s important to adapt and increase the tempo at which you can deliver value
What is "done"?
When someone says "It's almost done, I just need to X" it often means that X will end up taking 90% of the time.
On the other hand you also hear "It's done", "Okay, but when will it be deployed to production and 'done-done'?"
Agile has the concept of the definition of done; an agreed upon set of conditions that have to be met for a piece of work to be considered to be completed. Communicating what the expected outcome is to stakeholders is critical. With Agile becoming so popular and people moving away from the traditional project management approaches, I've noticed a trend where some people think that you don't need to plan up front and just be 'agile'. It doesn't preclude the requirement for upfront work, planning, communication on what you're trying to achieve, and actually knowing when you’ve achieved it.
The more senior you become, the more you have to start creating your own definitions of done. Your input will influence the quality of the work of people around you. Practice this as much as possible, even with small tasks. I’ve seen data scientists present amazing work to their peers that never end up being integrated with the product. I’ve also seen months spent on model refinement and the product getting canned before the model got deployed!
Why get it "done"?
Some data scientists might balk at the terminology, saying that the goal of data science is to explore and experiment, not churn out code. The reality is that if you never get things into production and test your hypotheses against reality, your experiments won't get you anywhere. A lot of machine learning hype has caused unreasonable expectations on data scientists and half-done implementations can damage your credibility. You can build trust with stakeholders by completing small pieces of work that deliver incremental value.
For example, my team once scheduled the deployment of an automated churn model. However, it was a lower priority until a manual test campaign was run. The business realised that their mechanism for retaining potential churners was ineffective. If we had spent more time on the model or even deployed it, that effort would have been wasted. Instead, the team delivered a good enough version that provided immediate value.
Know if it is your job to get it "done"
Sometimes you get it launched to production, it took 5 months, but there was a lot to do and you did it! Your boss is still not happy with you. Why? You built an entire web application for the stakeholders to upload their experimental ideas in real time, wasn't that cool? But was it your job to get that done? Could the project have been shortened by 3 months if you handed that portion off to another team? Could your time have been better spent on creating a better model or actually starting on another use-case? We all do things that are adjacent to our roles from time to time and as analytically minded person you might be able to spot the solution to problems surrounding your actual task and be able to solve them. That doesn't mean it's the best use of your time. The waste of effort compounds when the thing you are getting done doesn’t have the intended impact.
As a counter example, I once spent the better part of 2 months figuring out API integration with Adobe Analytics and Google Ad Manager, something that was definitely not in my job scope. The result, however, was a level of integration that enabled 5 new real-time use-cases. The dividends of that investment paid off. If I didn’t do it, it just wouldn’t have happened.
The costs of getting things "done"
Relationships
Putting too much pressure on others to get your own tasks done can backfire, especially if you don’t have the social capital. You've soured the relationship with the machine learning engineer because you kept putting in emergency fixes. They no longer want to answer questions, do any favours, and refuse to do anything not in the current sprint. Take into consideration that you’re all working towards a common goal.
Your own energy
You've exhausted yourself by working late nights and end up causing problems down the line due to things that you missed. You've expended all that brain power on version 1 that ends up being reworked. In a month after seeing the data come through and a completely new approach is required. Having a reputation for getting things done is great but don't let it burn you out; you have a 40 year career and need to pace yourself.
Done-mindset creating wheel spinning
Be careful of getting the opposite result when looking at getting things done on a team level. An example I’ve seen involved an adjacent team where everything was labeled urgent and an emergency. This became the new way of work and prioritisation ended up being thrown out the window. The initial set of emergencies triggered a new way of work that meant the team actually started getting less done as their work-in-progress was through the roof. Every piece of work was at risk of being interrupted.
How to get it "done"
Break down the work
If you've ever done something like your task before, you should be able to have a sense of what types of things you need to do; you need to analyse the data, define the target variable, train a model, speak to the machine learning engineers to get capacity etc. It helps to not just jump into it straight away and rather think through a rough idea of what steps you need to complete. Review how long a task took previously and use that as a baseline. Mistrust any gut feelings of "but it should be simpler this time". Take any gut feeling estimates and double them. Think about the data science cycle with all the critical components being communicating with your key stakeholders.
What to tackle first
Intimidated by the project and not even sure what needs to be broken down? Prioritisation can help but a lot of machine learning requirements are vague and ill defined. I’ve encountered many projects where I had to throw out the initial plan once analysis revealed roadblocks or ideas sparked from speaking to stakeholders lead to new approaches.
You’ll encounter many problems you’ve never faced before and can end up spinning your wheels. These approaches can help you get unstuck.
The hardest part
This approach is also known as eating the frog. Basically doing the hardest part first to avoid procrastination, making everything else smooth sailing from there.
The most important part
What part of the project would cause the project to not only be a failure if it wasn't completed but would mean it was indistinguishable from the project never having started?
The easiest part
You might get stuck, wearing yourself down against either the hardest or the most important part and not making any progress. Switch it up and do something easy to gain momentum. Try to do an easier version of the hard problem that you are trying to solve; instead of building a recommender system on a product level, build a model that predicts if someone will buy anything at all.
Do the wrong thing
Sometimes you just need a wrong answer to figure out what the right thing is to do. Asking an LLM to generate a hypothetical answer could help. Many times I've been fuelled by "ChatGPT, that is just wrong on so many levels, this is actually what you should have said...".
Work backwards
Do the last thing first. Need and API to output your result? Define that first and build a prototype. Need to write to a new database technology? Write a sample set of data to a dev table first.
The riskiest part
My favourite approach. With all the uncertainty in data science, this will allow you to spare yourself some trouble down the road. Look at the most murky part of the project, and tackle that first. Don't understand LLMs? Go play with that first. New dataset never used before? Go explore it. Never integrated with an API? Go and build a dummy integration or read up on the documentation. You might find that your problem cannot be solved or you need help from others. Those dependencies are best discovered early.
Talk to people
This is the part some tech-minded people sometimes struggle with (myself included). You have some hypotheses or assumptions you need to vet but instead of asking a person, you interrogate the data. Especially if you feel a little bit of imposter syndrome when working in a new business area or using a framework your unfamiliar with. Delaying this will mean you’ve missed the opportunity to foster relationships early.
Unblock yourself and strengthen your relationships
First realise when you are stuck; don't wait for people to come to you and ask whether you're blocked. "If you want to go fast, go alone, if you want to go far, go together." It's important to ask questions when you're stuck and to share your thought processes. If you don't know who to ask, start with the most promising person and ask them to refer you if necessary. By engaging with others, you can gain different perspectives and uncover information that might not be evident from data alone.
Don't start IMs with only a "hi". It’s annoying to busy people and means you won’t get your answer quickly. Conversely, if people ask you for help be willing to assist, it's part of working together and building social capital. In a startup you may be able to expect responses faster or in person, in a bigger organisation you have to politely annoy people into helping you. Since the signal to noise ratio is so low, asking for something once may not be a strong enough signal that something is important, or you might be using the wrong channel (email vs IM).
Many times I’ve relied on my network in the organisation to bounce ideas off of, to ask about similar approaches, to figure out who I need to talk to. These relationships are a key part in not staying isolated and ineffectual.
Conclusion
With machine learning projects it can end up that 9 approaches out of 10 don't work. The above tips can help you iterate faster and getting yourself unstuck. Then that 1 out of 10 approach that works will end up seeing the light of day and deliver value. You can only learn so much from models that don't get deployed. You've only got so many hours in the week. You've only got so much brain power to give, so make sure you use it on getting things done instead of just being busy.