...

6 Technical Skills That Make You a Senior Data Scientist


be honest. Writing code in 2025 is much easier than it was ten, or even five, years ago.

We moved from Fortran to C to Python, each step lowering the effort needed to get something working. Now tools like Cursor and GitHub Copilot can write boilerplate, refactor functions, and improve coding pipelines from a few lines of natural language.

At the same time, more people than ever are getting into AI, data science and machine learning. Product managers, analysts, biologists, economists, you name it, are learning how to code, understand how AI models work, and interpret data efficiently.

All of this to say this:

The real difference between a Senior and a Junior Data Scientist is not the coding level anymore.

Do not get me wrong. The difference is still technical. It still depends on understanding data, statistics and modeling. But it is no longer about being the person who can invert a binary tree on a whiteboard or solve an algorithm in O(n).

Throughout my career, I have worked with some outstanding data scientists across different fields. Over time, I started to notice a pattern in how the senior data professionals approached problems, and it wasn’t about the specific models they adopted or their coding abilities: it is about the structured and organized workflow that they adopt to convert a non-existing product into a robust data-driven solution.

In this article, I will describe this six-stage workflow that Senior Data Scientists use when developing a DS product or feature. Senior Data Scientist:

  1. Map the ecosystem before touching code
  2. Think about DS products like operators
  3. Design the system end-to-end with “pen and paper”
  4. Start simple, then earn the right to add complexity
  5. Interrogate metrics and outputs
  6. Tune the outputs to the audiences and select the right tools for displaying their work

Throughout the article I will expand on each one of these points. My goal is that, by the end of this article, you will be able to apply these six stages on your own so you can think like a Senior Data scientist in your day to day work.

Let’s get started!

Mapping the ecosystem

I get it, data professionals like us fall in love with the “data science core” of a product. We enjoy tuning models, trying different loss functions, playing with the number of layers, or testing new data augmentation tricks. After all, that is also how most of us were trained. At university, the focus is on the technique, not the environment where that technique will live.

However, Senior Data Scientists know that in real products, the model is only one piece of a larger system. Around it there is an entire ecosystem where the product needs to be integrated. If you ignore this context, you can easily build something clever that does not actually matter.

Understanding this ecosystem starts from asking questions like:

  • What exact problem are we improving, and how is it solved today?
  • Who will use this model, and how will it change their daily work?
  • What does “better” look like in practice from a business perspective (fewer tickets, more revenue, less manual review)?

In a few words, before doing any coding or system design, it is crucial to understand what the product is bringing to the table.

Image made by author

Your answer, from this step, will sound like this:

[My data product] aims to improve feature [A] for product [X] in system [Y]. The data science product will improve [Z]. You expect to gain [Q], improve [R], and decrease [T].

Think about DS products like operators

Ok, now that we have a clear understanding of the ecosystem, we can start thinking about the data product.

This is an exercise of switching chairs with the actual user. If we are the user of this product, what does our experience with the product look like?

To answer our question, we need to answer questions like:

  1. What is a good metric of satisfaction (i.e. success/failure) of the product? What is the optimal case, non optimal case, and worst case?
  2. How long is it ok to wait? Is it a couple of minutes, ten seconds, or real time?
  3. What is the budget for this product? How much it is ok to spend on this?
  4. What happens when the system fail? Do we fall back to a rule-based decision, ask the user for more information, or simply show “no result”? What is the safest default?
Image made by author

As you may notice, we are getting in the realm of system design, but we are not quite there yet. This is more of the preliminary phase where we determine all the constraints, limits and functionality of the system.

Design the system end-to-end with “pen and paper”

Ok, now we have:

  1. A full understanding of the ecosystem where our product will sit.
  2. A full grasp of the required DS product’s performance and constraints.

So we have everything we need to start the System Design* phase.

In a nutshell, we are using everything we have discovered earlier to determine:

  1. The input and output
  2. The Machine Learning structure we can use
  3. How the training and test data will be built
  4. The metrics we are going to use to train and evaluate the model.

Tools you can use to brainstorm this part are Figma and Excalidraw. For reference, this image represents a piece of System Design (the model part/part 2 of the above list) using Excalidraw.

System Design made by author using Excalidraw

Now this is where the real skills of a Senior Data Scientist emerge. All the information you have accumulated so far must converge to your system. Do you have a small budget? Probably training a 70B parameter DL structure is not a good idea. Do you need low latency? Batch processing is not an option. Do you need a complex NLP application where context matters and you have a limited dataset? Maybe LLMs can be an option.

Keep in mind that this is still only “pen and paper”: no code is written just yet. However, at this point, we have a clear understanding of what we need to build and how. NOW, and only now, we can start coding.

*System Design is a huge topic per se, and to treat it in less than 10 minutes is basically impossible. If you want to expand on this, a course I highly recommend is this one by ByteByteGo.

Start simple, then earn the right to add complexity

When a Senior Data Scientist works on the modelling, the fanciest, most powerful, and sophisticated Machine Learning models are usually the last ones they try.

The usual workflow follows these steps:

  1. Try to perform the problem manually: what would you do if you (not the machine) were to do the task?
  2. Engineer the features: Based on what you know from the previous point (1), what are the features you would consider? Can you craft some features to perform your task efficiently?
  3. Start simple: try a reasonably simple*, traditional machine learning model, for example, a Random Forest/Logistic Regression for classification or Linear/Polynomial Regression for regression tasks. If it is not accurate enough, build your way up.

When I say “build your way up”, this is what I mean:

Image made by author

In a few words: we only increase the complexity when necessary. Remember: we are not trying to impress anyone with the latest technology, we are trying to build a robust and functional data-driven product.

When I say “reasonably simple” I mean that, for certain complex problems, some very basic Machine Learning algorithms might already be out of the picture. For example, if you have to build a complex NLP application, you probably will never use Logistic Regression and it is safe to start from a more complex architecture from Hugging Face (e.g. BERT).

Interrogate metrics and outputs

One of the key differences between a senior figure and a more junior professional is the way they look at the model output.

Usually, Senior Data Scientitst spend a lot of time manually reviewing the output manually. This is because manual evaluation is one of the first things that Procuct Managers (the people that Senior Data Scientists will share their work with) do when they want to have a grasp of the model performance. For this reason, it is important that the model output looks “convincing” from a manual evaluation standpoint. Moreover, by reviewing hundreds or thousands of cases manually, you might spot the cases where your algorithm fails. This will give you a starting point to improve your model if necessary.

Of course, that is just the beginning. The next important step is to choose the most opportune metrics to do a quantitative evaluation. For example, do we want our model to properly represent all the classes/choices of the dataset? Then, recall is very important. Do we want our model to be extremely on point when it does a classification, even at the cost of sacrificing some data coverage? Then, we are prioritizing precision. Do we want both? AUC/F1 scores are our best bet.

In a few words: the best data scientists know exactly what metrics to use and why. Those metrics will be the ones that will be communicated internally and/or to the clients. Not only that, those metrics will be the benchmark for the next iteration: if someone wants to improve your model (for the same task), it has to improve that metric.

Tune the outputs to the audiences and select the right tools to display their work

Let’s recap where we are:

  1. We have mapped our DS product in the ecosystem and defined our constraints.
  2. We have built our system design and developed the Machine Learning model
  3. We have evaluated it, and it is accurate enough.

Now it is finally time to present our work. This is crucial: the quality of your work is only as high as your ability to communicate it. The first thing we have to understand is:

Who are we showing this to?

If we are showing this to a Staff Data Scientist for model evaluation, or we are showing this to a Software Engineer so they can implement our model in production, or a Product Manager that will need to report the work to higher decisional roles, we will need different kinds of deliveries.

This is the rule of thumb:

  1. A very high level model overview and metrics result will be provided to Product Managers
  2. A more detailed explanation of the model details and the metrics will be shown to Staff Data Scientists
  3. Very hands-on details, through code scripts and notebooks, will be handed to the super-heroes that will make this code into production: the Software Engineers.

Conclusions

In 2025, writing code is not what distinguishes Senior from Junior Data Scientists. Senior data scientists are not “better” because they know the tensorflow documentation on the top of their heads. They are better because they have a specific workflow that they adopt when they build a data-powerted product.

In this article, we explained the standard Senior Data Scientist workflow though a six layer process:

  • A communication layer to tune the delivery to the audience (PM story, DS rigor, engineer-ready artifacts)
  • A way to map the ecosystem before touching code (problem, baseline, users, definition of “better”)
  • A framework to think about DS features like operators (latency, budget, reliability, failure modes, safest default)
  • A lightweight pen-and-paper system design process (inputs/outputs, data sources, training loop, evaluation loop, integration)
  • A modeling workflow that starts simple and adds complexity only when it’s necessary
  • A practical method to interrogate outputs and metrics (manual review first, then the right metric for the product goal)
  • A communication layer to tune the delivery to the audience (PM story, DS rigor, engineer-ready artifacts)

Before you head out

Thank you again for your time. It means a lot ❤️

My name is Piero Paialunga, and I’m this guy here:

Image made by author

I’m originally from Italy, hold a Ph.D. from the University of Cincinnati, and work as a Data Scientist at The Trade Desk in New York City. I write about AI, Machine Learning, and the evolving role of data scientists both here on TDS and on LinkedIn. If you liked the article and want to know more about machine learning and follow my studies, you can:

A. Follow me on Linkedin, where I publish all my stories
B. Follow me on GitHub, where you can see all my code
C. For questions, you can send me an email at [email protected]

Source link

#Technical #Skills #Senior #Data #Scientist