in machine learning are the same.
Coding, waiting for results, interpreting them, returning back to coding. Plus, some intermediate presentations of one’s progress. But, things mostly being the same does not mean that there’s nothing to learn. Quite on the contrary! Two to three years ago, I started a daily habit of writing down lessons that I learned from my ML work. In looking back through some of the lessons from this month, I found three practical lessons that stand out:
- For new projects, choose libraries wisely
- Use a clipboard manager to save your clips
- Read broadly to read deeply
Choosing Between Libraries and Self-Made Code
Machine learning projects often begin with the same question: should you build everything yourself, or rely on existing libraries?
At the most fundamental level, this can mean deciding between frameworks like PyTorch or TensorFlow. Back when Towards Data Science was still hosted on Medium, I was a strong advocate for TensorFlow. Today, I lean much more towards PyTorch. But this section is not about such framework-level decisions.
Instead, I want to focus on project-level choices.
Imagine you are tasked with setting up a new ML project. The requirements are specific: sparsely labeled data, image inputs, and some architectural constraints. What should you do?
A good starting point is to search GitHub for projects that might already meet most of your needs. If you find one that matches 100%, great — use it. If you don’t find anything close, that’s also great — because the decision is now clear: you’ll need to build it yourself.
The more challenging case is when you do find something, but it doesn’t quite fit. Do you patch the existing codebase until it works? Or would it be faster to implement your own solution from scratch?
There’s no single right answer, but I’ve found a few rules of thumb useful:
If you need fine-grained control over every aspect of the ML pipeline → build it yourself.
If you just need a standard training pipeline → use a library.
If you want to modify an existing method → start with the library that already has it.
If you’re introducing your own method → do it yourself.
Another factor worth considering is longevity. Code that you write yourself is code you fully control — no sudden breaking changes, no obscure bugs hidden in a third-party repo. On the other hand, libraries can provide you with years of accumulated testing and optimization, things you’d struggle to reproduce alone. The art is to balance speed of progress now against maintainability later.
Sometimes, I’ve even found that starting with a library for rapid prototyping, and then reimplementing the crucial parts myself once I knew what works, stroke the best balance. That way, I get quick feedback early but still retain full ownership over the parts that matter most. In my experience, the best libraries, at least for research-heavy projects, are those that feel like research code.
Two contrasting examples would be the Avalanche and Mammoth libraries. Avalanche is much more full fledged, and everything is nicely abstracted. On the other hand, Mammoth is more like an expanded research project, where you still directly can control the methodological parts. Libraries like that latter can give you the best of both worlds.
The above guidelines won’t solve the dilemma of self-vs-library every time, but over they allowed me to more systematically approach it. Over the years, and this September again, they saved me days of indecision.
The Benefit of Clipboard Managers
Suppose you’re controlling an ML project from the command line. You start a run like this: python3 run.py --param1 --param2
Then another one with different parameters. And another. Soon you’re juggling several runs, and you want to compare the results.
The naive way is to copy each output manually into a central place: copy, paste; copy, paste; copy, paste. Until at some point, you overwrite the wrong result and have to start again.
This exact situation occurred to me at the beginning of this month. When I was setting up a new project (after deciding on doing it myself versus using a library; see above), I also did some code testing. I wanted to see if everything runs without any errors. So, I evaluated several parameter settings, often changing one to two arguments from run to run. As my project was a ML project – and thus involved training ML models-, it took a while for a script to run through, which meant I had to wait until I could test the next parameter. Spinning up separate runs was not an option due to cluster occupation.
Between testing two parameters settings, I thus focused on the project setup and fixing bugs. Then, once I saw that a parameter had been tested successfully, I scheduled the next parameter test, and resumed the project setup.
As you can imagine, this strategy only works unto a certain point. After repeating this back and forth for a while, I lost track of the parameter combinations that I already tested. Because it was only a setup phase, I had not yet implemented real testing and results collection; this I usually do later. Luckily, my habit of copying commands, pasting them, and then modifying arguments saved me from having to run test twice. This, in combination with using a clipboard manager.
Instead of only storing the most recent item, these tools keep a history of everything you’ve copied. At any time, you can browse back and select the clip you need*.
The real strength of clipboard managers is how they reduce cognitive overhead. Instead of constantly worrying “did I just overwrite my last copy?” or “where did I save that snippet?”, you free up mental bandwidth for the actual task at hand. It’s one of those small tools** that doesn’t look like much but compounds over time.
And importantly, this isn’t only about experiments. The same holds when you’re preparing a talk, drafting a paper, or gathering figures from multiple sources. Once you’ve used a clipboard manager long enough, you’ll wonder how you ever worked without one.
I can attest that from my own experience. On my Mac machines, I have been using the Launchbar clipboard manager (though it’s much more than this!) for years; and on Windows I installed the free Ditto utility. They’ve often helped me when I clipped something, then deleted the original content (from which I wanted to clip something). At all times, the last clips were still available with a single command – readily providing me the needed information.
Depth and Breadth in Reading
The same project also reminded me of something about reading papers. Setting it up required combining recent methodological advances with tabular data. As always, there was a flood of potentially relevant work. The question was: what should I read, and what can I skip?
This time, the decision was easier than expected. Over the past few months, I had been reading papers regularly — not intensively, but steadily, on and off. That gave me a solid mental map of my research field. More importantly, I had also read adjacent work, i.e. papers that are not fully from my field, but that tackle very similar challenges.
Reading widely now helped me identify connections across fields and recognize which methods were truly relevant. Instead of feeling overwhelmed, I could quickly decide which papers were worth my attention, and which ones I could safely ignore.
But, the benefit goes beyond efficiency (and knowledge, the primary goal of reading). Looking outside your main field often gets you ideas you wouldn’t have encountered otherwise. For me, insights from adjacent areas sometimes end up shaping the core of my own projects. In other words, breadth isn’t just preparation for depth — it’s also a source of creativity.
Over time, the practice of reading from close fields builds resilience. Research fields shift quickly, and methods that are en vogue today may be forgotten tomorrow. But if you’ve cultivated breadth, you can adapt more easily: you already know the neighboring fields, and you can move with the field rather than being swept away by it.
* not recommended, but often the quickest way at the early stages of a project. For later stages, i recommend centrally logging the results.
** for Windows: Ditto; for Mac: Launchbar.
Source link
#Machine #Learning #Lessons #Ive #Learned #Month