all the time:
“What projects should I do to get a job in data science or machine learning?”
This question is flawed from the beginning.
A great project is personal to you, which means any project I suggest will automatically be a “bad” choice.
In this article, I aim to break down the types of projects that actually help you get hired and the framework you can follow to find them.
4–5 simple projects
Start by building 4–5 smaller projects to give your portfolio some initial weight.
The primary goal here is mainly for “optics” and to ensure that your resume/CV, GitHub, and LinkedIn profiles appear active and well-populated.
Please take a few weeks to build these smaller projects, ensuring they are of sufficient quality and not something you hastily generated with ChatGPT.
Aim to build a wide range of projects, each using different tools, datasets, and machine learning algorithms.
Algorithms and ML models
I recommend you have projects with the following algorithms:
- Gradient Boosted Trees — The gold standard algorithm for tabular data, so it’s something you will definitely use on the job.
- Neural Networks — Good understanding of deep learning frameworks like TensorFlow or PyTorch is valuable, especially if you want to work in computer vision, NLP or AI.
- Clustering Algorithms — Models like K-Means and DBSCAN demonstrate your grasp of unsupervised learning, which is needed for some roles.
Getting exciting and novel data
It’s much better to obtain a messier and more realistic dataset that reflects the data you will encounter in the real world. This will impress employers and interviewers even more, directly demonstrating your abilities as a data scientist.
When selecting datasets for your projects, avoid using overused datasets such as MNIST, Titanic, or Iris. If I saw these, it would be an instant rejection, or at the very least, put me off a lot.
Some good places to get data:
- Use public and free APIs — you can check out the free-apis site for some ideas.
- Web scrape data from relevant sites (make sure you are allowed to do this first!) — Here is a list of websites that allow web scraping.
- Public government data sources — data.gov is an example you can use.
- Gather your own data through surveys and questionnaires.
To decide what your projects should be on, it’s best to start by answering specific questions you think will be interesting to discover from the data.
I recommend showcasing your results using tools like Streamlit or deploying a simple model via GitHub Actions.
However, don’t stress about building a fully end-to-end production system using something like AWS or its services, such as EC2 or ECS. At this stage, it’s completely fine if you don’t know how to do that, and it’s not the goal of these small projects.
One big project
This is where you really need to focus and take your time.
After you’ve built your smaller projects, it’s time to make one big project. This one might take a couple of months if you’re working on it for an hour or two each day.
This may intimidate you, but you need to put in the effort if you want a project that stands out from the rest.
The question is, what should you build?
As I mentioned earlier, I can’t choose this project for you, but I can provide a framework to follow, allowing you to find a great project yourself.
Example project
Let me give you an example of a great project.
At my previous company, we were hiring for a junior data scientist to work on optimisation and operations research problems.
The candidate we hired stood out for one main reason: they had a highly relevant and deeply personal project that closely matched the role.
They were passionate about NFL fantasy football and wanted to improve how they built their weekly lineups (this is similar to the Fantasy Premier League in the UK).
So, they developed their own optimisation engine to allocate players more effectively within the constraints of the program.
It wasn’t just the engine itself; they read academic papers on optimisation strategies and studied how others were approaching the same problem.
Do you see why this was such a powerful project?
- It was a personal problem that they were interested in.
- It was unique, and we hadn’t seen anything like it before or since.
- It showed their passion and interest in optimisation and operations research.
- It was directly relevant to the job for which they were applying.
My framework
Here’s a simple framework for you to follow to come up with great project ideas:
- List at least five things you’re interested in outside of work and the data science or machine learning field.
- For each thing, come up with questions you would like answers to or other people may find interesting.
- Think about how machine learning could help answer those questions. Don’t worry if the question seems impossible; be as creative as possible.
- Pick one question that excites you the most. Ideally, choose something that feels just slightly out of your reach ; that way, you will really learn and push yourself out of your comfort zone.
Building complexity and scale
To make this project stand out, we need to add some complexity and scale to it. This means different things, and there are various ways to incorporate this.
If you’re aiming for a role as a machine learning engineer, it’s especially valuable to build and deploy the project end-to-end.
Your project should ideally include the following:
- Data collection and storage.
- Data preprocessing.
- Model training and evaluation.
- Model deployment (via API, web app, etc).
- Analysis and presentation of your results.
To do this, you will need to learn some of the following:
It may seem like a lot, but you don’t need to do everything on this list.
The main thing is to start and learn these things along the way; don’t try to learn everything at once; that’s procrastination.
Document and communicate
The final and arguably most essential part is to document your learning.
Technical skills alone won’t land you the job.
Communication is one of the most essential skills to have as a machine learning engineer or data scientist, especially when you move up the ranks.
Show your project by:
- Adding your projects to GitHub and having a well-documented README.
- Including instructions for setup and usage to enable users to explore and interact with your project.
- Write a blog post explaining your projects and how you did it.
- Share it on LinkedIn, Twitter, Reddit, Discord, YouTube, or wherever people who may be interested in trying it are.
The more you share your work, the more visible you become to potential employers and collaborators.
It’s actually not that hard to create a solid portfolio of projects; it just requires consistent work and patience, which most people are unwilling to do.
There is no “quick” project that gets you hired; what will get you hired is taking the time to build something personal, of good quality, and novel.
That’s the secret.
Another thing!
I offer 1:1 coaching calls where we can chat about whatever you need — whether it’s projects, career advice, or just figuring out your next step. I’m here to help you move forward!
1:1 Mentoring Call with Egor Howell
Career guidance, job advice, project help, resume review topmate.io
Connect with me
Source link
#STOP #Building #Useless #Projects #Works