(If you haven’t read Part 1 yet, check it out here.)
Missing data in time-series analysis is a recurring problem.
As we explored in Part 1, simple imputation techniques or even regression-based models-linear regression, decision trees can get us a long way.
But what if we need to handle more subtle patterns and capture the fine-grained fluctuation in the complex time-series data?
In this article we will explore K-Nearest Neighbors. The strengths of this model include few assumptions with regards to nonlinear relationships in your data; hence, it becomes a versatile and robust solution for missing data imputation.
We will be using the same mock energy production dataset that you’ve already seen in Part 1, with 10% values missing, introduced randomly.
We will impute missing data in using a dataset that you can easily generate yourself, allowing you to follow along and apply the techniques in real-time as you explore the process step by step!
Source link
#Missing #Data #TimeSeries #Machine #Learning #Techniques #Part #Sara #Nóbrega #Jan