Super happy with my learnings in Supervised learning

19 Jul, 2024

The past week I've been in a war. Picture this: There I was, armed with months of ML theory and the confidence of a toddler in a cape, ready to conquer the ML world. Then I'm faced with that Dutch real estate dataset. Holy stroopwafel, what a mess!

After months of ML theory, I was ready to get my hands dirty. Little did I know just how dirty they'd get. Remember that 80/20 rule about data cleaning? Ha! Try 99/1. I spent so much time cleaning data, I started dreaming in pandas dataframes.

As I dug into the data, I felt like a kid trying to build a sandcastle with wet sand. Messy addresses, wonky price formats, missing values galore – it was a real data soup. But you know what? Each problem solved felt like finding treasure in that junk drawer.

I scrubbed, I formatted, I engineered features. Some days it felt like herding cats, other days like solving a really fun puzzle. And slowly but surely, that messy pile of data started looking like something I could actually use.

Now, with a clean dataset in hand, I moved on to building my machine learning model. Initially I opted for a regression approach, as I was interested in predicting continuous values (property prices). After experimenting with various algorithms, including linear regression and SGD, I settled on RandomForestRegressor, which provided a 88% accuracy. The model was able to predict property prices with a decent level of accuracy, which was incredibly satisfying after all the hard work. It felt like finally beating The Wither from Minecraft lmao.

Here's what I learned: data's messy, and that's okay. It's in the cleaning that you really get to know your dataset. It's frustrating, sure, but it's also where the real learning happens.

So if you're staring down a messy dataset of your own, take heart. Roll up your sleeves, dive in, and remember – even ML enthusiasts have to start somewhere.