Overview
This project was my introduction to machine learning, utilizing the Titanic survival dataset from Kaggle. It's a classic beginner competition in the field of data science and machine learning, offering a structured challenge with historical data.
Approach and Methods
- Initial Attempt: I implemented linear regression using only mathematics and NumPy, recalling and applying the least squares method. This achieved a score of 0.64.
- Improvement: I then leveraged the
sk-learnlibrary in Python, maintaining the same data normalization methods. This significantly improved the score to 0.72.
The contrast between the manual implementation and the library-based approach piqued my curiosity about what optimizations and techniques sk-learn uses under the hood for linear regression.
Technologies Used
- Programming Language: Python
- Libraries: NumPy, scikit-learn
- Dataset: Titanic Survival Dataset from Kaggle