I'm Yahya

Data Scientist • Web Developer

Experience Experience Experience Experience Experience Experience Experience Experience

I design and develop high performance AI and web solutions with a strong focus on user experience and robustness. I have experience with these frameworks, languages, and tools.

Portfolio

TeeSize

TeeSize is a complete end-to-end project that uses a deep learning model trained on a subset of the DeepFashion2 dataset to detect landmarks on T-shirt images.

It uses these landmarks to perform perspective correction and then accurately measure the T-shirt dimensions.
It comes with a beautiful and intuitive GUI frontend for ease of use.

Learn More

This Website

This is my personal website that I wrote from scratch to showcase my work.

This website was written with correct semantics and structure, and follows the best practices for web design.
A lot of styling was done for following the Neobrutalism design system.
This website was written with accessibility, responsiveness, and security in mind.

Learn More

Articles

Jul 2024

Predicting the survivors of Titanic

The Titanic competition is based on the infamous shipwreck of Titanic in April 15, 1912. The goal is to create a model that predicts which passengers survived the Titanic shipwreck.

I did a detailed analysis of the input features in order to understand what impact does each feature has on the target.
After selecting suitable features, I developed strategies to fill in missing values and created suitable encoding schemes for non-numerical features.
I developed a complete pipeline to automatically perform the necessary data preprocessing.
Then, I tested a lot of commonly used classification models. From which, I found out that Support Vector Machine Classifier and Random Forest Classifer are the most promising.
Finally, I ran grid searches on these two classifiers to find the best set of parameters. The best model achieved roughly 82% accuracy on cross-validation, and I used this model to predict which passengers survived in the test set.

Notebook

Jul 2024

Estimating bank churn

The goal of the bank churn competition is to predict whether a customer continues with their account or closes it (churns).

I did a detailed analysis of the input features in order to understand what impact does each feature has on the target.
The dataset had no missing values, however, there were duplicate rows and some non-numeric columns. I dropped the duplicate rows and one-hot encoded the non-numeric columns.
To make the data preprocessing flexible and simple to use for new data, I created a complete pipeline to automatically perform the necessary steps.
Based on the nature of the dataset, a decision tree ensemble model was the most suitable. For it, I used the LightGBM framework.
The best model was evaluted based on the ROC AUC criteria. It achieved a public score of roughly 0.75 on Kaggle, and I used this model to predict whether the customer will churn or not.