Projects

Research & Machine Learning Projects

SpaceX Flight Landing Prediction

Developed a machine learning pipeline to predict Falcon 9 first-stage landing success, supporting cost-reduction strategies in commercial spaceflight.

Problem: Rocket launches are extremely costly. Predicting the likelihood of successful landings allows SpaceX to plan reusability strategies and reduce mission costs.

Approach: Collected launch data from SpaceX API and web-scraping sources; cleaned and structured datasets; performed exploratory analysis with interactive maps and visualisations; engineered features (payload mass, orbit, flight site, booster type); trained classification models (logistic regression, decision tree, random forest, SVM); and evaluated performance with accuracy, F1, and cross-validation.

Results: Achieved strong predictive performance on test data (≈85–90% accuracy depending on the model). Produced clear dashboards and visuals of launch outcomes by location and year. Demonstrated how data science can inform aerospace engineering decisions and cost-saving strategies.

Tools: Python, scikit-learn, pandas, matplotlib, Seaborn, Folium (interactive maps), Jupyter.

Links: GitHub

Simulation-Based Mathematical Models & Parameter Estimation to Interpret Tissue Growth Experiments

Framework to forecast tissue growth and determine optimal conditions for artificial red blood cell generation.

Problem: Wet-lab experimentation to optimise growth conditions is costly and time-consuming.

Approach: Discretised a continuum model for numerical computation; implemented simulations in Python & Julia; preprocessed large multi-modal clinical datasets; performed parameter identifiability analysis and evaluated statistical fit before generating forward simulations across environments.

Results: Delivered a reproducible simulation framework that forecasts tissue growth under varying conditions, reducing reliance on extensive wet-lab iterations.

Tools: Python, Julia, NumPy/SciPy, Jupyter, Git.

Links: GitHub

PK/PD Modelling to Assess the Effects of Anti-Cancer Agents on Tumour Volume

Dose-response modelling to forecast optimal dosing regimens, including combination therapy effects.

Problem: Need to predict tumour-volume response under different dosing schedules, including combinations, from sensitive preclinical data.

Approach: Collaborated with GSK; adapted oncology PK/PD models from literature to capture combination effects; calibrated parameters to preclinical datasets using MATLAB, Python & Monolix; validated with held-out data.

Results: Produced regimen recommendations from fitted models and scenario analyses to inform experimental planning.

Tools: MATLAB, Python, Monolix, pandas, matplotlib.

Links: Download Dissertation

Deep Learning Experiments & Waste Classification

Collection of experiments exploring CNNs, Vision Transformers and data‑pipeline strategies, culminating in a capstone waste‑classification project.

Problem: Selecting an appropriate deep‑learning architecture and data‑loading strategy is critical for real‑world image‑classification tasks. Understanding trade‑offs between frameworks (PyTorch vs Keras) and models (CNNs vs ViTs) is necessary before tackling applied problems such as waste sorting.

Approach: Implemented convolutional neural networks and vision transformers in both Keras and PyTorch, comparing training performance and efficiency. Built a PyTorch classifier from scratch and explored hybrid CNN‑ViT architectures. Conducted experiments on breast‑cancer classification and compared memory‑based vs generator‑based data pipeline. For the capstone, fine‑tuned pre‑trained models on a waste‑product image dataset, applying cross‑validation and hyper‑parameter tuning.

Results: Demonstrated how CNNs and ViTs perform across tasks, highlighting efficiency and accuracy trade‑offs. Achieved strong classification accuracy on benchmark datasets and improved waste‑classification performance via transfer learning. Provided insights into data‑pipeline choices and model selection for future projects.

Tools: Python, PyTorch, TensorFlow, Keras, NumPy, pandas, matplotlib, scikit‑learn.

Links: GitHub

Mathematics for Machine Learning Exercises

Comprehensive set of notebooks and exercises covering the mathematical foundations of machine learning.

Problem: A solid grasp of linear algebra, optimisation and probability is essential for designing and understanding machine‑learning algorithms. Many learners struggle to bridge theoretical concepts and practical implementation.

Approach: Worked through exercises inspired by Imperial College London’s Mathematics for Machine Learning series. Topics include vector spaces, orthogonality, projections, inner products and back‑propagation mathematics. Implemented algorithms such as K‑Nearest Neighbors and PageRank; practised neural‑network fundamentals via gradient‑based optimisation and fitting helical distributions. Additional exercises explore transformation matrices and reflections.

Results: Produced a set of well‑annotated notebooks that translate mathematical theory into code, serving both as a learning resource and as a reference for future projects. Completing these exercises deepened understanding of the maths underlying machine learning.

Tools: Python, Jupyter, NumPy, SciPy, matplotlib.

Links: GitHub

Little Lemon Restaurant API

Developed a RESTful back‑end for a fictional restaurant, enabling digital reservations and menu management.

Problem: Restaurants require reliable APIs to manage reservations, menu items and customer data. Building such an API involves creating secure endpoints, handling CRUD operations and supporting authentication.

Approach: Built a RESTful API using Django and the Django REST Framework. Implemented serializers, token‑based authentication and CRUD functionality for reservations, menu items and customer data. Validated the API with Postman and deployed it locally as well as on GitHub.

Results: Delivered a functional back‑end that allows clients to create, read, update and delete restaurant data while ensuring secure access control. Successful Postman tests confirmed endpoint reliability, making the API suitable for integration with front‑end or mobile applications.

Tools: Python, Django, Django REST Framework, Postman, Git.

Links: GitHub

Stock Analysis of Apple Inc. (AAPL)

Performed exploratory data analysis and regression modeling on Apple’s stock to uncover trends and build predictive models.

Problem: Investors and analysts often need to dissect historical stock data to understand trends, assess normality assumptions and develop predictive models. Such analyses must handle outliers, non‑normal distributions and provide interpretable metrics.

Approach: Conducted in‑depth exploratory data analysis on Apple (AAPL) price data, including summary statistics, missing‑value analysis, kurtosis assessment and correlation heatmaps. Applied normality tests (Shapiro‑Wilk, D’Agostino’s K², Anderson–Darling) and visualised the data through histograms, scatter plots, line plots, boxplots and candlestick charts. Detected and removed outliers, then fit a linear regression model using ordinary least squares and evaluated it with metrics such as MSE, MAE, MAPE and R².

Results: Generated comprehensive visual and statistical insights into AAPL’s historical behaviour, highlighting distributional properties and relationships between variables. The regression model provided a baseline predictive framework; metrics indicated reasonable fit while emphasising the limitations of simple linear models. The notebook serves as an educational reference rather than a trading recommendation.

Tools: Python (pandas, NumPy), matplotlib, seaborn, plotly, statsmodels, SciPy, scikit‑learn.

Links: GitHub

AI-Based EEG Decoding for Assistive Communication in Locked-In Syndrome

End-to-end signal-to-decision pipeline for decoding noisy EEG to support communication.

Problem: EEG signals are noisy and non-stationary, making reliable intent decoding difficult.

Approach: Built a MATLAB pipeline with denoising and feature extraction; trained classification models with train/validation/test splits; evaluated with accuracy, sensitivity and specificity.

Results: Achieved robust decoding performance with clear metric reporting, supporting feasibility for assistive interfaces.

Tools: MATLAB (Signal Processing), Statistics & Machine Learning Toolbox.

Links: GitHub

Cardiac Image Segmentation

Statistical learning pipelines in R to segment cardiac images with high accuracy.

Problem: Need accurate, interpretable segmentation for downstream cardiac analysis.

Approach: Prototyped multiple learners in RStudio (logistic regression, random forest, clustering); compared pipelines via cross-validation; reported per-class metrics.

Results: Selected a high-performing segmentation approach suitable for reproducible clinical analysis workflows.

Tools: R, RStudio, tidyverse, caret.

Links: GitHub