Edwin Onuonga

Last Updated: 03/06/2022

Abstract

My aim is to build impactful large-scale machine learning infrastructure based on novel technologies, and deliver these solutions to automate processes, solve challenging problems and improve our knowledge of the world.

About

I am a software engineer with strong interest and experience in developing machine learning focused products.

In particular, I am intrigued by probabilistic approaches such as the Bayesian framework for tackling modelling problems, as its inherent ability to capture and represent the uncertainties of the world around us makes it a fascinating and powerful choice to address the challenges in applications ranging from geoscience and ecology to robotics and finance.

Experience

Hazy

Machine Learning Engineer
June 2022 - Ongoing
Improving a privacy-preserving synthetic data generation platform for conducting enterprise data analytics while ensuring compliance with modern data regulation.

Nibble Technology

Data Scientist
September 2021 - May 2022
Understanding user negotiation styles and strengthening data use for driving decisions on a conversational AI product enabling e-commerce retailers to deliver personalized discounts to customers via an engaging negotiation agent.
Junior Developer (Intern)
May 2021 - August 2021
Adding new features to support the growth of the start-up, as well as improving the core negotiation algorithm and providing foundations for advancing the use of machine learning within the company.

Education

University of Edinburgh

MSc Statistics with Data Science — Distinction
September 2020 - September 2021

Research projects:

Courses:

  • Bayesian Theory
  • Bayesian Data Analysis
  • Statistical Methodology
  • Applied Statistics
  • Incomplete Data Analysis
  • Stochastic Modelling
  • Fundamentals of Operational Research
  • Credit Scoring
  • Biomedical Data Science
  • Generalized Regression Models
BSc (Hons) Computer Science — 1st class
September 2016 - May 2016

Research projects:

Courses:

  • Algorithms, Data Structures and Learning
  • Introductory Applied Machine Learning
  • Machine Learning Practical
  • Machine Learning and Pattern Recognition
  • Extreme Computing
  • Database Systems
  • Automatic Speech Recognition
  • Processing Formal and Natural Languages
  • Foundations of Natural Language Processing
  • Natural Language Understanding, Generation and Machine Translation
  •  

Skills

Technology

I have extensive practice in using the common Python data stack for data analysis, visualization and modelling. This includes packages such as numpy, scipy, pandas, scikit-learn, statsmodels, jupyter, matplotlib, seaborn, plotly.

I am equally experienced in the use of torch and tensorflow for developing more complex machine learning models. Recently I have been experimenting with probabilistic modelling via general purpose packages including tensorflow-probability and pymc3, as well as specialized packages such as gpflow for Gaussian processes.

Besides Python, I can also use R for analysis and statistical modelling, and SQL for relational database management. Lately I have also been focusing on learning about model deployment via services offered by AWS.

Machine Learning

In terms of model development, I am familiar with common practices such as dataset splitting, k-fold cross validation, regularization, hyper-parameter optimization and evaluation metric selection.

I am also very familiar with traditional machine learning approaches such as:

In addition, I have strong knowledge in the implementation of neural networks, including recurrent neural networks for sequences and convolutional neural networks for images. I also enjoy building custom architectures and trying to keep up with the latest advancements in deep learning. While deep learning is fascinating and has its applications, I also spend a lot of time exploring alternative techniques such as hidden Markov models and Gaussian processes.

Projects

Examples of some of my larger personal projects.
Project Description Technology
Sequentia A machine learning interface for sequence classification algorithms in Python. python, torch, numpy, hmmlearn
GTZAN
Classification
Convolutional neural network based music genre classification via chromagram and MFCC features. python, torch, optuna, w&b
TorchFSDD A utility for wrapping the Free Spoken Digit Dataset into PyTorch-ready data set splits. python, torch
Daze Better multi-class confusion matrix plots for Scikit-Learn, incorporating per-class and overall evaluation measures. python, sklearn, matplotlib, numpy
Arx A Ruby interface for querying academic papers on the arXiv search API. ruby, nokogiri