data and machine learning

artificial intelligence, deep learning, computer vision.

Diabetic Retinopathy is a diabetic realted disease that affects the retina of the eye. Millions around the world suffer from this disease.

Currently, diagnosis happens through the use of a technique called fundus photography, which involves photographing the rear of the eye. Medical screening for diabetic retinopathy occurs around the world, but is more difficult for people living in rural areas.

Using machine learning and computer vision, we attempt to automate the process of diagnosis, which currently is manually being performed doctors. Using an ensemble of B3 and B5 Efficientnets, we achieve a Quadratic Weighted Kappa score of 0.905775. In comparison, the winning solution on Kaggle achieved 0.93612.

Being able to automate the detection of metastasised cancer in pathological scans with machine learning and deep neural networks is an area of medical imaging and diagnostics with promising potential for clinical usefulness.

Here we explore a particular dataset prepared for this type of analysis and diagnostics. The PCam dataset is a binary classification image dataset containing approximately 300,000 labeled low-resolution images of lymph node sections extracted from digital histopathological scans. Each image is labelled by trained pathologists for the presence of metastasised cancer.

Using a convolutional neural network, transfer learning, and other hyperparameter optimisations, we show how we can predict the occurrence of cancer in this dataset with an accuracy of 98.6%.

The most ground breaking advances in the field of machine learning over the past decade, from computer vision to NLP, can be attributed to the rise of neural networks and deep learning.

The following is a theoretical introduction to neural nets. We start by learning how we represent neural networks in terms of math and code. We cover the structure of a basic neural net, how a hypothesis function looks like in neural net, and also start to represent some of the theory and bring that into some Matlab code.

Logistic Regression is used extensively in classification problems. Similar principles to Linear regression apply here and we go through how we implement cost functions and gradient descent for logistic regression problems. We also explore some new concepts. Including optimisation algorithms and some practical Matlab code implementing gradient descent, how to recognise overfitting and underfitting, and regularisation.

Following on from the introduction of the univariate cost function and gradient descent in the previous post, we start to introduce multi-variate linear regression in this post and how this affects the hypothesis, cost function and gradient descent.

We start to cover important topics including vectorisation, multi-variate gradient descent, learning rate alpha for gradient descent tuning, feature scaling and normalisation.

This is the first post in a series, covering notes and key topics in Andrew Ng's seminal course on Machine Learning from Standford University.

These notes cover the mathematical basics of machine learning, including definitions of classification and regression, an introduction to the cost-function, and of course gradient descent.