human / unsupervised

/ Histopathological Cancer Detection

Being able to automate the detection of metastasised cancer in pathological scans with machine learning and deep neural networks is an area of medical imaging and diagnostics with promising potential for clinical usefulness.

Here we explore a particular dataset prepared for this type of analysis and diagnostics. The PCam dataset is a binary classification image dataset containing approximately 300,000 labeled low-resolution images of lymph node sections extracted from digital histopathological scans. Each image is labelled by trained pathologists for the presence of metastasised cancer.

Using a convolutional neural network, transfer learning, and other hyperparameter optimisations, we show how we can predict the occurrence of cancer in this dataset with an accuracy of 98.6%.


/ Diabetic Retinopathy. Detecting Blindness.

Diabetic Retinopathy is a diabetic realted disease that affects the retina of the eye. Millions around the world suffer from this disease.

Currently, diagnosis happens through the use of a technique called fundus photography, which involves photographing the rear of the eye. Medical screening for diabetic retinopathy occurs around the world, but is more difficult for people living in rural areas.

Using machine learning and computer vision, we attempt to automate the process of diagnosis, which currently is manually being performed doctors. Using an ensemble of B3 and B5 Efficientnets, we achieve a Quadratic Weighted Kappa score of 0.905775. In comparison, the winning solution on Kaggle achieved 0.93612.


Neural Networks. Hypothesis and Definition.

Neural networks have been around for decades. But it hasn't been until recently, with the rise of big data and the availability of ever increasing computation power, that we have really started to see a lot of exciting progress in this branch of machine learning.

The most ground breaking advances in the field of machine learning over the past decade, from computer vision to NLP, can be attributed to the rise of neural networks, and in particular deep learning.

The following post is a theoretical introduction to neural nets. We start by learning how we represent neural networks in terms of math and code. We cover the structure of a basic neural net, how a hypothesis function looks like in neural net, and also start to represent some of the theory and bring that into some Matlab code.


Logistic Regression. Overfitting. Regularisation.

Logistic Regression is one of the most well known regression algorithms in the world and is used extensively in classification problems (ie labelling inputs as belonging to a particular class.) Similar principles to Linear regression apply here and we go through how we implement cost functions and gradient descent for logistic regression problems. We also explore some new concepts. Including optimisation algorithms and some practical Matlab code implementing gradient descent, how to recognise overfitting and underfitting, and regularisation.


Linear Regression (Multivariate). Cost Function. Hypothesis. Gradient

In lesson 1, we were introduced to the basics of linear regression in a univariate context.

Now in lesson 2, we start to introduce models that have a number of different input features (multivariate).

We also cover the Normal equation, mean normalisation, and feature scaling.


Regression (Univariate). Cost Function. Hypothesis. Gradient Descent

This is the first post in a series, covering notes and key topics in Andrew Ng's seminal course on Machine Learning from Standford University, the web's most highly rated machine learning course, and content direct from one of the field's most influential contributors.

The series is a compilation of notes from my time through the course, and is in essence aimed to be a useful machine learning handbook that students can refer to or that practitioners can use as a reference for foundational review.