Circe

Jupyter notebooks for machine learning.

Handwritten Digits: Classification

Neural Networks:

Classification with Convolutional Neural Network

This notebook builds a neural network using a convolutional architecture, inspired by the architecture of the mammalian brain. The neural network is trained on a training set of handwritten digits, and is then validated using testing data. This neural network achieves an accuracy of > 95%.

Topology-Based Dimensional Reduction with Neural Networks

A notebook utilizing a neural network that narrows in the middle to find a reduced-dimensional representation of the original data set. This network is utilized to project the original data into the lower dimensional subspace, and is also used to back-project points from the low-dimensional space into high-dimensional, 8x8 images. This allows us to traverse the "handwriting manifold" constructed by the neural network and explore how it has learned to recognize handwriting.

Dimensionality Reduction:

Principal Components Analysis

Utilize principal components analysis to analyzee the dimensionality of the handwritten digits data set and find a reduced-dimensional manifold to make modeling easier.

Principal Components Analysis and State Vector Classifier

Build on the PCA work done in the prior notebook, and utilize the reduced manifold to classify handwritten digits. Explore the impact of number of principal components on the classification accuracy.

Random Forests versus Principal Components Analysis

This notebook compares a random forest model for dimensionality reduction to principal components analysis to find which one performs better.

Abalone Data Set

Models:

Linear Models

Jupyter Notebook: Attempts to fit the abalone data set by modeling the system response as a linear function of the input variables.

Higher Order Models

Jupyter Notebook: Further attempts to fit the abalone data set, using higher-order models for the system response.

K-Nearest Neighbors

Jupyter Notebook: Builds a simple k-nearest neighbors classifier model to fit observed inputs to outputs.

Linear Classifiers

Jupyter Notebook: Linear classifiers, like linear regression models but for categorical data rather than continuous data, are used to categorize abalones here.

Gaussian Process Model

Jupyter Notebook: Utilizes a Gaussian process model to fit observed data using inputs/outputs and krigging.

Exploring the Data:

Exploring the Data

Initial exploration of the abalone data set.

Variable Interactions

This notebook explores the reason why linear and higher-order models fail to fit the data well. The reason? The data have high variance!

Principal Component Analysis

A notebook exploring the use of the covariance matrix and its eigenvalues and eigenvectors to extract principal components and visualize the results.