Jupyter notebooks for machine learning.
About
Neural Networks:
This notebook builds a neural network using a convolutional architecture, inspired by the architecture of the mammalian brain. The neural network is trained on a training set of handwritten digits, and is then validated using testing data. This neural network achieves an accuracy of > 95%.
A notebook utilizing a neural network that narrows in the middle to find a reduced-dimensional representation of the original data set. This network is utilized to project the original data into the lower dimensional subspace, and is also used to back-project points from the low-dimensional space into high-dimensional, 8x8 images. This allows us to traverse the "handwriting manifold" constructed by the neural network and explore how it has learned to recognize handwriting.
Dimensionality Reduction:
Utilize principal components analysis to analyzee the dimensionality of the handwritten digits data set and find a reduced-dimensional manifold to make modeling easier.
Build on the PCA work done in the prior notebook, and utilize the reduced manifold to classify handwritten digits. Explore the impact of number of principal components on the classification accuracy.
This notebook compares a random forest model for dimensionality reduction to principal components analysis to find which one performs better.
Models:
Jupyter Notebook: Attempts to fit the abalone data set by modeling the system response as a linear function of the input variables.
Jupyter Notebook: Further attempts to fit the abalone data set, using higher-order models for the system response.
Jupyter Notebook: Builds a simple k-nearest neighbors classifier model to fit observed inputs to outputs.
Jupyter Notebook: Linear classifiers, like linear regression models but for categorical data rather than continuous data, are used to categorize abalones here.
Jupyter Notebook: Utilizes a Gaussian process model to fit observed data using inputs/outputs and krigging.
Exploring the Data:
Initial exploration of the abalone data set.
This notebook explores the reason why linear and higher-order models fail to fit the data well. The reason? The data have high variance!
A notebook exploring the use of the covariance matrix and its eigenvalues and eigenvectors to extract principal components and visualize the results.