Using Machine Learning for Exploratory Data Analysis

Abstract

This tutorial will introduce attendees to fundamental concepts in the clustering and dimensionality reduction fields of unsupervised machine learning. Attendees will learn about the assumptions algorithms make and how those assumptions can cause the algorithms to be more or less suited to particular datasets. Hands-on interaction with machine learning algorithms on real and synthetic data are a central component of this tutorial. Students will use the software platform Divvy (freely available from the Mac App Store or divvy.ucsd.edu) to visualize and analyze data in real time while testing the concepts learned during formal instruction. We encourage attendees to bring their Mac laptops and their own datasets for the hands-on portion of the tutorial, and if possible to email their datasets ahead of time to josh@cogsci.ucsd.edu. Attendees will leave the tutorial with a much better understanding of basic concepts in unsupervised machine learning. Pragmatically they will understand when to apply, e.g., k-means to a dataset versus single linkage clustering. Attendees will also learn how to integrate Divvy into their existing research workflow so that they can quickly test and compare machine learning algorithms on their data.


Back to Table of Contents