Python - Multivariable Data Analysis
This is a demo on how to use python to conduct statistical analysis on a multivariable data set. ISB microbiologist, Alex Carr, recreated some of the python script he uses to analyze his bacterial samples and applied it to the publicly available iris data set, which is commonly used in statistics demonstrations. The iris data set contains measurements of petal width, petal length, sepal width, and sepal length for three different iris flower types: Setosa, Versicolour, and Virginica. Measurements were taken for 50 flowers total.
The demo script demonstrates how to use data imported two different ways: from a python package (Sci-Kit Learn) or a .csv file from your computer. This demo will then walk you through data normalization, linear regression, PCA, heatmaps, and K-means clustering.
Data normalization: organizing and structuring your data
Linear regression: modeling a relationship between two variables
PCA Plot: like a typical linear regression plot, a PCA plot portrays vectors in 2-D space, however, it uses more than two variables.
Heatmap: values are assigned to a color scale to help visually portray patterns across a lot of data
K-means clustering: determining grouping for vectors based on the vector mean
Students, please take this 1-minute survey, now that you've completed this activity. We are interested in learning about your experience so we can improve these resources. All responses to this survey are anonymous, all questions are optional, and your feedback is much appreciated.
Funding to support the development of this lesson was provided by the National Science Foundation Award DBI-1565166 & 0640950. The content of these pages was created by students for students with the help of teachers and scientists. The views expressed herein are those of the authors and do not necessarily reflect the views of NSF or ISB.