R for Data Analysis and Visualization
ECON 396 (Fall 2017)
TR 10:30-11:45, DURP Computer Lab (first floor Saunders)
TR 10:30-11:45, DURP Computer Lab (first floor Saunders)
Jonathan Page
2017-11-12
Syllabus
Office Hours
Monday 2-3 PM and Tuesday 3-4 PM, or by appointment, Saunders 509, jrpage at hawaii dot edu.
Student Learning Objectives
- To be familiar with standard techniques for visualizing data, including heat maps, contour plots, etc.
- To be able to transform raw data into formats suitable for analysis
- To be able to perform basic exploratory analysis
- To be able to create data visualizations in R
There is no prerequisite for this course.
Resources
Required
Introductory Statistics with Randomization and Simulation: Available as a free PDF (https://www.openintro.org/stat/textbook.php?stat_book=isrs) or for $8.49 on Amazon.
Course Requirements
Grades for this course will be based on weekly assignments (30%), project assignments (30%), the project proposal (5%), the final project deliverable (20%), and final project presentation participation (15%).
Weekly assignments (30%)
Weekly assignments are short R excercises. Each exercise should take no longer than 15 minutes. You will typically be given time to complete the exercise in class the day the assignment is given. The assignment will be in the form of R Markdown file (*.Rmd). You will submit the completed assignments via classroom.google.com by the following class period.
Individual Project
Project assignments (30%)
Each week, leading up to the project proposal, you will be given an assignment that is designed to provide you with an organized workflow for approaching new data science projects. Project assignments are submitted via classroom.google.com, with the exception of the two presentations
Project proposal presentation (5%)
This presentation should be less than 2 minutes. You simply need to communicate the core question your project seeks to answer and the dataset(s) you will be using to answer this question.
Final project (20%)
The final project will be an R Markdown document which communicates your project question, the data you used, and your results. You will need to deliver both your R Markdown file and any necessary data for running the file.
Final project presentation participation (15%)
Your final project participation grade is based on a combination of your own presentation and the feedback you provide to your classmates.
Schedule
The following schedule is tentative and subject to change. Typically, the Tuesday class will consist of the week’s R lecture. Depending on how quickly we get through the material, you will have time to work on your assignment that will be due before the following class period. On Thursdays, we will discuss a relevant topic, but you should have time to work on your project assignment for the week. That assignment will generally be due before the following class period, except for the last several weeks when you are completing your final project.
Week 1
- R Intro to R and RStudio; Histograms, scatterplots, summary statistics
- Data R Sample Datasets
- Topic Data sources overview
- Project Assignment Indentify interesting datasets (include links to datasets) and questions
Week 2
- R read_csv, dplyr basics, heatmaps, hexbins
- Data ACS PUMS [CSV]
- Topic Anscombe’s Quartet
- Project Assignment Choose question and dataset (with link to your source) for your project
Week 3
- R ggplot facets, bubble plots, transparency
- Data Hawaii Tourism Authority [Excel]
- Topic Probability
- Project Assignment Write description of your question
Week 4
- R geom_smooth, abline, vline, hline
- Data State of Hawaii Department of Business, Economic Development (DBEDT) [Excel]
- Topic Distributions
- Project Assignment Write description of your dataset(s)
Week 5
- R ggplot2 Extensions and Scatterplot Matrices (GGally)
- Data ACS Immigration [CSV]
- Topic JunkCharts Trifecta Checkup
- Project Assignment Create 2 descriptive plots of your datasets(s)
Week 6
- R Boxplots, violin plots
- Data SSA [Excel]
- Topic Intro to Inference
- Project Assignment Write a description of the data cleaning required for your project
Week 7
- R Spatial Visualizations with geom_spoke, gganimate, and GGally::glyphs
- Data NOAA Wind [netCDF]
- Topic Confidence Interval
- Project Assignment Write a description of your planned approach
Week 8
- R geom_area, geom_ribbon
- Data BLS American Time Use Survey (ATUS) [TSV]
- Topic Project Proposal Description
- Project Assignment Work on project proposal presentation
Week 9
- R jitter, rug, aesthetics
- Data PSID [SPS, TXT (Fixed-Width)]
- Topic Present project proposal (<2 Minutes)
Week 10
- R Themes, Labels, and Colors
- Data Zillow Age of Real Estate Inventory Data [CSV]
- Topic Inference for Numerical Data
- Project Assignment Work on final project
Week 11
- R Polar Coordinates
- Data University of Michigan - Survey of Consumers [CSV]
- Topic Inference for Categorical Data
- Project Assignment Work on final project (cont.)
Week 12
- R Text Analysis (Natural Language Processing)
- Data Twitter [twitteR API]
- Topic Linear Regression
- Project Assignment Work on final project (cont.)
Week 13
- R networks, geomnet extension
- Data UN Comtrade [CSV]
- Topic Multiple Regression
- Project Assignment Work on final project (cont.)
Week 14
- R Log Scales
- Data IRS Statistics of Income [CSV]
- Topic Putting your work online
- Project Assignment Work on final project (cont.)
Week 15
- R Cross-Section Modeling
- Data NSF National Survey of College Graduates [DAT (Fixed-Width)]
- R Time-Series Modeling
Week 16
- Final Project presentations
Other Resources
There are many useful resources you should be aware of while going through this course. I will attempt to keep this list updated as I become aware of more useful links:
RStudio’s List of Useful R Packages
Courses
Gary King - Quantitative Research Methodology
John Stasko - Information Visualization
Jenny Bryan - Data wrangling, exploration, and analysis with R