• R for Data Analysis and Visualization
  • Syllabus
    • Office Hours
    • Student Learning Objectives
    • Resources
      • Required
      • Recommended:
    • Course Requirements
      • Weekly assignments (30%)
    • Individual Project
      • Project assignments (30%)
      • Project proposal presentation (5%)
      • Final project (20%)
      • Final project presentation participation (15%)
    • Schedule
      • Week 1
      • Week 2
      • Week 3
      • Week 4
      • Week 5
      • Week 6
      • Week 7
      • Week 8
      • Week 9
      • Week 10
      • Week 11
      • Week 12
      • Week 13
      • Week 14
      • Week 15
      • Week 16
    • Other Resources
      • Statistics
      • Visualization
      • Courses
      • Books
      • Papers
  • Weekly Assignments
    • Creating the R Markdown files
    • Submitting your assignment
  • Individual Project
    • Project assignments
    • Project proposal presentation
    • Final project RMarkdown
    • Final project presentation participation
  • Final Project Rubric
    • Section: Introduction (10%)
    • Section: Data (10%)
    • Section: Analysis and Results (30%)
    • Section: Conclusion (10%)
    • General: Organization (10%)
    • General: Grammar (10%)
    • General: Code (20%)
  • I R Tutorials
  • 1 R Basics
    • 1.1 R Markdown
    • 1.2 Working with data already loaded into R
      • 1.2.1 Numeric summaries
      • 1.2.2 Visual summaries
      • 1.2.3 Distribution plots
    • 1.3 Assignment
  • 2 Reading data
    • 2.1 Data Source:
    • 2.2 read_csv
    • 2.3 dplyr
    • 2.4 First Look at ggplot2
    • 2.5 Heatmaps
    • 2.6 Hexbins
    • 2.7 Other topics from this dataset
    • 2.8 Assignment
  • 3 Facets, Bubbles, and Transparency
    • 3.1 Data
      • 3.1.1 Importing non-standard Excel files
    • 3.2 Facets
    • 3.3 Bubbles
    • 3.4 Transparency
    • 3.5 Facets
    • 3.6 Assignment
  • 4 Lines and Curves
    • 4.1 Data
    • 4.2 geom_smooth
    • 4.3 geom_abline
    • 4.4 geom_vline
    • 4.5 hline
    • 4.6 Assignment
  • 5 Scatter Plot Matrices and Extensions
    • 5.1 Data
    • 5.2 ggplot2 extensions
    • 5.3 ggjoy
    • 5.4 scatterplot matrix (GGally::ggscatmat)
    • 5.5 Assignment
  • 6 Boxplots and Violin Plots
    • 6.1 Data
    • 6.2 Boxplots
    • 6.3 Violin Plots
    • 6.4 Dot Plots
    • 6.5 Assignment
  • 7 Spatial Visualizations
    • 7.1 Data
    • 7.2 geom_spoke
    • 7.3 maps
    • 7.4 gganimate
    • 7.5 glyphs
    • 7.6 Assignment
  • 8 geom_area and geom_ribbon
    • 8.1 Data
      • 8.1.1 Downloading a file from the internet
      • 8.1.2 Pairing down the list of variables
      • 8.1.3 Adding more activity categories
    • 8.2 geom_area
    • 8.3 geom_ribbon
    • 8.4 Assignment
  • 9 Jitter, Rug, and Aesthetics
    • 9.1 Data
      • 9.1.1 Codebook
      • 9.1.2 Preprocessing the SPSS File
      • 9.1.3 Importing with the SPSS file using memisc
      • 9.1.4 Transform Data
    • 9.2 Jitter
    • 9.3 Rug
    • 9.4 Aesthetics
    • 9.5 Assignment
  • 10 Themes, Labels, and Colors
    • 10.1 Data
      • 10.1.1 Zillow Real Estate Data
      • 10.1.2 Reshaping data with tidyr
    • 10.2 Starting Plot
    • 10.3 Themes
      • 10.3.1 Complete Themes
      • 10.3.2 Modifying a theme
      • 10.3.3 ggthemes
    • 10.4 Labels
      • 10.4.1 Title
    • 10.5 Colors
      • 10.5.1 Color Scales
      • 10.5.2 Color Brewer
    • 10.6 Assignment
  • 11 Polar Coordinates
    • 11.1 Data
      • Survey of Consumers
    • 11.2 Simple Time Series
    • 11.3 Stacked Periods
    • 11.4 Polar Coordinates
    • 11.5 Assignment
    • 11.6 Data Attribution
  • 12 Text Analysis
    • 12.1 Data
      • 12.1.1 Twitter
      • 12.1.2 Setting up twitteR
      • 12.1.3 Finding tweets
    • 12.2 Tidytext
    • 12.3 N-grams
      • 12.3.1 Skip N-grams
    • 12.4 Assignment
  • 13 Networks
    • 13.1 Data
      • 13.1.1 UN Comtrade Database
    • 13.2 Geomnet
    • 13.3 ggnetwork
    • 13.4 Node/Vertex Summary
    • 13.5 Assignment
      • 13.5.1 Download a new dataset
      • 13.5.2 Create a visualization
  • 14 Log scales
    • 14.1 Basic Log Review
      • 14.1.1 Order of magnitude
      • 14.1.2 Percent change
      • 14.1.3 Histogram comparison
    • 14.2 Data
    • 14.3 Logs in position
      • 14.3.1 Population size and AGI per capita
      • 14.3.2 Histogram comparison
      • 14.3.3 Taxes and Farms
    • 14.4 Logs in color
    • 14.5 Assignment
  • 15 Cross-Section Modeling
    • 15.1 Introduction
    • 15.2 Data
      • 15.2.1 Layout
      • 15.2.2 Data Dictionary
      • 15.2.3 The 2015 National Survey of College Graduates
      • 15.2.4 Variable Definitions
      • 15.2.5 Training/Validation/Test Sets
    • 15.3 Exploratory Data Analysis
    • 15.4 Simple linear model
    • 15.5 Model selection
      • 15.5.1 Model alternatives
      • 15.5.2 Comparing model performance
    • 15.6 Predictions from the selected model
    • 15.7 Assignment
    • 15.8 See also
  • 16 Time-Series Modeling
    • 16.1 Introduction
    • 16.2 Data
    • 16.3 Hold-Out Set
    • 16.4 GDP
    • 16.5 Autoregressive Model
      • 16.5.1 AR Performance
    • 16.6 ARIMA
      • 16.6.1 ARIMA Performance
    • 16.7 Vector Autoregression
    • 16.8 Assignment
  • II Topics
  • Data Sources Overview
    • Macro Data
      • US
      • Other US
    • Micro Data
    • Hawaii Data
    • Collections of data lists
  • Anscombe’s Quartet
    • Prep the data
    • Numeric summary
    • Visual summary
    • The Datasaurus Dozen
  • Probability
    • Hot Hands
    • Saving your code
    • Getting Started
    • Compared to What?
    • Simulations in R
    • Simulating the Independent Shooter
    • On your own
      • Comparing Kobe Bryant to the Independent Shooter
  • Distributions
    • The Data
    • The normal distribution
    • Evaluating the normal distribution
    • Normal probabilities
    • On Your Own
  • How to Judge Visualizations
  • Intro to Inference
    • The data
    • The unknown sampling distribution
    • Interlude: The for loop
    • Sample size and the sampling distribution
    • On your own
  • Confidence Intervals
    • Sampling from Ames, Iowa
    • The data
    • Confidence intervals
    • Confidence levels
    • On your own
  • Inference for Numerical Data
    • North Carolina births
    • Exploratory analysis
    • Inference
    • stats package
    • On your own
  • Inference for Categorical Data
    • The survey
    • The data
    • Inference on proportions
    • How does the proportion affect the margin of error?
    • Success-failure condition
    • On your own
  • Introduction to Linear Regression
    • Batter up
    • The data
    • Sum of squared residuals
    • The linear model
    • Prediction and prediction errors
    • Model diagnostics
    • On Your Own
  • Multiple Linear Regression
    • Grading the professor
    • The data
    • Exploring the data
    • Simple linear regression
    • Multiple linear regression
    • The search for the best model
  • Published with bookdown

R for Data Analysis and Visualization

Data Sources Overview

While you can find many data sources by typing public data sources in your favorite search engine, the following lists should help you get started.

Macro Data

US

  • Federal Reserve Economic Data (FRED)
  • Bureau of Labor Statistics
  • Bureau of Economic Analysis
  • National Bureau of Economic Research
  • Congressional Budget Office: Budget and Economic Data
  • American FactFinder (American Community Survey, Census Summary Files, etc.)

Other US

  • The Conference Board (includes consumer confidence index)
  • Survey of Consumers
  • Historical Exchange Rates
  • Center for Medicare and Medicaid Services

Micro Data

  • Panel Study on Income Dynamics
  • IPUMS (census and survey data)
  • US Census Public Use Microdata Sample (PUMS)
  • Center for Medicare and Medicaid Services

Hawaii Data

  • State of Hawaii Department of Business, Economic Development and Tourism (DBEDT)

Collections of data lists

  • American Economic Association (AEA) list of data sources