June 5th, 2015

Putting this Talk in Context

Past INRUG Meetings

  • reproducibility (rmarkdown)
  • visualization (ggplot2 and shiny)
  • data cleaning and wrangling (tidyr and dplyr)

A Few Remarks for Today

  • we'll talk about teaching statistics (theory), not data science
  • national recommendations for statistics pedagogy call for a tool such as R
  • theory can't be compromised for neat tricks in R

Recent Efforts

  • make R more accessible for intro stats courses
  • include visualization and simulation to complement an intro stats course
  • incorporate aspects of Data Science in an intro stats course

Who Am I?

My Training

  • applied mathematician and probabilist, but not a really a statistician…
  • enthusiastic user of R for ~1-2 years

Courses I Teach

  • calculus sequence, differential equations, …, linear algebra, and statistics
    • MATH 121: introductory statistics
    • MATH 321: calculus-based introductory statistics
    • MATH 421/422: proof-based probability and statistics

Why I'm Partial to Using Mathematical/Statistical Software

  • computational thinking and literacy
  • provides a mechanism for inquiry via exploration, experimentation, and discovery
  • move beyond the (often tedious) formulas and grapple with the more interesting questions
  • appreciate the increasing role that computation plays in mathematics
    • software is no replacement for understanding the underlying concepts for the calculations

Recent R-related Activities

Recommendations for Statistics Pedagogy

GAISE Recommendations:

  1. emphasize statistical literacy and develop statistical thinking
  2. use real data
  3. use technology for developing concepts and analyzing data
    • "to the maximum extent feasible, calculations and graphics should be automated."
  4. stress conceptual understanding, rather than mere knowledge of procedures

\(\star\) read more at http://www.amstat.org/education/gaise/

ASA Curriculum Guidelines:

  1. increased importance of data science
  2. real applications
  3. more diverse models and approaches
  4. ability to communicate

\(\star\) read more at http://www.amstat.org/education/curriculumguidelines.cfm

Technology and Software

Some sort of software is necessary, but which and how?

  • instructor demonstrations vs. hands-on experience?
  • possible software: R Studio, Minitab, SPSS, SAS/JMP, Deducer/RCmdr, StatCrunch, etc.
  • online applets: StatKey, Rossman-Chance, etc.


R is…

  • "R is a free software environment for statistical computing and graphics.
    It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS"
    • The R Project Organization
  • "R is a data analysis software; R is a programming language; R is an
    environment for statistical analysis; R is open-source; R is a community"
  • "R is extremely powerful"
    • Revolution Analytics

So, Why R?

Why R?

  • free and open source
    • SAS, SPSS, Maple, Mathematica, MATLAB, etc. are expensive
    • valuable and employable skill (accessible after graduation)
  • R is growing (rapidly in popularity)
  • students won't outgrow it (it will grow with them)
  • instructional data is pre-loaded
  • R provides a gentle introduction to coding, computation, and \(\LaTeX\)
  • reproducible rather than point, drag, click, copy/paste, cross-your-fingers, …
  • high-quality graphics
  • clear and intuitive workflow

How Have I Incorporated R?

A Tale of Two Semesters

Fall 2014

  • R via SageMathCloud (not R Studio)
  • regular in-class demonstrations
    • R code embedded into \(\LaTeX\)
  • group writing projects (used R outside of class)
  • in-class labs (scheduled on the fly)

Spring 2015

  • R via R Studio (limited access outside of class)
  • regular in-class demonstrations
    • notes in R Markdown; shiny applications
  • no group projects
  • in-class labs (scheduled from day 1)

Lecture

R is used in some fashion for ~75% of my lectures

  • planned demonstrations (static and/or interactive)
  • simulations and graphics on the fly
  • post notes online in HTML format

Examples of Notes and R Code

  • permutation hypothesis tests
  • continuous RVs
  • simple linear regression

Examples of shiny Applications

  • confidence intervals
  • Type I and II errors

Group Writing Projects

In-depth exploration of a concept

  • groups are often asked to play the role of a consulting company
    • inspired by Gavin Larose (Univ. of Michigan)
  • use R outside of class; wrestle with R on their own
  • made R relevant and important
  • formed collaborative groups (and aided rapport)
  • students are given the .Rmd file to edit
    • they begin to take pride in producing high-quality documents

Examples

  • Descriptive Statistics
  • Airline Overbooking
  • Planning a Picnic

Labs

In-class labs are held in College 243

  • focused on discovery and inquiry
    • rather than learn concepts and R at the same time
  • collaborative teams encouraged
  • use R while exploring material from a different perspective
  • bimodal response from students (useful vs. waste of time)

Example

  • drug efficacy via sleep dataset

Example (drug efficacy via sleep dataset)

With provided code, students create plots similar to these:

Next time, I'll have students explore with mplot instead.

Project MOSAIC (http://mosaic-web.org/)

A predominant philosophy: Less Volume, More Creativity

  • xpnorm, xqnorm, xpt, xqt, favstats, tally, do, etc. are functions built in the mosaic package
  • mplot, mScatter, mWrangle, mBar, etc. are functions that utilize the manipulate package

Freely available books:

  • A Compendium of Commands to Teach Statistics with R
  • Start Teaching Statistics Using R
  • Start R in Calculus

Other instructional material:

"Less Volume, More Creativity"

  • less number of R commands, but each with more flexibility and power
  • provide enough commands so that students can do something, but not so many that it is overwhelming
  • avoid programming aspects (if, for, etc.)
  • give students the bare minimum and make them "beg" for more

A Quote to Live By

  • "if R is the hardest thing in the course, then the R is too hard and your questions are too easy"
    • R. Pruim (2015)

Useful Educational Tools in a Statistics Clases

mplot

  • powerful way to both do data exploration as well as learn ggplot2
require(mosaic)
require(manipulate)
mplot(ToothGrowth)

mosaic and manipulate

xqnorm(0.975,5,7)
manipulate(xqnorm(alpha,verbose=FALSE),alpha = slider(0.001,0.999,step=0.001))

Useful Educational Tools in a Statistics Clases

mWrangle

  • several Data Science courses are popping up around the country and spend ~1/3 to ~1/2 of the time on data wrangling
  • how to easily include it in an introductory course?
require(DCF)
require(DCFinteractive)
require(mosaicData)
mWrangle(KidsFeet)

How Has it all Worked?

Enhancement or Distraction?

Struggles and Surprises

Selling R To Students

  • R can be tough to sell (it's free, remember?)
  • the exposure to R seems conducive to learning and to have a positive impact on retention
    • but it may not help a student immediately, e.g., on the next exam
  • some students have really invested themselves in learning R

Lab Space

  • the mathematics department has no lab space; A&S has limited lab space
    • a 10 person seat license for Mathematica
  • R Studio on a server, using personal machines, and SageMathCloud are each options

R Changes Things

MATH 321 with R (i.e., students using R) is fundamentally different than without R

How much math? How much data? How much R?

  • the "correct" balance changes from MATH 121 to MATH 321 to MATH 421

Bigger Picture and Future Directions

Keep Learning R

  • based off of GAISE recommendations (and others), software is necessary
    • R is one viable (and good) option
  • more deeply connects students to important concepts
  • provides a computational tool for exploration
  • fundamentally changed my course organization and structure
    • ease of creating lecture notes and interactive demonstrations

Possible Directions

  • interactive webpages
  • partially flipped classroom
    • expose students to software before coming to class
  • more writing projects and less exams
  • randomized notes, examples, HWs, textbooks, etc.

Ending with More Questions than Answers

SoTL Questions Regarding R

  • how best to deliver the material to students?
    • labs, demonstrations, and/or group projects?
    • potential for pseudo-flipped classroom
  • how does using R impact learning?
  • how does using R impact attitude/interest?
  • how should testing be done?
    • with or without R?
  • this is a math class about the study of data
    • does incorporating data help or hurt the learning of the material?
    • showing neat tricks in R is not a replacement for learning the (mathematical) content


How Well Would R Work in a Calculus Course?

Acknowledgements and Thanks

  • Inland Northwest R Users Group
  • Project NExT cohort and Isolated Statisticians
  • A&S for funding to participate in Project NExT
  • CTA for funding to attend USCOTS
  • Mathematics Department for their support