Course Motivation and Themes

— Christopher Genovese and Alex Reinhart

Motivation #

Modern data analysis can be a complex business #

Creating good software to manage this complexity has become an essential skill for statisticians.

Computing is taught at the margins in most statistics curricula #

  • Typical statistical computing courses focus on the details of methods and algorithms for various concrete problems.

  • Students are expected to learn the practice of computing and software engineering organically during their research.

  • Typical feedback and incentives can obscure the benefits of building good software.

The organic development cycle for statistical software can limit correctness, clarity, and reusability. #

  1. Start with questions, ideas, objectives – sometimes incomplete
  2. Try many different approaches and methods
  3. The code starts out rough and quick, supporting these approaches.
    • Assumptions about the data and algorithm get baked in
    • Meaning and documentation are sparse
    • Structure and design are secondary to ``getting it working''
    • There are working examples but few distinctive tests
  4. Once the paper is out, it’s time to move on…
    • Build on existing code base despite flaws
    • If it’s used, extensions build on top of that edifice, possibly for some time.
  5. Repeat

Building efficient, elegant, reusable software increases our productivity #

Good software engineering emphasizes:

  • Managing complexity
  • Communicating clearly
  • Finding effective abstractions
  • Crafting well chosen solutions to problems
  • Obtaining good performance, reuse, and generalizability

Sound familiar?

Programming well is lots of fun #

Course Philosophy and Goals #

Philosophy: #

  • A broad and firm foundation in computing will pay off throughout your career
  • The way to get better at programming is to practice programming
  • Good software design and programming practice are skills every statistician needs
  • Revision is a critical part of the development process
  • Having (at least a passing) understanding of multiple languages will make you a better programmer

Goals: By the end of this course, you should be able to #

  • develop correct, well-structured, and readable code;
  • design useful tests at all stages of development;
  • effectively use development tools such as editors/IDEs, debuggers, profilers, testing frameworks, and a version control system;
  • build a moderate scale software system that is well-designed and that facilitates code reuse and generalization;
  • select algorithms and data structures for several common families of statistical and other problems;
  • write small programs in a new language.

Themes #

Theme #1: Good programming practice #

Theme #2: Efficient workflows #

Theme #3: Good software design #

Theme #4: Well-chosen representations, data structures, and algorithms. #

Theme #5: Killer Apps! #

Meta-Theme: Manage Complexity! #