Motivation #
Modern data analysis can be a complex business #
Creating good software to manage this complexity has become an essential skill for statisticians.
Computing is taught at the margins in most statistics curricula #
-
Typical statistical computing courses focus on the details of methods and algorithms for various concrete problems.
-
Students are expected to learn the practice of computing and software engineering organically during their research.
-
Typical feedback and incentives can obscure the benefits of building good software.
The organic development cycle for statistical software can limit correctness, clarity, and reusability. #
- Start with questions, ideas, objectives – sometimes incomplete
- Try many different approaches and methods
- The code starts out rough and quick, supporting these approaches.
- Assumptions about the data and algorithm get baked in
- Meaning and documentation are sparse
- Structure and design are secondary to ``getting it working''
- There are working examples but few distinctive tests
- Once the paper is out, it’s time to move on…
- Build on existing code base despite flaws
- If it’s used, extensions build on top of that edifice, possibly for some time.
- Repeat
Building efficient, elegant, reusable software increases our productivity #
Good software engineering emphasizes:
- Managing complexity
- Communicating clearly
- Finding effective abstractions
- Crafting well chosen solutions to problems
- Obtaining good performance, reuse, and generalizability
Sound familiar?
Programming well is lots of fun #
Course Philosophy and Goals #
Philosophy: #
- A broad and firm foundation in computing will pay off throughout your career
- The way to get better at programming is to practice programming
- Good software design and programming practice are skills every statistician needs
- Revision is a critical part of the development process
- Having (at least a passing) understanding of multiple languages will make you a better programmer
Goals: By the end of this course, you should be able to #
- develop correct, well-structured, and readable code;
- design useful tests at all stages of development;
- effectively use development tools such as editors/IDEs, debuggers, profilers, testing frameworks, and a version control system;
- build a moderate scale software system that is well-designed and that facilitates code reuse and generalization;
- select algorithms and data structures for several common families of statistical and other problems;
- write small programs in a new language.