Practices (Week 5 Thursday)

— Christopher Genovese and Alex Reinhart

Announcements #

Plan for Today #

  • Debrief on Tuesday’s Activity
  • Challenge Discussion and Schedule Revision
  • Best Practices with activity (hopefully)

Debrief #

  • What was most challenging?
  • Concrete examples as a tool (more than just tests)
  • Design choices: object/function, immutable/mutable, representation?
  • Benefits of recursive thinking
  • Utility of the structure

Challenges #

There are two challenges to choose from; new versions of these have been uploaded into the problem-bank repository.

Classification Trees
Build binary classifiers using trees to partition covariate space
Shazam
Identify source of audio snippets using hashing and spectral analysis

Phase 1: Design #

Goal: Think carefully – and specifically – about the structure of your code.

  • Identify Data Entities and their Relationships
  • Make some major structural decisions
  • Familiarize yourself with needed libraries or methods
  • Identify key layers, concerns, and abstractions
  • Interface specification (implementation of this should be fast)
  • Tests

This is an initial framework, intended to make your thoughts concrete and give you an infrastructure for building your implementation. It is ok if your plans and designs change as you proceed. Make it an initial attempt rather than a rough draft.

On Tuesday, we will have a group design activity focused on the challenges that will help you move forward. You should familiarize yourself with the challenge material beforehand.

Revised Schedule #

Challenge Part Deadline
1 October 13
2 October 26
3 November 23
Final Revision December 13

Best Practices #

Programming is a form of communication to two audiences: the computer and human readers (including future you).

As long as your code is syntactically correct, the computer will run it, for better or worse. But your code will be checked, studied, tested, documented, debugged, used, modified, generalized, and reused by humans. To get the most value from your time spent programming, you need to pay attention to how humans process your code.

Indeed, the features that characterize good code are very similar in spirit to the features that characterize good writing.

There are many details to manage in a complex piece of code, and consequently there are many detailed choices to consider in practice. These include matters of

  • style
  • naming
  • documentation
  • organization
  • design
  • dependence
  • error handling
  • tooling

See the rubric or the book Code Complete by Steve McConnell. These are worth reading and studying.

But for our purposes today, we can cover a lot of ground with only a few basic principles.

Write code to be read #

Code communicates ideas and describes abstractions, often complicated ones. Its execution is like an unfolding story, with characters traveling along their own narrative arcs.

Try to maximize the ease and clarity with which the reader can process the code. Help them to

  • understand the ideas/abstractions behind the code
  • identify the characters/entities involved, and
  • follow the story.

A good principle for writing/presentation: prepare your reader for the information you are about to give.

Here are a few implications of this principle:

  • Format your code to make it easy to read
  • Use meaningful, concrete, and descriptive names
  • Arrange your code to bring out the central idea in each chunk
  • Make critical relationships salient
  • Structure your interfaces to present a clean and consistent abstraction
  • Avoid hidden side effects and obscure features
  • Use documentation to supplement code not mimic it

On documentation #

Give readers an entry point for understanding how the code is used, how data flows, et cetera. Examples and description can help, even if brief. Section labels and pointers to entry points can be helpful. Tests are a form of documentation too.

Use docstrings/structured comments for nontrivial functions.

The code can impart meaning on details – on the how – but not as easily on the why or when. Comments help there.

If you write a clever piece of code, first ask if you need to be clever and if so, consider documenting the goals of the code, constraints, reasons, etc.

Task for sharpening: Go to an open-source repository that interests you and read the code. What works for you? What doesn’t? What makes you work?

Be consistent #

A foolish consistency may be the hobgoblin of little minds, but for programming, a practical consistency is helpful to in many ways.

Here are a few implications of this principle.

  • Use consistent formatting, spacing, and style

  • Use consistent naming schemes for variables, functions, classes, and files CamelCase sausage-case snake_case ALL_CAPS

  • Use consistent documentation formatting, style, and scope

  • Use consistent interfaces to functions and classes

  • Use consistent error handling

Many conventions for naming, formatting, spacing, etc. are included in style guides used by projects or programming languages. For example, PEP 8 describes naming and formatting conventions for Python code, and your code will be expected to follow it. (PEP 8 is unusual because nearly every major Python project uses it.) R has a lot of historical cruft that means nobody uses the exact same style, but the tidyverse style guide is a good reference. Read these guides!

Don’t Repeat Yourself #

Seriously, don’t repeat yourself. It’s inefficient to repeat yourself. So don’t do it. Really.

Keep your code DRY! (Not WET – wasting everyone’s time!)

Each piece of knowledge embodied in the code should have one unambiguous and authoritative representation.

Here are a few implications of this principle.

  • If you find yourself repeating a piece of code, put it in a function.
  • If you find yourself using a number or other literal, make it a named constant. (Besides a few basic cases such as 0, 1.)
  • Documentation should not merely repeat what the code does but should add value. For instance: why, who, when?

It’s easier to chew small pieces #

Any stretch of code focuses on a few key ideas. Organizing your code to bring out one idea at a time, rearranging as needed.

  • Organize your code modularly (paragraphs, functions, files)
  • Prefer functions that do one thing well
  • Prefer orthogonality (decoupling)
  • Prefer functions/classes/modules with a distinct purpose and identity

Coupling: Consider how a change in your code/design/interface/… will cascade through your whole codebase.

Keep the contract clear #

Each function or class has an explicit contract behind it. “I give you this, you give me that.”

Make that contract salient in your code, your names, your tests, and your documentation.

An implication: separate calculations, actions, and data (Referential transparency is a good goal in any paradigm.)

An idea we will discuss: using language features to enforce this contract (from types to assertions to explicit pre/post conditions).

Keep information on a need to know basis #

Each function, class, and module in your code needs some information to do its job.

Give it the information it needs but no more.

Giving too much information couples parts of the code that should be independent, making them harder to test, debug, and reason about.

Objects in particular should “_encapsulate_” the information they contain quite jealously.

Make it run, make it right, make it fast – in that order. #

Only optimize the bottlenecks!

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

– Donald Knuth, The Art of Computer Programming

A Demonstration #

  • In your local copy of the documents repository, do a git pull.
  • Open the file Activities/best-practices/shift-the-mean-1.r in an editor or in RStudio.

We will think about this code with respect to the principles and consider some modifications to improve it.

First, look over the code for five minutes and consider a few initial questions:

  1. What does this code do? How might you figure it out?
  2. What are the intended inputs?
  3. What is the intended output?
  4. Can you explain why anything is done the way it is?
  5. What about this code’s formatting and style makes it difficult to answer the questions above?

Second, a few modifications. See the files:

in the Activities/best-practices directory of the documents repository.

An Interactive Exercise #

Copy one of the files Activities/best-practices/nnk.py or Activities/best-practices/nnk.r into another directory (outside documents).

We will think about this code and make a series of modifications, in light of the principles we have discussed today.

A few initial questions to consider as you examine the code:

  1. What does this code do? How might you figure it out?
  2. What are the intended inputs?
  3. What is the intended output?
  4. Can you explain why anything is done the way it is?
  5. What works well here for clarity and readability? What does not?
  6. Where is the code consistent or inconsistent?
  7. Is there repeated code? What should you do about that?
  8. Are the concepts within the code separated into meaningful chunks?
  9. Is information properly encapsulated?

As you find the answers to these questions, restructure the code to make it follow our design principles.

Rough activity time: 25 minutes

You are encouraged to discuss this with your neighbors as you work, but you should enter your own changes.

Resources #

  • The book Code Complete by Steve McConnell
  • The Pragmatic Programmer by Andy Hunt and Dave Thomas
  • Community style guides