— Christopher Genovese and Alex Reinhart

Let’s consider design. Suppose you have a major software task. You’ve been asked to build, or need to build as part of your research, a big software system with many pieces that do many things. How do you go about designing your code, so you know how to organize your work and how to fit all the pieces together?

Design Example: Collaborative Assessment #

At your workplace, your team periodically gets together to do project performance reviews. Each project’s performance is reviewed by everyone involved (manager, employees, users from other departments), and the team uses these reviews to put together recommendations for each project.

These meetings tend to be very long, involve lots of details, and a lot of details get lost.

You decide to write a software system in which all the stakeholders can enter their data and perspectives and which compiles the information into a report.

How should you go about designing this system?

Design Reality #

Design is a sloppy process #

Expect dead ends, wrong turns, mistakes, head-slapping regrets… leading to a good outcome.

Good designs are often only subtly different from bad ones.

Design is about tradeoffs and priorities #

Recognizing those tradeoffs and aligning them with your priorities is a key step.

What are some of the tradeoffs in the example?

Design involves constraints #

Constraints lead to creative solutions

Design is heuristic #

No one methodology or process works in all contexts.

Design is iterative #

Requirements change #

In what ways might the requirements change over time in the example?

Considerations #

What considerations or criteria might affect our design choices?

Design Principles #

Minimize Unnecessary Complexity #

Software engineering is about managing complexity.

Form Consistent Abstractions #

Abstraction is the process of representing the essential features of a mechanism without delving into details or explanation of the underlying implementation.

If the various abstractions that comprise a piece of software are conceptually consistent and aligned, the system becomes easier to work with. If they are inconsistent or misaligned, complexity increases.

What are some of the abstractions in the Collaborative Assessment example?

Loose Coupling, Strong Cohesion, and Encapsulation #

Coupling is the interdependence between different parts of a software system.

In tightly coupled systems, changes to one part of the system tend to cascade, forcing changes in many other parts of the system. The system becomes rigid and fragile.

Key example of coupling: two different parts of the system depending on low-level details of one part’s implementation.

Cohesion describes how well the pieces of a system or module fit together in working towards their singular goal.

An example of a weakly cohesive design is one in which all the code for all parts of the system is in a single file.

Strong cohesion and loose coupling are often aided by encapsulating data and implementation details.

Example: getters and setters for objects

Modularity and Single Responsibilities #

As part of the design process, we try to understand the function and responsibities of different parts of the system and to divide the code into modules that each have a single, focused responsibility.

Modularity works with loose coupling, high cohesion, and encapsulation to help enforce a separation of concerns

What are some ideas for modules in the assessment example?

Extensibility #

Requirements change, and users often want to use software in ways that the authors did not anticipate.

When the design is rigid, it is hard to extend the functionality of the program, and user’s needs are not met.

We want to keep extensibility in mind as we design our programs. Modularity and consistent abstractions help us create extensible software.

What are some ways that we might need to extend the assessment software?

Reusability #

Writing software involves solving problems, some big, some small.

The same problems often recur again and again – but why re-solve those problems.

Parts of our software can be reused (or built upon) in solving future problems. Modularity, encapsulation, and separation of concerns make it easier to build reusable components.

Examples: Building data visualizations

Ease of Maintenance #

Any software that is used for a period of time has to be maintained. Libraries – even languages – change, as do data formats, communication methods, interfaces, and platforms.

  • Good documentation
  • Good tests
  • Package and dependency management (e.g., virtual environments)
  • Version control with good commit messages
  • Well-written code

Use Libraries When Possible #

Using well-used and well-tested (and especially standard) libraries support all of the above principles and often improve performance.

A Design Process #

We will combine top-down (starting with the high-level tasks and moving towards the details) and bottom-up (starting with the details and building to the high-level) approaches.

  1. Develop a clear idea of the objectives and requirements of your program.
  2. Identify the main concerns/subsystems. These are good initial candidates for modules.
  3. Determine what kind of data your program will operate on. How will it obtain that data? From one or many possible sources? How will that data be stored and organized for the tasks at hand?
  4. Name and declare the entry-point functions, including their interface. Define tests if possible.
  5. Develop pseudo-code for the main entry points.
  6. Identify auxilliary functions needed in main functions Define tests if possible.
  7. Reconsider the high-level design
  8. Consider low-level functions. How will you process and store your data? What pieces will bind together the different modules in your system? What basic utilities do you need? Define tests if possible.
  9. How do these low-level details affect the high-level design?
  10. Iterate!

Team Design Task: M5 #

We are going to work in teams to design a macro processor that we will call M5, after the famous m4 and of course Star Trek:


The Program #

M5 is a program that reads in a text file. It breaks the file into tokens, which are essentially words separated by spaces (but see further details below).

It then loops through each token and does one of two things with each token.

  • If it’s recognized as a macro, out of a list of defined macros, the text is replaced with the macro expansion. More on that in a moment. The macro expansion is pushed back into the input, so M5 will then repeat the process on that text, looping through its tokens.
  • If the token is not the name of a macro, it is simply printed to the output.

Think of this as being a bit like LaTeX. In a LaTeX file, ordinary tokens (like text) are output into the PDF; macros (like \frac) are expanded into further commands, and you can define a new expansion with \newcommand{\foo}{bar}.

Let’s call this process of looking at tokens scanning.

Scanning can be suppressed with appropriate quoting. When scanned, text within quote pairs `’ (backtick-apostrophe) is returned as-is and unquoted. But note that text that is rescanned will be processed as usual, so it may require multiple sets of quotes to totally prevent expansion.

Now, what is a macro? M5 has a built-in macro called define. You give it a macro name and the text the macro should be replaced with:

define(`foo', `bar');  # The quotes are removed during initial scanning
foo                    #=> bar
define(`bar', `zap')   # A new definition, # is comment character
foo                    #=> zap because bar is rescanned and replaced
define(`foo', ``bar'') # foo will be replaced with `bar' to be rescanned
foo                    #=> bar

Empty quotes can be used in the middle of a name to avoid recognition. For example: f`'oo will not be recognized as a macro foo but will substitute to foo. (Note: `'foo will not work in this way however as the quotes just end up as an empty string before foo is even read.)

So, if foo is a macro with expansion `bar'. Then scanning foo will produce bar, with the quotes removed, even if . But if bar is subsequently defined as a macro producing zap. Then foo will cause bar to be

Macro names that are recognized during scanning must start with a letter, hyphen, or underscore and must consist of letters, hyphens, underscores, and digits. However, as we will see, the built-in define macro can accept and define other names.

Macros can have arguments. These are referred to by ${number} (e.g., ${1}, ${2}) to get the number-th argument. ${0} refers to the name of the macro being expanded; $# refers to the number of arguments, and $* expands to a comma-separated list of arguments to the macro.

When a macro name is scanned and that macro can accept arguments, the arguments are read if an immediately following parenthesis appears. Arguments are scanned as they are read; this scanning can include commas, which are then treated as input commas con rescanning. Unquoted leading and trailing whitespace is ignored in the arguments. Quoted commas are part of the argument not separators between arguments. Whitespace after the closing parenthesis is not ignored unless the closing parenthesis is immediately followed by a ;.

Note: comments are ignored but not dropped; use the built-in macro comment to drop text until the end of the line

The program m5 should accept multiple files on the command line (along with any options deemed helpful) and produce output to standard output (by default).

Built-in Commands #

Here’s a selection of built-in commands; many more are possible:

  • define, define0, definen, clone, pushdef, popdef

    Use define(name, expansion) to create macro name. References to arguments in the expansion will be replaced with arguments, even if quoted. (Replacement happens before rescanning.) Both name and expansion are usually quoted here, th

    Use define0(name, expansion) to create a macro name that cannot take arguments. Parentheses after the name are not scanned, and argument refs in the expansion are treated as is. Similarly, definen creates a macro that requires arguments

    Use clone(name) expands to the quoted definition of name, if a macro, or the empty string..

    Use pushdef(name) and popdef(name) to temporarily redefine and undefine a macro.

  • include(file)

    Immediately starts reading input from file. Once everything has been read from the file, resumes reading from the original input file. These can be nested.

  • defined(name)

    Returns T,T if name is a defined macro else T,F.

  • undefine(name)

    Undefines a macro

  • comment

    Discard text from this macro through the end of the line.

  • ifelse(string1, string2, truex, falsex)

    If string1 and string2 are the same text, expand to truex otherwise falsex.

    Example: ifelse(defined(`foo'), `yes', `no');

  • call(name)

    Replaced with the expansion of name if defined or the empty string. Note that name need not be a valid macro string, allowing one to define special namespaces.

    define(`mypackage::foo', ...);
    • first(arg1,args...) and rest(arg1,args...) and reverse(args...)

    Expands, respectively, to arg1 and args..., a comma separated list of arguments, and args... in reverse order.

    • Loops: for(iterator,start,end,body) and foreach(iterator,list,body)

    • Strings: substring, join, translate, search, format, …

    • Counters/Integers: inc, dec, calc, numerical comparisons.

    • Streams: input and output redirection, including virtual streams.

    An Example #

    Suppose we have two files. In jumps.txt:

    jumps over
    define(`productive', `lazy');

In main.txt:

The quick brown fox
define(`fox', `dog');
productive fox.

If we run the command m5 main.txt, we will get the output

The quick brown fox
jumps over the
lazy dog.

The Task #

Divide into teams. Work together on the initial design steps 1-3 (maybe 1-4). Then divide into smaller groups to define the functions in different modules.

You should create actual files with actual content on your computer as part of design process. These files may have design notes and sketched-out classes and functions, written just as we discussed when talking about Part 1 of the Challenge. You don’t have to write the code, just design the functions and classes.

We will stop periodically to check in on the ideas.

A few questions to consider, in no particular order:

  1. What data structures will be useful for managing the defined names? Do you need to distinguish built-in’s from user-defined names? What properties do you need to store for each name?
  2. How will you handle include? What data structures are needed? How does this fit with a strategy for handling input and output?
  3. How do scanning and re-scanning work? How does this affect your input and output strategy?
  4. What are the principal entities that you have to manage in this program? How will you create them as classes, and what methods should those classes have?

Resources #

  • Programming on Purpose by P.J. Plauger (See e.g., link.)
  • The Art of Unix Programming by Eric Raymond (See e.g., link.)
  • Code Complete by Steve McConnell (Focuses on Object-Oriented Design)