Following the very good advice of Brandon Whitcher, I started recently reading the book:
“Software for Data Analysis: Programming with R”
by John Chambers
I must admit that from the title, and from knowing that this is a book about how to use software for performing statistical analysis, I was expecting a very slow paced, dry and abstract book.
Instead, however, I found a profoundly philosophical and strongly motivational book.
John Chambers sets up a perfectly aligned rationale to introduce us to the R system and its features.
It all starts with having a PURPOSE !
Chambers puts it clearly as:
DATA EXPLORATION is our MISSION
“We and those who use our software want to find new paths to understand the data and the underlying processes.”
“The mission is, indeed, to boldly go where no one has gone before”
“…but, we need boldness to be balanced by our responsibility…”
That responsibility is to make sure that the results of our software are TRUSTWORTHY.
Note that he didn’t say:
- Feature rich
- Backward compatible
- Easy to use
He said TRUSTWORTHY!
Chambers elaborates further:
“…The complexity of the data processes and of the computations applied to them mean that those who receive the results of modern data analysis have limited opportunity to verfiy the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it. Both the data analyst and the software provider therefore have a strong responsibility to produce a result that is trustworthy, and, if possible, one that can be shown to be trustworthy.”
This paragraph will apply to almost all the software that we use and develop for data analysis, engineering and scientific research.
Chambers phrases this as
THE PRIME DIRECTIVE: “Make Trustworthy Software”
“…The creators of software have the obligation to program in such a way that the computations can be understood and trusted…”
“…Our directive is not to distort the message of the data and to provide computations whose content can be trusted and understood…”
The Prime Directive is NOT the Mission, but rather an important safeguard to apply in pursuing the mission.
There are then two motivating principles:
- The Mission: which is Bold Data Exploration
- The Prime Directive: Trustworthy Software
From these two clear principles Chambers goes on and guides us through the features and design principles of the R language.
These characteristics flow naturally from the two motivating principles.
For example, the rationale for making “R” to be Open Source Software, derives directly from the Prime Directive:
- First: “The simple openness allows any sufficiently competent observer to enquire fully about what is actually being computed. There are no intrinsic limitations to the validation of the software.”
- Second: “open-source systems demonstrably generate a spirit of community among contributors and active users. The active and demanding community is a key to trustworthy software, as well as making useful tools readily available.”
I envy Chambers and his clarity.
As software developers,
- It is so easy to get blinded by the software itself.
- To let the software and its self-preservation become the purpose.
We easily let the software become an “Institution.”
This brings the associated drawback that Clay Shirky warned us about:
When the only goal of the software is to “come up with the next version“, then we are lost…
Here is where we pause,
…for that software that we write…
What is our MISSION?
but, don’t answer that fast…!
- The MISSION is not “what the software does” nor “how it does it”.
- The MISSION is not the collective set of features: “…my software can do this, can do that…”
- The MISSION is not the answer to the “what ?”, nor the “how ?” of the software.
The MISSION is the answer to the “Why?” of it.