Home > Uncategorized > RFC Word Occurrence

RFC Word Occurrence

March 23rd, 2012

I planned it for a long time, but now seemed to be the right time – just before the next IETF.

Out of curiosity, I wanted to visualize the  occurrence frequency of the following terms over the years: [“MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, “OPTIONAL”].

Since in majority of cases RFCs represent standards, I was curious to see how the usage of these terms evolved over the years. Of course, a more frequent occurrence of a term “MUST” from year to year does not indicate that the standards have become “stricter” in their statements and requirements, but nonetheless, the dynamics can be observed in the following two graphs.

The first graph (to the right) represents a total number of each keyword/term occurrence over the years. Publication years of the considered RFCs were truncated to 1985-2011, as the occurrence of these terms was minor in preceding years, and 2012 is still underway. “MUST” is a clear dominant among all terms. 2010 witnessed the largest share of “MUST” usage among all RFC document categories. Terms like “SHOULD”, “MAY” and “MUST NOT” have also shown a significant growth. However, the total number of published RFCs has also been increasing over past years, and therefore the graph below normalizes the term occurrence by the total number of RFCs that mentioned any of the terms in the corresponding year. “MUST” is still the dominant term, with a growth rate of ~80% between 2000 and 2010. Terms “MAY” and “SHOULD” follow each other closely. On the other hand, the usage of  terms like “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD NOT”, “RECOMMENDED”, “OPTIONAL” between 2000 and 2010 has been steady and low.

There are a number of things that could be further checked, e.g., distribution by document category, according to working groups, correlation with the length of the document, etc., but that will follow at some point in future 🙂

Make your own conclusions (e.g., why do more frequently used terms seem to fluctuate simultaneously?), and feel free to share them with me, as well as any other suggestions on what other interesting stats I could extract from these documents.

Comments are closed.