Pass Phrases


A grammar-based random pass-phrase generator can help make life easier for users and system administrators by generating memorable passwords that should meet the needs of most sites. With this algorithm, users should be able to choose a password more easily. The passwords produced by the algorithm should be easy enough to type, reducing the likelihood of being accidentally locked out of the system by logon failures. System administrators may finally be able to spend less time resetting passwords and unlocking accounts, without sacrificing security.

Passwords provide much of computer and data security, but they suffer from conflicting requirements: Ideally, they would be easy to memorize and quick to type, yet they should also be able to withstand attack by an automated password cracking program. The United States Department of Defense (DOD), and the National Institute of Standards and Technology (NIST) established requirements that are intended to strengthen passwords. Unfortunately, many users (and system administrators) find it tough to come up with passwords that meet DOD and NIST requirements, and even tougher to memorize their passwords. Users forget their passwords, or mistype them and cause an account lockout. System administrators then need to come up with secure new passwords for these users.

A grammatically-correct random pass phrase generator can make passwords that are easy enough for users to memorize, yet still be secure. The program can generate over 200 trillion different equally-likely pass phrases (in security terms, a strength measured at about 47 bits of entropy). The passwords will be between 14 and 22 characters long. Since most of the password length comes from familiar English words, the length is more tolerable. The random selection of words often results in absurd phrases. Absurdity is good. Advertisers use absurdity to make their messages more memorable.

Here is a random sampling of passwords from the program, along with the words separated by spaces for easy reading:

`55ScabbyGateAromas` 55 Scabby Gate Aromas
||BroodsPaving25Ghouls Broods Paving 25 Ghouls
“ThreatPlops45Pumas Threat Plops 45 Pumas
[PreppyHotelScored45] Preppy Hotel Scored 45
63ChoirEssaysMooing“ 63 Choir Essays Mooing
(71ButteYolksTitter) 71 Butte Yolks Titter
–LynxesRiling41Keels Lynxes Riling 41 Keels
79PuttsRefuelCoral## 79 Putts Refuel Coral
{98HurryFilmyMall} 98 Hurry Filmy Mall
`78RetainEmuRaptly` 78 Retain Emu Raptly
<43SpunkyGulagsShaped> 43 Spunky Gulags Shaped
##47RidgedLensesPass 47 Ridged Lenses Pass
||98PrewarMasonsQuit 98 Prewar Masons Quit
38FooledMatsZap{} 38 Fooled Mats Zap
24DuckyClubsNailed## 24 Ducky Clubs Nailed
97NamingTapsKnelt!! 97 Naming Taps Knelt
{}99TravelGlazedCyst 99 Travel Glazed Cyst
~~HazelsLose93Dopers Hazels Lose 93 Dopers
50PlushyCuesGoing^^ 50 Plushy Cues Going
TuskAged41Apace$$ Tusk Aged 41 Apace

How it Works

To start, DOD password requirements include:

  • Passwords must be a minimum of 14 characters, including at least two uppercase characters, two lowercase characters, two punctuation characters, and two numeric digits.
  • No character can be used more than twice in a row.
  • Passwords cannot contain patterns, such as “ASDF”, “ABC”, and “123”, or be composed of a single word, such as “password”, or “secret”.
  • Passwords must be changed every 60 days, and more than four characters must be changed each time.
  • Passwords can’t be reused.

Password randomness requirements may be derived from NIST Special Publication 800-63-1, December 8, 2008, Table A.2, “Required Password Entropy for Level 2 and various Password Lifetimes and Back-off Periods”. It specifies “Bits of Password Entropy Needed for Level 2” (single factor remote network authentication). The requirements range from 22 to 37 bits, depending on password lifetime and failure lockout periods. For purposes of randomly generated passwords, 37 bits of entropy means that the generator would need to produce at least two to the 37th power different, equally-likely choices, or about 137 billion passwords. The algorithm’s strength is well in excess of these requirements.

It is easy for a computer to generate a strong password that meets DOD requirements.  It is hard to generate one that a person could easily remember. If a person wants to memorize the sequence “EGBDF”, they structure it into something like “Every Good Boy Does Fine.” A random password generator needs to structure the information to make it easy enough for a person to memorize.

Here is the process that was implemented to create strong random passwords that can be memorized:

  1. Randomly draw three words from a list of about ten thousand words. Each word may be drawn more than once. The words are familiar words that people know how to spell, composed of between two and six unaccented letters. The first letter of each word is capitalized, and the rest of the letters are lowercase.
  2. Randomly select a two digit number between 10 and 99. This meets the DOD requirement of two numeric digits, while allowing for the fact that most people would not be comfortable with either zero or a number with a leading zero.
  3. Rearrange the three words and the two digit number in a grammatically sensible order to make it easy to remember. To do that, the program shuffles the three words and the two digit number into an order that looks like an English phrase. If it can’t be made into something that looks like an English phrase, then discard all three words and try again. This step is why the number of possible passwords is calculated using combinations with repetition.
  4. Randomly select a pair of punctuation characters from a list of 14 choices. Four of the choices will be left-right punctuation pairs: (), [], {}, and <>. The other ten will just use the same punctuation character twice.
  5. Randomly put the two punctuation characters before, after, or at both ends of the generated pass phrase.

The math is as follows, based on one million tests:

  • Combinations with repetition of 9,869 words randomly taken three at a time = 160,250,798,855.
  • Times 90, for the two digit random number.
  • Times 14, for the random punctuation.
  • Times 3, for the three different places that the punctuation can go.
  • Times the fraction of phrase arrangements that work (399,851 out of 1,000,000).

The result is 242,208,951,413,829, or 242 trillion.  Base 2 log is 47.78, yielding a strength of 47 bits.

Here is the process for creating a list of about 10,000 words:

  1. Select words composed of between two and six unaccented letters. The primary source for the words was data derived from the British National Corpus (BNC), as hosted on the “Phrases in English” (PIE) website.
  2. Determine the dominant grammatical usage for each of the words. The “Phrases in English” website tagged each word with its frequency in different ways. This served as a starting point, though much more work was necessary, such as tagging verbs as being either transitive or intransitive. English words are often hard to classify. “Fish”, for example: “1 fish” (singular noun), “2 fish” (plural noun), “go fish” (intransitive verb), “fish bone” (adjective).
  3. Discard articles (“the”, “an”, “a”), prepositions (“of”, “to”, “in”), conjunctions (“and”, “but”, “or”), pronouns (“I”, “he”, “it”), proper names (“Mary”, “Ohio”, “French”), linking verbs (“be”, “become”, “seem”), numbers (“six”, “sixth”), quantifiers (“both”, “some”), and interjections (“oops”, “ouch” , “@#$%&?!”).
  4. Reclassify mass nouns as either singular or count nouns, or discard them if neither seems appropriate. Reclassify irregular past participle verb forms as adjectives or discard them.
  5. Group the remaining words into 14 different types of nouns, adjectives, verbs and adverbs.
  6. Discard words that are unfamiliar, hard to spell, hard to classify, or inappropriate for polite workplace usage. Five and six character singular nouns were drastically reduced, since singular nouns were the hardest to incorporate into phrase structures.

The usual entropy per phrase estimates should not be applied to a randomly generated sentence. Natural language follows Zipf’s law (by George Kingsley Zipf). His formula shows that only a few words of vocabulary account for a majority of actual word usage. The artificial phrases from this program are composed of randomly selected words, giving all words an equal chance of being used. None of the top 50 most common English words are even included in the word list. Zipf’s law can still be used, however, to create a frequency list of vocabulary to use in the word list.

The usual entropy per word estimates should not be applied to the word list. Normal entropy estimates are based on the full language vocabulary. The word list is limited to words of between two and six letters, which is the set of words having the highest variation per character.

The word types (with examples) are as follows:

  • N – Noun: cat
  • Ns – Noun: cats
  • Ay – Adjective: icy
  • Aer – Adjective: icier
  • Aest – Adjective: iciest
  • Aly – Adverb: icily
  • T – Transitive verb: add
  • Ts – Transitive verb: adds
  • Ted – Transitive verb: added
  • Ting – Transitive verb: adding
  • I – Intransitive verb: moo
  • Is – Intransitive verb: moos
  • Ied – Intransitive verb: mooed
  • Iing – Intransitive verb: mooing

The NIST criteria could be met using as few as 1,000 words, so removing words from the list won’t hurt. There are more words that could be added, but most would be longer, harder to remember, harder to use in a phrase, or harder to spell. It would also take a huge number of them to alter the final password strength to any significant degree. If more words were to be added, then the best way would probably be based on the fact that words travel in packs, such as ice, icy, icier, iciest, icily, ices, iced, icing. The word list is sure to be missing some valid forms of common words.

The grammatical word sequence is obtained by referencing a list of acceptable phrase structures. The phrase structures were created manually, one at a time, by identifying the most commonly occurring combinations of the different word types and the two-digit number. These were evaluated to create arrangements that resembled common English phrase structures. Each of the phrase structures had to have between one and two nouns. In all, there were about 90 different combinations that looked acceptable. Although some of the combinations had more than one possible acceptable arrangement, each combination of types was assigned only one arrangement.

Other languages could probably use the same algorithm with little or no change to the code, since the vocabulary and grammar are both driven by data files. It should be a simple enough matter to swap out both the word list and phrase structure list to adapt the program for use with a language other than English.

The algorithm could be used either to make suggestions to users, or it could be used to force users to choose from dynamically-generated lists of passwords. Either approach will have both advantages and drawbacks.

The full algorithm has been implemented in both Java and Oracle PL/SQL. The PL/SQL version was the work of Curtis Copley. The Java version was the work of Gabriel Copley. The Java version makes use of permutation and combination functions. Gabriel Copley coded a factoradic permutation generator for PL/SQL, but Curtis Copley went with a hard-coded list of permutations. As a result, the PL/SQL program is less flexible regarding the number of words.

A three-word phrase generator should be suitable for most applications. A four-word pass phrase would require an unmanageably large phrase structure list. A two-word pass phrase would result in about 58 billion possibilities, or a little less than 36 bits of strength. That would be adequate for some applications, depending on password lifetime and lockout periods.

The DICEWARE passphrase method can be used to generate even stronger passwords. The advantage of a grammar-oriented approach is that the passwords should be easier to remember.

Infotecs GmbH has implemented a similar concept in a free program called ViPNet Password Roulette. It might be an alternative for those who are not programming oriented. However, the algorithm presented in this paper was designed specifically to comply with DISA and NIST guidelines, and everything is open in order to facilitate implementation in any environment.

For a more lengthy set of articles on pass phrases, see The Great Debates: Pass Phrases vs. Passwords by Jesper M. Johansson, Ph.D., ISSAP, CISSP, Security Program Manager, Microsoft Corporation. No prior work could be identified on the use of grammar in constructing random pass phrases.

The strength of the generated passwords has not yet been evaluated by the Defense Information Systems Agency (DISA) or any other agency. Its eventual approval or disapproval cannot be guaranteed. Both implementations use a cryptographically secure pseudo-random number generator (CSPRNG). Any other implementation must also use a CSPRNG in order to meet basic password generation requirements.

All research and development work on this algorithm was performed using personal time and equipment. No government resources were involved. Special thanks go to William H. Fletcher, Associate Professor at the United States Naval Academy (USNA), for his Phrases in English website and his encouragement to make this happen.

Code and data files

Currently, This site has implementations in Oracle PL/SQL, Java, Perl, and C#. If anyone ports this to another environment (PHP, Python, Tcl, shell, or some PDA language), please submit it so that others might be able to use it.

The files below are stored in plain-text format for maximum compatibility with various browser and firewall environments. Be sure to right-click on the links below and save to your computer with the proper file extension.

Data Files:

– word list for use with Java implementation
– phrase structure list for use with Java implementation

Oracle PL/SQL:

PL/SQL version by Curtis Copley:
password_pkg.sql –Oracle PL/SQL implementation of algorithm
– SQL to create table with list of words
– SQL to create table with list of valid phrase structures


Java version was written by Gabriel Copley. – Java implementation of algorithm


Perl version by J.D. Baldwin:

It contains extensive internal comments and documentation. See “perldoc ./” for more details, or run it with a -h switch.

You can download all the above files in a single zip archive: – All three implementations.


C# version by Jim Foster. –C# implementation of algorithm

Paul Sand at the University of New Hampshire also wrote a Perl version, available here:
Grammatically-Correct Random Pass Phrase Generator (in Perl)