Find all the unique 4-letter words in a text

 $ cat ipsum.txt | perl -ne 'print map("$_\n", m/\w+/g);' | tr A-Z a-z | sort | uniq | awk 'length($1) == 4 {print}'

— by Janos on Jan. 29, 2012, 10:28 p.m.


  • The perl regex pattern m/\w+/g will match consecutive non-word characters, resulting in a list of all the words in the source string
  • map("$_\n", @list) transforms a list, appending a new-line at the end of each element
  • tr A-Z a-z transforms uppercase letters to lowercase
  • In awk, length($1) == 4 {print} means: for lines matching the filter condition "length of the first column is 4", execute the block of code, in this case simply print