0

Print lines of a file that don't exist in another file, ignoring the first two fields

awk -F'|' 'NR == FNR { $1 = ""; $2 = ""; seen[$0]++ } NR != FNR { orig = $0; $1 = ""; $2 = ""; if (!seen[$0]) print orig }' first.txt second.txt

November 27, 2023bashoneliners

Explanation

That is, for example when the first file contains:

1234|12|Bill|Blatt|programmer
3243|34|Bill|Blatt|dentist
98734|25|Jack|Blatt|programmer
748567|31|Mark|Spark|magician

And the second file contains:

123|12|Bill|Blatt|programmer
3243|4|Bill|Blatt|dentist
934|25|Jack|Blatt|prograbber
30495|89|Dave|Scratt|slobber

The lines that are unique in the second file ignoring the first two files are:

934|25|Jack|Blatt|prograbber
30495|89|Dave|Scratt|slobber

The one-liner is easier to see expanded to multiple lines:

awk -F'|' -v OFS='|' '
  NR == FNR {
    $1 = "";
    $2 = "";
    seen[$0]++;
 }
 NR != FNR {
   orig = $0;
   $1 = "";
   $2 = "";
   if (!seen[$0]) print orig
 }' first.txt second.txt

Here's how it works:

  • We pass two input files to the Awk script on the command line. This will be important.
  • -F'|' -- use pipe as the field separator.
  • Filter 1: NR == FNR -- this matches lines in the first input file.
  • We build a map of lines we've seen, without the first two fields. We do this by clearing the values of the first two fields ($1, $2), and using the rest ($0) as the key, and count it.
  • Filter 2: NR != FNR -- this matches lines not in the first input file.
  • We save the original line, compute the key, and if we haven't seen it yet, then we print the original line.

Notice that this approach also preserves the original order of the lines in the second file.