╔╦╗╦ ╦╔═╗╦  ╦╔═╗╔═╗╔╦╗╔═╗
        ║║║ ║╠═╝║  ║║  ╠═╣ ║ ║╣
       ═╩╝╚═╝╩  ╩═╝╩╚═╝╩ ╩ ╩ ╚═╝
       ╔╦╗╔═╗╔╦╗╔═╗╔═╗╔╦╗╔═╗╦═╗
        ║║║╣  ║ ║╣ ║   ║ ║ ║╠╦╝
       ═╩╝╚═╝ ╩ ╚═╝╚═╝ ╩ ╚═╝╩╚═

============================================================================

Find duplicates. One of the most common tasks in sysadmin life.

    perl -ne 'print if $seen{$_}++'

That's it. Prints every line that appears more than once.

The trick is the post-increment. First time a line appears, $seen{$_}
is 0 (false), so nothing prints. Second time, it's 1 (true), so it
prints. Third time, still true, prints again.

Want only the second occurrence? We'll get there.

============================================================================

PART 1: HOW IT WORKS
--------------------

    print if $seen{$_}++

Break it down:

    PIECE           WHAT IT DOES
    ------------    ------------------------------------------
    $seen{$_}       Hash lookup - has this line been seen?
    ++              Post-increment - add 1 AFTER returning value
    $seen{$_}++     Returns OLD value (0 first time), then increments
    print if ...    Print if the condition is true (non-zero)

First encounter: $seen{$_} is undef (0 in numeric context).
Returns 0, then becomes 1. Condition false, no print.

Second encounter: $seen{$_} is 1.
Returns 1, then becomes 2. Condition true, print.

          .--.
         |o_o |
         |:_/ |
        //   \ \
       (|     | )
      /'\_   _/`\
      \___)=(___/

============================================================================

PART 2: VARIATIONS
------------------

Print duplicates only once (not every repeat):

    perl -ne 'print if $seen{$_}++ == 1'

The == 1 means "only on the second occurrence."

Print unique lines only (no duplicates at all):

    perl -ne 'print unless $seen{$_}++'

Flip the logic. Print first occurrence, skip all repeats.

Print lines that appear exactly N times (requires two passes):

    perl -ne '$c{$_}++; END { print for grep { $c{$_} == 3 } keys %c }'

============================================================================

PART 3: FIRST VS LAST OCCURRENCE
--------------------------------

First occurrence of each line:

    perl -ne 'print unless $seen{$_}++'

Last occurrence of each line:

    perl -ne '$last{$_} = $_; END { print values %last }'

This overwrites each time, so only the last survives.

But order is lost. Want last occurrence in order?

    perl -ne '$last{$_} = $.; END { print sort { $last{$a} <=> $last{$b} } keys %last }'

Store line numbers, sort by them at the end.

============================================================================

PART 4: CASE INSENSITIVE
------------------------

Ignore case when detecting duplicates:

    perl -ne 'print if $seen{lc $_}++'

The lc lowercases the key. "Hello" and "HELLO" are now the same.

Normalize whitespace too:

    perl -ne '$k = lc; $k =~ s/\s+/ /g; print if $seen{$k}++'

============================================================================

PART 5: FIELD-BASED DUPLICATES
------------------------------

Duplicate detection on a specific column:

    perl -ane 'print if $seen{$F[0]}++'

The -a splits each line into @F. This checks for duplicate first
fields only.

Duplicate IPs in a log:

    perl -ane 'print if $seen{$F[0]}++' access.log

Duplicate usernames in /etc/passwd:

    perl -F: -ane 'print if $seen{$F[0]}++' /etc/passwd

============================================================================

PART 6: COUNTING DUPLICATES
---------------------------

How many times does each line appear?

    perl -ne '$c{$_}++; END { print "$c{$_}: $_" for keys %c }'

Output like:

    3: this line appeared three times
    1: this line appeared once
    5: this line appeared five times

Sorted by count:

    perl -ne '$c{$_}++; END { print "$c{$_}: $_" for sort { $c{$b} <=> $c{$a} } keys %c }'

Most frequent first.

============================================================================

PART 7: ADJACENT DUPLICATES
---------------------------

The uniq command only removes adjacent duplicates:

    perl -ne 'print unless $_ eq $last; $last = $_'

Line must equal the previous line to be skipped.

This is faster (no hash) but only catches consecutive repeats:

    aaa
    aaa    <- removed
    bbb
    aaa    <- NOT removed, not adjacent

============================================================================

PART 8: REAL WORLD EXAMPLES
---------------------------

Find duplicate lines in a config file:

    perl -ne 'print "$.: $_" if $seen{$_}++' config.ini

Includes line numbers so you can find them.

Duplicate entries in /etc/hosts:

    perl -ane 'print if $seen{$F[1]}++' /etc/hosts

Checks hostname field (second column).

Duplicate SSH keys:

    perl -ne 'print if $seen{(split)[1]}++' ~/.ssh/authorized_keys

Keys are in the second field. Finds if someone's key is listed twice.

Duplicate cron jobs:

    crontab -l | perl -ne 'print if $seen{$_}++'

============================================================================

PART 9: MEMORY CONSIDERATIONS
-----------------------------

The hash stores every unique line. For huge files with many unique
lines, this eats memory.

For massive files, consider:

    sort file.txt | uniq -d

External sort handles files larger than RAM. But loses original order.

Or process in chunks if you only care about recent duplicates:

    tail -10000 huge.log | perl -ne 'print if $seen{$_}++'

============================================================================

PART 10: THE FAMILY
-------------------

These patterns are related:

    perl -ne 'print if $seen{$_}++'          # All duplicates
    perl -ne 'print if $seen{$_}++ == 1'     # Each duplicate once
    perl -ne 'print unless $seen{$_}++'      # Unique lines only
    perl -ne '$c{$_}++; END{print for grep{$c{$_}>1}keys%c}'  # Dupes, one each

The post-increment idiom is the heart of all of them.

============================================================================

PART 11: WHY POST-INCREMENT
---------------------------

Why $seen{$_}++ instead of ++$seen{$_}?

Pre-increment returns the NEW value:

    ++$seen{$_}    # Returns 1 on first encounter (true!)

Post-increment returns the OLD value:

    $seen{$_}++    # Returns 0 on first encounter (false!)

With pre-increment, everything prints. The ++ happens before the
return. Post-increment is what makes the logic work.

============================================================================

PART 12: COMBINING WITH OTHER PATTERNS
--------------------------------------

Duplicates matching a pattern:

    perl -ne 'print if /error/i && $seen{$_}++'

Only errors, and only repeated ones.

Duplicates across multiple files:

    perl -ne 'print "$ARGV: $_" if $seen{$_}++' *.log

Shows which file the duplicate is in.

Duplicates within a time window (log files):

    perl -ane '
        $t = $F[0]; 
        %seen = () if $t ne $last_t; 
        print if $seen{$_}++; 
        $last_t = $t
    ' timestamped.log

Resets the seen hash when the timestamp changes.

============================================================================

                    $seen{$_}++
                        |
                   +----+----+
                   |         |
                first     again
                   |         |
                 skip      print

         The post-increment trick

============================================================================

japh.codes