╔╦╗╔═╗╔═╗╦╔═╗ ╔═╗╦═╗╔═╗╦ ╦ ║║║╠═╣║ ╦║║ ╠═╣╠╦╝║ ╦╚╗╔╝ ╩ ╩╩ ╩╚═╝╩╚═╝ ╩ ╩╩╚═╚═╝ ╚╝ ============================================================================ Here's the scenario. You're a sysadmin. Logs rotate nightly and get compressed. Your directory looks like this: access.log access.log.1 access.log.2.gz access.log.3.gz access.log.4.gz Some compressed, some not. You need to grep through all of them. The ugly way: cat access.log access.log.1 zcat access.log.*.gz Two commands. Or write a wrapper script. Or... perl -e ' s/^([^'"'"']+\.gz)\z/gzcat '"'"'$1'"'"' |/ for @ARGV; print while <>; ' access.log* One command handles both. The diamond operator sees pipes and plain files, doesn't care which is which. ============================================================================ PART 1: THE TRICK ----------------- Before the diamond operator runs, we rewrite @ARGV: s/^([^']+\.gz)\z/gzcat '$1' |/ for @ARGV; Files ending in .gz become: access.log.2.gz -> gzcat 'access.log.2.gz' | That trailing pipe tells Perl "run this command and read its output." The diamond operator sees a pipe, executes gzcat, and gives you decompressed data. Plain files pass through unchanged. The regex doesn't match them. ============================================================================ PART 2: HOW PIPES IN @ARGV WORK ------------------------------- The diamond operator has magic behaviors: ARGV ENTRY PERL DOES ---------------- ---------------------------------- filename open(ARGV, '<', 'filename') command | open(ARGV, '-|', 'command') | command open(ARGV, '|-', 'command') A trailing pipe means "read from this command." A leading pipe means "write to this command." This is old-school two-argument open behavior. Usually dangerous. Here, we're exploiting it deliberately. .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/ ============================================================================ PART 3: THE REGEX EXPLAINED --------------------------- s/^([^']+\.gz)\z/gzcat '$1' |/ PIECE MEANING -------- ------------------------------------------ ^ Start of string ( Begin capture [^']+ One or more chars that aren't single quotes \.gz Literal .gz ) End capture \z End of string gzcat '$1' Replacement: gzcat command with filename | Trailing pipe - read from command Why [^']+ instead of .+? Security. If someone passes a filename with a single quote in it, they could inject commands. The [^'] prevents that. evil'file;rm -rf /.gz # Won't match - has quote ============================================================================ PART 4: MULTIPLE COMPRESSION FORMATS ------------------------------------ Handle .gz and .Z files: s/^([^']+\.(?:gz|Z))\z/gzcat '$1' |/ for @ARGV; Handle .bz2 too: s/^([^']+\.bz2)\z/bzcat '$1' |/ for @ARGV; s/^([^']+\.(?:gz|Z))\z/gzcat '$1' |/ for @ARGV; Two substitutions, handles three formats. Or combined: for (@ARGV) { s/^([^']+\.(?:gz|Z))\z/gzcat '$1' |/; s/^([^']+\.bz2)\z/bzcat '$1' |/; s/^([^']+\.xz)\z/xzcat '$1' |/; } ============================================================================ PART 5: PRACTICAL SCRIPT ------------------------ Make it a reusable tool: #!/usr/bin/env perl # zperl - run perl with transparent decompression for (@ARGV) { next if /^\s*-/; # Skip flags s/^([^']+\.(?:gz|Z))\z/gzcat '$1' |/; s/^([^']+\.bz2)\z/bzcat '$1' |/; s/^([^']+\.xz)\z/xzcat '$1' |/; s/^([^']+\.zst)\z/zstdcat '$1' |/; } # Hand off to regular perl with modified @ARGV exec 'perl', @ARGV; Now: zperl -ne 'print if /error/' access.log* Handles all compression formats transparently. ============================================================================ PART 6: GREP ALL LOGS --------------------- Search through rotated logs: perl -e ' s/([^'"'"']+\.gz)$/gzcat '"'"'$1'"'"' |/ for @ARGV; print "$ARGV: $_" while <>; ' /var/log/syslog* | grep -i error The $ARGV variable tells you which file (or pipe) you're reading from. Or pure Perl: perl -e ' s/([^'"'"']+\.gz)$/gzcat '"'"'$1'"'"' |/ for @ARGV; print "$ARGV: $_" if /error/i while <>; ' /var/log/syslog* ============================================================================ PART 7: THE QUOTE NIGHTMARE --------------------------- Those ugly quote escapes: '"'"' This is shell escaping. Inside single quotes, you can't use single quotes. So you: 1. End single quote: ' 2. Add escaped single quote: "'" 3. Resume single quote: ' Result: '"'"' produces a literal single quote. In a script file, it's cleaner: s/^([^']+\.gz)\z/gzcat '$1' |/ for @ARGV; No shell escaping needed. ============================================================================ PART 8: SECURITY NOTES ---------------------- This trick uses two-argument open semantics. That's usually bad. The [^']+ in the regex protects against quote injection. But there are other special characters: >file # Opens for writing |command # Pipe to command command| # Pipe from command If someone controls your filenames, they control your system. Safe version: for (@ARGV) { if (/\.gz$/ && -f $_) { # Must be a real file $_ = "gzcat \Q$_\E |"; # \Q quotes metacharacters } } The -f checks it's a regular file. The \Q escapes special chars. ============================================================================ PART 9: MODERN ALTERNATIVE -------------------------- PerlIO::gzip does this cleanly: perl -MPerlIO::gzip -ne 'print if /error/' access.log.gz But it requires installing a module. The @ARGV trick works with stock Perl. For serious work, use the module. For quick hacks on random servers where you can't install anything? Magic @ARGV. ============================================================================ PART 10: OTHER @ARGV TRICKS --------------------------- Set default files: @ARGV = glob('/var/log/messages*') unless @ARGV; print while <>; Expands to all message logs if no args given. Filter @ARGV: @ARGV = grep { -f $_ && -r $_ } @ARGV; Only process readable regular files. Add STDIN explicitly: push @ARGV, '-' unless @ARGV; The dash means STDIN. Now your script works with pipes. ============================================================================ PART 11: THE FULL MONTY ----------------------- Putting it all together: #!/usr/bin/env perl use strict; use warnings; # Default to common log locations if no args @ARGV = glob('/var/log/syslog*') unless @ARGV; # Handle compressed files for (@ARGV) { next unless -f; # Skip non-files s/^([^']+\.gz)\z/gzcat '$1' |/; s/^([^']+\.bz2)\z/bzcat '$1' |/; s/^([^']+\.xz)\z/xzcat '$1' |/; } # Add STDIN if nothing left @ARGV = ('-') unless @ARGV; # Process while (<>) { print "$ARGV:$.: $_" if /error/i; } ============================================================================ @ARGV | +-------+-------+ | | | file file.gz - | | | open gzcat STDIN | | | +-------+-------+ | <> Transparent decompression ============================================================================ japh.codes