╦═╗╔═╗╔═╗╔═╗╦═╗╔╦╗ ╔═╗╔═╗╔═╗╔═╗╦═╗╔═╗╔╦╗╔═╗╦═╗ ╠╦╝║╣ ║ ║ ║╠╦╝ ║║ ╚═╗║╣ ╠═╝╠═╣╠╦╝╠═╣ ║ ║ ║╠╦╝ ╩╚═╚═╝╚═╝╚═╝╩╚══╩╝ ╚═╝╚═╝╩ ╩ ╩╩╚═╩ ╩ ╩ ╚═╝╩╚═ ============================================================================ The $/ variable. Input record separator. Most people never touch it. By default it's a newline. That's why <> reads one line at a time. But change $/ and you change how Perl sees the world. $/= undef; # Slurp entire file $/= ""; # Paragraph mode $/= \1024; # Read 1024 bytes at a time $/= "END"; # Read until "END" One variable. Four completely different behaviors. ============================================================================ PART 1: SLURP MODE ------------------ $/ = undef; my $entire_file = <>; That's it. The whole file in one scalar. No loop needed. In a one-liner: perl -0777 -ne 'print length' file.txt The -0777 sets $/ to undef. The 777 is octal for "nothing" - there's no character 0777, so Perl interprets it as "no separator at all." When would you use this? Multi-line regex matches: perl -0777 -pe 's/START.*?END/REPLACED/gs' file.txt Without slurp mode, that regex can't cross line boundaries. ============================================================================ PART 2: PARAGRAPH MODE ---------------------- $/ = ""; Empty string. Now Perl reads paragraph by paragraph - chunks of text separated by blank lines. perl -00 -ne 'print if /error/i' logfile.txt The -00 enables paragraph mode. Each $_ is a full paragraph. If any line in that paragraph contains "error", print the whole thing. Context around your matches. For free. .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/ ============================================================================ PART 3: FIXED-WIDTH RECORDS --------------------------- Here's where it gets weird. Set $/ to a reference to a number: $/ = \1024; Now <> reads exactly 1024 bytes at a time. Not lines. Bytes. perl -e '$/ = \16; print "<$_>\n" while <>' binary.dat Reads 16 bytes per iteration. Perfect for binary files with fixed record sizes. Database dumps. Network packet captures. Old mainframe files with 80-character records. This handles all of them. ============================================================================ PART 4: CUSTOM DELIMITERS ------------------------- $/ = "END"; Now Perl reads until it hits the literal string "END". perl -e '$/ = ""; print "---\n$_" while <>' data.xml Ghetto XML parsing. Each $_ contains everything up to and including the tag. Not proper parsing. But when you need something quick and dirty on a server with no XML libraries? This works. ============================================================================ PART 5: THE SWITCHES -------------------- Command line shortcuts: SWITCH EQUIVALENT MEANING ------ ----------------- --------------------------- -0 $/ = "\0" Null-separated (find -print0) -00 $/ = "" Paragraph mode -0777 $/ = undef Slurp mode -0x $/ = chr(x) Octal character code The -0 without a number means null byte - pairs perfectly with find -print0 and xargs -0: find . -name "*.log" -print0 | perl -0 -ne 'print if -M $_ > 7' Null-separated filenames. No issues with spaces or newlines in paths. ============================================================================ PART 6: BINARY CHUNKING ----------------------- Processing a large binary file in chunks: perl -e ' $/ = \4096; my $total = 0; while (<>) { $total += length; print STDERR "Read $total bytes\r"; } print STDERR "\nDone: $total bytes\n"; ' hugefile.bin Progress indicator for large file processing. Each read is exactly 4096 bytes (except possibly the last one). ============================================================================ PART 7: RECORD-ORIENTED DATA ---------------------------- Old-school fixed-width data: # Data file: 10-char name, 3-char age, 7-char salary # JohnSmith 034 0085000 # JaneDoe 028 0092000 perl -e ' $/ = \20; while (<>) { my ($name, $age, $salary) = unpack("A10 A3 A7", $_); print "$name: $age years, \$$salary\n"; } ' payroll.dat Twenty bytes per record. Unpack pulls out the fields. No parsing needed because the structure is fixed. ============================================================================ PART 8: MULTIPLE FILES ---------------------- When processing multiple files, $/ persists: perl -0777 -ne 'print "$ARGV: " . length . " bytes\n"' *.txt Each file is slurped separately. $ARGV tells you which file. But watch out - $. (line number) doesn't reset between files unless you explicitly close ARGV: perl -00 -ne 'print "$ARGV:$.: $_" }{ close ARGV' *.txt The butterfly operator strikes again. ============================================================================ PART 9: COMBINING WITH OUTPUT ----------------------------- There's also $\ - the output record separator: $/ = ""; # Read paragraphs $\ = "\n---\n"; # Print separator after each output while (<>) { print if /important/; } Every print automatically appends "---" between paragraphs. In one-liner form: perl -00 -l -ne 'print if /error/' logfile.txt The -l sets $\ to match $/ (sort of). Output gets auto-newlined. ============================================================================ PART 10: THE MAGIC TABLE ------------------------ Quick reference: $/ READS ------------ ------------------------------------ "\n" One line (default) undef Entire file "" Paragraph (text between blank lines) \N Exactly N bytes "STRING" Until STRING appears "\0" Until null byte SWITCH EQUIVALENT ------ ---------- -0 $/ = "\0" -00 $/ = "" -0777 $/ = undef -0NNN $/ = chr(oct("NNN")) ============================================================================ PART 11: REAL WORLD ------------------- Extracting functions from C code: perl -0777 -ne 'print "$1\n" while /^(\w+ \w+\([^)]*\))\s*{/gm' *.c Finding the longest line in a file: perl -0777 -ne 'print length($1) while /(.+)/g' file.txt | sort -n | tail -1 Processing CSV with embedded newlines (fields quoted): perl -0777 -ne ' while (/"([^"]*)"/) { my $clean = $1; $clean =~ s/\n/\\n/g; s/"[^"]*"/"$clean"/; } print ' data.csv The record separator is one of Perl's quiet superpowers. Most languages give you lines. Perl gives you whatever you want. ============================================================================ $/ / \ / \ | ?? | \ / \ / \/ Define your own reality ============================================================================ japh.codes