╔═╗╔═╗╔═╗╔╦╗╔═╗╔═╗ ║ ╦║ ║╠═╣ ║ ╚═╗║╣ ╚═╝╚═╝╩ ╩ ╩ ╚═╝╚═╝ ╔═╗╔═╗╔═╗╦═╗╔═╗╔╦╗╔═╗╦═╗ ║ ║╠═╝║╣ ╠╦╝╠═╣ ║ ║ ║╠╦╝ ╚═╝╩ ╚═╝╩╚═╩ ╩ ╩ ╚═╝╩╚═ ============================================================================ Count things. That's what this operator does. Count matches, count splits, count elements. Five characters that force list context and return a number: =()= The goatse operator. Yes, the name is exactly what you think it is. Turn your head sideways, squint a little. Perl hackers have a dark sense of humor. But don't let the crude name fool you. This thing is genuinely useful. ============================================================================ PART 1: THE TRICK ----------------- How many words in a string? my $count = () = $string =~ m~\w+~g; That's it. $count now holds the number of matches. Without the =()= wrapper: my $count = $string =~ m~\w+~g; That gives you 1 (true) or 0 (false). Boolean, not count. The goatse forces list context on the regex, making it return all matches. Then it counts them. ============================================================================ PART 2: HOW IT WORKS -------------------- Break it down piece by piece: my $count = () = $string =~ m~\w+~g; PIECE WHAT IT DOES ------------------ ------------------------------------------ $string =~ m~\w+~g Match all words, return list of matches () = Assign to empty list (forces list context) my $count = Assign the COUNT of that list to $count The magic is the empty parentheses. When you assign a list to (), Perl evaluates the right side in list context but discards the actual values. What propagates leftward is the count. It's the same reason this works: my $count = () = (1, 2, 3, 4, 5); # $count is 5 .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/ ============================================================================ PART 3: COUNTING MATCHES ------------------------ Count vowels in a string: my $text = "The quick brown fox"; my $vowels = () = $text =~ m~[aeiou]~gi; print "Vowels: $vowels\n"; # Vowels: 5 Count IP addresses in a log: my $log = slurp('access.log'); my $ips = () = $log =~ m~\d+\.\d+\.\d+\.\d+~g; print "Found $ips IP addresses\n"; Count lines matching a pattern: my @lines = ; my $errors = () = grep { m~ERROR~ } @lines; print "Error count: $errors\n"; ============================================================================ PART 4: COUNTING SPLITS ----------------------- How many fields in a CSV line? my $line = "one,two,three,four,five"; my $fields = () = split m~,~, $line; print "Fields: $fields\n"; # Fields: 5 Count path components: my $path = "/usr/local/bin/perl"; my $depth = () = split m~/~, $path; print "Depth: $depth\n"; # Depth: 5 Note: split returns empty strings for leading delimiters, so /usr gives you ("", "usr", "local", "bin", "perl") - five elements. ============================================================================ PART 5: IN ONE-LINERS --------------------- Count words in a file: perl -0777 -ne 'print scalar(() = m~\w+~g), "\n"' file.txt The scalar() is needed because print provides list context. We want the count, not the matches. Count matches per line: perl -lne '$n = () = m~the~gi; print "$n: $_"' file.txt Output: 2: The cat and the dog 0: No matches here 1: Another the appears ============================================================================ PART 6: WHY NOT JUST USE SCALAR? -------------------------------- You might think: my $count = scalar($string =~ m~\w+~g); But that doesn't work. The scalar() doesn't change the context of the regex match - it just converts the result. You still get 1 or 0. The goatse works because the assignment to () happens BEFORE the assignment to $count. Context flows right to left in Perl. my $count = () = $string =~ m~\w+~g; ^ | This assignment forces list context on everything to its right ============================================================================ PART 7: VARIATIONS ------------------ Sometimes you see it written with variables instead of empty parens: my $count = my @dummy = $string =~ m~\w+~g; Same effect, but now @dummy holds the actual matches. Useful if you want both the count AND the matches. Or with the //g in different positions: my $count = () = m~pattern~g for $string; Statement modifier form. Same result. ============================================================================ PART 8: GOTCHAS --------------- Watch out for captures: my $text = "abc 123 def 456"; my $count = () = $text =~ m~(\w+)~g; # $count is 4 (the captured groups) my $count = () = $text =~ m~\w+~g; # $count is also 4 (the full matches) With captures, you get the captured parts. Without, you get the full matches. Same count here, but the actual strings differ. Multiple captures multiply your count: my $count = () = $text =~ m~(\w)(\w)~g; # $count is 8 (two captures x four matches) ============================================================================ PART 9: REAL-WORLD USES ----------------------- Validate credit card format (count digits): my $cc = "4111-1111-1111-1111"; my $digits = () = $cc =~ m~\d~g; die "Invalid card" unless $digits == 16; Count function calls in code: my $code = slurp('script.pl'); my $prints = () = $code =~ m~\bprint\b~g; print "Found $prints print statements\n"; Check minimum word count: my $essay = ; my $words = () = $essay =~ m~\w+~g; die "Too short!" if $words < 500; ============================================================================ PART 10: THE NAME ----------------- Look, I'm not going to explain the name. If you don't know, consider yourself lucky. If you do know, I'm sorry. Some people call it the "saturn operator" (looks like the planet with rings). Others call it the "countof operator" (what it does). But goatse is what stuck, because Perl culture is what it is. The operator itself is elegant. The name is... memorable. ============================================================================ =( )= | count | \_____/ Forcing list context since 1997 ============================================================================ japh.codes