Idiomatic Perl: Counting the Number of Times a Character Occurs in a String
From time to time you may want to count the number of occurrences of a character inside a string. You could loop from the beginning of the string to the end:
my $s = "Testing one two three.";Ugly. This is going to take all day with all those calls to
my $m = 0;
for (my $i = 0; $i < length($s); $i++) {
$m++ if (substr($s, $i, 1) eq 'e');
}
substr
.* You could split
the string. There's a special split
pattern //
that divides the input string into characters. So perhaps:my $n = 0;Maybe this doesn't seem quite right to you yet. There must be a shorter way to do this in Perl, right? Yes. The
for (split //, $s) {
$n++ if $_ eq 'e';
}
tr///
(transliterate characters) operator can count characters for you. The tr///
operator returns the number of characters it changed. For example:my $example = "Mixed Case";The value returned from
my $changed = $example =~ tr/a-z/A-Z/; # make everything uppercase
print $changed, "\n"; # 9
tr///
above is 9, which is the number of characters changed to uppercase. Armed with this knowledge, you might think of:my $c = $example =~ tr/e/e/; # change 'e' to 'e' ...which will work just fine. This is perfectly reasonable, but there's a shortcut. If you omit the second argument to
tr///
, no characters are changed, but tr///
still returns a count of the character(s) in its first argument.my $c = $example =~ tr/e//; # counts the number of 'e' just as above
my $c_lower = $example =~ tr/a-z//; # counts lowercase letters in $example
tr///
is so smart, in fact, that it knows when you aren't changing the string, meaning that you can write things like this:my $count_ignore_case = lc($example) =~ tr/t//;Normally,
tr///
requires a modifiable value, but if it won't be making changes, you can supply a constant argument like a result from an expression.-joseph
*Not really, but ouch.
6 Comments:
a slight variation on the idiom you propose could be:
my $x = 'test';
my $n_of_t = () = ($x =~ /t/g);
this is probably uglier (and slower) than using tr///, but I find it a nice idiom :-).
and it does saves you an lc() when you need to ignore case:
my $n_of_t = () = ($x =~ /t/gi);
cheers,
Aldo
Thanks for the example. Just one tiny thing I think:
This line:
print $changed, "\n"; # 9
prints 7
Yup, nice catch.
% perl -le '$a = "Mixed Case"; print $a =~ tr/a-z/A-Z/'
7
Great trick! I have a similar problem, where I have a series of characters ('a' through 'z') that I want to count in a string. I tried tr/$_//; but soon found that you can't use variables in a transliteration regex, unless you wrap it in exec(), which I hear is costly. I could try Aldo's approach:
for $char ('a'..'z'){
my $x = 'text';
my $count = () = ($x =~ /$char/g);
}
which works great. Still, I was wondering if anyone has any other ideas? I'll be using this function a whole lot, so speed is a must.
Thanks for all your help!
Dex
Dexter,
If you have a range of characters that may be changing, you'll have to use eval for each different range of characters. But this isn't necessarily difficult. The easiest way is probably a closure:
sub counter {
my $set = shift;
# print 'sub { return shift =~ tr/' . $set . '//; }' . "\n";
return eval 'sub { return shift =~ tr/' . $set . '//; }';
}
my $digit_ctr = counter('0-9');
my $letter_ctr = counter('a-zA-Z');
my $backslash_ctr = counter('\\\\');
my $newline_ctr = counter('\n');
my $str = '123abcABC[]\\' . "\n";
print "digits: " . $digit_ctr->($str) . "\n";
print "letters: " . $letter_ctr->($str) . "\n";
print "backslashes: " . $backslash_ctr->($str) . "\n";
print "newlines: " . $newline_ctr->($str) . "\n";
The subroutine "counter" creates a reference to an anonymous subroutine (in this case, a particular type called a closure because of the way the my variable $set is used) that takes one argument and does the desired counting using tr///. Each call to "counter" creates a subroutine by compiling a bit of Perl code, but you only need to call "counter" once for each class of characters you want to count. Once you have the subroutine that you want (or subroutines) you can reuse them as often as you like via the coderef call syntax $coderef->(args...).
Ref: comment by aldo.
my $x = 'test';
my $n_of_t = () = ($x =~ /t/g);
this is probably uglier (and slower) than using
Is it true that regex is slower
than tr ?
mallah@tradeindia.com
Post a Comment
<< Home