Learning How to Program in Perl

Dr. David J. Ritchie, Mr. Jim Skrine, Mr. Bruce Webber

Washington Junior High Computer Club

April 22, 1998

Summary

This is the third of several handouts for a short course in learning how to program in a computer language called "perl".

1.0 Introduction

This is the third of several handouts for a short course in learning how to program in a computer language called "perl".

This handout will cover more features of the language in Parts 1 and 2. Again, the treatment will be very schematic because it is assumed that you are trying out the features as you go along.

One of the best ways to learn any language is to read it and to try to understand what is being said. For computer languages, this is best done by adding comments to a program.

In Part 3, the handout will cover a sample, partially-written Hangman game. Here you are asked to add comments to the sample code to say to the human reader what the perl code is doing.

2.0 Language Features - Part 1

This first part focuses on control structures. These are perl "sentences" that change the flow of the program from that of doing one sentence after another to that of doing, for example, sentence A if a scaler variable is greater than zero and sentence B if the scaler variable is less than or equal to zero.

2.1 Statement Blocks

First, let us consider "statement blocks." Statement blocks are groups of perl "sentences." They are the simplest "control structure" in that they mandate no change in the program flow. Within a "statement block" of otherwise ordinary statements, control proceeds from one statement to another.

In a sense, statement blocks are like paragraphs in English or another human language. In English, you indicate a new paragraph by indenting the first line. Not only does that indicate the start of the new paragraph but it automatically ends the previous paragraph.

In Perl, you indicate a "statement block" by a curly brace. The opening curly brace { indicates the start of a statement block. The closing curly brace } indicates the end of the statement block. Because it is very important that the opening and closing curly braces match, you generally put them in the same column on the page so that you can look down the page and spot the matching braces.

For example, here is a statement block:

{
$a = "Hello";
$b = 1.23;
}
It is also possible to have a statement block on one line. Here is an example.
{$a = 1.23};
Now, it is actually possible to leave off the final semicolon in this case but you can include it if you wish.

2.2 if and unless

Now we consider the "if" and the "unless" control statements. The form is that of a control expression and a statement block. If the control expression is true, the statement block is executed. If it is false, the statement block is skipped.

For example,

# example 1
if ( $a == 4.00 ) { $b = 0}

# example 2
if ( $a <= 4.00) {

$b = 17;
$c = $b + 1;
}
# example 3
if ( $a > 4.00 ) {
$d = "help";
} else {
$d = "go away"
}
You can assume that "true" and "false" are defined so as to do the right thing. But if you really want to know, false means either 0 or the null string and true is anything else. You can also use the "unless" control statement. This is the reverse of the "if" control statement.
# example 4
unless ( $a > 4.00 ) {
$d = "go away";
} else {
$d = "help"
}

2.3 while and until

The next form of control statement is the "while" and the "until". Again, the form is that of a control expression and a statement block. However, now the statement block will be executed over and over again as long as the control expression is true.

When it is false, the execution of the block will end.

For example,

# example 5
$a = 8.00
while ( $a > 4.00 ) {
$a = $a - 1.0
}
# example 6
$a = 0
$b = 0
while ( $a <= 4.00) {
$b = $a + $b;
$a = $a + 1
}
Here is an example of the until statement.
 
# example 6
$a = 0.00
until ( $a > 4.00) {
$a = $a + 1.0
}


Again, while and until are the reverse of one another. Note that the condition for continuing to loop is at the top of the block.

If you want instead to have the condition for continuing to loop to be at the bottom of the block, then you use the "do (...) while {...}" or the "do (...) until {...}" statement.

For example,

# example 7
do {
$a = $a + 1
} until ( $a > 3.0 );

# example 8
do {

$a = $a +1;
$b = $b + $a;
} while ( $a> 3.0 );

2.4 for

The "for" statement starts with an initial expression, evaluates a test expression to see if it should do the loop, does the loop and then executes a "re-init" expression followed by evaluation of the test expression again.

For example,
 

# example 8
for ( $a = 1; $a < 4.00 ; $a++) {
$b = $a ;
$c = $b * 2;
}


This will begin with $a having the value 1, test to make sure that $a is less than 4.00, execute the block, add 1 to $a, test to make sure that $a is still less than 4.00 and execute the block again. When $a is finally no longer less than 4.00 at the time of test, it will go onto the next block.

2.5 foreach

The final control structure is the "foreach" structure. This statement takes a list of values, assigns them one at a time to a scaler variable, and executes a block. When the block is done, the process is repeated with the next item in the list. When all items have been handled, the next statement block is executed.

For example,

# example 9
@a = (1, 3, 5, 7, 9);
$sum = 0;
foreach $b ( @a ) {
$sum = $sum + $b;
}

3.0 Language Features - Part 2

This second part focuses on "regular expressions." Regular expressions are ways of specifing patterns of characters. Perl lets you specify such patterns and ask whether or not another string "matches" that pattern.

3.1 Simple Regular Expressions

The simplest regular expression is simply a string of characters. The string "abc" is a regular expression. When used as a pattern to match against an unknown character string (perhaps one that has been read from <STDIN>), the pattern "abc" matches the unknown string whenever the unknown string contains the letters "a", "b", and "c" in sequence.

Here is an example:

# example 10
if ( $a =~ m/abc/ ) { print (" The variable has an `abc'!")}
Actually, you can leave out the m (which stands for match) if you use the / (slash) as a delimiter. So you can say...
# example 11
if ( $a =~ /abc/ ) { print (" we have a match!") }

3.2 Not So Simple Regular Expressions...

Regular expressions have lots of options. While they may seem complicated, let me tell you that it is much simpler to learn and use their seeming complexity than it is to program the equivalent thing yourself!

So, here are some additional regular expression pattern specifications. Try not to get too blown away. It will seem much simpler after you have used it for a while.

Single Character: A single character indicates itself.

Example:

if ($a =~ /a/) {print ("this")}
The print is executed if $a contains an "a".

Any Single Character: Dot (".") indicates any single character.

Example:

if ($a =~ /./) {print ("this")}
The print is executed if $a contains any character.

Character Class: Square brackets specify any single character from the class given within the brackets.

Example:

if ($a =~ /[aeiou]/ {print ("this")}
The print is executed if $a contains an "a" or an "e" or an "i" or an "o" or a "u" (but not AEIOU!).

Some short cuts exist: [A-Z] means any UPPER CASE letter from A to Z. [a-z] means a similar thing but for lower case. Ditto for [0-9]. Also, [a-zA-Z], etc., etc.

The caret ("^") (which is above the 6 on many keyboards) says "not". That is, a pattern which specifies not any lower case alphabetic character is: [^a-z].

There are also the following: \d matches any digit. \w matches any word character. \s matches any space-type character (space, return, tab, linefeed, formfeed). The upper case versions (\D, \W, \S) match the not of the lower case versions.

3.3 Grouping

Patterns composed of groups of the above are indicated as follows.

Sequence: This means simply that you can specify a pattern by writing a sequence of the pattern specifiers previously defined.

Example:

if ($a =~ /../) {print (" The variable has two characters.")}


Multipliers: Asterisk ("*") means zero or more of the immediately previous character or character class.

Example:

if ($a =~ /z*/) {print ("this")}
The print is executed if $a contains zero or more z's.

The plus sign ("+") means one or more of the immediately previous character.

The question mark ("?") means zero or one of the immediately previous character.

Note that the pattern specification is "greedy." It specifies the largest pattern that will match so if ($a =~ /z+/) {print ("this")} will match the largest number of z's in the pattern.

Finally, there is the general multiplier of {min, max}. This specifies a pattern which means no fewer than min and no larger than max.

Example:

if ($a =~ /a{5,10}/) {print ("this")}
The print is executed if $a contains between 5 and 10 a's in a row. {5, } means 5 or more. { ,5} means 5 or less. {5} means exactly 5.

Alternation: Vertical bar indicates alternate possibilities.

Example:

if ($a =~ /a|b|c/) {print ("this")}
The print is executed if $a contains either a, b, or c.

3.4 Anchoring Patterns

There are notations which require that a pattern be "anchored," say, at the beginning of a string or the end of a string, etc.

A beginning of string anchor is indicated by ^. An end of string anchor is indicated by $.

A word boundary anchor is indicated by \b. The negation is \B (not a word boundary anchor.

Example:

if ($a =~ /^a*b$/) {print ("this")}
The print is executed provided $a contains string which has an "a" at the beginning and a "b" at the end.

3.5 Substitutions

Now being able to specify patterns, there are a number of things we can do.

We can substitute some characters in place of a pattern.

# example 12
$old = "Hello, Word";
$new = "Goodbye";
$old =~ s/Hello/$new/;
In this example, the string "Hello" is replaced with the string "Goodbye" in the variable $old.

3.6 Split

The split function takes as input a regular expression and a string. It looks for all occurences of the regular expression within that string. What doesn't match is returned as a list.
# example 13
$alphabet = "abcdefghijklmnopqrstuvwxy"
@alphalist = split (//, $alphabet)
The example splits on "null", putting everything else (non-null) out as a list.

3.7 Join

The join function takes a glue string and a list of values. It glues the values together as a string with the glue string between each value.
# example 14
$glue = ":"
join($glue, @alphalist)
The example joins the alphabet list from the previous example back together as a string with colons between each character.

4.0 The Hangman Spelling Game

This third part asks you to add comments for the human being to this (partially written) game.
#
#+
# Our first hangman game...
#
# Date Author Modifications
# 4/20/98 D. Ritchie Original
#-
#
# initialize our words
@Words = qw(
ceiling
jealous
possibly
occasion
curious
mischief
opposite
difficult
niece
tremendous
) ;
$NumWords = @Words;
#
# Ask if person wants to play
print ("How about a nice game of Hangman? \n");
print ("Please answer Y or N and press the Return key.\n");
#
# Get a line of response
$response = <STDIN>;
#
# Did they at least answer Y or N?
if ($response =~ /[nN]|[yY]/) {
print ("Thank you for answering Y or N.\n") ;
} else {
print ("You didn't answer Y or N! Goodbye!\n") ;
}
#
# Did they answer N?
if ($response =~ /[nN]/ ) {
print ("I'm sorry you don't want to play. Goodbye!\n") ;
}
#
# Did they answer Y?
if ($response =~ /[yY]/ ) {
print ("Great! Let's get started!\n") ;
#
# For each word in our word list...
foreach $Word (@Words) {
@letts = split //, $Word ;
$letts = @letts ; ;
print ("I'm thinking of a word. It has $letts letters.\n") ;
$need = "Y" ;
while ($need eq "Y") {
print ("Please enter a single letter as your guess.\n") ;
print ("Then, press the Return key\n") ;
$response = <STDIN> ;
chop ($response) ;
@respletts = split //, $response ;
$respletts = @respletts ;
if ($response =~ /[a-z]|[A-Z]/ && $respletts == 1) {
$need = "N" ;
} else {
$need = "Y" ;
print ("You must enter a single letter! Try again! \n") ;
}
}
#
# We got a letter...is it in the word?
$Where = index ($Word,$response) ;
if ($Where < 0) {
print (" Your guess $response was not in the word!\n");
} else {
print (" Your guess $response WAS in the word!\n");
}
}
}