Perl Programming/Exercise 3

Reading and writing files

edit

Write a program to read a file from disk, and save it as a new text file. (make sure you're not working with an important document!) Try downloading a public-domain book [1] to test how your program works with large text files.

Using regular expressions to search text

edit

Modify your program to copy only lines of text which start with "Chapter 1" or other such identifier. Try identifying each chapter number as it's found and doing some simple maths with it. Display the number of chapters in the book, with how many lines each chapter contains.

Put a regular-expression in to search for repeated repeated words in the text. Make it find repetitions even when the capitalisation of the words is different.

Try to search for words beginning in de and ending in a vowel. Write a search to identify numbers in the text, whether they've got decimal places, - signs, etc. If you want, you can try to find them if they're written as text.

How many distinct proper nouns are there in the book you've selected? How many questions?

Doing search-and-replace

edit

Modify the file-copy program so that it searches for each instance of one word, and replaces it with another word in the output file. Check that the program has operated correctly. Would you trust it with an important document yet?

Write a program to capitalise the first letter of each sentence, and test it. Make sure it's not being misled by full-stops in numbers, abbreviations, and titles.

Write a program to search for numbers in a book, and add 10 to each number. Write a program to search for expressions in brackets (10+3/2) and replace them with the result (11.5). Check that it doesn't do anything dangerous when you write (print "hello";) some code in the expressions.

Benchmarking your programs

edit

Using either the Time::HiRes module, or a benchmarking module, modify one of your programs to display the time taken to run.

Try using a benchmarking system to identify which line of code takes the longest to run on one of your programs. Which line of code gets run the most often?

Putting it all together

edit

Write a program which searches a directory for files which match a supplied regular-expression. Display how long it took to run the program on a particular set of directories.

Modify the program to search a directory-tree, rather than just one directory.

Test how different search requests affect the time taken to do the search. See if you can find a regular-expression which never finishes (hit ctrl-c to stop a busy Perl program). Then try to re-write that regular-expression to do the same job but faster.

Answers

Previous exercise | Next exercise