Python for Everybody
Chapter 10
Exercise 10.1
"""
Exercise 10.1: Revise a previous program as follows: Read and parse the "From"
lines and pull out the addresses from the line. Count the number of messages
from each person using a dictionary.
After all the data has been read, print the person with the most commits by
creating a list of (count, email) tuples from the dictionary. Then sort the
list in the reverse order and print out the person who has the most commits.
Sample line:
From stephen.marquard@uct.ac.az Sat Jan 05 09:14:16 2008
Enter a file name: mbox-short.txt
cwen@iupui.edu 5
Enter a file name: mbox.txt
zqian@umich.edu 195
Python for Everybody: Exploring Data Using Python 3
by Charles R. Severance
Solution by Jamison Lahman, June 1, 2017
"""
dictionary_addresses = dict() # Initialize variables
lst = list()
fname = input('Enter file name: ')
try:
fhand = open(fname)
except FileNotFoundError:
print('File cannot be opened:', fname)
quit()
for line in fhand:
words = line.split()
if len(words) < 2 or words[0] != 'From':
continue
else:
if words[1] not in dictionary_addresses:
dictionary_addresses[words[1]] = 1 # First entry
else:
dictionary_addresses[words[1]] += 1 # Additional counts
for key, val in list(dictionary_addresses.items()):
lst.append((val, key)) # Fills list with value, key of dict
lst.sort(reverse=True) # Sorts by highest value
for key, val in lst[:1]: # Only displays the largest value
print(key, val)
Exercise 10.2
"""
Exercise 10.2: This program counts the distribution of the hour of the day
for each of the messages. You can pull the hour from the "From" line by finding
the time string and then splitting that string into parts using the colon
character. Once you have accumulated the counts for each hour, print out the
counts, one per line, sorted by hour as shown below.
Sample line: From stephen.marquard@uct.ac.az Sat Jan 05 09:14:16 2008
Sample Execution:
python timeofday.py
Enter a file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
Python for Everybody: Exploring Data Using Python 3
by Charles R. Severance
Solution by Jamison Lahman, June 1, 2017
"""
dictionary_hours = dict() # Initialize variables
lst = list()
fname = input('Enter file name: ')
try:
fhand = open(fname)
except FileNotFoundError:
print('File cannot be opened:', fname)
quit()
for line in fhand:
words = line.split()
if len(words) < 2 or words[0] != 'From':
continue
col_pos = words[5].find(':')
hour = words[5][:col_pos]
if hour not in dictionary_hours:
dictionary_hours[hour] = 1 # First entry
else:
dictionary_hours[hour] += 1 # Additional counts
for key, val in list(dictionary_hours.items()):
lst.append((key, val)) # Fills list with hour, count of dict
lst.sort() # Sorts by hour
for key, val in lst:
print(key, val)
Exercise 10.3
"""
Exercise 10.3: Write a program that reads a file and prints the letters in
decreasing order of frequency. Your program should convert all the input to
lower case and only count the letters a-z. Your program should not count
spaces, digits, puntuaction, or anything other than letters a-z. Find text
samples from several different languages and see how letter frequency varies
between languages. Compare your results with the tables at
wikipedia.org/wiki/Letter_frequencies
Python for Everybody: Exploring Data Using Python 3
by Charles R. Severance
Solution by Jamison Lahman, June 1, 2017
"""
import string
counts = 0 # Initialize variables
dictionary_counts = dict()
relative_lst = list()
fname = input('Enter file name: ')
try:
fhand = open(fname)
except FileNotFoundError:
print('File cannot be opened:', fname)
exit()
for line in fhand:
line = line.translate(str.maketrans('', '', string.digits))
line = line.translate(str.maketrans('', '', string.punctuation))
line = line.lower()
# Removes numbers and punctuation then sets all letters to lower case
words = line.split()
for word in words:
for letter in word:
# Count each letter for relative frequencies
counts += 1
if letter not in dictionary_counts:
dictionary_counts[letter] = 1
else:
dictionary_counts[letter] += 1
for key, val in list(dictionary_counts.items()):
relative_lst.append((val / counts, key)) # Computes the relative frequency
relative_lst.sort(reverse=True) # Sorts from highest rel freq
for key, val in relative_lst:
print(key, val)