Python Programming/Internet


The urllib module which is bundled with python can be used for web interaction. This module provides a file-like interface for web urls.

Getting page text as a string

edit

An example of reading the contents of a webpage

import urllib.request as urllib
pageText = urllib.urlopen("http://www.spam.org/eggs.html").read()
print(pageText)

Processing page text line by line:

import urllib.request as urllib
for line in urllib.urlopen("https://en.wikibooks.org/wiki/Python_Programming/Internet"):
  print(line)

Get and post methods can be used, too.

import urllib.request as urllib
params = urllib.urlencode({"plato":1, "socrates":10, "sophokles":4, "arkhimedes":11})

# Using GET method
pageText = urllib.urlopen("http://international-philosophy.com/greece?%s" % params).read()
print(pageText)

# Using POST method
pageText = urllib.urlopen("http://international-philosophy.com/greece", params).read()
print(pageText)

Downloading files

edit

To save the content of a page on the internet directly to a file, you can read() it and save it as a string to a file object

import urllib2
data = urllib2.urlopen("http://upload.wikimedia.org/wikibooks/en/9/91/Python_Programming.pdf", "pythonbook.pdf").read() # not recommended as if you are downloading 1gb+ file, will store all data in ram.
file =  open('Python_Programming.pdf','wb')
file.write(data)
file.close()

This will download the file from here and save it to a file "pythonbook.pdf" on your hard drive.

Other functions

edit

The urllib module includes other functions that may be helpful when writing programs that use the internet:

>>> plain_text = "This isn't suitable for putting in a URL"
>>> print(urllib.quote(plain_text))
This%20isn%27t%20suitable%20for%20putting%20in%20a%20URL
>>> print(urllib.quote_plus(plain_text))
This+isn%27t+suitable+for+putting+in+a+URL

The urlencode function, described above converts a dictionary of key-value pairs into a query string to pass to a URL, the quote and quote_plus functions encode normal strings. The quote_plus function uses plus signs for spaces, for use in submitting data for form fields. The unquote and unquote_plus functions do the reverse, converting urlencoded text to plain text.

Email

edit

With Python, MIME compatible emails can be sent. This requires an installed SMTP server.

import smtplib
from email.mime.text import MIMEText

msg = MIMEText( 
"""Hi there,

This is a test email message.

Greetings""")

me  = 'sender@example.com'
you = 'receiver@example.com'
msg['Subject'] = 'Hello!'
msg['From'] =  me
msg['To'] =  you
s = smtplib.SMTP()
s.connect()
s.sendmail(me, [you], msg.as_string())
s.quit()

This sends the sample message from 'sender@example.com' to 'receiver@example.com'.

edit