Python Programming/Files

File I/O

Read entire file:

inputFileText = open("testit.txt", "r").read()
print(inputFileText)

In this case the "r" parameter means the file will be opened in read-only mode.

Read certain amount of bytes from a file:

inputFileText = open("testit.txt", "r").read(123)
print(inputFileText)

When opening a file, one starts reading at the beginning of the file, if one would want more random access to the file, it is possible to use seek() to change the current position in a file and tell() to get to know the current position in the file. This is illustrated in the following example:

>>> f=open("/proc/cpuinfo","r")
>>> f.tell()
0L
>>> f.read(10)
'processor\t'
>>> f.read(10)
': 0\nvendor'
>>> f.tell()
20L
>>> f.seek(10)
>>> f.tell()
10L
>>> f.read(10)
': 0\nvendor'
>>> f.close()
>>> f
<closed file '/proc/cpuinfo', mode 'r' at 0xb7d79770>

Here a file is opened, twice ten bytes are read, tell() shows that the current offset is at position 20, now seek() is used to go back to position 10 (the same position where the second read was started) and ten bytes are read and printed again. And when no more operations on a file are needed the close() function is used to close the file we opened.

Read one line at a time:

for line in open("testit.txt", "r"):
    print(line)

In this case readlines() will return an array containing the individual lines of the file as array entries. Reading a single line can be done using the readline() function which returns the current line as a string. This example will output an additional newline between the individual lines of the file, this is because one is read from the file and print introduces another newline.

Write to a file requires the second parameter of open() to be "w", this will overwrite the existing contents of the file if it already exists when opening the file:

outputFileText = "Here's some text to save in a file"
open("testit.txt", "w").write(outputFileText)

Append to a file requires the second parameter of open() to be "a" (from append):

outputFileText = "Here's some text to add to the existing file."
open("testit.txt", "a").write(outputFileText)

Note that this does not add a line break between the existing file content and the string to be added.

Since Python 2.5, you can use with keyword to ensure the file handle is released as soon as possible and to make it exception-safe:

with open("input.txt") as file1:
  data = file1.read()
  # process the data

Or one line at a time:

with open("input.txt") as file1:
  for line in file1:
    print(line)

Related to the with keywords is Context Managers chapter.

Links:

7.5. The with statement, python.org
PEP 343 -- The "with" Statement, python.org

Testing Files

Determine whether path exists:

import os
os.path.exists('<path string>')

When working on systems such as Microsoft Windows™, the directory separators will conflict with the path string. To get around this, do the following:

import os
os.path.exists('C:\\windows\\example\\path')

A better way however is to use "raw", or r:

import os
os.path.exists(r'C:\windows\example\path')

But there are some other convenient functions in os.path, where os.path.exists() only confirms whether or not path exists, there are functions which let you know if the path is a file, a directory, a mount point or a symlink. There is even a function os.path.realpath() which reveals the true destination of a symlink:

>>> import os
>>> os.path.isfile("/")
False
>>> os.path.isfile("/proc/cpuinfo")
True
>>> os.path.isdir("/")
True
>>> os.path.isdir("/proc/cpuinfo")
False
>>> os.path.ismount("/")
True
>>> os.path.islink("/")
False
>>> os.path.islink("/vmlinuz")
True
>>> os.path.realpath("/vmlinuz")
'/boot/vmlinuz-2.6.24-21-generic'

Common File Operations

To copy or move a file, use the shutil library.

import shutil
shutil.move("originallocation.txt","newlocation.txt")
shutil.copy("original.txt","copy.txt")

To perform a recursive copy it is possible to use copytree(), to perform a recursive remove it is possible to use rmtree()

import shutil
shutil.copytree("dir1","dir2")
shutil.rmtree("dir1")

To remove an individual file there exists the remove() function in the os module:

import os
os.remove("file.txt")

Finding Files

Files can be found using glob:

glob.glob('*.txt') # Finds files in the current directory ending in dot txt 
glob.glob('*\\*.txt') # Finds files in any of the direct subdirectories
                      # of the currect directory ending in dot txt 
glob.glob('C:\\Windows\\*.exe')
for fileName in glob.glob('C:\\Windows\\*.exe'):
  print(fileName)
glob.glob('C:\\Windows\\**.exe', recursive=True) # Py 3.5: ** allows recursive nesting

The content of a directory can be listed using listdir:

filesAndDirectories=os.listdir('.')
for item in filesAndDirectories:
  if os.path.isfile(item) and item.endswith('.txt'):
    print("Text file: " + item)
  if os.path.isdir(item):
    print("Directory: " + item)

Getting a list of all items in a directory, including the nested ones:

for root, directories, files in os.walk('/user/Joe Hoe'):
  print("Root: " + root)                         # e.g. /user/Joe Hoe/Docs
  for dir1 in directories:
    print("Dir.: " + dir1)                       # e.g. Fin
    print("Dir. 2: " + os.path.join(root, dir1)) # e.g. /user/Joe Hoe/Docs/Fin
  for file1 in files:
    print("File: " + file1)                      # e.g. MyFile.txt
    print("File 2: " + os.path.join(root, file1))# e.g. /user/Joe Hoe/Docs/MyFile.txt

Above, root takes value of each directory in /user/Joe Hoe including /user/Joe Hoe itself, and directories and files are only those directly present in each root.

Getting a list of all files in a directory, including the nested ones, ending in .txt, using list comprehension:

files = [os.path.join(r, f) for r, d, fs in os.walk(".") for f in fs
         if f.endswith(".txt")]
# As iterator
files = (os.path.join(r, f) for r, d, fs in os.walk(".") for f in fs
         if f.endswith(".txt"))

Links:

glob, python.org
glob, Py 3, python.org
os.listdir, python.org
os.walk, python.org
os.path.join, python.org

Current Directory

Getting current working directory:

os.getcwd()

Changing current working directory:

os.chdir('C:\\')

External Links

os — Miscellaneous operating system interfaces in Python documentation
glob — Unix style pathname pattern expansion in Python documentation
shutil — High-level file operations in Python documentation
Brief Tour of the Standard Library in The Python Tutorial

Previous: Input and Output

Index

Next: Modules