Files and Directories

This section discusses the operation of the Python standard library on files and directories. Python does not need to import any modules for operating files and directories.

File basic operation

open() function is used to open a file and return a file object. w represents writing mode, r indicates reading mode, a indicates the append mode, r+ represents the read and write mode. For example, write mode:

f = open('test.txt', 'w')
f.write('Hello, world!')
f.close()

In this way, we successfully created a file called test.txt, and wrote a row Hello, world! at the beginning of the file.

Using the r mode, you will read the contents of the file:

f = open('test.txt', 'r')
print(f.read())
f.close()

You can also use the with statement, it can automatically close the file:

with open('test.txt', 'r') as f:
    print(f.read())

In a mode, if the file does not exist, it will create a new file, if the file exists, add content at the end of the file.

with open('test.txt', 'a') as f:
    f.write('add something')

At this time, the content in test.txt file is Hello, world!add something.

Use the seek() function to move the file pointer to the specified location, for example:

with open('test.txt', 'r') as f:
    f.seek(5)
    print(f.read())

This is to move the file pointer to the 5th characters (the first character is the 0th character) and then read the contents thereof. The output is , world!add something.

Create a temporary file and directory

This part needs to import os and tempfile module.

import os
import tempfile

Get information about the temporary data environment:

print(tempfile.gettempdir())
print(tempfile.gettempprefix())

The path to the temporary data environment will be output, such as /tmp, and prefix of the temporary file, such as tmp.

Create a temporary file using the mkstemp() function:

(tempfh, tempfp) = tempfile.mkstemp(".tmp", "testTemp", None, True)
print(tempfp)

mkstemp() function returns a tuple, where the first element is a file descriptor, the second element is the file path. Its parameters are: file suffix, file name prefix, specified directory, whether the file is a text file.

Use the os's fdopen() function to convert the file descriptor into a file object. Let's write the contents in the temporary file, then read, and print it out, and finally close the temporary file and delete the temporary file.

f = os.fdopen(tempfh, "w+t")
f.write("Hello, world!")
f.seek(0)
print(f.read())
f.close()
os.remove(tempfp)

You can also create temporary files using the TemporaryFile() function:

with tempfile.TemporaryFile() as f:
    f.write("Hello, world!")
    f.seek(0)
    print(f.read())

If there are a lot of temporary files that need to be created, you can use the TemporaryDirectory() function, which creates a temporary directory and deletes the directory when exiting.

with tempfile.TemporaryDirectory() as tmpdirname:
    print("created temporary directory", tmpdirname)
    path = os.path.join(tmpdirname, "test.txt")
    with open(path, "w") as f:
        f.write("Hello, world!")
    print("File contents:")
    with open(path, "r") as f:
        print(f.read())

Read the CSV file

The CSV file is a text file, which is generally separated by a comma. We can read the CSV file using the CSV module. After importing the module, you can use the reader() function to read the CSV file, which returns an iterator, every iteration returns a list, each element in the list is a string, representing a line of data. Here is a simple example:

CSV文件的内容为：

"Month", "Average", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015"
"May",  0.1,  0,  0, 1, 1, 0, 0, 0, 2, 0,  0,  0  
"Jun",  0.5,  2,  1, 1, 0, 0, 1, 1, 2, 2,  0,  1
"Jul",  0.7,  5,  1, 1, 2, 0, 1, 3, 0, 2,  2,  1
"Aug",  2.3,  6,  3, 2, 4, 4, 4, 7, 8, 2,  2,  3
"Sep",  3.5,  6,  4, 7, 4, 2, 8, 5, 2, 5,  2,  5
"Oct",  2.0,  8,  0, 1, 3, 2, 5, 1, 5, 2,  3,  0
"Nov",  0.5,  3,  0, 0, 1, 1, 0, 1, 0, 1,  0,  1
"Dec",  0.0,  1,  0, 1, 0, 0, 0, 0, 0, 0,  0,  1

The Python code to read the above CSV file is as follows:

import csv
with open("data.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

The output is the content of the CSV file, and each line of data is a list.

We can use the Sniffer class to detect the format of the CSV file, and return a Dialect object, the object contains the format information of the CSV file.

print(csv.list_dialects()) # Print all available formats
with open("data.csv", "r") as f:
    dialect = csv.Sniffer().sniff(f.read(1024))
    f.seek(0)
    hasHeader = csv.Sniffer().has_header(f.read(1024))
    f.seek(0)
    print("Headers found: " + str(hasHeader))
    print(dialect.delimiter)
    print(dialect.escapechar)
    print(dialect.quotechar)

The first line is to print all available formats, the output is: ['excel', 'excel-tab', 'unix']. The second row begins to detect the format of the CSV file, 1024 means to read 1024 characters, the output result is:

0
0
Headers found: False
,
None
"

The latter three-line print is: Dilimiter, escape character, quote character.

Write CSV file

We use the writer() function to write to the CSV file, the function returns a writer. You can use the writerow() function to write a line of data, writerows() function to write multi-line data. For example:

with open("data.csv", mode="w") as f:
    writer = csv.writer(f)
    writer.writerow(["Month", "Average", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015"])
    writer.writerow(["May",  0.1,  0,  0, 1, 1, 0, 0, 0, 2, 0,  0,  0])
    writer.writerow(["Jun",  0.5,  2,  1, 1, 0, 0, 1, 1, 2, 2,  0,  1])
    writer.writerow(["Jul",  0.7,  5,  1, 1, 2, 0, 1, 3, 0, 2,  2,  1])
    writer.writerow(["Aug",  2.3,  6,  3, 2, 4, 4, 4, 7, 8, 2,  2,  3])
    writer.writerow(["Sep",  3.5,  6,  4, 7, 4, 2, 8, 5, 2, 5,  2,  5])
    writer.writerow(["Oct",  2.0,  8,  0, 1, 3, 2, 5, 1, 5, 2,  3,  0])
    writer.writerow(["Nov",  0.5,  3,  0, 0, 1, 1, 0, 1, 0, 1,  0,  1])
    writer.writerow(["Dec",  0.0,  1,  0, 1, 0, 0, 0, 0, 0, 0,  0,  1])

Handling ZIP files

Import the zipfile module and use the zipfile class to handle the ZIP file. Create a new Zip Archive and add the files data.csv to the Zip Archive:

import zipfile
with zipfile.ZipFile("data.zip", "w") as f:
    f.write("data.csv")

Check if a zip archive is valid:

print(zipfile.is_zipfile("data.zip"))

The output is: True, indicating that zip archive is valid.

Read the list of files for a ZIP Archive:

with zipfile.ZipFile("data.zip", "r") as f:
    print(f.namelist())

The output result is: ['data.csv'], indicating there is only one file `data.csv in the Zip Archive.

We can use the getinfo() function to get the file information in ZIP Archive:

with zipfile.ZipFile("data.zip", "r") as f:
    info = f.getinfo("data.csv")
    print(info.file_size)
    print(info.compress_size)
    print(info.compress_type)

The output result is:

327
327
0

Print the file content:

with zipfile.ZipFile("data.zip", "r") as f:
    with f.open("data.csv") as f2:
        print(f2.read())

We can extract files in Zip Archive:

with zipfile.ZipFile("data.zip", "r") as f:
    f.extractall()

Configuration file

Configuration File is a file for storing configuration information of the program so that we do not have to write a configuration information such as parameters in the program (this is called "hard code"). For example, the configuration file can be as follows:

[DEFAULT]
name = "John Doe" # Parameter name = parameter value
age = 30

[database]
host = localhost
port = 3306
user = root
password = 123456
database = mydb

We can use the configparser module to read the configuration file. First create a ConfigParser object, then use the read() function to read the configuration file:

import configparser
parser = configparser.ConfigParser()
parser.read("config.ini")
print(parser.sections()) # Output all sections
print(parser.has_section("database")) # Check if there is section Database
print(parser.get("DEFAULT", "name")) # Output Name in Default Section
print(parser.get("database", "port")) # Output port in Database Section
print(parser["database"]["host"]) # Output Host in Database Section, Parser can be seen as a dictionary

The output results are: "database"，True，"John Doe"和"3306". DEFAULT Section will be ignored when all Section is output. It is worth noting that the value thus acquired is a string, such as "3306", if you need to get a number, you need to use the int() function. But using the parser.getint() function can get the value directly.

print(type(parser["database"].getint("port"))) # output <class 'int'>

Similarly, parser.getfloat() and parser.getboolean() function can be used to get floating point values and boolean values.

If you try to get a section that does not exist, it will throw the KeyError Exception:

try:
    print(parser["database2"]["port"])
except KeyError as err:
    print("No such section: " + str(err))

The output will be: No such section: database2.

PreviousManipulating Data NextWorking with Numbers

Last updated 1 year ago