Introduction
Pickle is a Python module used for serializing and de-serializing Python object structures. Serialization is the process of converting a Python object (like a list, dictionary, or custom class instance) into a byte stream, which can be stored in a file or transmitted over a network. De-serialization is the reverse process – converting the byte stream back into a Python object. This tutorial explains how to read pickled data from a file in Python.
Understanding the Basics
The pickle
module allows you to save the state of Python objects to a file and then restore them later. This is useful for tasks like caching, storing program state, or sending data between different processes.
Important Note: While pickle is convenient, it’s crucial to be cautious when unpickling data from untrusted sources. Unpickling malicious data can execute arbitrary code, posing a security risk.
Writing Pickled Data to a File
Before we discuss reading pickled data, let’s quickly review how to write it:
import pickle
data = {'name': 'Alice', 'age': 30, 'city': 'New York'}
filename = 'my_data.pkl'
with open(filename, 'wb') as file:
pickle.dump(data, file)
In this code:
import pickle
imports the necessary module.data
is the Python object we want to save.filename
is the name of the file where the pickled data will be stored.open(filename, 'wb')
opens the file in binary write mode ('wb'
). It’s essential to open the file in binary mode when working with pickle.pickle.dump(data, file)
serializes thedata
object and writes it to the opened file.
Reading Pickled Data from a File
Now, let’s focus on reading the pickled data back from the file. A common mistake is assuming pickle.load()
will read all the data at once if multiple objects were pickled to the same file. pickle.load()
reads only a single pickled object from the file. If you’ve appended multiple pickled objects to a file, you need to read them one by one until the end of the file is reached.
Here’s how to read a single pickled object:
import pickle
filename = 'my_data.pkl'
with open(filename, 'rb') as file:
loaded_data = pickle.load(file)
print(loaded_data)
In this code:
open(filename, 'rb')
opens the file in binary read mode ('rb'
).pickle.load(file)
de-serializes the pickled object from the file and assigns it to theloaded_data
variable.
Reading Multiple Pickled Objects
If your file contains multiple pickled objects (created by repeatedly using pickle.dump()
in append mode), you need to read them in a loop. A try-except
block is the most robust way to handle this, catching the EOFError
(End of File Error) that occurs when the end of the file is reached:
import pickle
filename = 'multiple_data.pkl'
objects = []
with open(filename, 'rb') as file:
try:
while True:
obj = pickle.load(file)
objects.append(obj)
except EOFError:
pass # Reached the end of the file
print(objects)
In this code:
- We initialize an empty list
objects
to store the loaded objects. - The
while True
loop continues reading objects from the file until anEOFError
is raised. - Inside the loop,
pickle.load(file)
reads a single pickled object, which is then appended to theobjects
list. - The
except EOFError
block catches theEOFError
andpass
does nothing, effectively breaking the loop when the end of the file is reached.
Alternative Libraries
While pickle
is a standard Python module, other libraries provide similar functionality.
joblib
: Designed for efficiently serializing NumPy arrays and scikit-learn models. It often offers performance improvements for these specific data types. However, under the hood it still leverages the standardpickle
library.pandas
: Thepandas
library providesread_pickle
for loading pickled pandas DataFrames or Series. It is built uponpickle
but adds features specific to pandas data structures.
Important Considerations
- Binary Mode: Always open pickle files in binary mode (
'wb'
for writing,'rb'
for reading). - Security: Be cautious when unpickling data from untrusted sources.
- Compatibility: Pickle format can change between Python versions. Ensure compatibility if you are sharing pickled data between different versions of Python.
- File Structure: If you’re writing multiple objects to a file, you need to read them one by one, as
pickle.load()
reads only one object at a time.