Converting JSON Data to Pandas DataFrames

Working with JSON data is a common task in data analysis, and pandas provides efficient ways to convert JSON data into DataFrames. In this tutorial, we will explore how to convert JSON data to pandas DataFrames.

Introduction to JSON

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy to read and write. It consists of key-value pairs, arrays, and objects. When working with APIs or web services, you often receive data in JSON format.

Loading JSON Data

To work with JSON data in Python, we need to load it into a dictionary using the json module. We can use the json.loads() function to parse a JSON string into a Python dictionary.

import json

# Sample JSON data
json_data = '''
{
    "results": [
        {
            "elevation": 243.3462677001953,
            "location": {
                "lat": 42.974049,
                "lng": -81.205203
            },
            "resolution": 19.08790397644043
        },
        {
            "elevation": 244.1318664550781,
            "location": {
                "lat": 42.974298,
                "lng": -81.19575500000001
            },
            "resolution": 19.08790397644043
        }
    ],
    "status": "OK"
}
'''

# Load JSON data into a dictionary
data = json.loads(json_data)

Converting JSON Data to Pandas DataFrame

Once we have the JSON data loaded into a dictionary, we can convert it to a pandas DataFrame using the pd.json_normalize() function. This function is available in pandas version 1.0.1 and later.

import pandas as pd

# Convert JSON data to a DataFrame
df = pd.json_normalize(data['results'])
print(df)

Alternatively, you can use the pd.DataFrame.from_dict() function to convert the dictionary into a DataFrame.

# Convert JSON data to a DataFrame
df = pd.DataFrame.from_dict(data['results'])
print(df)

However, this method may not work well with nested dictionaries. In such cases, using pd.json_normalize() is recommended.

Handling Nested Dictionaries

When dealing with nested dictionaries, you can use the pd.json_normalize() function to flatten the data. For example:

import pandas as pd

# Sample JSON data with nested dictionaries
json_data = '''
{
    "results": [
        {
            "elevation": 243.3462677001953,
            "location": {
                "lat": 42.974049,
                "lng": -81.205203
            },
            "resolution": 19.08790397644043
        },
        {
            "elevation": 244.1318664550781,
            "location": {
                "lat": 42.974298,
                "lng": -81.19575500000001
            },
            "resolution": 19.08790397644043
        }
    ],
    "status": "OK"
}
'''

# Load JSON data into a dictionary
data = json.loads(json_data)

# Convert JSON data to a DataFrame with nested dictionaries
df = pd.json_normalize(data['results'])
print(df)

This will produce the following output:

   elevation  location.lat  location.lng  resolution
0  243.346268      42.974049    -81.205203  19.087904
1  244.131866      42.974298    -81.195755  19.087904

As you can see, the location dictionary has been flattened into separate columns.

Conclusion

In this tutorial, we have learned how to convert JSON data to pandas DataFrames using the pd.json_normalize() function and other methods. We have also explored how to handle nested dictionaries when converting JSON data to DataFrames. By following these steps, you can efficiently work with JSON data in your data analysis projects.

Leave a Reply

Your email address will not be published. Required fields are marked *