Dealing with Nested Data in Pandas

We’ll walk through how to deal with nested data using Pandas (for example - a JSON string column), transforming that data into a tabular format that’s easier to deal with and analyze.

Import Modules

import pandas as pd
import json

Create a Test Dataset

age=[21,22,23]
nested_data=[
    '''{"occupation":"nurse", "interests":"movies"}''',
    '''{"occupation":"teacher", "favorite_food":"spaghetti"}''', 
    '''{"occupation":"doctor"}'''
]

data = pd.DataFrame(dict(age=age, nested_data=nested_data))

data

	age	nested_data
0	21	{"occupation":"nurse", "interests":"movies"}
1	22	{"occupation":"teacher", "favorite_food":"spag...
2	23	{"occupation":"doctor"}

Load the JSON string into a dictionary and then convert it into a Series object. This flattens out the dictionary into a table-like format. Notice how this creates a column per key, and that NaNs are intelligently filled in via Pandas.

exploded = data.nested_data.apply(json.loads).apply(pd.Series)
exploded

	occupation	interests	favorite_food
0	nurse	movies	NaN
1	teacher	NaN	spaghetti
2	doctor	NaN	NaN

Last - we’ll drop the orignial nested column and concatenate the exploded version to create our final dataset.

pd.concat([data.drop(columns='nested_data'), exploded], axis=1)

	age	occupation	interests	favorite_food
0	21	nurse	movies	NaN
1	22	teacher	NaN	spaghetti
2	23	doctor	NaN	NaN