How to use Pandas to_json() to export JSON data

Learn how to use the Pandas to_json() function to export a dataframe to JSON, JSON lines, or a host of other JSON derivatives.

How to use Pandas to_json() to export JSON data
Picture by Hasan Albari, Pexels.
6 minutes to read

The Pandas to_json() function is one of a number of Pandas functions that allow you to export the data stored in a dataframe into other formats, in this case JavaScript Object Notation or JSON.

JSON data is structured and very similar to the Python dictionary format. Both data storage types are based on key-value pairs and look very similar. JSON was primarily designed as a lightweight data-interchange format that is easy to read and create, and easy for applications to parse.

When working with web data, you’ll often find that APIs return JSON output, or you may encounter JSON stored within Pandas dataframes. Similarly, many web applications will require you to POST JSON objects to them. A wide range of JSON formats have been created, and the Pandas to_json() function lets you export dataframe data to almost all of them.

Create a dataframe

To get started, open a Jupyter notebook and either create a dataframe of dummy data, or import data from a CSV file into a Pandas dataframe.

import pandas as pd

df = pd.DataFrame(
    [('Pterophyllum altum', 'Pterophyllum', 12.5), 
     ('Coptodon snyderae', 'Coptodon', 8.2),
     ('Astronotus ocellatus', 'Astronotus', 31.2), 
     ('Corydoras aeneus', 'Corydoras', 5.3),
     ('Xenomystus nigri', 'Xenomystus', 5.3)
    ], 
    columns=['species', 'genus', 'length_cm']
)
df
species genus length_cm
0 Pterophyllum altum Pterophyllum 12.5
1 Coptodon snyderae Coptodon 8.2
2 Astronotus ocellatus Astronotus 31.2
3 Corydoras aeneus Corydoras 5.3
4 Xenomystus nigri Xenomystus 5.3

Export to JSON using orient=’columns’

The columns parameter is the main way to control the output of the to_json() function. When you call it without any arguments, i.e. df.to_json(), the method will take the dataframe object as its input and return JSON as its output. By default, it sets orient to columns, so df.to_json() will give you the same output as df.to_json(orient='columns').

df.to_json()
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'
df.to_json(orient='columns')
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'

Export to JSON using orient=’records’

By changing the orient parameter argument to records, the to_json() function will return the JSON formatted as a list of records where each record is represented by a dictionary for each row that appeared within the dataframe.

df.to_json(orient='records')
'[{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}]'

Export to JSON lines using lines=True

If you need to export your data in the JSON lines format, you need to pass in both orient='records' and lines=True. You’ll get an error message back if you call lines=True without stating that you want the data formatted with orient='records'. The output is largely the same as orient='records' but a line break \n is added to split the data up over rows, which some applications require in order to handle the data.

# Output JSON lines only works with orient='records'
df.to_json(orient='records', lines=True)
'{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5}\n{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2}\n{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2}\n{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3}\n{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}'

Export to JSON using orient=’index’

The orient='index' argument returns a dictionary for each row but adds an index to the main JSON object returned.

df.to_json(orient='index')
'{"0":{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},"1":{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},"2":{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},"3":{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},"4":{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}}'

Export to JSON using orient=’values’

The orient='values' argument is not seen that often and returns JSON based on a list of lists, with each inner list representing a row in the dataframe.

df.to_json(orient='values')
'[["Pterophyllum altum","Pterophyllum",12.5],["Coptodon snyderae","Coptodon",8.2],["Astronotus ocellatus","Astronotus",31.2],["Corydoras aeneus","Corydoras",5.3],["Xenomystus nigri","Xenomystus",5.3]]'

Export to JSON using orient=’table’

The orient='table' is another relatively unusual one to see being used. This returns a schema JSON that includes the fields, data, and data types. Since there’s so much additional data present, it can increase file sizes quite a bit on larger datasets.

df.to_json(orient='table')
'{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"species","type":"string"},{"name":"genus","type":"string"},{"name":"length_cm","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},{"index":1,"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},{"index":2,"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},{"index":3,"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},{"index":4,"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}]}'

Force an encoded string to not be ASCII

By default, you don’t need to pass a value to the force_ascii argument, since it’s set to True by default. However, should you wish to override this and force the string not to be encoded as ASCII you can do that with force_ascii=False.

df.to_json(force_ascii=False)
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'

Matt Clarke, Monday, January 16, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.