The Pandas to_json()
function is one of a number of Pandas functions that allow you to export the data stored in a dataframe into other formats, in this case JavaScript Object Notation or JSON.
JSON data is structured and very similar to the Python dictionary format. Both data storage types are based on key-value pairs and look very similar. JSON was primarily designed as a lightweight data-interchange format that is easy to read and create, and easy for applications to parse.
When working with web data, you’ll often find that APIs return JSON output, or you may encounter JSON stored within Pandas dataframes. Similarly, many web applications will require you to POST
JSON objects to them. A wide range of JSON formats have been created, and the Pandas to_json()
function lets you export dataframe data to almost all of them.
To get started, open a Jupyter notebook and either create a dataframe of dummy data, or import data from a CSV file into a Pandas dataframe.
import pandas as pd
df = pd.DataFrame(
[('Pterophyllum altum', 'Pterophyllum', 12.5),
('Coptodon snyderae', 'Coptodon', 8.2),
('Astronotus ocellatus', 'Astronotus', 31.2),
('Corydoras aeneus', 'Corydoras', 5.3),
('Xenomystus nigri', 'Xenomystus', 5.3)
],
columns=['species', 'genus', 'length_cm']
)
df
species | genus | length_cm | |
---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 |
1 | Coptodon snyderae | Coptodon | 8.2 |
2 | Astronotus ocellatus | Astronotus | 31.2 |
3 | Corydoras aeneus | Corydoras | 5.3 |
4 | Xenomystus nigri | Xenomystus | 5.3 |
The columns
parameter is the main way to control the output of the to_json()
function. When you call it without any arguments, i.e. df.to_json()
, the method will take the dataframe object as its input and return JSON as its output. By default, it sets orient
to columns
, so df.to_json()
will give you the same output as df.to_json(orient='columns')
.
df.to_json()
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'
df.to_json(orient='columns')
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'
By changing the orient
parameter argument to records
, the to_json()
function will return the JSON formatted as a list of records where each record is represented by a dictionary for each row that appeared within the dataframe.
df.to_json(orient='records')
'[{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}]'
If you need to export your data in the JSON lines format, you need to pass in both orient='records'
and lines=True
. You’ll get an error message back if you call lines=True
without stating that you want the data formatted with orient='records'
. The output is largely the same as orient='records'
but a line break \n
is added to split the data up over rows, which some applications require in order to handle the data.
# Output JSON lines only works with orient='records'
df.to_json(orient='records', lines=True)
'{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5}\n{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2}\n{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2}\n{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3}\n{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}'
The orient='index'
argument returns a dictionary for each row but adds an index to the main JSON object returned.
df.to_json(orient='index')
'{"0":{"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},"1":{"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},"2":{"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},"3":{"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},"4":{"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}}'
The orient='values'
argument is not seen that often and returns JSON based on a list of lists, with each inner list representing a row in the dataframe.
df.to_json(orient='values')
'[["Pterophyllum altum","Pterophyllum",12.5],["Coptodon snyderae","Coptodon",8.2],["Astronotus ocellatus","Astronotus",31.2],["Corydoras aeneus","Corydoras",5.3],["Xenomystus nigri","Xenomystus",5.3]]'
The orient='table'
is another relatively unusual one to see being used. This returns a schema JSON that includes the fields, data, and data types. Since there’s so much additional data present, it can increase file sizes quite a bit on larger datasets.
df.to_json(orient='table')
'{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"species","type":"string"},{"name":"genus","type":"string"},{"name":"length_cm","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"species":"Pterophyllum altum","genus":"Pterophyllum","length_cm":12.5},{"index":1,"species":"Coptodon snyderae","genus":"Coptodon","length_cm":8.2},{"index":2,"species":"Astronotus ocellatus","genus":"Astronotus","length_cm":31.2},{"index":3,"species":"Corydoras aeneus","genus":"Corydoras","length_cm":5.3},{"index":4,"species":"Xenomystus nigri","genus":"Xenomystus","length_cm":5.3}]}'
By default, you don’t need to pass a value to the force_ascii
argument, since it’s set to True
by default. However, should you wish to override this and force the string not to be encoded as ASCII you can do that with force_ascii=False
.
df.to_json(force_ascii=False)
'{"species":{"0":"Pterophyllum altum","1":"Coptodon snyderae","2":"Astronotus ocellatus","3":"Corydoras aeneus","4":"Xenomystus nigri"},"genus":{"0":"Pterophyllum","1":"Coptodon","2":"Astronotus","3":"Corydoras","4":"Xenomystus"},"length_cm":{"0":12.5,"1":8.2,"2":31.2,"3":5.3,"4":5.3}}'
Matt Clarke, Monday, January 16, 2023