How to export data from Pandas dataframes

Picture by Pixabay, Pexels.

16 minutes to read

Data Science Pandas

The Pandas package is one of the main reasons why so many data scientists favour Python over Microsoft Excel. Pandas is incredibly powerful and versatile and can handle a wide range of file formats without the limitations that can cripple Excel.

As you might imagine, Pandas also lets you export data stored in a Pandas dataframe into a wide range of different file formats and systems. We’ll be covering how to export Pandas data to CSV, TSV, JSON, HTML, Feather, Parquet, Stata, LaTex, Pickle, and the clipboard in this project. However, you can also export to SQL and Google Sheets with some additional packages.

Import the packages and create a dataframe

To get started we’ll first import the Pandas package and then import data into a dataframe that we can export to a variety of different file formats using Pandas’ export functions.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/marketing_mix.csv')
df.head()

	week	tv	radio	newspaper	sales
0	0	230.1	37.8	69.2	22.1
1	1	44.5	39.3	45.1	10.4
2	2	17.2	45.9	69.3	9.3
3	3	151.5	41.3	58.5	18.5
4	4	180.8	10.8	58.4	12.9

Export a Pandas dataframe to CSV

The Pandas to_csv() function allows you to export or save the contents of a Pandas dataframe by writing the output to a CSV or comma separated value file. CSV files can be read by any application so they’re convenient and widely used. By default, you only need to pass a single argument to the to_csv() function, which is a string containing the filename or filepath, for example, data1.csv. When executed, Pandas will write the data to that file.

df.to_csv('data1.csv')

To understand why this default option may not be what you want, let’s load the CSV file we just created using read_csv() to read it back into a Pandas dataframe. As you’ll see, there’s an extra column called Unnamed: 0 in the dataframe because Pandas by default will add an extra index to the file in case one doesn’t exist.

df_data1 = pd.read_csv('data1.csv')
df_data1.head()

	Unnamed: 0	week	tv	radio	newspaper	sales
0	0	0	230.1	37.8	69.2	22.1
1	1	1	44.5	39.3	45.1	10.4
2	2	2	17.2	45.9	69.3	9.3
3	3	3	151.5	41.3	58.5	18.5
4	4	4	180.8	10.8	58.4	12.9

If you want to keep the index added by default, you can append a value in the index_label argument when calling to_csv(). Instead of adding Unnamed: 0 for the index name, this will add the value you’ve defined.

df.to_csv('data1.csv', index_label='week_number')

df_data1 = pd.read_csv('data1.csv')
df_data1.head()

	week_number	week	tv	radio	newspaper	sales
0	0	0	230.1	37.8	69.2	22.1
1	1	1	44.5	39.3	45.1	10.4
2	2	2	17.2	45.9	69.3	9.3
3	3	3	151.5	41.3	58.5	18.5
4	4	4	180.8	10.8	58.4	12.9

Exporting without an index

If you don’t want the index to be added, you can simply pass in the index=False argument. This tends to be the standard approach on most dataframes.

df.to_csv('data2.csv', index=False)

df_data2 = pd.read_csv('data2.csv')
df_data2.head()

	week	tv	radio	newspaper	sales
0	0	230.1	37.8	69.2	22.1
1	1	44.5	39.3	45.1	10.4
2	2	17.2	45.9	69.3	9.3
3	3	151.5	41.3	58.5	18.5
4	4	180.8	10.8	58.4	12.9

Exporting data to other formats such as TSV using separators

Despite the name, the to_csv() function also lets you export Pandas dataframes to similar character-separated data formats, such as tab separated value (TSV) or pipe separated value (PSV) files used by some legacy platforms.

To do this, you simply need to define the field delimiter or separator character by passing it as a string to the sep argument. For example, sep='|' will create a pipe separated value file, and sep='\t' will create a tab separated value or TSV file. By default, the to_csv() function uses sep=',', so you don’t need to enter anything if you want a CSV file back.

df.to_csv('data3.tsv', index=False, sep='\t')

Note that if you attempt to reload a file that uses a separator other than the default comma, the read_csv() function won’t format the dataframe correctly. Here’s what happens when we load the TSV file we created above using read_csv().

df_data3 = pd.read_csv('data3.tsv')
df_data3.head()

	week\ttv\tradio\tnewspaper\tsales
0	0\t230.1\t37.8\t69.2\t22.1
1	1\t44.5\t39.3\t45.1\t10.4
2	2\t17.2\t45.9\t69.3\t9.3
3	3\t151.5\t41.3\t58.5\t18.5
4	4\t180.8\t10.8\t58.4\t12.9

To load a TSV file in Pandas you simply need to define the separator character using sep='\t' in the read_csv() arguments.

df_data3 = pd.read_csv('data3.tsv', sep='\t')
df_data3.head()

	week	tv	radio	newspaper	sales
0	0	230.1	37.8	69.2	22.1
1	1	44.5	39.3	45.1	10.4
2	2	17.2	45.9	69.3	9.3
3	3	151.5	41.3	58.5	18.5
4	4	180.8	10.8	58.4	12.9

Export a Pandas dataframe to JSON

To export a Pandas dataframe as a JSON (JavaScript Object Notation) object you can use the to_json() function. Like to_csv(), you simply append the function name to your dataframe name using a dot and define the output filename as the argument.

The to_json() function also includes a range of other arguments to allow you to control the date format, index, compression, orientation, and can output regular JSON or JSON lines.

df.head()

	week	tv	radio	newspaper	sales
0	0	230.1	37.8	69.2	22.1
1	1	44.5	39.3	45.1	10.4
2	2	17.2	45.9	69.3	9.3
3	3	151.5	41.3	58.5	18.5
4	4	180.8	10.8	58.4	12.9

df.to_json('data.json')

Export a Pandas dataframe to HTML

One really useful feature if you write blog posts or want to export Pandas data to a web application is the ability to export a Pandas dataframe to an HTML table. You can do this via the to_html() function.

When appended to the name of your Pandas dataframe, the to_html() function returns a chunk of HTML defining the table layout for the Pandas dataframe, so you can copy it to your web page or insert it into a blog post. The to_html() function also includes a load of extra arguments that you can use to style the Pandas dataframe by assigning bold, adding class names, and justifying the content. It’s really powerful.

html = df.head().to_html()

html

    '<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>week</th>\n      <th>tv</th>\n      <th>radio</th>\n      <th>newspaper</th>\n      <th>sales</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>230.1</td>\n      <td>37.8</td>\n      <td>69.2</td>\n      <td>22.1</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>44.5</td>\n      <td>39.3</td>\n      <td>45.1</td>\n      <td>10.4</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2</td>\n      <td>17.2</td>\n      <td>45.9</td>\n      <td>69.3</td>\n      <td>9.3</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>3</td>\n      <td>151.5</td>\n      <td>41.3</td>\n      <td>58.5</td>\n      <td>18.5</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>4</td>\n      <td>180.8</td>\n      <td>10.8</td>\n      <td>58.4</td>\n      <td>12.9</td>\n    </tr>\n  </tbody>\n</table>'

Export a Pandas dataframe to the clipboard

If you want to export a Pandas dataframe to the clipboard, so you can paste it into a document, you can use the to_clipboard() function. The to_clipboard() function works out of the box on Windows and Mac OS, but you’ll need to install either xclip or xsel with the PyQt4 modules to run this on Linux. Once you’ve run the code the dataframe will be stored in your clipboard and you can paste it into another document, such as a Google Sheet, using Ctrl + v.

df.to_clipboard(excel=True, sep=None)

Export a Pandas dataframe to XML

If you’re running Pandas version 1.3.0 or greater, you can now also export a Pandas dataframe to XML using the to_xml() function. As with the other export functions, the default argument accepts a filename or filepath in which to write the data, i.e. /data/file.xml. It also includes a load of additional arguments that can be used to modify the XML exported.

df.to_xml('data4.xml')

Export a Pandas dataframe to a Parquet .parquet file

The to_parquet() function allows you to export a Pandas dataframe to a Parquet file. In case you haven’t heard of it, Parquet is an open source column-oriented data file format designed for efficient data storage and retrieval.

The Parquet file format is supported by a range of enterprise data platforms, such as Hadoop, Pig, Spark, and Hive. Parquet files have the .parquet suffix. By default you just need to provide a filepath or filename, i.e. data.parquet. However, you can also define the Parquet engine used with engine, control the index using index, define partition columns with partition_cols, and define storage options with storage_options.

df.to_parquet('data.parquet')

Export a Pandas dataframe to a Stata .dta file

The to_stata() function allows you to export a Pandas dataframe to a Stata file. Stata files have a .dta file suffix and are a proprietary binary format used in the IBM Stata statistics application.

df.to_stata('data.dta')

Export a Pandas dataframe to a Feather file

The to_feather() function lets you export a Pandas dataframe to a Feather file. Feather files are used by Apache Arrow for storing Arrow tables and use either a .feather or .fea file suffix. There are fewer additional arguments for to_feather(), you just need to define the file path.

df.to_feather('data.feather')

Export a Pandas dataframe to a LaTex tex file

If you’re one of those odd people who still uses LaTex for typesetting, you can export the contents of your dataframe to LaTex using the to_latex() function. As with the other Pandas export functions, you only need to define a filepath or filename by default, but there are loads of other arguments you can use with which to control the output of the LaTex generated. LaTex files have a .tex suffix.

df.to_feather('data.tex')

Export a Pandas dataframe to a Pickle pkl file

Finally, the to_pickle() function lets you export a Pandas dataframe to a Pickle file. Pickle is a Python module that uses object serialization to store data in a structured form. Data is serialized (or pickled) for storage, then retrieved and deserialized (or unpickled) during retrieval. Many people prefer it over saving to CSV. Pickle files usually have a .pkl suffix, but .pickle, .pck, .pcl, and .db are also used sometimes.

df.to_pickle('data.pkl')

Matt Clarke, Thursday, August 18, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.