When loading data into a Pandas dataframe, you’ll often find that data is truncated, columns are replaced with an ellipsis, or that the float precision makes numbers harder to read.
Thankfully, Pandas includes a built-in options API to let you configure global settings that control the way data is displayed within dataframes. In this simple project we’ll cover the Pandas options and settings API so you can adjust your dataframes to get them working just as you want them.
There are only 5 functions in the Pandas API for controlling options and settings. These are: get_option()
, set_option()
, reset_option()
, describe_option()
, and option_context()
.
Function | Usage |
---|---|
describe_option() |
The Pandas describe_option() function is used to display a description of a given function's usage and configuration. |
get_option() |
The Pandas get_option() function is used to get the value of a single function. For example, if you want to find out the maximum number of rows, the get_option() function will return the value currently stored so you can see if it needs to be changed. |
set_option() |
The Pandas set_option() function is used to set the value of a single function. For example, you can use this function to change the maximum number of rows, the maximum column width, or maximum number of columns shown in your dataframe. |
reset_option() |
The Pandas reset_option() function is used to reset the value of a single function to the default value. |
option_context() |
The Pandas option_context() function is used to execute a codeblock with a set of options that revert to prior settings after execution. |
Next, we’ll examine each of the Pandas functions used for configuring Pandas dataframes by working with some real data. To get started, import the Pandas package and import a large dataframe that includes a good mixture of data of various Pandas dtypes.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/telco.csv')
df.head()
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | ... | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | ... | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.5 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | ... | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | ... | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
5 rows × 21 columns
Pandas includes a range of functions to get, set, reset, and temporarily change and then change back, any dataframe options or settings. However, to use any of these, you’ll first need to know what settings exist for you to change, which is where the describe_option()
function comes in.
To return information on all available Pandas options and settings you can configure you need to run pd.describe_option()
with no arguments. If you extract the initial value, such as display.chop_threshold
or display.encoding
, you can then pass this to other Pandas functions to get, set, or reset the value.
pd.describe_option()
compute.use_bottleneck : bool
Use the bottleneck library to accelerate if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
compute.use_numba : bool
Use the numba engine option for select operations if it is installed,
the default is False
Valid values: False,True
[default: False] [currently: False]
compute.use_numexpr : bool
Use the numexpr library to accelerate computation if it is installed,
the default is True
Valid values: False,True
[default: True] [currently: True]
display.chop_threshold : float or None
if set to a float value, all float values smaller then the given threshold
will be displayed as exactly 0 by repr and friends.
[default: None] [currently: None]
display.colheader_justify : 'left'/'right'
Controls the justification of column headers. used by DataFrameFormatter.
[default: right] [currently: right]
display.column_space No description available.
[default: 12] [currently: 12]
display.date_dayfirst : boolean
When True, prints and parses dates with the day first, eg 20/01/2005
[default: False] [currently: False]
display.date_yearfirst : boolean
When True, prints and parses dates with the year first, eg 2005/01/20
[default: False] [currently: False]
display.encoding : str/unicode
Defaults to the detected encoding of the console.
Specifies the encoding to be used for strings returned by to_string,
these are generally strings meant to be displayed on the console.
[default: UTF-8] [currently: UTF-8]
display.expand_frame_repr : boolean
Whether to print out the full DataFrame repr for wide DataFrames across
multiple lines, `max_columns` is still respected, but the output will
wrap-around across multiple "pages" if its width exceeds `display.width`.
[default: True] [currently: True]
display.float_format : callable
The callable should accept a floating point number and return
a string with the desired format of the number. This is used
in some places like SeriesFormatter.
See formats.format.EngFormatter for an example.
[default: None] [currently: None]
display.html.border : int
A ``border=value`` attribute is inserted in the ``<table>`` tag
for the DataFrame HTML repr.
[default: 1] [currently: 1]
display.html.table_schema : boolean
Whether to publish a Table Schema representation for frontends
that support it.
(default: False)
[default: False] [currently: False]
display.html.use_mathjax : boolean
When True, Jupyter notebook will process table contents using MathJax,
rendering mathematical expressions enclosed by the dollar symbol.
(default: True)
[default: True] [currently: True]
display.large_repr : 'truncate'/'info'
For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
show a truncated table (the default from 0.13), or switch to the view from
df.info() (the behaviour in earlier versions of pandas).
[default: truncate] [currently: truncate]
display.latex.escape : bool
This specifies if the to_latex method of a Dataframe uses escapes special
characters.
Valid values: False,True
[default: True] [currently: True]
display.latex.longtable :bool
This specifies if the to_latex method of a Dataframe uses the longtable
format.
Valid values: False,True
[default: False] [currently: False]
display.latex.multicolumn : bool
This specifies if the to_latex method of a Dataframe uses multicolumns
to pretty-print MultiIndex columns.
Valid values: False,True
[default: True] [currently: True]
display.latex.multicolumn_format : bool
This specifies if the to_latex method of a Dataframe uses multicolumns
to pretty-print MultiIndex columns.
Valid values: False,True
[default: l] [currently: l]
display.latex.multirow : bool
This specifies if the to_latex method of a Dataframe uses multirows
to pretty-print MultiIndex rows.
Valid values: False,True
[default: False] [currently: False]
display.latex.repr : boolean
Whether to produce a latex DataFrame representation for jupyter
environments that support it.
(default: False)
[default: False] [currently: False]
display.max_categories : int
This sets the maximum number of categories pandas should output when
printing out a `Categorical` or a Series of dtype "category".
[default: 8] [currently: 8]
display.max_columns : int
If max_cols is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the width of the terminal and print a truncated object which fits
the screen width. The IPython notebook, IPython qtconsole, or IDLE
do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 20] [currently: 20]
display.max_colwidth : int or None
The maximum width in characters of a column in the repr of
a pandas data structure. When the column overflows, a "..."
placeholder is embedded in the output. A 'None' value means unlimited.
[default: 50] [currently: 50]
display.max_info_columns : int
max_info_columns is used in DataFrame.info method to decide if
per column information will be printed.
[default: 100] [currently: 100]
display.max_info_rows : int or None
df.info() will usually show null-counts for each column.
For large frames this can be quite slow. max_info_rows and max_info_cols
limit this null check only to frames with smaller dimensions than
specified.
[default: 1690785] [currently: 1690785]
display.max_rows : int
If max_rows is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the height of the terminal and print a truncated object which fits
the screen height. The IPython notebook, IPython qtconsole, or
IDLE do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 60] [currently: 60]
display.max_seq_items : int or None
when pretty-printing a long sequence, no more then `max_seq_items`
will be printed. If items are omitted, they will be denoted by the
addition of "..." to the resulting string.
If set to None, the number of items to be printed is unlimited.
[default: 100] [currently: 100]
display.memory_usage : bool, string or None
This specifies if the memory usage of a DataFrame should be displayed when
df.info() is called. Valid values True,False,'deep'
[default: True] [currently: True]
display.min_rows : int
The numbers of rows to show in a truncated view (when `max_rows` is
exceeded). Ignored when `max_rows` is set to None or 0. When set to
None, follows the value of `max_rows`.
[default: 10] [currently: 10]
display.multi_sparse : boolean
"sparsify" MultiIndex display (don't display repeated
elements in outer levels within groups)
[default: True] [currently: True]
display.notebook_repr_html : boolean
When True, IPython notebook will use html representation for
pandas objects (if it is available).
[default: True] [currently: True]
display.pprint_nest_depth : int
Controls the number of nested levels to process when pretty-printing
[default: 3] [currently: 3]
display.precision : int
Floating point output precision (number of significant digits). This is
only a suggestion
[default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
Whether to print out dimensions at the end of DataFrame repr.
If 'truncate' is specified, only print out the dimensions if the
frame is truncated (e.g. not display all rows and/or columns)
[default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.unicode.east_asian_width : boolean
Whether to use the Unicode East Asian Width to calculate the display text
width.
Enabling this may affect to the performance (default: False)
[default: False] [currently: False]
display.width : int
Width of the display in characters. In case python/IPython is running in
a terminal this can be set to None and pandas will correctly auto-detect
the width.
Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
terminal and hence it is not possible to correctly detect the width.
[default: 80] [currently: 80]
io.excel.ods.reader : string
The default Excel reader engine for 'ods' files. Available options:
auto, odf.
[default: auto] [currently: auto]
io.excel.ods.writer : string
The default Excel writer engine for 'ods' files. Available options:
auto, odf.
[default: auto] [currently: auto]
io.excel.xls.reader : string
The default Excel reader engine for 'xls' files. Available options:
auto, xlrd.
[default: auto] [currently: auto]
io.excel.xls.writer : string
The default Excel writer engine for 'xls' files. Available options:
auto, xlwt.
[default: auto] [currently: auto]
io.excel.xlsb.reader : string
The default Excel reader engine for 'xlsb' files. Available options:
auto, pyxlsb.
[default: auto] [currently: auto]
io.excel.xlsm.reader : string
The default Excel reader engine for 'xlsm' files. Available options:
auto, xlrd, openpyxl.
[default: auto] [currently: auto]
io.excel.xlsm.writer : string
The default Excel writer engine for 'xlsm' files. Available options:
auto, openpyxl.
[default: auto] [currently: auto]
io.excel.xlsx.reader : string
The default Excel reader engine for 'xlsx' files. Available options:
auto, xlrd, openpyxl.
[default: auto] [currently: auto]
io.excel.xlsx.writer : string
The default Excel writer engine for 'xlsx' files. Available options:
auto, openpyxl, xlsxwriter.
[default: auto] [currently: auto]
io.hdf.default_format : format
default format writing format, if None, then
put will default to 'fixed' and append will default to 'table'
[default: None] [currently: None]
io.hdf.dropna_table : boolean
drop ALL nan rows when appending to a table
[default: False] [currently: False]
io.parquet.engine : string
The default parquet reader/writer engine. Available options:
'auto', 'pyarrow', 'fastparquet', the default is 'auto'
[default: auto] [currently: auto]
mode.chained_assignment : string
Raise an exception, warn, or no action if trying to use chained assignment,
The default is warn
[default: warn] [currently: warn]
mode.sim_interactive : boolean
Whether to simulate interactive mode for purposes of testing
[default: False] [currently: False]
mode.use_inf_as_na : boolean
True means treat None, NaN, INF, -INF as NA (old way),
False means None and NaN are null, but INF, -INF are not NA
(new way).
[default: False] [currently: False]
mode.use_inf_as_null : boolean
use_inf_as_null had been deprecated and will be removed in a future
version. Use `use_inf_as_na` instead.
[default: False] [currently: False]
(Deprecated, use `mode.use_inf_as_na` instead.)
plotting.backend : str
The plotting backend to use. The default value is "matplotlib", the
backend provided with pandas. Other backends can be specified by
providing the name of the module that implements the backend.
[default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
Whether to register converters with matplotlib's units registry for
dates, times, datetimes, and Periods. Toggling to False will remove
the converters, restoring any converters that pandas overwrote.
[default: auto] [currently: auto]
To return specific information on a single Pandas option or setting you can pass the option or setting name to the describe_option()
function. For example, to get an explanation of what max_rows
does, you would enter the command pd.describe_option('max_rows')
.
pd.describe_option('max_rows')
display.max_rows : int
If max_rows is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the height of the terminal and print a truncated object which fits
the screen height. The IPython notebook, IPython qtconsole, or
IDLE do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 60] [currently: 60]
The Pandas get_option()
function returns the currently assigned value for a given Pandas configuration setting, such as the maximum number of rows, maximum column width, or maximum number of columns displayed. For example, to find the maximum number of rows Pandas will show you’d run pd.get_option('max_rows')
. If you prefer to use dot notation, pd.options.display.max_rows
will return the same value.
pd.get_option('max_colwidth')
50
The Pandas set_option()
function changes the setting or option currently stored. To use it you need to pass two arguments, the name of the option or setting, such as max_rows
, and the value you want to assign. For example, to increase max_rows
from the default value of 10 to 100, you’d run pd.set_option('max_rows', 100)
.
pd.set_option('max_rows', 100)
pd.set_option('max_colwidth', 1000)
pd.set_option('max_columns', 100)
The reset_option()
function is used to reset a Pandas option or setting back to its default value. To call the function you simply pass in the name of the function as the sole argument. For example, pd.reset_option('max_rows')
will reset the maximum number of rows back to 10. As with the other functions, dot notation also works, so you can use pd.options.display.max_rows
to do the same task.
Running pd.get_option('max_rows')
first reveals that max_rows
is currently set to 100, so we’ll reset it to the default value using pd.reset_option('max_rows')
, and finally re-run pd.get_option('max_rows')
to show that the value is now the default of 60.
pd.get_option('max_rows')
100
pd.reset_option('max_rows')
pd.get_option('max_rows')
60
The option_context()
function is one that gets used rarely. It is specifically designed to set a Pandas option or setting to one value and then immediately change it back to another.
For example, perhaps you’ve got a single dataframe in your Jupyter notebook to which you want to apply a certain type of formatting. You’d use option_context()
to modify that single dataframe and then change the setting back straight after.
Unlike other Pandas functions, option_context()
needs to be used as part of a with
statement. Here’s a simple example showing how it can be used. To show it working we’ll get get_option()
, then temporarily change this with option_context()
, then confirm it’s been changed back by calling get_option()
again.
pd.get_option('max_rows')
10
with pd.option_context('max_rows', 30):
print(pd.get_option('max_rows'))
30
pd.get_option('max_rows')
10
To temporarily change multiple Pandas settings, you can chain them when calling the option_context()
function.
with pd.option_context('max_rows', 2,
'max_columns', 3):
print(pd.get_option('max_rows'))
print(pd.get_option('max_columns'))
2
3
Now we’ve been over the various ways to get, set, reset, and temporarily change Pandas settings and options, let’s take a look at some of the common settings you might want to change when dealing with large or complex Pandas dataframes.
If you find that long strings are being truncated with an ellipsis (…), or you can’t read a dataframe because too much data is being shown, you may want to increase or decrease the column width. To change the maximum column width you need to pass the max_colwidth
value to set_option()
with the desired column width in characters. If you want an unlimited number of columns, you can pass None
.
pd.set_option('max_colwidth', 100)
pd.set_option('max_colwidth', None)
When there are too many columns to display, Pandas will hide some of them. You can increase the maximum number of columns shown by passing max_columns
to set_option()
. Similarly, if you want to condense a dataframe and only show a selection of columns, you can set max_columns
to a lower value.
pd.set_option('max_columns', 5)
By default, Pandas will only show 60 rows. When the value exceeds this number Pandas will show a truncated view comprising the top and bottom rows separated by an ellipsis.
To increase or decrease the maximum number of rows you can pass max_rows
to set_option()
with the maximum number of rows you wish to display. To display all rows you can pass None
, but be cautious of doing this when the dataframe is large, as you’ll run out of memory.
pd.set_option('max_rows', 10)
If you find that long numbers are being reformatted with scientific notation or you have too many trailing digits on floating point or decimal numbers you can reduce them with precision
. For example, by running pd.set_option('precision', 2)
Pandas will round numbers to two decimal places, rather than the usual 6.
pd.set_option('precision', 2)
Matt Clarke, Friday, August 26, 2022