How to change Pandas dataframe settings and options

Picture by Pixabay, Pexels.

23 minutes to read

Data Science Pandas

When loading data into a Pandas dataframe, you’ll often find that data is truncated, columns are replaced with an ellipsis, or that the float precision makes numbers harder to read.

Thankfully, Pandas includes a built-in options API to let you configure global settings that control the way data is displayed within dataframes. In this simple project we’ll cover the Pandas options and settings API so you can adjust your dataframes to get them working just as you want them.

The 5 Pandas options and settings functions

There are only 5 functions in the Pandas API for controlling options and settings. These are: get_option(), set_option(), reset_option(), describe_option(), and option_context().

Function	Usage
`describe_option()`	The Pandas `describe_option()` function is used to display a description of a given function's usage and configuration.
`get_option()`	The Pandas `get_option()` function is used to get the value of a single function. For example, if you want to find out the maximum number of rows, the `get_option()` function will return the value currently stored so you can see if it needs to be changed.
`set_option()`	The Pandas `set_option()` function is used to set the value of a single function. For example, you can use this function to change the maximum number of rows, the maximum column width, or maximum number of columns shown in your dataframe.
`reset_option()`	The Pandas `reset_option()` function is used to reset the value of a single function to the default value.
`option_context()`	The Pandas `option_context()` function is used to execute a codeblock with a set of options that revert to prior settings after execution.

Import Pandas and load some data

Next, we’ll examine each of the Pandas functions used for configuring Pandas dataframes by working with some real data. To get started, import the Pandas package and import a large dataframe that includes a good mixture of data of various Pandas dtypes.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/telco.csv')
df.head()

	customerID	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	...	DeviceProtection	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn
0	7590-VHVEG	Female	Yes	No	1	No	No phone service	DSL	No	...	No	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No
1	5575-GNVDE	Male	No	No	34	Yes	No	DSL	Yes	...	Yes	No	No	No	One year	No	Mailed check	56.95	1889.5	No
2	3668-QPYBK	Male	No	No	2	Yes	No	DSL	Yes	...	No	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes
3	7795-CFOCW	Male	No	No	45	No	No phone service	DSL	Yes	...	Yes	Yes	No	No	One year	No	Bank transfer (automatic)	42.30	1840.75	No
4	9237-HQITU	Female	No	No	2	Yes	No	Fiber optic	No	...	No	No	No	No	Month-to-month	Yes	Electronic check	70.70	151.65	Yes

5 rows × 21 columns

Using describe_option() to view Pandas settings and options

Pandas includes a range of functions to get, set, reset, and temporarily change and then change back, any dataframe options or settings. However, to use any of these, you’ll first need to know what settings exist for you to change, which is where the describe_option() function comes in.

To return information on all available Pandas options and settings you can configure you need to run pd.describe_option() with no arguments. If you extract the initial value, such as display.chop_threshold or display.encoding, you can then pass this to other Pandas functions to get, set, or reset the value.

pd.describe_option()

compute.use_bottleneck : bool
    Use the bottleneck library to accelerate if it is installed,
    the default is True
    Valid values: False,True
    [default: True] [currently: True]
compute.use_numba : bool
    Use the numba engine option for select operations if it is installed,
    the default is False
    Valid values: False,True
    [default: False] [currently: False]
compute.use_numexpr : bool
    Use the numexpr library to accelerate computation if it is installed,
    the default is True
    Valid values: False,True
    [default: True] [currently: True]
display.chop_threshold : float or None
    if set to a float value, all float values smaller then the given threshold
    will be displayed as exactly 0 by repr and friends.
    [default: None] [currently: None]
display.colheader_justify : 'left'/'right'
    Controls the justification of column headers. used by DataFrameFormatter.
    [default: right] [currently: right]
display.column_space No description available.
    [default: 12] [currently: 12]
display.date_dayfirst : boolean
    When True, prints and parses dates with the day first, eg 20/01/2005
    [default: False] [currently: False]
display.date_yearfirst : boolean
    When True, prints and parses dates with the year first, eg 2005/01/20
    [default: False] [currently: False]
display.encoding : str/unicode
    Defaults to the detected encoding of the console.
    Specifies the encoding to be used for strings returned by to_string,
    these are generally strings meant to be displayed on the console.
    [default: UTF-8] [currently: UTF-8]
display.expand_frame_repr : boolean
    Whether to print out the full DataFrame repr for wide DataFrames across
    multiple lines, `max_columns` is still respected, but the output will
    wrap-around across multiple "pages" if its width exceeds `display.width`.
    [default: True] [currently: True]
display.float_format : callable
    The callable should accept a floating point number and return
    a string with the desired format of the number. This is used
    in some places like SeriesFormatter.
    See formats.format.EngFormatter for an example.
    [default: None] [currently: None]
display.html.border : int
    A ``border=value`` attribute is inserted in the ``<table>`` tag
    for the DataFrame HTML repr.
    [default: 1] [currently: 1]
display.html.table_schema : boolean
    Whether to publish a Table Schema representation for frontends
    that support it.
    (default: False)
    [default: False] [currently: False]
display.html.use_mathjax : boolean
    When True, Jupyter notebook will process table contents using MathJax,
    rendering mathematical expressions enclosed by the dollar symbol.
    (default: True)
    [default: True] [currently: True]
display.large_repr : 'truncate'/'info'
    For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
    show a truncated table (the default from 0.13), or switch to the view from
    df.info() (the behaviour in earlier versions of pandas).
    [default: truncate] [currently: truncate]
display.latex.escape : bool
    This specifies if the to_latex method of a Dataframe uses escapes special
    characters.
    Valid values: False,True
    [default: True] [currently: True]
display.latex.longtable :bool
    This specifies if the to_latex method of a Dataframe uses the longtable
    format.
    Valid values: False,True
    [default: False] [currently: False]
display.latex.multicolumn : bool
    This specifies if the to_latex method of a Dataframe uses multicolumns
    to pretty-print MultiIndex columns.
    Valid values: False,True
    [default: True] [currently: True]
display.latex.multicolumn_format : bool
    This specifies if the to_latex method of a Dataframe uses multicolumns
    to pretty-print MultiIndex columns.
    Valid values: False,True
    [default: l] [currently: l]
display.latex.multirow : bool
    This specifies if the to_latex method of a Dataframe uses multirows
    to pretty-print MultiIndex rows.
    Valid values: False,True
    [default: False] [currently: False]
display.latex.repr : boolean
    Whether to produce a latex DataFrame representation for jupyter
    environments that support it.
    (default: False)
    [default: False] [currently: False]
display.max_categories : int
    This sets the maximum number of categories pandas should output when
    printing out a `Categorical` or a Series of dtype "category".
    [default: 8] [currently: 8]
display.max_columns : int
    If max_cols is exceeded, switch to truncate view. Depending on
    `large_repr`, objects are either centrally truncated or printed as
    a summary view. 'None' value means unlimited.

    In case python/IPython is running in a terminal and `large_repr`
    equals 'truncate' this can be set to 0 and pandas will auto-detect
    the width of the terminal and print a truncated object which fits
    the screen width. The IPython notebook, IPython qtconsole, or IDLE
    do not run in a terminal and hence it is not possible to do
    correct auto-detection.
    [default: 20] [currently: 20]
display.max_colwidth : int or None
    The maximum width in characters of a column in the repr of
    a pandas data structure. When the column overflows, a "..."
    placeholder is embedded in the output. A 'None' value means unlimited.
    [default: 50] [currently: 50]
display.max_info_columns : int
    max_info_columns is used in DataFrame.info method to decide if
    per column information will be printed.
    [default: 100] [currently: 100]
display.max_info_rows : int or None
    df.info() will usually show null-counts for each column.
    For large frames this can be quite slow. max_info_rows and max_info_cols
    limit this null check only to frames with smaller dimensions than
    specified.
    [default: 1690785] [currently: 1690785]
display.max_rows : int
    If max_rows is exceeded, switch to truncate view. Depending on
    `large_repr`, objects are either centrally truncated or printed as
    a summary view. 'None' value means unlimited.

    In case python/IPython is running in a terminal and `large_repr`
    equals 'truncate' this can be set to 0 and pandas will auto-detect
    the height of the terminal and print a truncated object which fits
    the screen height. The IPython notebook, IPython qtconsole, or
    IDLE do not run in a terminal and hence it is not possible to do
    correct auto-detection.
    [default: 60] [currently: 60]
display.max_seq_items : int or None
    when pretty-printing a long sequence, no more then `max_seq_items`
    will be printed. If items are omitted, they will be denoted by the
    addition of "..." to the resulting string.

    If set to None, the number of items to be printed is unlimited.
    [default: 100] [currently: 100]
display.memory_usage : bool, string or None
    This specifies if the memory usage of a DataFrame should be displayed when
    df.info() is called. Valid values True,False,'deep'
    [default: True] [currently: True]
display.min_rows : int
    The numbers of rows to show in a truncated view (when `max_rows` is
    exceeded). Ignored when `max_rows` is set to None or 0. When set to
    None, follows the value of `max_rows`.
    [default: 10] [currently: 10]
display.multi_sparse : boolean
    "sparsify" MultiIndex display (don't display repeated
    elements in outer levels within groups)
    [default: True] [currently: True]
display.notebook_repr_html : boolean
    When True, IPython notebook will use html representation for
    pandas objects (if it is available).
    [default: True] [currently: True]
display.pprint_nest_depth : int
    Controls the number of nested levels to process when pretty-printing
    [default: 3] [currently: 3]
display.precision : int
    Floating point output precision (number of significant digits). This is
    only a suggestion
    [default: 6] [currently: 6]
display.show_dimensions : boolean or 'truncate'
    Whether to print out dimensions at the end of DataFrame repr.
    If 'truncate' is specified, only print out the dimensions if the
    frame is truncated (e.g. not display all rows and/or columns)
    [default: truncate] [currently: truncate]
display.unicode.ambiguous_as_wide : boolean
    Whether to use the Unicode East Asian Width to calculate the display text
    width.
    Enabling this may affect to the performance (default: False)
    [default: False] [currently: False]
display.unicode.east_asian_width : boolean
    Whether to use the Unicode East Asian Width to calculate the display text
    width.
    Enabling this may affect to the performance (default: False)
    [default: False] [currently: False]
display.width : int
    Width of the display in characters. In case python/IPython is running in
    a terminal this can be set to None and pandas will correctly auto-detect
    the width.
    Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
    terminal and hence it is not possible to correctly detect the width.
    [default: 80] [currently: 80]
io.excel.ods.reader : string
    The default Excel reader engine for 'ods' files. Available options:
    auto, odf.
    [default: auto] [currently: auto]
io.excel.ods.writer : string
    The default Excel writer engine for 'ods' files. Available options:
    auto, odf.
    [default: auto] [currently: auto]
io.excel.xls.reader : string
    The default Excel reader engine for 'xls' files. Available options:
    auto, xlrd.
    [default: auto] [currently: auto]
io.excel.xls.writer : string
    The default Excel writer engine for 'xls' files. Available options:
    auto, xlwt.
    [default: auto] [currently: auto]
io.excel.xlsb.reader : string
    The default Excel reader engine for 'xlsb' files. Available options:
    auto, pyxlsb.
    [default: auto] [currently: auto]
io.excel.xlsm.reader : string
    The default Excel reader engine for 'xlsm' files. Available options:
    auto, xlrd, openpyxl.
    [default: auto] [currently: auto]
io.excel.xlsm.writer : string
    The default Excel writer engine for 'xlsm' files. Available options:
    auto, openpyxl.
    [default: auto] [currently: auto]
io.excel.xlsx.reader : string
    The default Excel reader engine for 'xlsx' files. Available options:
    auto, xlrd, openpyxl.
    [default: auto] [currently: auto]
io.excel.xlsx.writer : string
    The default Excel writer engine for 'xlsx' files. Available options:
    auto, openpyxl, xlsxwriter.
    [default: auto] [currently: auto]
io.hdf.default_format : format
    default format writing format, if None, then
    put will default to 'fixed' and append will default to 'table'
    [default: None] [currently: None]
io.hdf.dropna_table : boolean
    drop ALL nan rows when appending to a table
    [default: False] [currently: False]
io.parquet.engine : string
    The default parquet reader/writer engine. Available options:
    'auto', 'pyarrow', 'fastparquet', the default is 'auto'
    [default: auto] [currently: auto]
mode.chained_assignment : string
    Raise an exception, warn, or no action if trying to use chained assignment,
    The default is warn
    [default: warn] [currently: warn]
mode.sim_interactive : boolean
    Whether to simulate interactive mode for purposes of testing
    [default: False] [currently: False]
mode.use_inf_as_na : boolean
    True means treat None, NaN, INF, -INF as NA (old way),
    False means None and NaN are null, but INF, -INF are not NA
    (new way).
    [default: False] [currently: False]
mode.use_inf_as_null : boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.
    [default: False] [currently: False]
    (Deprecated, use `mode.use_inf_as_na` instead.)
plotting.backend : str
    The plotting backend to use. The default value is "matplotlib", the
    backend provided with pandas. Other backends can be specified by
    providing the name of the module that implements the backend.
    [default: matplotlib] [currently: matplotlib]
plotting.matplotlib.register_converters : bool or 'auto'.
    Whether to register converters with matplotlib's units registry for
    dates, times, datetimes, and Periods. Toggling to False will remove
    the converters, restoring any converters that pandas overwrote.
    [default: auto] [currently: auto]

To return specific information on a single Pandas option or setting you can pass the option or setting name to the describe_option() function. For example, to get an explanation of what max_rows does, you would enter the command pd.describe_option('max_rows').

pd.describe_option('max_rows')

display.max_rows : int
    If max_rows is exceeded, switch to truncate view. Depending on
    `large_repr`, objects are either centrally truncated or printed as
    a summary view. 'None' value means unlimited.

    In case python/IPython is running in a terminal and `large_repr`
    equals 'truncate' this can be set to 0 and pandas will auto-detect
    the height of the terminal and print a truncated object which fits
    the screen height. The IPython notebook, IPython qtconsole, or
    IDLE do not run in a terminal and hence it is not possible to do
    correct auto-detection.
    [default: 60] [currently: 60]

Using get_option() to get the value of a Pandas option

The Pandas get_option() function returns the currently assigned value for a given Pandas configuration setting, such as the maximum number of rows, maximum column width, or maximum number of columns displayed. For example, to find the maximum number of rows Pandas will show you’d run pd.get_option('max_rows'). If you prefer to use dot notation, pd.options.display.max_rows will return the same value.

pd.get_option('max_colwidth')

Using set_option() to set the value of a Pandas option

The Pandas set_option() function changes the setting or option currently stored. To use it you need to pass two arguments, the name of the option or setting, such as max_rows, and the value you want to assign. For example, to increase max_rows from the default value of 10 to 100, you’d run pd.set_option('max_rows', 100).

pd.set_option('max_rows', 100)

pd.set_option('max_colwidth', 1000)

pd.set_option('max_columns', 100)

Using reset_option() to reset the value of a Pandas option

The reset_option() function is used to reset a Pandas option or setting back to its default value. To call the function you simply pass in the name of the function as the sole argument. For example, pd.reset_option('max_rows') will reset the maximum number of rows back to 10. As with the other functions, dot notation also works, so you can use pd.options.display.max_rows to do the same task.

Running pd.get_option('max_rows') first reveals that max_rows is currently set to 100, so we’ll reset it to the default value using pd.reset_option('max_rows'), and finally re-run pd.get_option('max_rows') to show that the value is now the default of 60.

pd.get_option('max_rows')

pd.reset_option('max_rows')

pd.get_option('max_rows')

Using option_context() to change then reset a Pandas setting value

The option_context() function is one that gets used rarely. It is specifically designed to set a Pandas option or setting to one value and then immediately change it back to another.

For example, perhaps you’ve got a single dataframe in your Jupyter notebook to which you want to apply a certain type of formatting. You’d use option_context() to modify that single dataframe and then change the setting back straight after.

Unlike other Pandas functions, option_context() needs to be used as part of a with statement. Here’s a simple example showing how it can be used. To show it working we’ll get get_option(), then temporarily change this with option_context(), then confirm it’s been changed back by calling get_option() again.

pd.get_option('max_rows')

with pd.option_context('max_rows', 30):
    print(pd.get_option('max_rows'))

pd.get_option('max_rows')

To temporarily change multiple Pandas settings, you can chain them when calling the option_context() function.

with pd.option_context('max_rows', 2, 
                       'max_columns', 3):
    print(pd.get_option('max_rows'))
    print(pd.get_option('max_columns'))

2
3

Changing common Pandas display settings

Now we’ve been over the various ways to get, set, reset, and temporarily change Pandas settings and options, let’s take a look at some of the common settings you might want to change when dealing with large or complex Pandas dataframes.

1. Increase or decrease the maximum column width

If you find that long strings are being truncated with an ellipsis (…), or you can’t read a dataframe because too much data is being shown, you may want to increase or decrease the column width. To change the maximum column width you need to pass the max_colwidth value to set_option() with the desired column width in characters. If you want an unlimited number of columns, you can pass None.

pd.set_option('max_colwidth', 100)

pd.set_option('max_colwidth', None)

2. Increase or decrease the maximum number of columns

When there are too many columns to display, Pandas will hide some of them. You can increase the maximum number of columns shown by passing max_columns to set_option(). Similarly, if you want to condense a dataframe and only show a selection of columns, you can set max_columns to a lower value.

pd.set_option('max_columns', 5)

3. Increase or decrease the maximum number of rows

By default, Pandas will only show 60 rows. When the value exceeds this number Pandas will show a truncated view comprising the top and bottom rows separated by an ellipsis.

To increase or decrease the maximum number of rows you can pass max_rows to set_option() with the maximum number of rows you wish to display. To display all rows you can pass None, but be cautious of doing this when the dataframe is large, as you’ll run out of memory.

pd.set_option('max_rows', 10)

4. Changing the precision or number of trailing digits

If you find that long numbers are being reformatted with scientific notation or you have too many trailing digits on floating point or decimal numbers you can reduce them with precision. For example, by running pd.set_option('precision', 2) Pandas will round numbers to two decimal places, rather than the usual 6.

pd.set_option('precision', 2)

Matt Clarke, Friday, August 26, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.