How to use Pandas head() and tail() to get the first and last rows

Picture by Nacho Canepa, Pexels.

11 minutes to read

Data Science Pandas

One of the first things you’ll do whenever you import a Pandas dataframe is view the data to check that it’s formatted correctly and see what you’re dealing with. It’s an important step since about 80% of what we data scientists do is, unfortunately, just cleaning and reformatting data before we can do more interesting stuff with it.

Pandas includes a few very useful functions for viewing and checking data in dataframes. In this simple tutorial we’ll be going over the head() function used for showing the first rows, the tail() function for showing the last rows, the sample() function for showing random rows, and the T or transpose function for flipping the orientation of the dataframe.

Import Pandas and load a dataframe

To get started, open a Jupyter notebook, import Pandas and Numpy using the import pandas as pd and import numpy as np naming conventions, and create a dummy dataset containing some random data. The Pandas shape method can be used to return a tuple indicating how many rows and columns are present.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
df.shape

(1000, 4)

Using head() to view the first rows of a DataFrame

The Pandas head() function is used to return the first rows from a dataframe. By default, it returns the first 5 rows, so when you call head() without any arguments you’ll get 5 rows back, unless you’ve used set_option() to increase the default value.

df.head()

	A	B	C	D
0	-1.695955	0.151810	-1.304380	1.117109
1	-0.483403	0.229203	-0.490425	0.728589
2	-0.091534	-1.057842	0.325895	0.769804
3	0.962251	0.885115	0.078876	-0.723674
4	0.303216	-0.204397	0.150116	0.367043

To return a specific number of rows, such as just the first row, you can pass an integer value to the head() function, so df.head(1) will return the first row and df.head(10) will return the first 10 rows.

df.head(1)

	A	B	C	D
0	-1.695955	0.15181	-1.30438	1.117109

Using set_option() to prevent truncation

When you import Pandas it sets a number of default display options, such as the max_rows option, which defines the maximum number of rows displayed. You can find the value of the max_rows option by calling pd.get_option('max_rows'). By default, this is set to 60 rows.

pd.get_option('max_rows')

If you attempt to print the whole df dataframe, or call the head() function with a value that exceeds the max_rows setting, i.e. df.head(100), Pandas will return the data in a truncated view in which the first five and last five rows are shown with ellipses in the middle to denote that data is missing.

df.head(100)

	A	B	C	D
0	0.164807	-0.286455	1.340928	0.115890
1	-1.060355	-0.644209	-1.364114	-2.747539
2	0.464657	0.478078	0.622145	0.294728
3	0.753968	-0.275934	-0.605848	1.109735
4	-0.661911	0.092234	0.951647	-1.525059
...	...	...	...	...
95	-0.653926	1.001738	-0.118835	-0.291664
96	-1.303779	-0.465464	-1.025301	-1.536585
97	-0.598785	-0.384292	1.649467	0.543135
98	0.389788	0.482393	-0.544763	-1.138746
99	-1.838366	-0.515860	-0.615266	-0.272381

100 rows × 4 columns

If you want to override the default max_rows value you can use the Pandas set_option function to set the value to a higher number. Now you’ll see a larger number of rows. This is very useful when you just need to scan the data by eye to check that it looks right.

pd.set_option('max_rows', 100)
df.head(100)

Using tail() to view the last rows of a DataFrame

The Pandas tail() function works just like the head() function but instead shows the last rows. Calling the function with no arguments will return the last five rows in the dataframe.

df.tail()

	A	B	C	D
995	-1.781818	0.417116	-1.995486	0.316706
996	-0.738063	1.208763	1.200226	0.143066
997	0.758632	-0.186649	1.618236	0.711830
998	0.256206	-1.166524	0.709279	0.610565
999	0.681827	0.873835	1.829247	-0.641025

As with head(), you can also pass an integer value to the tail() function to return a specific number of rows from the bottom of the dataframe, so df.tail(1) will return only the last row.

df.tail(1)

	A	B	C	D
999	0.681827	0.873835	1.829247	-0.641025

Using transpose to flip the orientation of the data

The Pandas transpose function T can be used to flip the orientation of the dataframe, so the columns become rows and the rows become columns. Transposing a dataframe is an extremely useful technique for visually comparing or checking data, especially on dataframes that are wide due to the presence of lots of columns or long column values.

df.head(3).T

	0	1	2
A	-1.695955	-0.483403	-0.091534
B	0.151810	0.229203	-1.057842
C	-1.304380	-0.490425	0.325895
D	1.117109	0.728589	0.769804

Using negative values in head() and tail()

More rarely, you might also see a negative value being passed to the head() or tail() functions. For example, calling df.head(-10) returns all rows apart from the first 10, while df.tail(-10) returns all rows apart from the last 10.

df.head(-10)

	A	B	C	D
0	0.164807	-0.286455	1.340928	0.115890
1	-1.060355	-0.644209	-1.364114	-2.747539
2	0.464657	0.478078	0.622145	0.294728
3	0.753968	-0.275934	-0.605848	1.109735
4	-0.661911	0.092234	0.951647	-1.525059
...	...	...	...	...
985	0.505878	0.641314	0.848925	2.262935
986	0.148572	-0.984472	1.963678	-0.302820
987	0.155646	0.799404	-0.867468	0.233681
988	1.879391	-0.530778	-0.906801	-1.321481
989	0.333411	-0.520215	0.180943	-0.336810

990 rows × 4 columns

df.tail(-990)

	A	B	C	D
990	-0.064963	0.217709	-0.767372	0.363326
991	-1.285855	-1.214493	0.542552	-0.511454
992	0.500009	0.864383	-1.350805	0.192343
993	1.150621	-1.119834	0.054419	-1.936994
994	-0.182967	0.872534	0.841756	-1.004139
995	-0.842824	-0.120532	-0.190949	-0.652673
996	-1.805779	0.398528	-1.638430	-1.060032
997	1.519081	-0.947822	-1.514677	0.031164
998	-0.905574	0.761248	0.219420	-0.913892
999	1.812816	-0.031498	-0.910258	0.607475

Using sample() to return a random sample of rows

If you want to return a random sample of rows, you can use the Pandas sample() function. Like head() and tail(), sample() takes an optional n parameter, which specifies the number of rows to return. If you don’t specify n, it will return a single row. So, df.sample() returns a single random row, and df.sample(3) returns 3 random rows.

df.sample()

	A	B	C	D
259	-1.745995	0.407777	0.596915	-0.540192

df.sample(3)

	A	B	C	D
691	1.004318	1.131174	0.046987	-0.391565
863	0.126247	0.584434	-0.274276	-0.386791
834	0.126157	0.636516	0.694208	-0.555351

Matt Clarke, Saturday, November 26, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.