Python Pandas to extract rows and columns from a DataFrame
Get number of rows and columns
godarda@gd:~$ python3
...
>>> import pandas as pd
>>> df=pd.read_csv("/home/godarda/gd.csv")
>>> df
account_no name city dob bank amount
0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000
1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000
2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000
3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000
4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250
5 25622348994 Aanya Delhi 1975-08-18 SBI 105000
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
>>> df.shape
(7, 6)
>>> row,col=df.shape
>>> r,c=df.shape
>>> r
7
>>> c
6
Slicing on a DataFrame
>>> df[0:7]
account_no name city dob bank amount
0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000
1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000
2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000
3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000
4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250
5 25622348994 Aanya Delhi 1975-08-18 SBI 105000
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
>>> df[4:7]
account_no name city dob bank amount
4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250
5 25622348994 Aanya Delhi 1975-08-18 SBI 105000
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
>>> df[0::2] # Retrieving alternate rows (starting from 0 to multiple of 2)
account_no name city dob bank amount
0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000
2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000
4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
>>> df[0::3]
account_no name city dob bank amount
0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000
3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
Retrieving column(s) data
>>> df.columns
Index(['account_no', 'name', 'city', 'dob', 'bank', 'amount'], dtype='object')
>>> df.account_no
0 25622348989
1 25622348990
2 25622348991
3 25622348992
4 25622348993
5 25622348994
6 25622348995
Name: account_no, dtype: int64
>>> df.dob
0 1985-05-26
1 1990-08-20
2 1994-01-29
3 1982-01-14
4 1967-01-05
5 1975-08-18
6 1978-06-26
Name: dob, dtype: object
>>> df[['account_no','name']]
account_no name
0 25622348989 James Moore
1 25622348990 Donald Taylor
2 25622348991 Edward Parkar
3 25622348992 Ryan Bakshi
4 25622348993 Marie Peters
5 25622348994 Aanya
6 25622348995 James Moore
>>> df.amount
0 5000
1 7000
2 95000
3 50000
4 12250
5 105000
6 97800
Name: amount, dtype: int64
Retrieving data using query operation
>>> df['amount'].min()
5000
>>> df['amount'].max()
105000
>>> df[df.amount>=50000]
account_no name city dob bank amount
2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000
3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000
5 25622348994 Aanya Delhi 1975-08-18 SBI 105000
6 25622348995 James Moore NaN 1978-06-26 Citi 97800
>>> df[['account_no','name','amount']][df.amount>=50000]
account_no name amount
2 25622348991 Edward Parkar 95000
3 25622348992 Ryan Bakshi 50000
5 25622348994 Aanya 105000
6 25622348995 James Moore 97800
Getting top and bottom data
>>> df.head()# head() will retrieve top 5 rows account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 >>> df.tail()# tail() will retrieve bottom 5 rows account_no name city dob bank amount 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df.head(3) account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 >>> df.tail(3) account_no name city dob bank amount 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800
Statistical description of a DataFrame
>>> df.describe()
account_no amount
count 7.000000e+00 7.000000
mean 2.562235e+10 53150.000000
std 2.160247e+00 45761.055131
min 2.562235e+10 5000.000000
25% 2.562235e+10 9625.000000
50% 2.562235e+10 50000.000000
75% 2.562235e+10 96400.000000
max 2.562235e+10 105000.000000
Comments and Reactions
What Next?
Python Pandas to index and sort a DataFrame
Python Pandas to fill and drop missing data in a DataFrameData VisualizationAdvertisement