Simple Pandas Functions I wish I knew earlier

January 15, 2021

When slicing a pandas Dataframe we have an index-based selection data.iloc[row index, column index] and label-based selection data.loc[row label, column label]. Tutorials abound for this; however, when I have a large dataset with a numeric or time-series index and labeled columns, more often than not I simply want to select rows based on index and column from a label. This simple selection eluded me for much too long:

data.iloc[0].column_name

By default when creating a new Dataframe with multiple arrays pandas will stack them vertically (i.e. axis=0 or row-based). It boggles me that stacking arrays horizontally isn't included in the documentation:

data_one = np.array([1,2,3])
data_two = np.array([4,5,6])
pd.DataFrame([data_one, data_two], columns=['col_1', 'col_2', 'col_3'])
#   col_1	col_2	col_3
# 0   1     2     3
# 1   4     5     6

df = pd.DataFrame(np.column_stack((data_one, data_two)), columns=['col_1', 'col_2'])
#   col_1	col_2
# 0   1     4
# 1   2     5
# 2   3     6

And finally, let's avoid some "SettingwithCopyWarning" errors - when adding new columns to a DataFrame use the assign method:

df = pd.DataFrame([1, 2, 3], columns=['col_1'])
new_column = [4, 5, 6]
df = df.assign(col_2=new_column)
#   col_1  col_2
# 0   1     4
# 1   2     5
# 2   3     6

notes

A few more helpful tips and built-in Pandas functions

← Home