Dealing with NaN values in Pandas

22 September 2017
22 Sep 2017
2 min read

In my continued 1 series on Pandas, today I’m going to be writing about how to deal with NaN values.2 NaN values are entries in a dataframe that lack data; perhaps the column is non applicable, the data is messy, or the data is simply incomplete. Many models do not do well with NaN data, so dealing with these rows are a critical part of the data science process. Below are four examples of how to drop NaN values from a Pandas Dataframe.

df.dropna() #This command drops all rows that have any NaN values df.dropna(how='all') #This command drops the rows only if ALL columns are NaN df.dropna(thresh=2) #This command drops the row if it does not have at least two values that are not NaN df.dropna(subset=1) #This command drops the row only if NaN is in the specific column

Depending on how exactly you want ot deal with NaN values in Pandas, each of these results could be a right answer - sometimes it’s worth dropping a row if it has a NaN value, sometimes it’s worth dropping the row if only all columns are NaN, etc. Being able to slice and dice this data in whatever way you’d like is truly a superpower of Python and Pandas, specifically.

  1. yet unanticipated ↩︎

  2. Summarized from this Stack Overflow post. ↩︎

Want to know more?

I spend a ton of time thinking on how to work smarter, not harder. If you'd like to be the first to know what I'm thinking about, sign up to the list below.

iPhone X Event: First Impressions

Constantly Iterating