Pandas: Can you rewrite this line to use df.loc?

I have been working through Reindert-Jan Ekker excellent Plurasight course Pandas Playbook: Manipulating Data and right at the end of Demo: Detecting and Inspecting Missing Values, part of Module 5 Cleaning data, he asks the following question:

# Can you rewrite this line to use df.loc?

This code checks that data for MIN_TEMP_GROUND column only appears every 6th row. It drops every 6th row, the remaining rows should all be null which is confirmed when the code executes and returns True.

One problem with the existing solution is the use of chaining to obtain the answer and it might be more efficient to use loc instead. After kicking this around for a couple of hours and beginning to spin my wheels I asked a question on Stackoverflow. The answer used loc in conjunction with the % (modulo) operator to identify the rows of interest and check they were null.

df.loc[(df.index % 6) != 5, 'MIN_TEMP_GROUND'].isnull().all()


Reindert-Jan Ekker whose Plurasight catalogue of courses I can recommend.

Henry Ecker for generously answering my Stackoverflow question.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.