I have been working through Reindert-Jan Ekker excellent Plurasight course Pandas Playbook: Manipulating Data and right at the end of Demo: Detecting and Inspecting Missing Values, part of Module 5 Cleaning data, he asks the following question:
# Can you rewrite this line to use df.loc? df['MIN_TEMP_GROUND'].drop(every_6th_row).isnull().all() True
This code checks that data for MIN_TEMP_GROUND column only appears every 6th row. It drops every 6th row, the remaining rows should all be null which is confirmed when the code executes and returns True.
One problem with the existing solution is the use of chaining to obtain the answer and it might be more efficient to use loc instead. After kicking this around for a couple of hours and beginning to spin my wheels I asked a question on Stackoverflow. The answer used loc in conjunction with the % (modulo) operator to identify the rows of interest and check they were null.
df.loc[(df.index % 6) != 5, 'MIN_TEMP_GROUND'].isnull().all() True
Acknowledgements
Reindert-Jan Ekker whose Plurasight catalogue of courses I can recommend.
Henry Ecker for generously answering my Stackoverflow question.