Pandas检查dataFrame中的NaN
NaN代表Not A Number,是表示数据中缺失值的常用方法之一。它是一种特殊的浮点值,不能转换为浮点数以外的任何其他类型。
NaN值是数据分析中的主要问题之一,为了得到理想的结果,对NaN进行处理是非常必要的。
检查Pandas DataFrame中的NaN值
在Pandas DataFrame中检查NaN的方法如下:
- 使用
isnull().values.any()
方法检查NaN - 使用
isnull().sum()
方法统计NaN - 使用
isnull().sum().any()
方法检查NaN - 使用
isnull().sum().sum()
方法统计NaN
方法1:使用isnull().values.any()方法
# importing libraries
import pandas as pd
import numpy as np
num = {'Integers': [10, 15, 30, 40, 55, np.nan,
75, np.nan, 90, 150, np.nan]}
# Create the dataframe
df = pd.DataFrame(num, columns=['Integers'])
# Applying the method
check_nan = df['Integers'].isnull().values.any()
# printing the result
print(check_nan)
# 输出 True
可以通过从isnull().values.any()
中删除.values.any()
来获得NaN值所在的确切位置。
df['Integers'].isnull()
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 True
8 False
9 False
10 True
Name: Integers, dtype: bool
方法2:使用isnull().sum()方法
# importing libraries
import pandas as pd
import numpy as np
num = {'Integers': [10, 15, 30, 40, 55, np.nan,
75, np.nan, 90, 150, np.nan]}
# Create the dataframe
df = pd.DataFrame(num, columns=['Integers'])
# applying the method
count_nan = df['Integers'].isnull().sum()
# printing the number of values present
# in the column
print('Number of NaN values present: ' + str(count_nan))
Number of NaN values present: 3
方法3:使用isnull().sum().any()方法
# importing libraries
import pandas as pd
import numpy as np
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
np.nan, 90, 150, np.nan],
'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
np.nan, 26, np.nan, np.nan]}
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
# applying the method
nan_in_df = df.isnull().sum().any()
# Print the dataframe
print(nan_in_df)
# 输出 True
可以通过从isnull().sum().any()
中删除.sum().any()
来获得NaN值所在的确切位置。
方法4:使用isnull().sum().sum()方法
# importing libraries
import pandas as pd
import numpy as np
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
np.nan, 90, 150, np.nan],
'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
np.nan, 26, np.nan, np.nan]}
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
# applying the method
nan_in_df = df.isnull().sum().sum()
# printing the number of values present in
# the whole dataframe
print('Number of NaN values present: ' + str(nan_in_df))
Number of NaN values present: 8
参考
- https://www.geeksforgeeks.org/check-for-nan-in-pandas-dataframe/