内容目录
Pandas的对齐运算¶
算数运算和数据对齐¶
In [67]:
import numpy as np
import pandas as pd
s1 = pd.Series(np.arange(4), index=['a','b','c','d'])
s2 = pd.Series(np.arange(5), index=['a','c','e','f','g'])
In [68]:
# Series的算术运算示例
s1
Out[68]:
a 0 b 1 c 2 d 3 dtype: int64
In [69]:
s2
Out[69]:
a 0 c 1 e 2 f 3 g 4 dtype: int64
In [70]:
s1 + s2
Out[70]:
a 0.0 b NaN c 3.0 d NaN e NaN f NaN g NaN dtype: float64
In [74]:
df1 = pd.DataFrame(np.arange(12).reshape(4,3),index=['a','b','c','d'],columns=['A','B','C'])
df2 = pd.DataFrame(np.arange(9).reshape(3,3),index=['a','d','f'],columns=['A','B','D'])
In [75]:
df1
Out[75]:
A | B | C | |
---|---|---|---|
a | 0 | 1 | 2 |
b | 3 | 4 | 5 |
c | 6 | 7 | 8 |
d | 9 | 10 | 11 |
In [76]:
df2
Out[76]:
A | B | D | |
---|---|---|---|
a | 0 | 1 | 2 |
d | 3 | 4 | 5 |
f | 6 | 7 | 8 |
In [89]:
df1 + df2
Out[89]:
A | B | C | D | |
---|---|---|---|---|
a | 0.0 | 2.0 | NaN | NaN |
b | NaN | NaN | NaN | NaN |
c | NaN | NaN | NaN | NaN |
d | 12.0 | 14.0 | NaN | NaN |
f | NaN | NaN | NaN | NaN |
In [91]:
df3
# df3==np.NaN
Out[91]:
A | B | C | D | |
---|---|---|---|---|
a | 0.0 | 2.0 | NaN | NaN |
b | NaN | NaN | NaN | NaN |
c | NaN | NaN | NaN | NaN |
d | 12.0 | 14.0 | NaN | NaN |
f | NaN | NaN | NaN | NaN |
使用填充值的算术方法¶
In [92]:
s1
Out[92]:
a 0 b 1 c 2 d 3 dtype: int64
In [93]:
s2
Out[93]:
a 0 c 1 e 2 f 3 g 4 dtype: int64
In [94]:
# 使用fill_value填充值,将NaN的值默认填充为0
s1.add(s2,fill_value=0)
Out[94]:
a 0.0 b 1.0 c 3.0 d 3.0 e 2.0 f 3.0 g 4.0 dtype: float64
In [95]:
df1
Out[95]:
A | B | C | |
---|---|---|---|
a | 0 | 1 | 2 |
b | 3 | 4 | 5 |
c | 6 | 7 | 8 |
d | 9 | 10 | 11 |
In [96]:
df2
Out[96]:
A | B | D | |
---|---|---|---|
a | 0 | 1 | 2 |
d | 3 | 4 | 5 |
f | 6 | 7 | 8 |
In [97]:
df1.add(df2, fill_value=0)
Out[97]:
A | B | C | D | |
---|---|---|---|---|
a | 0.0 | 2.0 | 2.0 | 2.0 |
b | 3.0 | 4.0 | 5.0 | NaN |
c | 6.0 | 7.0 | 8.0 | NaN |
d | 12.0 | 14.0 | 11.0 | 5.0 |
f | 6.0 | 7.0 | NaN | 8.0 |
更多运算符:在线查询地址
In [100]:
# reindex等更多运算时,也可以使用fill_value进行填充
df1.reindex(columns=df2.columns,fill_value=0)
Out[100]:
A | B | D | |
---|---|---|---|
a | 0 | 1 | 0 |
b | 3 | 4 | 0 |
c | 6 | 7 | 0 |
d | 9 | 10 | 0 |
DataFrame和Series混合运算¶
In [101]:
arr = np.arange(12).reshape(3,4)
arr
Out[101]:
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
In [102]:
# 取arr二维数组第一行数据
arr[0]
Out[102]:
array([0, 1, 2, 3])
In [103]:
# arr - arr[0]时会进行每一行都减去arr[0]的数据(即广播)
arr - arr[0]
Out[103]:
array([[0, 0, 0, 0], [4, 4, 4, 4], [8, 8, 8, 8]])
In [104]:
df1
Out[104]:
A | B | C | |
---|---|---|---|
a | 0 | 1 | 2 |
b | 3 | 4 | 5 |
c | 6 | 7 | 8 |
d | 9 | 10 | 11 |
In [106]:
s3 = df1.iloc[0]
s3
Out[106]:
A 0 B 1 C 2 Name: a, dtype: int64
In [108]:
# DataFrame在与Series运算时会按照行进行广播运算
df1 - s3
Out[108]:
A | B | C | |
---|---|---|---|
a | 0 | 0 | 0 |
b | 3 | 3 | 3 |
c | 6 | 6 | 6 |
d | 9 | 9 | 9 |
In [110]:
# 如果要进行列广播运算,需要使用sub函数
s4 = df1['A']
s4
Out[110]:
a 0 b 3 c 6 d 9 Name: A, dtype: int64
In [113]:
df1.sub(s4, axis='index')
Out[113]:
A | B | C | |
---|---|---|---|
a | 0 | 1 | 2 |
b | 0 | 1 | 2 |
c | 0 | 1 | 2 |
d | 0 | 1 | 2 |