首页 » 3.DataFrame的基本操作

日期： 2024-04-17
时间： 23:50
作者： Dongming
阅读量: 211

3.DataFrame的基本操作

文章目录

DataFrame的构建¶

字典类：

数组、列表或元组构成的字典构造dataframe
Series构成的字典构造dataframe
字典构成的字典构造dataframe

列表类：

2D ndarray构造dataframe
字典构成的列表构造dataframe
Series构成的列表构造dataframe

In [39]:

import numpy as np
import pandas as pd

# 1.数组、列表或元组构成的字典构造dataframe
data = {
    'a': [1,2,3,4],
    'b': (5,6,7,8),
    'c': np.arange(9,13)
       } 

# 构造dataframe
frame = pd.DataFrame(data)
frame

Out[39]:

	a	b	c
0	1	5	9
1	2	6	10
2	3	7	11
3	4	8	12

In [40]:

# index属性查看行索引
frame.index

Out[40]:

RangeIndex(start=0, stop=4, step=1)

In [41]:

# columns属性查看列索引
frame.columns

Out[41]:

Index(['a', 'b', 'c'], dtype='object')

In [42]:

# values属性查看值
frame.values

Out[42]:

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [43]:

# 指定行索引index的值
frame = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
frame

Out[43]:

	a	b	c
A	1	5	9
B	2	6	10
C	3	7	11
D	4	8	12

In [51]:

# 指定列索引index的值
frame = pd.DataFrame(data, index=['A', 'B', 'C', 'D'], columns=['a', 'b', 'c'])
frame

Out[51]:

	a	b	c
0	1	5	9
1	2	6	10
2	3	7	11
3	4	8	12

In [55]:

# 2.Seris构成的字典构造dataframe
pd1 = pd.DataFrame({
    'a':pd.Series(np.arange(3)),
    'b':pd.Series(np.arange(3,5))
                    })
pd1

Out[55]:

	a	b
0	0	3.0
1	1	4.0
2	2	NaN

In [57]:

# 3.字典构成的字典构造dataframe
data1 = {
    'beijing': {'apple':3.6, 'banana': 5.6},
    'lanzhou': {'apple':3.2, 'banana': 6.6},
    'tianjin': {'apple':3.2}
}
pd2 = pd.DataFrame(data1)
pd2

Out[57]:

	beijing	lanzhou	tianjin
apple	3.6	3.2	3.2
banana	5.6	6.6	NaN

In [59]:

# 4.2D ndarray构造dataframe
arr1 = np.arange(12).reshape(4,3)  # 创建3行3列
frame1 = pd.DataFrame(arr1)
frame1

Out[59]:

	0	1	2
0	0	1	2
1	3	4	5
2	6	7	8
3	9	10	11

In [60]:

# 5.字典构成的列表构造dataframe
list_data = [
    {'apple': 3.6, 'banana':5.6,},
    {'apple': 3.9, 'banana':4.6,},
    {'apple': 3.2}
]
pd3 = pd.DataFrame(list_data)
pd3

Out[60]:

	apple	banana
0	3.6	5.6
1	3.9	4.6
2	3.2	NaN

In [68]:

# 6.Series构成的列表构造dataframe
list_data2 = [pd.Series(np.random.rand(4)), pd.Series(np.random.rand(4))]
print(list_data2)
pd4 = pd.DataFrame(list_data2)
pd4

[0    0.026158
1    0.033837
2    0.305968
3    0.162702
dtype: float64, 0    0.308487
1    0.819329
2    0.555282
3    0.294235
dtype: float64]

Out[68]:

	0	1	2	3
0	0.026158	0.033837	0.305968	0.162702
1	0.308487	0.819329	0.555282	0.294235

DataFrame的基本操作¶

转置操作¶

In [78]:

# 1.T转置
pd5 = pd.DataFrame(np.arange(9).reshape(3,3), index=['语文', '数学', '英语'], columns=['counts', 'scores', 'numbers'])
pd5

Out[78]:

	counts	scores	numbers
语文	0	1	2
数学	3	4	5
英语	6	7	8

In [80]:

# 通过.T进行转置
pd5.T

Out[80]:

	语文	数学	英语
counts	0	3	6
scores	1	4	7
numbers	2	5	8

获取数据¶

In [83]:

# 2.通过列表索引获取列数据(Series类型)
print(pd5['counts'])
print(pd5['counts'][1])

语文    0
数学    3
英语    6
Name: counts, dtype: int64
3

增加列数据¶

In [87]:

# 3.增加列数据
pd5['lens'] = 9 
pd5['lens'] = [15, 29, 40]
pd5

Out[87]:

	counts	scores	numbers	体育	lens
语文	0	1	2	9	15
数学	3	4	5	9	29
英语	6	7	8	9	40

删除列数据¶

In [88]:

# 4.删除列
del(pd5['体育'])
pd5

Out[88]:

	counts	scores	numbers	lens
语文	0	1	2	15
数学	3	4	5	29
英语	6	7	8	40

DataFrame的索引操作¶

In [1]:

import numpy as np
import pandas as pd

# 1.Series

1.Series和DataFrame中的索引都是Index对象

In [2]:

ps = pd.Series(range(5), index=['a','b','c','d','e'])
ps

Out[2]:

a    0
b    1
c    2
d    3
e    4
dtype: int64

In [3]:

pd1 = pd.DataFrame(np.arange(9).reshape(3,3), index=['a','b','c'], columns=['A','B','C'])
pd1

Out[3]:

	A	B	C
a	0	1	2
b	3	4	5
c	6	7	8

2.索引对象不可变，保障了数据的安全

In [103]:

# ps.index[0] = 'f'    # 索引不可变，执行会报错

3.常见的索引种类

Index：索引
Int64Index：整数索引
MultiIndex：层级索引
DatatimeIndex：时间戳类型

索引的基本操作(增、删、改、查)¶

重新索引操作¶

In [4]:

# 重新索引Series对象 
ps1 = pd.Series(range(5), index=[1,2,3,4,6])
ps1

Out[4]:

1    0
2    1
3    2
4    3
6    4
dtype: int64

In [107]:

ps2 = ps1.reindex([4,3,2,0,1,6])
ps2

Out[107]:

4    3.0
3    2.0
2    1.0
0    NaN
1    0.0
6    4.0
dtype: float64

In [8]:

# 重新索引DataFrame对象
pd1 = pd.DataFrame(np.arange(9).reshape(3,3), index=['a','b','c'], columns=['A','B','C'])
pd1

Out[8]:

	A	B	C
a	0	1	2
b	3	4	5
c	6	7	8

In [111]:

# 行索引重建
pd2 = pd1.reindex(['c', 'd','a', 'b',])
pd2

Out[111]:

	A	B	C
c	6.0	7.0	8.0
d	NaN	NaN	NaN
a	0.0	1.0	2.0
b	3.0	4.0	5.0

In [112]:

# 列索引重建
pd3 = pd1.reindex(columns=['B','C','A'])
pd3

Out[112]:

	B	C	A
a	1	2	0
b	4	5	3
c	7	8	6

增加操作¶

In [114]:

import numpy as np
import pandas as pd

ps2 = pd.Series(range(5), index=[1,2,3,4,6])
ps2

Out[114]:

1    0
2    1
3    2
4    3
6    4
dtype: int64

In [115]:

# Series：在原有数据基础上增加新的数据
ps2['5'] = 8
ps2

Out[115]:

1    0
2    1
3    2
4    3
6    4
5    8
dtype: int64

In [119]:

# Series：不影响原有数据基础上增加新的数据
s1 = pd.Series({10: 99})
ps3 = pd.concat([ps2, s1])
ps3

Out[119]:

1      0
2      1
3      2
4      3
6      4
5      8
10    99
dtype: int64

In [138]:

# DataFrame：在原有基础上增加新的数据
pd1 = pd.DataFrame(np.arange(9).reshape(3,3), index=['a','b','c'], columns=['A','B','C'])
pd1

Out[138]:

	A	B	C
a	0	1	2
b	3	4	5
c	6	7	8

In [139]:

pd1[4] = 9   # 默认是增加一列
pd1

Out[139]:

	A	B	C	4
a	0	1	2	9
b	3	4	5	9
c	6	7	8	9

In [140]:

# 增加列
pd1[4] = [10, 11, 12]  # 列的每一行数据不一样
pd1

Out[140]:

	A	B	C	4
a	0	1	2	10
b	3	4	5	11
c	6	7	8	12

In [141]:

pd1.insert(0, 'E', [15, 16, 17])
pd1

Out[141]:

	E	A	B	C	4
a	15	0	1	2	10
b	16	3	4	5	11
c	17	6	7	8	12

In [142]:

# 增加行
# 通过pd1[:][4] = [1,2,3,4,5]是不行的，所以需要使用标签索引
# 标签索引loc
pd1.loc['d'] = [1,1,1,1,1]
pd1

Out[142]:

	E	A	B	C	4
a	15	0	1	2	10
b	16	3	4	5	11
c	17	6	7	8	12
d	1	1	1	1	1

In [143]:

# 通过将字典转换为DataFrame之后，使用concat添加行
new_data = {'E': 6, 'A': 7, 'B': 10, 'C': 6, 4: 6}  # 将'4'改为整数4
new_row_df = pd.DataFrame([new_data])
pd1 = pd.concat([pd1, new_row_df], ignore_index=True)
pd1

Out[143]:

	E	A	B	C	4
0	15	0	1	2	10
1	16	3	4	5	11
2	17	6	7	8	12
3	1	1	1	1	1
4	6	7	10	6	6

删除操作¶

In [5]:

ps1

Out[5]:

1    0
2    1
3    2
4    3
6    4
dtype: int64

In [7]:

del ps1[3]
ps1

Out[7]:

1    0
2    1
4    3
6    4
dtype: int64

In [9]:

pd1

Out[9]:

	A	B	C
a	0	1	2
b	3	4	5
c	6	7	8

In [10]:

# del只能删除列
del pd1['C']
pd1

Out[10]:

	A	B
a	0	1
b	3	4
c	6	7

In [11]:

# drop：可以删除轴上数据
ps6 = ps1.drop(2)
ps6

Out[11]:

1    0
4    3
6    4
dtype: int64

In [12]:

# 删除多条数据
ps1.drop([1,4])

Out[12]:

2    1
6    4
dtype: int64

In [13]:

# dataframe的删除（默认删除的是行数据）
pd1.drop('c')

Out[13]:

	A	B
a	0	1
b	3	4

In [14]:

# 指定axis=1即为删除列数据，指定axis=0为删除行数据
pd1.drop('B',axis=1)

Out[14]:

	A
a	0
b	3
c	6

In [15]:

# inplace属性：在原对象上删除
ps1

Out[15]:

1    0
2    1
4    3
6    4
dtype: int64

In [17]:

ps1.drop(4,inplace=True)
ps1

Out[17]:

1    0
2    1
6    4
dtype: int64

修改操作¶

In [18]:

import numpy as np
import pandas as pd

ps1 = pd.Series(range(5), index=['a','b','c','d','e'])
ps1

Out[18]:

a    0
b    1
c    2
d    3
e    4
dtype: int64

In [ ]:

In [20]:

# 修改ps中的数据
ps1['a'] = 999
ps1

Out[20]:

a    999
b      1
c      2
d      3
e      4
dtype: int64

In [21]:

ps1[0] = 888
ps1

Out[21]:

a    888
b      1
c      2
d      3
e      4
dtype: int64

In [ ]:

# 修改DataFrame中的数据

In [22]:

pd1 = pd.DataFrame(np.arange(9).reshape(3,3), index=['a','b','c'], columns=['A','B','C'])
pd1

Out[22]:

	A	B	C
a	0	1	2
b	3	4	5
c	6	7	8

In [23]:

# 修改A列的所有数据
pd1['A'] = 9
pd1

Out[23]:

	A	B	C
a	9	1	2
b	9	4	5
c	9	7	8

In [24]:

pd1['A'] = [9, 10, 11]
pd1

Out[24]:

	A	B	C
a	9	1	2
b	10	4	5
c	11	7	8

In [25]:

# 通过loc标签修改行数据
pd1.loc['b'] = [1,1,1]
pd1

Out[25]:

	A	B	C
a	9	1	2
b	1	1	1
c	11	7	8

查询操作¶

In [26]:

ps1

Out[26]:

a    888
b      1
c      2
d      3
e      4
dtype: int64

In [29]:

# 1、行索引
# 使用标签进行索引查找
ps1['a']

Out[29]:

In [28]:

# 使用位置下标进行索引查找
ps1[0]

Out[28]:

In [30]:

# 2、切片索引
# 使用位置切片索引
ps1[2:4]

Out[30]:

c    2
d    3
dtype: int64

In [33]:

# 使用标签切片索引
ps1['a':'c']

Out[33]:

a    888
b      1
c      2
dtype: int64

In [35]:

# 3、不连续索引
ps1[['b','d','e']]

Out[35]:

b    1
d    3
e    4
dtype: int64

In [38]:

ps1[[1,3,4]]

Out[38]:

b    1
d    3
e    4
dtype: int64

In [39]:

# 4、布尔索引
ps1[ps1>2]

Out[39]:

a    888
d      3
e      4
dtype: int64

In [42]:

# DataFrame的索引
pd1

Out[42]:

	A	B	C
a	9	1	2
b	1	1	1
c	11	7	8

In [43]:

# 1、标签索引查看值
pd1['A']

Out[43]:

a     9
b     1
c    11
Name: A, dtype: int64

In [47]:

# 位置索引查看值
# 备注：DataFrame无法用位置索引()
# pd1[0]

In [48]:

# 查询多列数据
pd1[['A','C']]

Out[48]:

	A	C
a	9	2
b	1	1
c	11	8

In [49]:

# 查询某一个值
pd1['A']['a']

Out[49]:

In [ ]:

# 2、切片

In [50]:

# 注意：此处使用切片获取的默认是行数据
pd1[:2]

Out[50]:

	A	B	C
a	9	1	2
b	1	1	1

In [51]:

pd1['a':'c']

Out[51]:

	A	B	C
a	9	1	2
b	1	1	1
c	11	7	8

In [54]:

# 3.1通过loc标签进行行和列的切片查找
pd1.loc['a':'b','A':'C']

Out[54]:

	A	B	C
a	9	1	2
b	1	1	1

In [57]:

# 3.2通过iloc位置索引进行查找
pd1.iloc[0:2,1:3]

Out[57]:

	B	C
a	1	2
b	1	1

声明：一起AI技术所有文章，如无特殊说明或标注，均为本站作者原创发布。任何个人或组织，在未征得作者同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。
原创不易，内容版权受保护，感谢您的尊重与支持。

0 0 投票数

Article Rating

订阅评论

0 评论

内联反馈

查看所有评论

Dongming

见天地，见众生，见自己。

分类文章

推荐活动