内容目录
Pandas的数据读取¶
方法名 | 描述 | 示例 |
---|---|---|
read_csv() |
读取逗号分隔的文件(csv) | pd.read_csv('file.csv') |
read_excel() |
读取Excel文件 | pd.read_excel('file.xlsx') |
read_json() |
读取JSON文件 | pd.read_json('file.json') |
read_sql() |
从SQL数据库中读取数据 | pd.read_sql('SELECT * FROM table', connection) |
read_html() |
从HTML文件中读取表格数据 | pd.read_html('file.html') |
In [224]:
import numpy as np
import pandas as pd
读写文本格式文件¶
读取文件¶
In [226]:
df = pd.read_csv('testdata/example_scores.csv')
df
Out[226]:
英语 | 数学 | |
---|---|---|
0 | 98.0 | 14.0 |
1 | 19.0 | 83.0 |
2 | 11.0 | 48.0 |
3 | 73.0 | 21.0 |
4 | 71.0 | 25.0 |
5 | 19.0 | NaN |
6 | 19.0 | 6.0 |
7 | 26.0 | 20.0 |
8 | 99.0 | 60.0 |
9 | NaN | 65.0 |
10 | 97.0 | 47.0 |
11 | 54.0 | 87.0 |
12 | 19.0 | 6.0 |
13 | 13.0 | 92.0 |
14 | 1.0 | 91.0 |
15 | 95.0 | 56.0 |
16 | 4.0 | 57.0 |
17 | 19.0 | 28.0 |
18 | 74.0 | 9.0 |
19 | 44.0 | 6.0 |
In [228]:
pd.read_table('testdata/example_scores.csv',sep=',')
Out[228]:
英语 | 数学 | |
---|---|---|
0 | 98.0 | 14.0 |
1 | 19.0 | 83.0 |
2 | 11.0 | 48.0 |
3 | 73.0 | 21.0 |
4 | 71.0 | 25.0 |
5 | 19.0 | NaN |
6 | 19.0 | 6.0 |
7 | 26.0 | 20.0 |
8 | 99.0 | 60.0 |
9 | NaN | 65.0 |
10 | 97.0 | 47.0 |
11 | 54.0 | 87.0 |
12 | 19.0 | 6.0 |
13 | 13.0 | 92.0 |
14 | 1.0 | 91.0 |
15 | 95.0 | 56.0 |
16 | 4.0 | 57.0 |
17 | 19.0 | 28.0 |
18 | 74.0 | 9.0 |
19 | 44.0 | 6.0 |
In [232]:
pd.read_csv('testdata/example_scores_notitle.csv',header=None)
Out[232]:
0 | 1 | |
---|---|---|
0 | 98.0 | 14.0 |
1 | 19.0 | 83.0 |
2 | 11.0 | 48.0 |
3 | 73.0 | 21.0 |
4 | 71.0 | 25.0 |
5 | 19.0 | NaN |
6 | 19.0 | 6.0 |
7 | 26.0 | 20.0 |
8 | 99.0 | 60.0 |
9 | NaN | 65.0 |
10 | 97.0 | 47.0 |
11 | 54.0 | 87.0 |
12 | 19.0 | 6.0 |
13 | 13.0 | 92.0 |
14 | 1.0 | 91.0 |
15 | 95.0 | 56.0 |
16 | 4.0 | 57.0 |
17 | 19.0 | 28.0 |
18 | 74.0 | 9.0 |
19 | 44.0 | 6.0 |
In [233]:
pd.read_csv('testdata/example_scores_notitle.csv',names=['语文','数学'])
Out[233]:
语文 | 数学 | |
---|---|---|
0 | 98.0 | 14.0 |
1 | 19.0 | 83.0 |
2 | 11.0 | 48.0 |
3 | 73.0 | 21.0 |
4 | 71.0 | 25.0 |
5 | 19.0 | NaN |
6 | 19.0 | 6.0 |
7 | 26.0 | 20.0 |
8 | 99.0 | 60.0 |
9 | NaN | 65.0 |
10 | 97.0 | 47.0 |
11 | 54.0 | 87.0 |
12 | 19.0 | 6.0 |
13 | 13.0 | 92.0 |
14 | 1.0 | 91.0 |
15 | 95.0 | 56.0 |
16 | 4.0 | 57.0 |
17 | 19.0 | 28.0 |
18 | 74.0 | 9.0 |
19 | 44.0 | 6.0 |
In [238]:
pd.read_csv('testdata/Employee Sample Data.csv',encoding='latin1')
Out[238]:
EEID | Full Name | Job Title | Department | Business Unit | Gender | Ethnicity | Age | Hire Date | Annual Salary | Bonus % | Country | City | Exit Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E02387 | Emily Davis | Sr. Manger | IT | Research & Development | Female | Black | 55 | 4/8/2016 | $141,604 | 15% | United States | Seattle | 10/16/2021 |
1 | E04105 | Theodore Dinh | Technical Architect | IT | Manufacturing | Male | Asian | 59 | 11/29/1997 | $99,975 | 0% | China | Chongqing | NaN |
2 | E02572 | Luna Sanders | Director | Finance | Speciality Products | Female | Caucasian | 50 | 10/26/2006 | $163,099 | 20% | United States | Chicago | NaN |
3 | E02832 | Penelope Jordan | Computer Systems Manager | IT | Manufacturing | Female | Caucasian | 26 | 9/27/2019 | $84,913 | 7% | United States | Chicago | NaN |
4 | E01639 | Austin Vo | Sr. Analyst | Finance | Manufacturing | Male | Asian | 55 | 11/20/1995 | $95,409 | 0% | United States | Phoenix | NaN |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
995 | E03094 | Wesley Young | Sr. Analyst | Marketing | Speciality Products | Male | Caucasian | 33 | 9/18/2016 | $98,427 | 0% | United States | Columbus | NaN |
996 | E01909 | Lillian Khan | Analyst | Finance | Speciality Products | Female | Asian | 44 | 5/31/2010 | $47,387 | 0% | China | Chengdu | 1/8/2018 |
997 | E04398 | Oliver Yang | Director | Marketing | Speciality Products | Male | Asian | 31 | 6/10/2019 | $176,710 | 15% | United States | Miami | NaN |
998 | E02521 | Lily Nguyen | Sr. Analyst | Finance | Speciality Products | Female | Asian | 33 | 1/28/2012 | $95,960 | 0% | China | Chengdu | NaN |
999 | E03545 | Sofia Cheng | Vice President | Accounting | Corporate | Female | Asian | 63 | 7/26/2020 | $216,195 | 31% | United States | Miami | NaN |
1000 rows × 14 columns
In [240]:
# 创建层次化索引
pa = pd.read_csv('testdata/Employee Sample Data.csv',encoding='latin1',index_col=['Country','City'])
pa
Out[240]:
EEID | Full Name | Job Title | Department | Business Unit | Gender | Ethnicity | Age | Hire Date | Annual Salary | Bonus % | Exit Date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | City | ||||||||||||
United States | Seattle | E02387 | Emily Davis | Sr. Manger | IT | Research & Development | Female | Black | 55 | 4/8/2016 | $141,604 | 15% | 10/16/2021 |
China | Chongqing | E04105 | Theodore Dinh | Technical Architect | IT | Manufacturing | Male | Asian | 59 | 11/29/1997 | $99,975 | 0% | NaN |
United States | Chicago | E02572 | Luna Sanders | Director | Finance | Speciality Products | Female | Caucasian | 50 | 10/26/2006 | $163,099 | 20% | NaN |
Chicago | E02832 | Penelope Jordan | Computer Systems Manager | IT | Manufacturing | Female | Caucasian | 26 | 9/27/2019 | $84,913 | 7% | NaN | |
Phoenix | E01639 | Austin Vo | Sr. Analyst | Finance | Manufacturing | Male | Asian | 55 | 11/20/1995 | $95,409 | 0% | NaN | |
… | … | … | … | … | … | … | … | … | … | … | … | … | |
Columbus | E03094 | Wesley Young | Sr. Analyst | Marketing | Speciality Products | Male | Caucasian | 33 | 9/18/2016 | $98,427 | 0% | NaN | |
China | Chengdu | E01909 | Lillian Khan | Analyst | Finance | Speciality Products | Female | Asian | 44 | 5/31/2010 | $47,387 | 0% | 1/8/2018 |
United States | Miami | E04398 | Oliver Yang | Director | Marketing | Speciality Products | Male | Asian | 31 | 6/10/2019 | $176,710 | 15% | NaN |
China | Chengdu | E02521 | Lily Nguyen | Sr. Analyst | Finance | Speciality Products | Female | Asian | 33 | 1/28/2012 | $95,960 | 0% | NaN |
United States | Miami | E03545 | Sofia Cheng | Vice President | Accounting | Corporate | Female | Asian | 63 | 7/26/2020 | $216,195 | 31% | NaN |
1000 rows × 12 columns
In [259]:
# 分块读取大文件
pa2 = pd.read_csv('testdata/Employee Sample Data.csv',encoding='latin1',chunksize=10)
# print(pa2.get_chunk(5))
pa2.get_chunk(5)
Out[259]:
EEID | Full Name | Job Title | Department | Business Unit | Gender | Ethnicity | Age | Hire Date | Annual Salary | Bonus % | Country | City | Exit Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E02387 | Emily Davis | Sr. Manger | IT | Research & Development | Female | Black | 55 | 4/8/2016 | $141,604 | 15% | United States | Seattle | 10/16/2021 |
1 | E04105 | Theodore Dinh | Technical Architect | IT | Manufacturing | Male | Asian | 59 | 11/29/1997 | $99,975 | 0% | China | Chongqing | NaN |
2 | E02572 | Luna Sanders | Director | Finance | Speciality Products | Female | Caucasian | 50 | 10/26/2006 | $163,099 | 20% | United States | Chicago | NaN |
3 | E02832 | Penelope Jordan | Computer Systems Manager | IT | Manufacturing | Female | Caucasian | 26 | 9/27/2019 | $84,913 | 7% | United States | Chicago | NaN |
4 | E01639 | Austin Vo | Sr. Analyst | Finance | Manufacturing | Male | Asian | 55 | 11/20/1995 | $95,409 | 0% | United States | Phoenix | NaN |
In [257]:
pa3 = pd.read_csv('testdata/Employee Sample Data.csv',encoding='latin1',iterator=True)
pa3.get_chunk(5)
Out[257]:
EEID | Full Name | Job Title | Department | Business Unit | Gender | Ethnicity | Age | Hire Date | Annual Salary | Bonus % | Country | City | Exit Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E02387 | Emily Davis | Sr. Manger | IT | Research & Development | Female | Black | 55 | 4/8/2016 | $141,604 | 15% | United States | Seattle | 10/16/2021 |
1 | E04105 | Theodore Dinh | Technical Architect | IT | Manufacturing | Male | Asian | 59 | 11/29/1997 | $99,975 | 0% | China | Chongqing | NaN |
2 | E02572 | Luna Sanders | Director | Finance | Speciality Products | Female | Caucasian | 50 | 10/26/2006 | $163,099 | 20% | United States | Chicago | NaN |
3 | E02832 | Penelope Jordan | Computer Systems Manager | IT | Manufacturing | Female | Caucasian | 26 | 9/27/2019 | $84,913 | 7% | United States | Chicago | NaN |
4 | E01639 | Austin Vo | Sr. Analyst | Finance | Manufacturing | Male | Asian | 55 | 11/20/1995 | $95,409 | 0% | United States | Phoenix | NaN |
写入文件¶
In [242]:
df2 = pd.read_csv('testdata/example_scores.csv')
df2
Out[242]:
英语 | 数学 | |
---|---|---|
0 | 98.0 | 14.0 |
1 | 19.0 | 83.0 |
2 | 11.0 | 48.0 |
3 | 73.0 | 21.0 |
4 | 71.0 | 25.0 |
5 | 19.0 | NaN |
6 | 19.0 | 6.0 |
7 | 26.0 | 20.0 |
8 | 99.0 | 60.0 |
9 | NaN | 65.0 |
10 | 97.0 | 47.0 |
11 | 54.0 | 87.0 |
12 | 19.0 | 6.0 |
13 | 13.0 | 92.0 |
14 | 1.0 | 91.0 |
15 | 95.0 | 56.0 |
16 | 4.0 | 57.0 |
17 | 19.0 | 28.0 |
18 | 74.0 | 9.0 |
19 | 44.0 | 6.0 |
In [243]:
df2.to_csv('testdata/example_scores_pandas.csv')