当前位置：

首页
/
IT
/
程序
/
Objective-C
/
2Python进阶强化训练之csv|json|xml|excel高

2Python进阶强化训练之csv|json|xml|excel高

excel 如何

Python 进阶强化训练之 csv|json|xml|excel 高

如何读写 csv 数据？

实际案例

我们可以通过 http://table.finance.yahoo.com/table.csv?s=000001.sz，这个 url 获取中国股市 (深市) 数据集，它以 csv 数据格式存储：

Date,
Open,
High,
Low,
Close,
Volume,
Adj Close 2016 - 09 - 15,
9.06,
9.06,
9.06,
9.06,
000,
9.06 2016 - 09 - 14,
9.17,
9.18,
9.05,
9.06,
42148100,
9.06 2016 - 09 - 13,
9.18,
9.21,
9.14,
9.19,
46093100,
9.19 2016 - 09 - 12,
9.29,
9.32,
9.13,
9.16,
75658100,
9.16 2016 - 09 - 09,
9.40,
9.43,
9.36,
9.38,
32743100,
9.38 2016 - 09 - 08,
9.39,
9.42,
9.38,
9.40,
29521800,
9.40 2016 - 09 - 07,
9.41,
9.42,
9.37,
9.40,
45937300,
9.40 2016 - 09 - 06,
9.42,
9.43,
9.36,
9.41,
57473800,
9.41 2016 - 09 - 05,
9.46,
9.46,
9.40,
9.42,
46993600,
9.42 2016 - 09 - 02,
9.43,
9.46,
9.42,
9.45,
36879600,
9.45 2016 - 09 - 01,
9.49,
9.52,
9.42,
9.45,
48013100,
9.45 2016 - 08 - 31,
9.46,
9.50,
9.43,
9.49,
48974600,
9.49 2016 - 08 - 30,
9.42,
9.47,
9.41,
9.47,
59508100,
9.47 2016 - 08 - 29,
9.41,
9.45,
9.38,
9.42,
56523100,
9.42 2016 - 08 - 26,
9.45,
9.47,
9.40,
9.45,
50223300,
9.45 2016 - 08 - 25,
9.42,
9.45,
9.34,
9.44,
61738900,
9.44 2016 - 08 - 24,
9.41,
9.43,
9.38,
9.43,
73228100,
9.43

请将平安银行这支股票，在 2016 年中成交量超过

14000000

的记录存储到另一个 csv 文件中

解决方案

使用标准库中的 csv 模块，可以使用其中

reader

和

writer

完成文件读写

下载数据

>>> from urllib import urlretrieve
# 获取平安银行股票信息，保存到pingan.csv文件中
>>> urlretrieve('http://table.finance.yahoo.com/table.csv?s=000001.sz', 'pingan.csv')
('pingan.csv', <httplib.HTTPMessage instance at 0x1a997e8>)

使用 csv 模块进行读

>>> import csv
>>> rf = open('pingan.csv', 'rb')
>>> reader = csv.reader(rf)
# 获取的对象是一个可迭代的
>>> reader.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> reader.next()
['2016-09-15', '9.06', '9.06', '9.06', '9.06', '000', '9.06']

使用 csv 模块进行写

>>> wf = open('pingan_copy.csv', 'wb')
>>> writer = csv.writer(wf)
>>> writer.writerow(['2016-09-14', '9.17', '9.18', '9.05', '9.06', '42148100', '9.06'])
>>> writer.writerow(reader.next())                                                     
>>> wf.flush()

查看写入的文件内容

[[email protected] ~]# cat pingan_copy.csv 2016 - 09 - 14,
9.17,
9.18,
9.05,
9.06,
42148100,
9.06 2016 - 09 - 13,
9.18,
9.21,
9.14,
9.19,
46093100,
9.19

如上的问题解决方案如下：

#!/use/bin/env python
# _*_ coding:utf-8 _*_
import csv
with open('pingan.csv', 'rb') as rf:
    reader = csv.reader(rf)
    with open('pingan2016.csv', 'wb') as wf:
        writer = csv.writer(wf)
        headers = reader.next()
        writer.writerow(headers)
        for row in reader:
            if row[0] < '2016-01-01':
                break
            if int(row[5]) >= 50000000:
                writer.writerow(row)

如何读写 json 数据？

实际案例

在 web 应用中常用 JSON(JavaScript Object Notation) 格式传输数据，在 python 中如何读写 json 数据

解决方案

使用标准库中的 json 模块，其中

loads

、

dumps

函数可以完成 json 数据的读写

将数据类型转换为字符串

>>>  import json# 创建一个列表 >>>  l  =   [1, 2, 'asd', {'blgo_url',
    'ansheng.me'
}]# 使用dumps转换为字符串 >>>  json.dumps(l)' [1,  2,  "asd",   {
    "blgo_url":  "ansheng.me"
}]'# 去掉空格 >>>  json.dumps(l,  separators = [', ', ': '])' [1, 2, "asd", {
    "blgo_url": "ansheng.me"
}]'# 排序 >>>  d  =   {'b': None,
    'a': '111',
    'g': 'Null'
} >>>  json.dumps(d,  sort_keys = True)' {
    "a":  "111",
     "b":  null,
     "g":  "Null"
}'

将字符串转换为数据类型

>>>  json.loads(' [1, 2, "asd", {
    "blgo_url": "ansheng.me"
}]')[1,  2,  'asd',   {'blgo_url':  'ansheng.me'
}]

如何解析简单的 xml 文档？

实际案例

如以下

XML

文档，如何使用 python 进行解析？

<?xml version="1.0"  encoding="utf-8"  ?>
    <data>
            
        <country name="Liechtenstein">
                    
            <rank update="yes">
                2
                </rank>
                        
                <year>
                    2008
                </year>
                        
                <gdppc>
                    141100
                </gdppc>
                        
                <neighbor name="Austria"  direction="E" />
                        
                <neighbor name="Switzerland"  direction="W" />
                    
                </country>
                    
                <country name="Singapore">
                            
                    <rank update="yes">
                        5
                        </rank>
                                
                        <year>
                            2011
                        </year>
                                
                        <gdppc>
                            59900
                        </gdppc>
                                
                        <neighbor name="Malaysia"  direction="N" />
                            
                        </country>
                            
                        <country name="Panama">
                                    
                            <rank update="yes">
                                69
                                </rank>
                                        
                                <year>
                                    2011
                                </year>
                                        
                                <gdppc>
                                    13600
                                </gdppc>
                                        
                                <neighbor name="Costa Rica"  direction="W" />
                                        
                                <neighbor name="Colombia"  direction="E" />
                                    
                                </country>
    </data>

解决方案

可以使用标准库中的

xml.etree.ElementTree

，其中的

parse

函数可以解析 XML 文档

# 导入parse
>>> from xml.etree.ElementTree import parse
>>> f = open('a.xml')
# 获得ElementTree对象
>>> et = parse(f)
# 获取根节点，也就是data
>>> root = et.getroot()
>>> root
<Element 'data' at 0x00000203ECA1E728>
# 查看标签
>>> root.tag
'data'
# 属性
>>> root.attrib
{}
# 文本
>>> root.text.strip()
''
# 获得一个节点的子元素，然后在获取每个子元素的属性
>>> for child in root: print(child.get('name'))
...
Liechtenstein
Singapore
Panama
# 根据标签寻找子元素，每次之寻找第一个
>>> root.find('country')
<Element 'country' at 0x00000203ECBD3228>
# 寻找所有
>>> root.findall('country')
[<Element 'country' at 0x00000203ECBD3228>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'country' at 0x00000203ECBDBDB8>]
# 获得一个生成器对象
>>> root.iterfind('country')
<generator object prepare_child.<locals>.select at 0x00000203ECBC5FC0>
>>> for e in root.iterfind('country'): print(e.get('name'))
...
Liechtenstein
Singapore
Panama
# 获取所有的元素节点
>>> list(root.iter())
[<Element 'data' at 0x00000203ECA1E728>, <Element 'country' at 0x00000203ECBD3228>, <Element 'rank' at 0x00000203ECBDBA98>, <Element 'year' at 0x00000203ECBDBAE8>, <Element 'gdppc' at 0x00000203ECBDBB38>, <Element 'neighbor' at 0x00000203ECBDBB88>, <Element 'neighbor' at 0x00000203ECBDBBD8>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'rank' at 0x00000203ECBDBC78>, <Element 'year' at 0x00000203ECBDBCC8>, <Element 'gdppc' at 0x00000203ECBDBD18>, <Element 'neighbor' at 0x00000203ECBDBD68>, <Element 'country' at 0x00000203ECBDBDB8>, <Element 'rank' at 0x00000203ECBDBE08>, <Element 'year' at 0x00000203ECBDBE58>, <Element 'gdppc' at 0x00000203ECBDBEA8>, <Element 'neighbor' at 0x00000203ECBDBEF8>, <Element 'neighbor' at 0x00000203ECBDBF48>]
# 寻找标签为rank的子节点
>>> list(root.iter('rank'))
[<Element 'rank' at 0x00000203ECBDBA98>, <Element 'rank' at 0x00000203ECBDBC78>, <Element 'rank' at 0x00000203ECBDBE08>]

查找的高级用法

# 匹配country下的所有子节点 >>>  root.findall('country
/*')
[<Element 'rank' at 0x00000203ECBDBA98>, <Element 'year' at 0x00000203ECBDBAE8>, <Element 'gdppc' at 0x00000203ECBDBB38>, <Element 'neighbor' at 0x00000203ECBDBB88>, <Element 'neighbor' at 0x00000203ECBDBBD8>, <Element 'rank' at 0x00000203ECBDBC78>, <Element 'year' at 0x00000203ECBDBCC8>, <Element 'gdppc' at 0x00000203ECBDBD18>, <Element 'neighbor' at 0x00000203ECBDBD68>, <Element 'rank' at 0x00000203ECBDBE08>, <Element 'year' at 0x00000203ECBDBE58>, <Element 'gdppc' at 0x00000203ECBDBEA8>, <Element 'neighbor' at 0x00000203ECBDBEF8>, <Element 'neighbor' at 0x00000203ECBDBF48>]
# 找到所有节点的rank
>>> root.findall('.//rank')
[<Element 'rank' at 0x00000203ECBDBA98>, <Element 'rank' at 0x00000203ECBDBC78>, <Element 'rank' at 0x00000203ECBDBE08>]
# 找到rank的父对象
>>> root.findall('.//rank/..')
[<Element 'country' at 0x00000203ECBD3228>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'country' at 0x00000203ECBDBDB8>]
# 查找country包含name属性的
>>> root.findall('country[@name]')
[<Element 'country' at 0x00000203ECBD3228>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'country' at 0x00000203ECBDBDB8>]
>>> root.findall('country[@age]')
[]
# 查找属性等于特定值的元素
>>> root.findall('country[@name="Singapore"]')
[<Element 'country' at 0x00000203ECBDBC28>]
# 查找必须包含某一个子元素
>>> root.findall('country[rank]')
[<Element 'country' at 0x00000203ECBD3228>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'country' at 0x00000203ECBDBDB8>]
# 查找子元素等于指定的值
>>> root.findall('country[rank="5"]')
[<Element 'country' at 0x00000203ECBDBC28>]
# 根据位置查找，从1开始
>>> root.findall('country')
[<Element 'country' at 0x00000203ECBD3228>, <Element 'country' at 0x00000203ECBDBC28>, <Element 'country' at 0x00000203ECBDBDB8>]
# 根据位置查找
>>> root.findall('country[1]')
[<Element 'country' at 0x00000203ECBD3228>]
# 倒数第一个
>>> root.findall('country[last()]')
[<Element 'country' at 0x00000203ECBDBDB8>]
# 倒数第二个
>>> root.findall('country[last()-1]')
[<Element 'country' at 0x00000203ECBDBC28>]*/

如何构建 xml 文档？

实际案例

某些时候，我们需要将其他格式数据转换为 xml，例如下面的字符串如何转换为 XML？

解决方案

使用标准库中的

xml.etree.ElementTree

，构建

ElementTree

，使用

write

方法写入文件

>>> from xml.etree.ElementTree import Element, ElementTree
>>> from xml.etree.ElementTree import tostring
>>> e = Element('Data')
>>> e.set('name', 'abc')
>>> e.text = '123'
# 查看结果
>>> tostring(e)
b'<Data name="abc">123</Data>'
>>> e2 = Element('Row')
>>> e3 = Element('Open')
>>> e3.text = '9.17'
>>> e2.append(e3)
>>> e.text = None
>>> e.append(e2)
>>> tostring(e)
b'<Data name="abc"><Row><Open>9.17</Open></Row></Data>'
>>> et = ElementTree(e)
>>> et.write('demo.xml')

解决如上问题的脚本如下：

import csv
from xml.etree.ElementTree import Element, ElementTree
def pretty(e, level=0):
    if len(e) > 0:
        e.text = '\n' + '\t' * (level + 1)
        for child in e:
            pretty(child, level + 1)
        child.tail = child.tail[:-1]
    e.tail = '\n' + '\t' * level
    
def csvToXML(fname):
    with open(fname, 'rb') as f:
        reader = csv.reader(f)
        headers = reader.next()
        root = Element('Data')
        for row in reader:
            eRow = Element('Row')
            root.append(eRow)
            for tag, text in zip(headers, row):
                e = Element(tag)
                e.text = text
                eRow.append(e)
    pretty(root)
    return ElementTree(root)
    
et = csvToXML('pingan.csv')
et.write('pingan.xml')

转换好的文件内容为：

<Data>
            
    <Row>
                        
        <Date>
            2016-09-15
        </Date>
                        
        <Open>
            9.06
        </Open>
                        
        <High>
            9.06
        </High>
                        
        <Low>
            9.06
        </Low>
                        
        <Close>
            9.06
        </Close>
                        
        <Volume>
            000
        </Volume>
                        
        <Adj Close>
            9.06
        </Adj Close>
                
    </Row>
            
    <Row>
                        
        <Date>
            2016-09-14
        </Date>
                        
        <Open>
            9.17
        </Open>
                        
        <High>
            9.18
        </High>
                        
        <Low>
            9.05
        </Low>
                        
        <Close>
            9.06
        </Close>
                        
        <Volume>
            42148100
        </Volume>
                        
        <Adj Close>
            9.06
        </Adj Close>
                
    </Row>
            .....

如何读写 excel 文件？

实际案例

Microsoft Excel 是日常办公中使用最频繁的软件，起数据格式为 xls,xlsx, 一种非常常用的电子表格。

某小学某班成绩记录在 excel 了文件中，内容如下：

姓名    语文    数学    外语
小明    95    96    94
张三    85    84    92
王五    86    85    75
小哈    96    92    100

利用 python 读写 excel 了，添加总分列，计算每人总分

解决方案

使用第三方库 xlrd 和 xlwt，这两个库分别用于 excel 读和写

安装这两个模块

pip3 install xlrd
pip3 install xlwt

脚本如下：

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# 导入xlrd和xlwt
import xlrd
import xlwt
# 打开excel文件
rbook = xlrd.open_workbook('x.xlsx')
# 表
rsheet = rbook.sheet_by_index(0)
# 添加一个总分的列
# 列
nc = rsheet.ncols
# 第0行，列，文本类型，文字，内容
rsheet.put_cell(0, nc, xlrd.XL_CELL_TEXT, '总分', None)
# 迭代表中的所有数据，计算总分
for row in range(1, rsheet.nrows):
    # 计算每行的总分，跳过第0行,0==姓名,sum对列表进行求和，t等于最后加上拿出来的分数
    t = sum(rsheet.row_values(row, 1))
    # 写入数据
    rsheet.put_cell(row, nc, xlrd.XL_CELL_NUMBER, t, None)
    
# 写入到文件中
wbook = xlwt.Workbook()
wsheet = wbook.add_sheet(rsheet.name)
# 对其方式，垂直和水平都是剧中
style = xlwt.easyxf('align: vertical center, horizontal center')
# rsheet的每个单元格写入到wsheet中
for r in range(rsheet.nrows):  # 行
    for c in range(rsheet.ncols):  # 列
        wsheet.write(r, c, rsheet.cell_value(r, c), style)
        
wbook.save('output.xlsx')

计算结果如下：

姓名    语文    数学    外语    总分
小明    95    96    94    285
张三    85    84    92    261
王五    86    85    75    246
小哈    96    92    100    288

#Python

来源: http://www.bubuko.com/infodetail-2065811.html

与本文相关文章

暂无,快来抢沙发吧！