1,CSV 文件存储
1.1 写入
简单示例
- import CSV
- with open('data.csv', 'a') as csvfile:
- writer = CSV.writer(csvfile) # 初始化写入对象, 传入文件句柄
- writer.writerow(['id', 'name', 'age']) # 调用 writerow() 方法传入每行的数据
- writer.writerow(['1', 'rose', '18'])
- writer.writerow(['2', 'john', '19'])
以文本方式打开, 分隔符默认为逗号 (,):
- id,name,age
- 1,rose,18
- 2,john,19
修改默认分隔符:
writer = CSV.writer(csvfile, delimiter=' ') # 以空格为分隔符
同时写入多行:
- # 此时参数为二维列表
- writer.writerow([['1', 'rose', '18'], ['2', 'john', '19']])
避免出现空行, 可以在写入时加 newline='':
with open("test.csv", "a+", newline='') as csvfile:
如果数据源是字典
- import CSV
- with open('data1.csv', 'a') as csvfile:
- fieldnames = ['id', 'name', 'age'] # 定义表头
- writer = CSV.DictWriter(csvfile, fieldnames=fieldnames) # 初始化一个字典, 将文件句柄和表头传入
- writer.writeheader() # 写入表头
- writer.writerow({'id': '1', 'name': 'rose', 'age': 18}) # 写入表格中具体内容
编码问题, 需要指定 open() 函数编码格式:
open('data.csv', 'a', encoding='utf-8')
另外 pandas 库的 DataFrame 对象的 to_csv() 方法也可以将数据写入 CSV 中.
1.2 读取
- import CSV
- with open('data1.csv', 'r') as csvfile:
- reader = CSV.reader(csvfile)
- for row in reader:
- print(row)
结果如下:
- ['id', 'name', 'age']
- ['1', 'rose', '18']
Tips: 如果有中文需要指定文件编码
pandas 库的 read_csv() 方法
- import pandas as pd
- df = pd.read_csv('data.csv')
- print(df)
运行结果如下:
- id name age
- 0 1 rose 18
- 1 2 john 19
1.3 避免重复插入表头
- #newline 的作用是防止每次插入都有空行
- with open("test.csv", "a+", newline='') as csvfile: # 必须使用 a+, 追加方式
- writer = CSV.writer(csvfile)
- #以读的方式打开 CSV 用 CSV.reader 方式判断是否存在标题.
- with open("test.csv", "r", newline="") as f:
- reader = CSV.reader(f)
- if not [row for row in reader]:
- writer.writerow(["型号", "分类"])
- writer.writerows([[keyword, miaoshu]])
- else:
- writer.writerows([[keyword, miaoshu]])
示例
爬取一下该网站的所有评论: https://www.bestbuy.ca/en-ca/product/hp-hp-officejet-pro-6968-all-in-one-inkjet-printer-with-fax-6968/10441056/review
- import requests
- import time
- import CSV
- headers = {
- "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) ApplewebKit/604.1.38 (Khtml, like Gecko)"
- "Version/11.0 Mobile/15A372 Safari/604.1",
- "Referer": "https://www.bestbuy.ca/en-ca/product/hp-hp-officejet-pro-6968-all-in-one-inkjet-printer-with-fax-"
- "6968/10441056/review"
- }
- def get_content(url):
- """爬取数据"""
- res = requests.get(url=url, headers=headers)
- # print(res.status_code)
- return res.JSON()
- def parse_res(res):
- """解析数据"""
- csv_data = {}
- # print(res, type(res))
- data = res["reviews"]
- for i in data:
- csv_data["title"] = i["title"]
- csv_data["comment"] = i["comment"]
- csv_data["publish"] = i["reviewerName"]
- csv_data["publish_time"] = i["submissionTime"]
- print(csv_data)
- save_data(csv_data)
- def save_data(csv_data):
- """存储数据"""
- with open('data.csv', 'a+', newline='') as csvfile:
- # 以读的方式打开 CSV, 判断表格是否有数据
- with open('data.csv', 'r', newline='') as f:
- reader = CSV.reader(f)
- fieldnames = ['title', 'comment', 'publish', 'publish_time']
- writer = CSV.DictWriter(csvfile, fieldnames=fieldnames) # DictWriter: 字典
- if not [row for row in reader]:
- writer.writeheader()
- writer.writerow(csv_data)
- else:
- writer.writerow(csv_data)
- if __name__ == '__main__':
- for i in range(1, 11):
- url = 'https://www.bestbuy.ca/api/v2/json/reviews/10441056?source=all&lang=en-CA&pageSize=10&page=%s' '&sortBy=date&sortDir=desc' % i
- res = get_content(url)
- time.sleep(2)
- parse_res(res)
参考文章:
2. JSON 文件存储
2.1 读取 JSON
- import JSON
- s = '''
- [{
- "name": "rose",
- "gender": "female",
- "age": "18"
- }]
- '''
- data = JSON.loads(s)
- print(data)
- print(type(data))
运行结果如下:
- [{
- 'name': 'rose', 'gender': 'female', 'age': '18'
- }]
- <class 'list'> # 因为最外层是列表
读取 JSON 文件
- with open('data.json', 'r') as f:
- s = f.read()
- data = JSON.loads(s)
- print(data)
2.2 输出 JSON
- import JSON
- data = [{
- "name": "rose",
- "gender": "female",
- "age": "18"
- }]
- with open('data.json', 'a') as f:
- f.write(JSON.dumps(data))
缩进 2 个字符, 这样结构更清晰:
- with open('data.json', 'a') as f:
- f.write(JSON.dumps(data, indent=2))
运行结果如下:
- [
- {
- "name": "rose",
- "gender": "female",
- "age": "18"
- }
- ]
如果输出的包含中文, 须臾指定参数 ensure_ascii=False, 否则默认转换为 Unicode 字符:
- with open('data.json', 'a') as f:
- f.write(JSON.dumps(data, indent=2, ensure_ascii=False))
来源: http://www.bubuko.com/infodetail-3162506.html