将 python 中爬取的数据保存到数据库中

将爬取的数据保存到数据库中, 保存的方法有很多种, 可以采用比较方便的 python 中内置的 sqlite3 模块

# 必要方法和数据库的引入
 import urllib.request
 import re
 import sqlite3
 #爬取数据的函数
 def get_content(page, key):
     url = 'https://search.51job.com/list/010000%2C020000%2C030200%2C040000,000000,0000,00,9,99,' + key + ',2,' + str(page) + '.html'
     a = urllib.request.urlopen(url)
     HTML = a.read().decode('gbk')
     lst = re.findall(r'<span class="t3">(北京 | 上海 | 广州 | 深圳).*?</span>\s+<span class="t4">(\d+\.?\d?)-(\d+\.?\d?)(万 | 千)/(年 | 月)</span>', HTML)  #对数据的一些筛选
     return lst
 #使用 sqlite3 连接数据库, 创建 jobs 表
 conn = sqlite3.connect('51.db')
 c = conn.cursor()
 c.execute('''CREATE TABLE IF NOT EXISTS jobs
         (key text, addr text, min float, max float)''')
 c.execute('''delete from jobs''')
 conn.commit()  #提交事务
 #将数据写入 51.txt 文件和数据库中
 with open('51.txt', 'w') as f:
     f.write('%s\t%s\t%s\t%s\n' % ('key','addr','min','max'))
     for key in ('python', 'java'):
         for each in range(1, 11):
             for items in get_content(each, key):
                 min = float(items[1])
                 max = float(items[2])
                 if items[3] == "千":    #统一单位, 方便比较
                     min /= 10
                     max /= 10
                 if items[4] == "年":
                     min /= 12
                     max /= 12
                 f.write('%s\t%s\t%s\t%s\n' % (key, items[0], round(min, 2), round(max, 2)))
                 c.execute("INSERT INTO jobs VALUES (?,?,?,?)", (key, items[0], round(min, 2), round(max, 2)))
 conn.commit()
 conn.close()
 #相当于一个入口, 去执行 get_content 函数
 if __name__ == '__main__':
     lst = get_content(1, 'python')
     print(lst)

sqlite3 和 pymysql 模块之间有很多不同的地方. 首先 sqlite3 是一个嵌入式模块, 用户在使用时不需要专门去下载, 可以直接导入使用, 而 pymysql 需要用户在 pip 文件中单独下载, 而且可能会出现很多问题. 另外就是一个细节问题 ,sqlite3 的占位符是?, 而 pymysql 的占位符是 %s.

来源: http://www.bubuko.com/infodetail-3295381.html

与本文相关文章

暂无,快来抢沙发吧！