当前位置：

首页
/
IT
/
程序
/
Python
/
Python 学习笔记三 -- 文件操作及处理 json

Python 学习笔记三 -- 文件操作及处理 json

1 f = open('file.txt','r') #以只读方式打开一个文件, 获取文件句柄, 如果是读的话, r 可以不写, 默认就是只读,

2 在 python2 中还有 file 方法可以打开文件, python3 中 file 方法已经没有了, 只有 open

res = f.read()# 获取所有文件内容
             print(res) #打印文件的所有内容
f.close()# 关闭文件
f = open('file.txt','r')
3         frist_line = f.readline()# 获取文件的第一行内容, 返回的是一个 list
4         print(frist_line)# 打印第一行
        f.close()# 关闭文件

打开文件时, 需要指定文件路径和以何等方式打开文件, 打开后, 即可获取该文件句柄, 后面通过此文件句柄对该文件操作,

打开文件的模式有:

r, 只读模式 (默认). 打开文件不存的话, 会报错

w, 只写模式.[不可读; 不存在则创建; 存在则删除内容;]

a, 追加模式.[不可读; 不存在则创建; 存在则只追加内容;]

"+" 表示可以同时读写某个文件

r+ 读写模式 [可读, 可写; 可追加, 如果打开的文件不存在的话, 会报错]

w+ 写读模式 [写读模式, 使用 w + 的话, 已经存在的文件内容会被清空, 可以读到已经写的文件内容]

a+ 追加读模式 [追加读写模式, 不存在则创建; 存在则只追加内容;]

"U" 表示在读取时, 可以将 \r \n \r\n 自动转换成 \n (与 r 或 r+ 模式同使用)

r+U

"b" 表示处理二进制文件 (如: FTP 发送上传 ISO 镜像文件, linux 可忽略, windows 处理二进制文件时需标注)

rb
 wb
 ab

文件操作方法:

f = open('file.txt','r+',encoding='utf-8')#encoding 参数可以指定文件的编码
             f.readline()# 读一行
             f.readable()# 判断文件是否可读
             fr.writable()# 判断文件是否可写
             fr.encoding# 打印文件的编码
             f.read()# 读取所有内容, 大文件时不要用, 因为会把文件内容都读到内存中, 内存不够的话, 会把内存撑爆
             f.readlines()# 读取所有文件内容, 返回一个 list, 元素是每行的数据, 大文件时不要用, 因为会把文件内容都读到内存中, 内存不够的话, 会把内存撑爆
             f.tell()# 获取当前文件的指针指向
             f.seek(0)# 把当前文件指针指向哪
             f.write('爱情证书')# 写入内容
             f.fulsh()# 写入文件后, 立即从内存中把数据写到磁盘中
             f.truncate()# 清空文件内容
             f.writelines(['爱情证书','孙燕姿'])# 将一个列表写入文件中

f.close() 关闭文件

读取小文件时, 可以

f = open('users.txt',encoding='utf-8')
 #文件对象, 文件句柄
 # while True:
 #     line = f.readline()
 #     if line!='':
 #         print('line:',line)
 #     else:
 #         print('文件内容都读完了, 结束了')
 #         break

用上面的 read() 和 readlines() 方法操作文件的话, 会先把文件所有内容读到内存中, 这样的话, 内存数据一多, 非常卡, 高效的操作, 就是读一行操作一行, 读过的内容就从内存中释放了

大文件时, 读取文件高效的操作方法:

1
f = open('users.txt',encoding='utf-8')
for line in f:
2      print(line)

这样的话, line 就是每行文件的内容, 读完一行的话, 就会释放一行的内存

with 使用:

在操作文件的时候, 经常忘了关闭文件, 这样的就可以使用 with, 它会在使用完这个文件句柄之后, 自动关闭该文件, 使用方式如下:

with open('file.txt','r') as f:# 打开一个文件, 把这个文件的句柄付给 f
           for line in f:
               print(line)
    with open('file.txt') as fr,with open('file_bak','w') as fw: #这个是多文件的操作, 打开两个文件, fr 是读 file.txt,fw 是新建一个 file_bak 文件
            for line in fr:# 循环 file.txt 中的每一行
                fw.write(line)# 写到 file_bak 文件中

修改文件:

修改文件的话, 有两种方式,

一种是把文件的全部内容都读到内存中, 然后把原有的文件内容清空, 重新写新的内容;

第二种是把修改后的文件内容写到一个新的文件中

下面是一个 file.txt

寂寞当然有一点

你不在我身边

总是特别想念你的脸

距离是一份考卷

第一种方法: a:

#1, 简单, 粗暴直接的
 f = open('file.txt',encoding='utf-8')
 res = f.read().replace('一点','二点')
 f.close()
 f = open('file.txt',mode='w',encoding='utf-8')
 f.write(res)
 f.flush()  # 立即把缓冲区里面的内容, 写到磁盘上
 f.close()

替换后的 lile.txt :

寂寞当然有二点

你不在我身边

总是特别想念你的脸

距离是一份考卷

或者 : b:

with open('file.txt', 'r+',encoding='utf-8') as fr:
     res1 = fr.read()
     fr.seek(0)
     new_res = res1.replace('你', 'you')
     fr.write(new_res)

或者 :

f = open('file.txt','a+',encoding='utf-8')
f.seek(0)
res = f.read().replace('你','you')
f.seek(0)
f.truncate() #清空文件里面的内容
f.write(res)
f.close()

修改后的 file.txt:

寂寞当然有二点

you 不在我身边

总是特别想念 you 的脸

距离是一份考卷

第二种方法:

(二)a:
import os
f = open('file.txt',encoding='utf-8')
f2 = open('file.txt.bak','w',encoding='utf-8')
for line in f:
    new_line = line.replace('一点','二点')
    f2.write(new_line)
f.close()
f2.close()
os.remove('file.txt')
os.rename('file.txt.bak','file.txt')
 (二)b:
import os
with open('file.txt',encoding='utf-8') as f, open('file.txt.bak','w',encoding='utf-8') as f2:  #这个是多文件的操作, 打开两个文件, f 是读 file.txt,f2 是新建一个 file_bak 文件
    for line in f:  #循环 file.txt 中的每一行
        new_line = line.replace('一点','二点')
        f2.write(new_line)  #写到 file_bak 文件中
os.remove('file.txt')
os.rename('file.txt.bak','file.txt')

替换后 file.txt:

寂寞当然有二点

你不在我身边

总是特别想念你的脸

距离是一份考卷

拓展练习 : 监控日志

日志文件 :

access.log
178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /wp-includes/logo_img.php HTTP/1.0" 302 161 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) ApplewebKit/533.4 (Khtml, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /blog HTTP/1.0" 301 233 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:15 +0800] "GET /blog/ HTTP/1.0" 200 38278 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
66.249.75.29 - - [04/Jun/2017:03:45:55 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=574&filter=hot HTTP/1.1" 200 17482 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
37.9.169.20 - - [04/Jun/2017:03:47:59 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:01 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:02 +0800] "GET /blog/ HTTP/1.1" 200 38330 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:23 +0800] "GET /blog/ HTTP/1.1" 200 38330 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
42.236.49.31 - - [04/Jun/2017:03:49:04 +0800] "GET /questions HTTP/1.1" 200 41977 "http://bbs.besttest.cn/questions" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider" "-"
66.249.75.28 - - [04/Jun/2017:03:49:42 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=473&filter=digest&digest=1 HTTP/1.1" 200 17242 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
123.125.71.60 - - [04/Jun/2017:03:52:50 +0800] "GET /robots.txt HTTP/1.1" 302 161 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.117 - - [04/Jun/2017:03:52:50 +0800] "GET /blog HTTP/1.1" 301 233 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.80 - - [04/Jun/2017:03:52:51 +0800] "GET /blog/ HTTP/1.1" 200 38330 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
66.249.75.28 - - [04/Jun/2017:03:53:29 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=516&filter=heat&orderby=heats HTTP/1.1" 200 17019 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
40.77.167.135 - - [04/Jun/2017:03:55:07 +0800] "GET /static/CSS/bootstrap/fonts/glyphicon
#1, 要从日志里面找到 1 分钟之内访问超过 200 次的
 #2, 每分钟都运行一次
 # 1, 读取文件内容, 获取到 ip 地址
 # 2, 把每个 ip 地址存起来 {}
 # 3, 判断 ip 访问的次数是否超过 200 次
 # 4, 加入黑名单 print
 #['118.24.4.30','118.24.4.30','118.24.4.30','118.1xx.x.xx','118.1xx.x.xx']
 # {
 #     '118.23.3.40':2,
 #     '118.23.3.41':5
 # }
 import time
 point = 0 #初始的位置
 while True:
     ips = {}  # 存放 ips 字典
     f = open('access.log',encoding='utf-8')
     f.seek(point)
     for line in f: #循环取文件里面每行数据
         ip = line.split()[0] #按照空格分割, 取第一个元素就 ip
         if ip in ips:# 判断这个 ip 是否存在
             # ips[ip] = ips[ip]+1
             ips[ip]+=1# 如果存在的话, 次数加 + 1
         else:
             ips[ip]=1 #如果不存在 ip 的次数就是 1
     point = f.tell() #记录文件指针位置, 下一个 60s 后从这个位置开始循环
     f.close()
     for ip,count in ips.items():# 循环这个字典, 判断次数大于 200 的
         if count>=200:
             print('%s 加入黑名单'%ip)
     time.sleep(60)

二, 处理 Json

# json 通用的数据类型, 所有的语言都认识
# k-v { }
#json 串是字符串

json 串格式 : 用三个单引号引住 json ps: json 的键值一定是用双引号

s='''

{

"error_code": 0,

"stu_info": [

{

"id": 309,

"name": "小白",

"sex": "男",

"age": 28,

"addr": "河南省济源市北海大道 32 号",

"grade": "天蝎座",

"phone": "18512572946",

"gold": 100

{

"id": 310,

"name": "小白",

"sex": "男",

"age": 28,

"addr": "河南省济源市北海大道 32 号",

"grade": "天蝎座",

"phone": "18516572946",

"gold": 100

}

]

}

'''

json 是一种所有语言中都通用的 key-value 数据结构的数据类型, 很像 python 中的字典, json 处理使用 json 模块, json 模块有下面常用的方法:

json.dumps()
json.dump()
json.loads()
json.load()
     import json
     dic = {"name":"niuniu","age":18}
     print(json.dumps(dic))# 把字典转成 json 串
# 输出 :
    {"age": 18, "name": "niuniu"}
     fj = open('a.json','w')   # a.json 不存在的
     print(json.dump(dic,fj))# 把字典转换成的 json 串写到一个文件里面
#  输出 :  在当前的目录下, 新增了一个 a.json 文件, 文件内容为 json :
{"age": 18, "name": "niuniu"}
     s_json = '{"name":"niuniu","age":20,"status":true}'
     print(json.loads(s_json))# 把 json 串转换成字典
# 输出 :
{'status': True, 'name': 'niuniu', 'age': 20}
     fr = open('a.json','r')      # a.json 内容为 : {"age": 18, "name": "niuniu"}
   print(json.load(fr))# 从文件中读取 json 数据, 然后转成字典
# 输出 :
   {'name': 'niuniu', 'age': 18}

来源: http://www.bubuko.com/infodetail-2751948.html

与本文相关文章

暂无,快来抢沙发吧！