Python用list或dict字段模式读取文件的方法

这篇文章主要给大家介绍了 Python 利用 list 字段模式或者 dict 字段模式读取文件的方法, 文中给出了详细的介绍和示例代码，相信对大家的理解和学习具有一定的参考借鉴价值，有需要的朋友可以跟着小编来一起学习学习吧。

Python 是一种面向对象、解释型计算机程序设计语言，由 Guido van Rossum 于 1989 年底发明，第一个公开发行版发行于 1991 年。Python 语法简洁而清晰，具有丰富和强大的类库。它常被昵称为胶水语言，它能够把用其他语言制作的各种模块（尤其是 C/C++）很轻松地联结在一起。

前言

Python 用于处理文本数据绝对是个利器，极为简单的读取、分割、过滤、转换支持，使得开发者不需要考虑繁杂的流文件处理过程（相对于 JAVA 来说的，嘻嘻）。博主自己工作中，一些复杂的文本数据处理计算，包括在 HADOOP 上编写 Streaming 程序，均是用 Python 完成。

而在文本处理的过程中，将文件加载内存中是第一步，这就涉及到怎样将文件中的某一列映射到具体的变量的过程，最最愚笨的方法，就是按照字段的下标进行引用，比如这样子：

 
# fields是读取了一行，并且按照分隔符分割之后的列表
user_id = fields[0]
user_name = fields[1]
user_type = fields[2]

如果按照这种方式读取，一旦文件有顺序、增减列的变动，代码的维护是个噩梦，这种代码一定要杜绝。

本文推荐两种优雅的方式来读取数据，都是先配置字段模式，然后按照模式读取，而模式则有字典模式和列表模式两种形式；

读取文件，按照分隔符分割成字段数据列表

首先读取文件，按照分隔符分割每一行的数据，返回字段列表，以便后续处理。

代码如下：

 
def read_file_data(filepath):
 '''根据路径按行读取文件, 参数filepath：文件的绝对路径
 @param filepath: 读取文件的路径
 @return: 按\t分割后的每行的数据列表
 '''
 fin = open(filepath, 'r')
 for line in fin:
  try:
   line = line[:-1]
   if not line: continue
  except:
   continue
  
  try:
   fields = line.split("\t")
  except:
   continue
  # 抛出当前行的分割列表
  yield fields
 fin.close()

使用 yield 关键字，每次抛出单个行的分割数据，这样在调度程序中可以用

for fields in read_file_data(fpath)

的方式读取每一行。

映射到模型之方法 1：使用配置好的字典模式，装配读取的数据列表

这种方法配置一个 {"字段名": 字段位置} 的字典作为数据模式，然后按照该模式装配读取的列表数据，最后实现用字典的方式访问数据。

所使用的函数：

 
@staticmethod
def map_fields_dict_schema(fields, dict_schema):
 """根据字段的模式，返回模式和数据值的对应值；例如 fields为['a','b','c'],schema为{'name':0, 'age':1}，那么就返回{'name':'a','age':'b'}
 @param fields: 包含有数据的数组，一般是通过对一个Line String通过按照\t分割得到
 @param dict_schema: 一个词典，key是字段名称，value是字段的位置；
 @return: 词典，key是字段名称，value是字段值
 """
 pdict = {}
 for fstr, findex in dict_schema.iteritems():
  pdict[fstr] = str(fields[int(findex)])
 return pdict

有了该方法和之前的方法，可以用以下的方式读取数据：

 
# coding:utf8
"""
@author: www.crazyant.net
测试使用字典模式加载数据列表
优点：对于多列文件，只通过配置需要读取的字段，就能读取对应列的数据
缺点：如果字段较多，每个字段的位置配置，较为麻烦
"""
import file_util
import pprint
 
# 配置好的要读取的字典模式，可以只配置自己关心的列的位置
dict_schema = {"userid":0, "username":1, "usertype":2}
for fields in file_util.FileUtil.read_file_data("userfile.txt"):
 # 将字段列表，按照字典模式进行映射
 dict_fields = file_util.FileUtil.map_fields_dict_schema(fields, dict_schema)
 pprint.pprint(dict_fields)

输出结果：

{
    'userid': '1',
    'username': 'name1',
    'usertype': '0'
} {
    'userid': '2',
    'username': 'name2',
    'usertype': '1'
} {
    'userid': '3',
    'username': 'name3',
    'usertype': '2'
} {
    'userid': '4',
    'username': 'name4',
    'usertype': '3'
} {
    'userid': '5',
    'username': 'name5',
    'usertype': '4'
} {
    'userid': '6',
    'username': 'name6',
    'usertype': '5'
} {
    'userid': '7',
    'username': 'name7',
    'usertype': '6'
} {
    'userid': '8',
    'username': 'name8',
    'usertype': '7'
} {
    'userid': '9',
    'username': 'name9',
    'usertype': '8'
} {
    'userid': '10',
    'username': 'name10',
    'usertype': '9'
} {
    'userid': '11',
    'username': 'name11',
    'usertype': '10'
} {
    'userid': '12',
    'username': 'name12',
    'usertype': '11'
}

映射到模型之方法 2：使用配置好的列表模式，装配读取的数据列表

如果需要读取文件所有列，或者前面的一些列，那么配置字典模式优点复杂，因为需要给每个字段配置索引位置，并且这些位置是从 0 开始完后数的，属于低级劳动，需要消灭。

列表模式应命运而生，先将配置好的列表模式转换成字典模式，然后按字典加载就可以实现。

转换模式，以及用按列表模式读取的代码：

 
@staticmethod
def transform_list_to_dict(para_list):
 """把['a', 'b']转换成{'a':0, 'b':1}的形式
 @param para_list: 列表，里面是每个列对应的字段名
 @return: 字典，里面是字段名和位置的映射
 """
 res_dict = {}
 idx = 0
 while idx < len(para_list):
  res_dict[str(para_list[idx]).strip()] = idx
  idx += 1
 return res_dict
 
@staticmethod
def map_fields_list_schema(fields, list_schema):
 """根据字段的模式，返回模式和数据值的对应值；例如 fields为['a','b','c'],schema为{'name', 'age'}，那么就返回{'name':'a','age':'b'}
 @param fields: 包含有数据的数组，一般是通过对一个Line String通过按照\t分割得到
 @param list_schema: 列名称的列表list
 @return: 词典，key是字段名称，value是字段值
 """
 dict_schema = FileUtil.transform_list_to_dict(list_schema)
 return FileUtil.map_fields_dict_schema(fields, dict_schema)

使用的时候，可以用列表的形式配置模式，不需要配置索引更加简洁：

 
# coding:utf8
"""
@author: www.crazyant.net
测试使用列表模式加载数据列表
优点：如果读取所有列，用列表模式只需要按顺序写出各个列的字段名就可以
缺点：不能够只读取关心的字段，需要全部读取
"""
import file_util
import pprint
 
# 配置好的要读取的列表模式，只能配置前面的列，或者所有咧
list_schema = ["userid", "username", "usertype"]
for fields in file_util.FileUtil.read_file_data("userfile.txt"):
 # 将字段列表，按照字典模式进行映射
 dict_fields = file_util.FileUtil.map_fields_list_schema(fields, list_schema)
 pprint.pprint(dict_fields)

运行结果和字典模式的完全一样。

file_util.py 全部代码

以下是 file_util.py 中的全部代码，可以放在自己的公用类库中使用

 
# -*- encoding:utf8 -*-
'''
@author: www.crazyant.net
@version: 2014-12-5
'''
 
class FileUtil(object):
 '''文件、路径常用操作方法
 '''
 @staticmethod
 def read_file_data(filepath):
  '''根据路径按行读取文件, 参数filepath：文件的绝对路径
  @param filepath: 读取文件的路径
  @return: 按\t分割后的每行的数据列表
  '''
  fin = open(filepath, 'r')
  for line in fin:
   try:
    line = line[:-1]
    if not line: continue
   except:
    continue
   
   try:
    fields = line.split("\t")
   except:
    continue
   # 抛出当前行的分割列表
   yield fields
  fin.close()
 
 @staticmethod
 def transform_list_to_dict(para_list):
  """把['a', 'b']转换成{'a':0, 'b':1}的形式
  @param para_list: 列表，里面是每个列对应的字段名
  @return: 字典，里面是字段名和位置的映射
  """
  res_dict = {}
  idx = 0
  while idx < len(para_list):
   res_dict[str(para_list[idx]).strip()] = idx
   idx += 1
  return res_dict
 
 @staticmethod
 def map_fields_list_schema(fields, list_schema):
  """根据字段的模式，返回模式和数据值的对应值；例如 fields为['a','b','c'],schema为{'name', 'age'}，那么就返回{'name':'a','age':'b'}
  @param fields: 包含有数据的数组，一般是通过对一个Line String通过按照\t分割得到
  @param list_schema: 列名称的列表list
  @return: 词典，key是字段名称，value是字段值
  """
  dict_schema = FileUtil.transform_list_to_dict(list_schema)
  return FileUtil.map_fields_dict_schema(fields, dict_schema)
 
@staticmethod
def map_fields_dict_schema(fields, dict_schema):
 """根据字段的模式，返回模式和数据值的对应值；例如 fields为['a','b','c'],schema为{'name':0, 'age':1}，那么就返回{'name':'a','age':'b'}
 @param fields: 包含有数据的数组，一般是通过对一个Line String通过按照\t分割得到
 @param dict_schema: 一个词典，key是字段名称，value是字段的位置；
 @return: 词典，key是字段名称，value是字段值
 """
 pdict = {}
 for fstr, findex in dict_schema.iteritems():
  pdict[fstr] = str(fields[int(findex)])
 return pdict

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家学习或者使用 python 能有一定的帮助，如果有疑问大家可以留言交流。

来源:

与本文相关文章

暂无,快来抢沙发吧！