在工作中,经常需要解析不同类型的文件,常用的可能就是正则表达式了,简单点的,可能会使用 awk。这里要推荐一种比较小众的方式,使用 pyparsing 来解析文件。
pyparsing 可以做些什么呢?主要可以相当方便地定制自己的 tokenizer,因此可以很容易拓展,实现自己的 parser
下面看一个 traceview 的解析例子
- 16803 AsyncTask #3
- 16804 pool-2-thread-5
- 16806 pool-3-thread-1
- 16807 uil-pool-2-thread-1
- 16808 uil-pool-2-thread-2
- 16809 uil-pool-2-thread-3
- 16810 uil-pool-2-thread-4
- Trace (threadID action usecs class.method signature):
- 16736 xit 0 ..dalvik.system.VMDebug.startMethodTracingFilename (Ljava/lang/String;IIZI)V VMDebug.java
- 16804 xit 0 ..com.android.org.conscrypt.NativeCrypto.EVP_DigestUpdate (Lcom/android/org/conscrypt/OpenSSLDigestContext;[BII)V NativeCrypto.java
- 16736 xit 218 .dalvik.system.VMDebug.startMethodTracing (Ljava/lang/String;IIZI)V VMDebug.java
- 16736 xit 225 android.os.Debug.startMethodTracing (Ljava/lang/String;II)V Debug.java
- 16736 xit 230-android.os.Debug.startMethodTracing (Ljava/lang/String;I)V Debug.java
- 16736 xit 266-java.lang.reflect.Method.invoke (Ljava/lang/Object;[Ljava/lang/Object;Z)Ljava/lang/Object; Method.java
- 16804 ent 528 ..java.lang.ClassLoader.loadClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
- 16804 ent 543 ...java.lang.ClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
- 16804 ent 548 ....java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
- 16804 ent 567 .....java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
- 16804 xit 576 .....java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
- 16804 xit 681 ....java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
- 16804 ent 689 ....com.uc.base.aerie.hack.ClassLoaderSupport$a.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ProGuard
- 16804 ent 704 .....java.lang.ClassLoader.getParent ()Ljava/lang/ClassLoader; ClassLoader.java
- 8
- 16804 ent 726 ......java.lang.BootClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
- 16804 ent 730 .......java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
- 16804 ent 734 ........java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
- 16804 xit 740 ........java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
- 16804 xit 754 .......java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
- 16804 xit 759 ......java.lang.BootClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
- 16804 xit 763 .....java.lang.ClassLoader.loadClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
这是一部分转换后的原始日志,格式比较标准,因此可以这么定制
- import os
- frompyparsingimport Word, nums, Combine, alphas, Literal, ZeroOrMore, Group, \
- Suppress
- semiFlag = Literal(";")
- dotFlag = Suppress(Literal("."))
- multiDot = ZeroOrMore(dotFlag)
- threadID =Word(nums, max=5)
- actionField = Word(alphas)
- usecsField = Word(nums, max=8)
- clsField = Word(alphas+".")
- methodField = Combine("("+ ZeroOrMore(Word(alphas +";/")) +")"+ Word(alphas +"/") + semiFlag)
- regex = threadID + actionField + usecsField + multiDot + Group(clsField + methodField) + clsField
- with open(os.path.join(os.getcwd(), "StepBeforeFirstDraw_o.txt"),"rb") as f:
- lineno = 0
- flag = 0
- while1:
- line = f.readline()
- lineno += 1if "threadID action usecs" in line:
- flag = lineno
- continue
- ifflag > 0:
- try:
- regex.parseString(line).toXML("")
- except Exception as e:
- pass
解析结果为:
- /usr/bin/python2.7 /home/alex/workspace/virtual_space/project/calclex.py
- ['16804', 'ent', '528', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '543', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '548', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '567', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '576', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '681', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '704', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
- ['16804', 'ent', '726', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '730', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '734', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '740', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '754', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'xit', '759', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'xit', '763', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'xit', '771', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'xit', '774', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '809', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '814', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '818', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '822', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '827', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '842', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '853', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
- ['16804', 'xit', '857', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
- ['16804', 'ent', '861', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '865', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
- ['16804', 'ent', '869', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
这样已经很方便去做二次处理了,而且解析规则的可读性也会比正则的强。
来源: http://www.cnblogs.com/alexkn/p/7129168.html