当前位置：

首页
/
IT
/
程序
/
C/C++
/
从零写一个编译器 (八): 语义分析之构造符号表

从零写一个编译器 (八): 语义分析之构造符号表

项目的完整代码在 C2j-Compiler https://github.com/dejavudwh/C2j-Complier

前言

在之前完成了描述符号表的数据结构, 现在就可以正式构造符号表了. 符号表的创建自然是要根据语法分析过程中走的, 所以符号表的创建就在 LRStateTableParser 里的 takeActionForReduce 方法

不过在此之前, 当然还需要一个方便对这个符号表操作的类了

这一篇主要的两个文件是

TypeSystem.java

LRStateTableParser.java

操作符号表

操作符号表的方法都在 TypeSystem 类里.

TypeSystem 里主要有这几个方法:

类型说明符

逻辑都很简单

public TypeLink newType(String typeText) {
      Specifier sp;
      int type = Specifier.NONE;
      boolean isLong = false, isSigned = true;
      switch (typeText.charAt(0)) {
          case 'c':
              if (typeText.charAt(1) == 'h') {
                  type = Specifier.CHAR;
              }
              break;
          case 'd':
          case 'f':
              System.err.println("Floating point Numbers are not supported");
              System.exit(1);
              break;
          case 'i':
              type = Specifier.INT;
              break;
          case 'l':
              isLong = true;
              break;
          case 'u':
              isSigned = false;
              break;
          case 'v':
              if (typeText.charAt(2) == 'i') {
                  type = Specifier.VOID;
              }
              break;
          case 's':
              //ignore short signed
              break;
          default:
              break;
      }
      sp = new Specifier();
      sp.setType(type);
      sp.setLong(isLong);
      sp.setSign(isSigned);
      TypeLink link = new TypeLink(false, false, sp);
      return link;
  }

创建存储类型

其实这一部分有的到后面解释执行或者代码生成的时候, 现在这个编译器是不处理的

这一部分的逻辑也很简单

public TypeLink newClass(String classText) {
      Specifier sp = new Specifier();
      sp.setType(Specifier.NONE);
      setClassType(sp, classText.charAt(0));
      TypeLink link = new TypeLink(false, false, sp);
      return link;
  }
  private void setClassType(Specifier sp, char c) {
      switch(c) {
          case 0:
              sp.setStorageClass(Specifier.FIXED);
              sp.setStatic(false);
              sp.setExternal(false);
              break;
          case 't':
              sp.setStorageClass(Specifier.TYPEDEF);
              break;
          case 'r':
              sp.setStorageClass(Specifier.REGISTER);
              break;
          case 's':
              sp.setStatic(true);
              break;
          case 'e':
              sp.setExternal(true);
              break;
          default:
              System.err.println("Internal error, Invalid Class type");
              System.exit(1);
              break;
      }
  }

给符号添加修饰符

addSpecifierToDeclaration 是为当前整个 Symbol 链都加上修饰符, 比如遇见 int x,y,z; 这种情况

public Declarator addDeclarator(Symbol symbol, int declaratorType) {
      Declarator declarator = new Declarator(declaratorType);
      TypeLink link = new TypeLink(true, false, declarator);
      symbol.addDeclarator(link);
      return declarator;
  }
public void addSpecifierToDeclaration(TypeLink specifier, Symbol symbol) {
    while (symbol != null) {
        symbol.addSpecifier(specifier);
        symbol = symbol.getNextSymbol();
    }
}

剩下用到再提

构造符号表

构造符号表的过程在语法分析的过程里, 也就是进行 reduce 操作的时候. 很好理解问什么符号表的构建会在 reduce 操作时发生, 因为当发生 reduce 操作的时候就代表产生了一个变量名或者是产生了一个变量定义, 这时候是把它们加入符号表的最好时机

所以在语法分析的过程中加入一个方法来处理 reduce 时的操作, 此外还需要一个属性堆栈来辅助操作, 属性堆栈的作用就是用来保存之前的操作, 以方便后面使用

比如现在产生了一个修饰符, 但是语法分析过程还没有读入变量名, 就先把这个修饰符压入属性堆栈, 等读入变量名的时候就可以创建一个 Symbol, 再把修饰符弹出堆栈链接到 Symbol 上

takeActionForReduce

takeActionForReduce 方法的参数就是做 reduce 操作依据的产生式的编号

只看一下比较复杂的:

StructSpecifier_TO_TypeSpecifier:

这是结构定义生成一个结构体类型的 declartor, 对应的推导式是

* TYPE_SPECIFIER -> STRUCT_SPECIFIER
* STTUCT_SPECIFIER -> STRUCT OPT_TAG LC DEF_LIST RC
*                     | STRUCT TAG

先生成一个 Specifier 再设置它的 vStruct 属性, 也就是声明这是一个结构体, 之后拿到结构体定义, 从这个推导式中我们可以看出, 结构体定义肯定在它的上一步, 也就是被放入了属性堆栈的顶端

SPECIFIERS_TypeOrClass_TO_SPECIFIERS

这里对应的推导式是

* SPECIFIERS -> SPECIFIERS TYPE_OR_CLASS

目的其实就是合并多个 specifier

DEFLIST
case SyntaxProductionInit.ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList:
case SyntaxProductionInit.VarList_COMMA_ParamDeclaration_TO_VarList:
case SyntaxProductionInit.DeclList_Comma_Decl_TO_DeclList:
case SyntaxProductionInit.DefList_Def_TO_DefList:

其实这部分是连接一系列变量的定义, 比如 ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList 就是对应像 x,y 这样用逗号分割的连续定义, 把它们的符号连接起来

DECLARTOR
case SyntaxProductionInit.OptSpecifier_ExtDeclList_Semi_TO_ExtDef:
case SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration:
case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:

这里其实是完成一个完成的定义, 像 int x,y; 后把拿到 Specifier 放入到每个符号去, 比较特殊的是拿到 Specifier 的位置, 这是根据 reduce 次数计算的.

STRUCT 和 FUNCTION

接下来比较需要注意的就是处理 struct 和 function 的方法

在处理连续的变量声明的时候, 如果遇见的类型是结构体的话, 就进行这样一个处理, 如果当前是个结构体声明, 我们就直接把这个结构体里的符号, 也就是 structDefine 的 fields 放入 argList(这个原本应该是放函数参数的)

private void handleStructVariable(Symbol symbol) {
      if (symbol == null) {
          return;
      }
      boolean isStruct = false;
      TypeLink typeLink = symbol.typeLinkBegin;
      Specifier specifier = null;
      while (typeLink != null) {
          if (!typeLink.isDeclarator) {
              specifier = (Specifier) typeLink.getTypeObject();
              if (specifier.getType() == Specifier.STRUCTURE) {
                  isStruct = true;
                  break;
              }
          }
          typeLink = typeLink.toNext();
      }
      if (isStruct) {
          StructDefine structDefine = specifier.getStruct();
          Symbol copy = null, headCopy = null, original = structDefine.getFields();
          while (original != null) {
              if (copy != null) {
                  Symbol sym = original.copy();
                  copy.setNextSymbol(sym);
                  copy = sym;
              } else {
                  copy = original.copy();
                  headCopy = copy;
              }
              original = original.getNextSymbol();
          }
          symbol.setArgList(headCopy);
      }
  }

这个方法其实就是根据有没有参数来判断当前函数的名字在堆栈的哪个位置

private void setFunctionSymbol(boolean hasArgs) {
    Symbol funcSymbol;
    if (hasArgs) {
        funcSymbol = (Symbol) valueStack.get(valueStack.size() - 4);
    } else {
        funcSymbol = (Symbol) valueStack.get(valueStack.size() - 3);
    }
    typeSystem.addDeclarator(funcSymbol, Declarator.FUNCTION);
    attributeForParentNode = funcSymbol;
}

takeActionForReduce 源码

private void takeActionForReduce(int productionNum) {
      switch (productionNum) {
          case SyntaxProductionInit.TYPE_TO_TYPE_SPECIFIER:
              attributeForParentNode = typeSystem.newType(text);
              break;
          case SyntaxProductionInit.EnumSpecifier_TO_TypeSpecifier:
              attributeForParentNode = typeSystem.newType("int");
              break;
          case SyntaxProductionInit.StructSpecifier_TO_TypeSpecifier:
              attributeForParentNode = typeSystem.newType(text);
              TypeLink link = (TypeLink) attributeForParentNode;
              Specifier sp = (Specifier) link.getTypeObject();
              sp.setType(Specifier.STRUCTURE);
              StructDefine struct = (StructDefine) valueStack.get(valueStack.size() - 1);
              sp.setStruct(struct);
              break;
          case SyntaxProductionInit.SPECIFIERS_TypeOrClass_TO_SPECIFIERS:
              attributeForParentNode = valueStack.peek();
              Specifier last = (Specifier) ((TypeLink) valueStack.get(valueStack.size() - 2)).getTypeObject();
              Specifier dst = (Specifier) ((TypeLink) attributeForParentNode).getTypeObject();
              typeSystem.specifierCopy(dst, last);
              break;
          case SyntaxProductionInit.NAME_TO_NewName:
          case SyntaxProductionInit.Name_TO_NameNT:
              attributeForParentNode = typeSystem.newSymbol(text, nestingLevel);
              break;
          case SyntaxProductionInit.START_VarDecl_TO_VarDecl:
          case SyntaxProductionInit.Start_VarDecl_TO_VarDecl:
              typeSystem.addDeclarator((Symbol) attributeForParentNode, Declarator.POINTER);
              break;
          case SyntaxProductionInit.VarDecl_LB_ConstExpr_RB_TO_VarDecl:
              Declarator declarator = typeSystem.addDeclarator((Symbol) valueStack.get(valueStack.size() - 4), Declarator.ARRAY);
              int arrayNum = (Integer) attributeForParentNode;
              declarator.setElementNum(arrayNum);
              attributeForParentNode = valueStack.get(valueStack.size() - 4);
              break;
          case SyntaxProductionInit.Name_TO_Unary:
              attributeForParentNode = typeSystem.getSymbolByText(text, nestingLevel, symbolScope);
              break;
          case SyntaxProductionInit.ExtDeclList_COMMA_ExtDecl_TO_ExtDeclList:
          case SyntaxProductionInit.VarList_COMMA_ParamDeclaration_TO_VarList:
          case SyntaxProductionInit.DeclList_Comma_Decl_TO_DeclList:
          case SyntaxProductionInit.DefList_Def_TO_DefList: {
              Symbol currentSym = (Symbol) attributeForParentNode;
              Symbol lastSym = null;
              if (productionNum == SyntaxProductionInit.DefList_Def_TO_DefList) {
                  lastSym = (Symbol) valueStack.get(valueStack.size() - 2);
              } else {
                  lastSym = (Symbol) valueStack.get(valueStack.size() - 3);
              }
              currentSym.setNextSymbol(lastSym);
          }
          break;
          case SyntaxProductionInit.OptSpecifier_ExtDeclList_Semi_TO_ExtDef:
          case SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration:
          case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:
              Symbol symbol = (Symbol) attributeForParentNode;
              TypeLink specifier;
              if (productionNum == SyntaxProductionInit.TypeNT_VarDecl_TO_ParamDeclaration) {
                  specifier = (TypeLink) (valueStack.get(valueStack.size() - 2));
              } else {
                  specifier = (TypeLink) (valueStack.get(valueStack.size() - 3));
              }
              typeSystem.addSpecifierToDeclaration(specifier, symbol);
              typeSystem.addSymbolsToTable(symbol, symbolScope);
              handleStructVariable(symbol);
              break;
          case SyntaxProductionInit.VarDecl_Equal_Initializer_TO_Decl:
              //Here you need to set the Symbol object for the response, otherwise there will be an error above Symbol symbol = (Symbol)attributeForParentNode;
              attributeForParentNode = (Symbol) valueStack.get(valueStack.size() - 2);
              break;
          case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:
              setFunctionSymbol(true);
              Symbol argList = (Symbol) valueStack.get(valueStack.size() - 2);
              ((Symbol) attributeForParentNode).args = argList;
              typeSystem.addSymbolsToTable((Symbol) attributeForParentNode, symbolScope);
              symbolScope = ((Symbol) attributeForParentNode).getName();
              Symbol sym = argList;
              while (sym != null) {
                  sym.addScope(symbolScope);
                  sym = sym.getNextSymbol();
              }
              break;
          case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:
              setFunctionSymbol(false);
              typeSystem.addSymbolsToTable((Symbol) attributeForParentNode, symbolScope);
              symbolScope = ((Symbol) attributeForParentNode).getName();
              break;
          case SyntaxProductionInit.OptSpecifiers_FunctDecl_CompoundStmt_TO_ExtDef:
              symbol = (Symbol) valueStack.get(valueStack.size() - 2);
              specifier = (TypeLink) (valueStack.get(valueStack.size() - 3));
              typeSystem.addSpecifierToDeclaration(specifier, symbol);
              symbolScope = GLOBAL_SCOPE;
              break;
          case SyntaxProductionInit.Name_To_Tag:
              symbolScope = text;
              attributeForParentNode = typeSystem.getStructFromTable(text);
              if (attributeForParentNode == null) {
                  attributeForParentNode = new StructDefine(text, nestingLevel, null);
                  typeSystem.addStructToTable((StructDefine) attributeForParentNode);
              }
              break;
          case SyntaxProductionInit.Struct_OptTag_LC_DefList_RC_TO_StructSpecifier:
              Symbol defList = (Symbol) valueStack.get(valueStack.size() - 2);
              StructDefine structObj = (StructDefine) valueStack.get(valueStack.size() - 4);
              structObj.setFields(defList);
              attributeForParentNode = structObj;
              symbolScope = GLOBAL_SCOPE;
              break;
          case SyntaxProductionInit.Enum_TO_EnumNT:
              enumVal = 0;
              break;
          case SyntaxProductionInit.NameNT_TO_Emurator:
              doEnum();
              break;
          case SyntaxProductionInit.Name_Eequal_ConstExpr_TO_Enuerator:
              enumVal = (Integer) (valueStack.get(valueStack.size() - 1));
              attributeForParentNode = (Symbol) (valueStack.get(valueStack.size() - 3));
              doEnum();
              break;
          case SyntaxProductionInit.Number_TO_ConstExpr:
          case SyntaxProductionInit.Number_TO_Unary:
              attributeForParentNode = Integer.valueOf(text);
              break;
          default:
              break;
      }
      astBuilder.buildSyntaxTree(productionNum, text);
  }

作用域

在上面说的构造符号表的过程还有一个点没有说, 就是每个符号的作用域问题.

在 LRStateTableParser 有两个属性

nestingLevel

用来表明当前符号的嵌套层次

symbolScope

用来表示当前符号的作用域, 如果是全局变量就设置为 GLOBAL_SCOPE, 为函数内部的即设置为对应的函数名

nestingLevel 在 shift 操作遇见左右括号时, 就会进行相应的加减

symbolScope 则是在 reduce 过程, 如果遇见一个函数定义或者完成一个完整的函数定义就会进行相应的设置

小结

这一篇就是利用 reduce 操作的时候, 我们可以知道是根据哪个产生式做的 reduce, 就可以在其中插入符号表的构建. 过程看着可能比较复杂, 但其实就是:

根据 reduce 的产生式创建信息 -> 根据 reduce 的产生式来组合信息

另外的 GitHub 博客: https://dejavudwh.cn/

来源: https://www.cnblogs.com/secoding/p/11375710.html

与本文相关文章

暂无,快来抢沙发吧！