正文之前
因为数据差距实在太大, 从 10-10000 都有, 要是全搞决策树我估计我是啥都不用搞了, 看着电脑卡死就 ok! 所以特地将连续的数据转化为连续的数据! 看看是不是会生成新的, 更好地决策树!
正文
废话不多说! 直接丢代码! 不然真是难受的一批! 写了好一会儿才搞定的!
- #include<iostream>
- #include<string>
- #include<fstream>
- using namespace std;
- int main()
- {
- int count=0;
- float attr[34];
- ifstream in("/Users/zhangzhaobo/Documents/Graduation-Design/Mydata.txt");
- ofstream out("/Users/zhangzhaobo/Documents/Graduation-Design/Data/New_Data.txt");
- string line[34];
- for (int i = 0; i <34; ++i)
- {
- in>>line[i];
- }
- out<<"Diff_X"<<"\t"<<"Diff_Y"<<"\t";
- for (int i = 4; i <8; ++i)
- {
- out<<line[i]<<"\t";
- }
- out<<"Diff_Luminosity\t";
- out<<line[10]<<"\t";
- out<<"TypeouOfSteel\t";
- for (int i = 13; i < 27; ++i)
- {
- out<<line[i]<<"\t";
- }
- out<<"Fault";
- out<<endl;
- float maxX,MaxY,MaxL;
- while(count<1941)
- {
- for (int i = 0; i < 34; ++i)
- {
- in>>attr[i];
- }
- float X_dis=attr[1]-attr[0];
- float Y_dis=attr[3]-attr[2];
- float Luminosity_dis=attr[9]-attr[8];
- float TypeOfSteel=attr[11];
- out<<X_dis<<"\t"<<Y_dis<<"\t";
- for (int i = 4; i <8; ++i)
- {
- out<<attr[i]<<"\t";
- }
- out<<Luminosity_dis<<"\t";
- out<<attr[10]<<"\t";
- out<<TypeOfSteel<<"\t";
- for (int i = 13; i < 27; ++i)
- {
- out<<attr[i]<<"\t";
- }
- int Fault=0;
- for (int i = 0; i < 7; ++i)
- {
- Fault=(Fault+attr[i+27])*2;
- }
- out<<Fault<<endl;
- count++;
- }
- in.close();
- return 0;
- }
正文
改善之后的属性为:
- Diff_X Diff_Y Pixels_Areas X_Perimeter Y_Perimeter Sum_of_Luminosity Diff_Luminosity Length_of_Conveyer TypeouOfSteel Steel_Plate_Thickness Edges_Index Empty_Index Square_Index Outside_X_Index Edges_X_Index Edges_Y_Index Outside_Global_Index LogOfAreas Log_X_Index Log_Y_Index Orientation_Index Luminosity_Index SigmoidOfAreas Fault
- 8 44 267 17 44 24220 32 1687 1 80 0.0498 0.2415 0.1818 0.0047 0.4706 1 1 2.4265 0.9031 1.6435 0.8182 -0.2913 0.5822 128
- 6 29 108 10 30 11397 39 1687 1 80 0.7647 0.3793 0.2069 0.0036 0.6 0.9667 1 2.0334 0.7782 1.4624 0.7931 -0.1756 0.2984 128
为此还特地写了个 C++ 的程序来观察!
- #include<iostream>
- #include<string>
- #include<fstream>
- using namespace std;
- int main()
- {
- string line[72];
- int count=0;
- for (int i = 0; i <72; ++i)
- {
- cin>>line[i];
- }
- for (int i = 0; i <24; ++i)
- {
- cout<<"[->"<<i<<":"<<line[i]<<"-->"<<line[i+24]<<"-->"<<line[i+48]<<endl;
- /* code */
- }
- }
最后整出来还蛮好看!
- [->0: Diff_X --> 8 --> 6
- [->1: Diff_Y --> 44 --> 29
- [->2: Pixels_Areas --> 267 --> 108
- [->3: X_Perimeter --> 17 --> 10
- [->4: Y_Perimeter --> 44 --> 30
- [->5: Sum_of_Luminosity --> 24220 --> 11397
- [->6: Diff_Luminosity --> 32 --> 39
- [->7: Length_of_Conveyer --> 1687 --> 1687
- [->8: TypeouOfSteel --> 1 --> 1
- [->9: Steel_Plate_Thickness --> 80 --> 80
- [->10: Edges_Index --> 0.0498 --> 0.7647
- [->11: Empty_Index --> 0.2415 --> 0.3793
- [->12: Square_Index --> 0.1818 --> 0.2069
- [->13: Outside_X_Index --> 0.0047 --> 0.0036
- [->14: Edges_X_Index --> 0.4706 --> 0.6
- [->15: Edges_Y_Index --> 1 --> 0.9667
- [->16: Outside_Global_Index --> 1 --> 1
- [->17: LogOfAreas --> 2.4265 --> 2.0334
- [->18: Log_X_Index --> 0.9031 --> 0.7782
- [->19: Log_Y_Index --> 1.6435 --> 1.4624
- [->20: Orientation_Index --> 0.8182 --> 0.7931
- [->21: Luminosity_Index --> -0.2913 --> -0.1756
- [->22: SigmoidOfAreas --> 0.5822 --> 0.2984
- [->23: Fault --> 128 --> 128
来源: http://www.jianshu.com/p/bbac3491c2a8