gcc 编译参数 -fno-strict-aliasing

一, 问题引入

最近在项目中遇到一个问题, 当使用 double 类型数据时, 在进行 jce 编解码后会出现乱数据问题, 比如 encode 一个数据.

Encode:
{
		"index": 10,
		"score": 10.12,
		......
}

再 decode 出来, 会发现与原来 encode 进去的数据不一样, 看起来像是未定义的一个值

Decode:
{
		"index": 10,
		"score": -1.53533e+267,
		......
}

二, 问题定位

项目之前也有相同的应用场景, 但是没有出现问题, 所以首先怀疑 jce 版本是否有升级过, 但发现 jce 版本没有被改动过, 可以排除是 jce 的问题(实际上也是 jce 的问题, 后面解释). 想到最近项目在编译时加了 - O2 的优化选项, 故验证之, 果然是 - O2 搞的鬼. 但是为什么加了 - O2 的优化选项会触发这个 bug, 为了解决这个问题, 需要弄清楚两点:

编译时加 - O2 会有哪些优化选项

jce 的哪些代码会触发这个 bug

gcc -O2 优化开启了很多优化选项, 其中有一项就是 - fstrict-aliasing, 先来看看 gcc 对 - fstrict-aliasing 的解释: Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.

大概意思是说不同类型 (除了相似的类型, 比如 int 和 unsigned int) 的指针不会指向同一个内存地址, 如果违反这个规则, 将会出现未知情况, 举个例子:

[huanghaibin33@DevTJ-todo ~/test]$ cat test_aliasing.cpp
#include <iostream>
int main()
{
		int i = 0x12345678;
		short *p = (short *) &i;
		short tmp;
		tmp = *p;
		*p = *(p+1);
		*(p+1) = tmp;
		printf("i=%x\n", i);
		return 0;
}
[huanghaibin33@DevTJ-todo ~/test]$ g++ test_aliasing.cpp  -o test_aliasing
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing
i=56781234
[huanghaibin33@DevTJ-todo ~/test]$ g++ -O2 test_aliasing.cpp  -o test_aliasing
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing
i=12345678
[huanghaibin33@DevTJ-todo ~/test]$ g++ -O2 -fno-strict-aliasing test_aliasing.cpp  -o test_aliasing
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing
i=56781234

这段代码的目的是交换一个 int 类型的前两个字节和后两个字节, 正常编译和加了 - O2, -fno-strict-aliasing 选项, 程序可以正常运行, 但是加了 - O2 而不加 - fno-strict-aliasing 时, 结果并不是我们预期想要的. 原因是加了 - O2 选项, 默认打开了 - strict-aliasing, 程序中的 short *p = (short *) &i, 破坏了 aliasing 规则, 编译器不会认为 short 型指针 p 指向整形 & i 的地址, 因此对 p 的操作不会影响到 i 的结果.

至此问题比较清晰了, 接下来看看 jce 哪块代码违反了 aliasing 规则:

inline Int64 jce_htonll(Int64 x)
{
		jce::bswap_helper h;
		h.i64 = x;
		Int32 tmp = htonl(h.i32[1]);
		h.i32[1] = htonl(h.i32[0]);
		h.i32[0] = tmp;
		return h.i64;
}
inline Double jce_ntohd(Double x)
{
		Int64 __t__ = jce_htonll((*((Int64 *)&x)));
		return *((Double *) &__t__);
}

上述有两处代码违反了 aliasing 规则, 编译出来的程序运行结果将不可知. wup 已经在新版本 wup-linux-c++-1.0.8.1.tgz 修复了这个 bug, 看看修复的代码:

inline Double jce_ntohd(Double x)
{
		union helper {
				Double d;
				Int64 i64;
		};
		helper.d = x;
		helper.i64 = jce_htonll( helper.i64 );
		return helper.d;
}

三, 解决方法

从代码层面上优化, 比如可以通过 union 数据结构巧妙的进行转换, 可以看下面代码

在开启 - O2 和 - O3 的情况下, 可以加上 - fno-strict-aliasing, 允许不同指针指向同一个内存地址.(这在已有代码违反 aliasing 规则比较多的情况下是一个快速解决方法)

不开启 - O2,-O3 优化

[huanghaibin33@DevTJ-todo ~/test]$ cat test_aliasing_v2.cpp
#include <iostream>
int main()
{
		union helper {
				int i;
				short s;
		};
		int i = 0x12345678;
		helper h;
		h.i = i;
		short tmp = h.s;
		h.s = *(&(h.s) + 1);
		*(&(h.s) + 1) = tmp;
		i = h.i;
		printf("i=%x\n", i);
		return 0;
}
[huanghaibin33@DevTJ-todo ~/test]$ g++ test_aliasing_v2.cpp -o test_aliasing_v2
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing_v2
i=56781234
[huanghaibin33@DevTJ-todo ~/test]$ g++ -O2 test_aliasing_v2.cpp -o test_aliasing_v2
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing_v2
i=56781234
[huanghaibin33@DevTJ-todo ~/test]$ g++ -O2 -fno-strict-aliasing test_aliasing_v2.cpp -o test_aliasing_v2
[huanghaibin33@DevTJ-todo ~/test]$ ./test_aliasing_v2
i=56781234

四, 结语

在存在强制类型转换的情况下, 采用 - O1 和采用 - O2 或 - O3 产生的运行结果是不同的. 在项目中应尽量避免不同类型的指针转换, 使用编译优化选项时要多加重视编译告警.

参考资料

http://km.oa.com/group/578/articles/show/150732?kmref=search&from_page=1&no=1
 https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

来源: https://www.qcloud.com/developer/article/1159055

与本文相关文章

暂无,快来抢沙发吧！