最近再做一个RSS阅读工具给自己用,其中一个环节是从服务器端获取一个包含了RSS源列表的json文件,再根据这个json文件下载、解析RSS内容。核心代码如下:
- class PresenterImpl(val context: Context, val activity: MainActivity) : IPresenter {
- private val URL_API = "https://vimerzhao.github.io/others/rssreader/RSS.json"
- override fun getRssResource(): RssSource {
- val gson = GsonBuilder().create()
- return gson.fromJson(getFromNet(URL_API), RssSource::class.java)
- }
- private fun getFromNet(url: String): String {
- val result = URL(url).readText()
- return result
- }
- ......
- }
之前一直执行地很好,直到前两天我购买了一个
的域名,并将原来的域名
- vimerzhao.top
重定向到了
- vimerzhao.github.io
。这个工具就无法使用了,但在浏览器输入
- vimerzhao.top
却能得到数据:
- URL_API
那为什么
没有拿到数据呢?
- URL.readText()
可以通过下面代码测试:
- import java.net. * ;
- import java.io. * ;
- public class TestRedirect {
- public static void main(String args[]) {
- try {
- URL url1 = new URL("https://vimerzhao.github.io/others/rssreader/RSS.json");
- URL url2 = new URL("http://vimerzhao.top/others/rssreader/RSS.json");
- read(url1);
- System.out.println("=--------------------------------=");
- read(url2);
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- public static void read(URL url) {
- try {
- BufferedReader in =new BufferedReader(new InputStreamReader(url.openStream()));
- String inputLine;
- while ((inputLine = in.readLine()) != null) {
- System.out.println(inputLine);
- } in .close();
- } catch(IOException e) {
- e.printStackTrace();
- }
- }
- }
得到结果如下:
- <html>
- <head><title>301 Moved Permanently</title></head>
- <body bgcolor="white">
- <center><h1>301 Moved Permanently</h1></center>
- <hr><center>nginx</center>
- </body>
- </html>
- =--------------------------------=
- {"theme":"tech","author":"zhaoyu","email":"dutzhaoyu@gmail.com","version":"0.01","contents":[{"category":"综合版块","websites":[{"tag":"门户网站","url":["http://geek.csdn.net/admin/news_service/rss","http://blog.jobbole.com/feed/","http://feed.cnblogs.com/blog/sitehome/rss","https://segmentfault.com/feeds","http://www.codeceo.com/article/category/pick/feed"]},{"tag":"知名社区","url":["https://stackoverflow.com/feeds","https://www.v2ex.com/index.xml"]},{"tag":"官方博客","url":["https://www.blog.google/rss/","https://blog.jetbrains.com/feed/"]},{"tag":"个人博客-行业","url":["http://feed.williamlong.info/","https://www.liaoxuefeng.com/feed/articles"]},{"tag":"个人博客-学术","url":["http://www.norvig.com/rss-feed.xml"]}]},{"category":"编程语言","websites":[{"tag":"Kotlin","url":["https://kotliner.cn/api/rss/latest"]},{"tag":"Python","url":["https://www.python.org/dev/peps/peps.rss/"]},{"tag":"Java","url":["http://www.codeceo.com/article/category/develop/java/feed"]}]},{"category":"行业动态","websites":[{"tag":"Android","url":["http://www.codeceo.com/article/category/develop/android/feed"]}]},{"category":"乱七八遭","websites":[{"tag":"Linux-综合","url":["https://linux.cn/rss.xml","http://www.linuxidc.com/rssFeed.aspx","http://www.codeceo.com/article/tag/linux/feed"]},{"tag":"Linux-发行版","url":["https://blog.linuxmint.com/?feed=rss2","https://manjaro.github.io/feed.xml"]}]}]}
HTTP返回码301,即发生了重定向。可在浏览器上这个过程太快以至于我们看不到这个301界面的出现。这里需要说明的是
是Kotlin中一个
- URL.readText()
,本质还是调用了
- 扩展函数
类的
- URL
方法,部分源码如下:
- openStream
- .....
- /**
- * Reads the entire content of this URL as a String using UTF-8 or the specified [charset].
- *
- * This method is not recommended on huge files.
- *
- * @param charset a character set to use.
- * @return a string with this URL entire content.
- */
- @kotlin.internal.InlineOnly public inline fun URL.readText(charset: Charset = Charsets.UTF_8) : String = readBytes().toString(charset)
- /**
- * Reads the entire content of the URL as byte array.
- *
- * This method is not recommended on huge files.
- *
- * @return a byte array with this URL entire content.
- */
- public fun URL.readBytes() : ByteArray = openStream().use {
- it.readBytes()
- }
所以上面的测试代码即说明了
失败的原因。 不过
- URL.readText()
不支持重定向是否合理?为什么不支持?还有待探究。
- URL
方法
- equals
首先看下
的说明(URL (Java Platform SE 7 )):
- equals
Compares this URL for equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can't be resolved, the host names must be equal without regard to case; or both host names equal to null.
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.
接下来再看一段代码:
- import java.net. * ;
- public class TestEquals {
- public static void main(String args[]) {
- try {
- // vimerzhao的博客主页
- URL url1 = new URL("https://vimerzhao.github.io/");
- // zhanglanqing的博客主页
- URL url2 = new URL("https://zhanglanqing.github.io/");
- // vimerzhao博客主页重定向后的域名
- URL url3 = new URL("http://vimerzhao.top/");
- System.out.println(url1.equals(url2));
- System.out.println(url1.equals(url3));
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- }
根据定义输出结果是什么呢?运行之后是这样:
- true
- false
你可能猜对了,但如果我把电脑断网之后再次执行,结果却是:
- false
- false
但其实3个域名的IP地址都是相同的,可以
一下:
- ping
- zhaoyu@Inspiron ~/Project $ ping vimezhao.github.io
- PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
- 64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms
- ^C
- --- sni.github.map.fastly.net ping statistics ---
- 1 packets transmitted, 1 received, 0% packet loss, time 0ms
- rtt min/avg/max/mdev = 396.692/396.692/396.692/0.000 ms
- zhaoyu@Inspiron ~/Project $ ping zhanglanqing.github.io
- PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
- 64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms
- ^C
- --- sni.github.map.fastly.net ping statistics ---
- 2 packets transmitted, 1 received, 50% packet loss, time 1000ms
- rtt min/avg/max/mdev = 396.009/396.009/396.009/0.000 ms
- zhaoyu@Inspiron ~/Project $ ping vimezhao.top
- ping: unknown host vimezhao.top
- zhaoyu@Inspiron ~/Project $ ping vimerzhao.top
- PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
- 64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=409 ms
- ^C
- --- sni.github.map.fastly.net ping statistics ---
- 2 packets transmitted, 1 received, 50% packet loss, time 1001ms
- rtt min/avg/max/mdev = 409.978/409.978/409.978/0.000 ms
首先看一下有网络连接的情况,
和
- vimerzhao.github.io
是我和我同学的博客,虽然内容不一样但是指向相同的IP,协议、端口等都相同,所以相等了;而
- zhanglanqing.github.io
虽然和
- vimerzhao.github.io
指向同一个博客,但是一个是
- vimerzhao.top
一个是
- https
,协议不同,所以判断为不相等。相信这和大多数人的直觉是相背的:指向不同博客的URL相等了,但指向相同博客的URL却不相等! 再分析断网之后的结果:首先查看
- http
的源码:
- URL
- public boolean equals(Object obj) {
- if (!(obj instanceof URL))
- return false;
- URL u2 = (URL)obj;
- return handler.equals(this, u2);
- }
再看
对象的源码:
- handler
- protected boolean equals(URL u1, URL u2) {
- String ref1 = u1.getRef();
- String ref2 = u2.getRef();
- return (ref1 == ref2 || (ref1 != null && ref1.equals(ref2))) &&
- sameFile(u1, u2);
- }
源码:
- sameFile
- protected boolean sameFile(URL u1, URL u2) {
- // Compare the protocols.
- if (!((u1.getProtocol() == u2.getProtocol()) ||
- (u1.getProtocol() != null &&
- u1.getProtocol().equalsIgnoreCase(u2.getProtocol()))))
- return false;
- // Compare the files.
- if (!(u1.getFile() == u2.getFile() ||
- (u1.getFile() != null && u1.getFile().equals(u2.getFile()))))
- return false;
- // Compare the ports.
- int port1, port2;
- port1 = (u1.getPort() != -1) ? u1.getPort() : u1.handler.getDefaultPort();
- port2 = (u2.getPort() != -1) ? u2.getPort() : u2.handler.getDefaultPort();
- if (port1 != port2)
- return false;
- // Compare the hosts.
- if (!hostsEqual(u1, u2))
- return false;// 无网络连接时会触发这一句
- return true;
- }
最后是
的源码:
- hostsEqual
- protected boolean hostsEqual(URL u1, URL u2) {
- InetAddress a1 = getHostAddress(u1);
- InetAddress a2 = getHostAddress(u2);
- // if we have internet address for both, compare them
- if (a1 != null && a2 != null) {
- return a1.equals(a2);
- // else, if both have host names, compare them
- } else if (u1.getHost() != null && u2.getHost() != null)
- return u1.getHost().equalsIgnoreCase(u2.getHost());
- else
- return u1.getHost() == null && u2.getHost() == null;
- }
在有网络的情况下,
和
- a1
都不是
- a2
所以会触发
- null
,返回
- return a1.equals(a2)
;而没有网络时则会触发
- true
即第二个判断,显然
- return u1.getHost().equalsIgnoreCase(u2.getHost());
的
- url1
(
- host
)和
- vimerzhao.github.io
的
- url2
(
- host
)不等,所以返回
- zhanglanqing.github.io
,导致
- false
判断为真,
- if (!hostsEqual(u1, u2))
执行。 可见,
- return false
类的
- URL
方法不仅违反直觉还缺乏一致性,在不同环境会有不同结果,十分危险!
- equals
方法
- equals
此外,
还是个耗时的操作,因为在有网络的情况下需要进行DNS解析,
- equals
同理,这里以
- hashCode()
为例说明。
- hashCode()
类的
- URL
源码:
- hashCode()
- public synchronized int hashCode() {
- if (hashCode != -1)
- return hashCode;
- hashCode = handler.hashCode(this);
- return hashCode;
- }
对象的
- handler
方法:
- hashCode()
- protected int hashCode(URL u) {
- int h = 0;
- // Generate the protocol part.
- String protocol = u.getProtocol();
- if (protocol != null)
- h += protocol.hashCode();
- // Generate the host part.
- InetAddress addr = getHostAddress(u);
- if (addr != null) {
- h += addr.hashCode();
- } else {
- String host = u.getHost();
- if (host != null)
- h += host.toLowerCase().hashCode();
- }
- // Generate the file part.
- String file = u.getFile();
- if (file != null)
- h += file.hashCode();
- // Generate the port part.
- if (u.getPort() == -1)
- h += getDefaultPort();
- else
- h += u.getPort();
- // Generate the ref part.
- String ref = u.getRef();
- if (ref != null)
- h += ref.hashCode();
- return h;
- }
其中
会消耗大量时间。所以,如果在基于哈希表的容器中存储
- getHostAddress()
对象,简直就是灾难。下面这段代码,对比了
- URL
和
- URL
在存储50次时的表现:
- URI
- import java.net. * ;
- import java.util. * ;
- public class TestHash {
- public static void main(String args[]) {
- HashSet < URL > list1 = new HashSet < >();
- HashSet < URI > list2 = new HashSet < >();
- try {
- URL url1 = new URL("https://vimerzhao.github.io/");
- URI url2 = new URI("https://zhanglanqing.github.io/");
- long cur = System.currentTimeMillis();
- int cnt = 50;
- for (int i = 0; i < cnt; i++) {
- list1.add(url1);
- }
- System.out.println(System.currentTimeMillis() - cur);
- cur = System.currentTimeMillis();
- for (int i = 0; i < cnt; i++) {
- list2.add(url2);
- }
- System.out.println(System.currentTimeMillis() - cur);
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- }
输出为:
- 271
- 0
所以,基于哈希表实现的容器最好不要用
。
- URL
的作用
- TrailingSlash
所谓
就是域名结尾的斜杠。比如我们在浏览器看到
- TrailingSlash
,复制后粘贴发现是
- vimerzhao.top
。首先用下面代码测试:
- http://vimerzhao.top/
- import java.net. * ;
- import java.io. * ;
- public class TestTrailingSlash {
- public static void main(String args[]) {
- try {
- URL url1 = new URL("https://vimerzhao.github.io/");
- URL url2 = new URL("https://vimerzhao.github.io");
- System.out.println(url1.equals(url2));
- outputInfo(url1);
- outputInfo(url2);
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- public static void outputInfo(URL url) {
- System.out.println("------" + url.toString() + "----------");
- System.out.println(url.getRef());
- System.out.println(url.getFile());
- System.out.println(url.getHost());
- System.out.println("----------------");
- }
- }
得到结果如下:
- false
- ------https://vimerzhao.github.io/----------
- null
- /
- vimerzhao.github.io
- ----------------
- ------https://vimerzhao.github.io----------
- null
- vimerzhao.github.io
- ----------------
其实,无论用前面的
方法读或者地址栏直接输入url,
- read()
和
- url1
的内容都是相同的,但是加
- url2
表示这是一个目录,不加表示这是一个文件,所以二者
- /
的结果不同,导致
- getFile()
判断为
- equals
。在地址栏输入时甚至不会觉察到这个
- false
,所返回的结果也一样,但
- TrailingSlash
判断竟然为
- equals
,真是防不胜防!
- false
,那么就会在这个目录下找
- /
文件;如果没有,以
- index.html
为例,则会先找
- vimerzhao.top/tags
,如果找不到就会自动在后面添加一个
- tags
,再在
- /
目录下找
- tags
文件。如图:
- index.html
这里有一个有趣的测试,编写两段代码如下:
- import java.net. * ;
- import java.io. * ;
- public class TestTrailingSlash {
- public static void main(String args[]) {
- try {
- URL urlWithSlash = new URL("http://vimerzhao.top/tags/");
- int cnt = 5;
- long cur = System.currentTimeMillis();
- for (int i = 0; i < cnt; i++) {
- read(urlWithSlash);
- }
- System.out.println(System.currentTimeMillis() - cur);
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- public static void read(URL url) {
- try {
- BufferedReader in =new BufferedReader(new InputStreamReader(url.openStream()));
- String inputLine;
- while ((inputLine = in.readLine()) != null) {
- //System.out.println(inputLine);
- } in .close();
- } catch(IOException e) {
- e.printStackTrace();
- }
- }
- }
- import java.net. * ;
- import java.io. * ;
- public class TestWithoutTrailingSlash {
- public static void main(String args[]) {
- try {
- URL urlWithoutSlash = new URL("http://vimerzhao.top/tags");
- int cnt = 5;
- long cur = System.currentTimeMillis();
- for (int i = 0; i < cnt; i++) {
- read(urlWithoutSlash);
- }
- System.out.println(System.currentTimeMillis() - cur);
- } catch(Exception e) {
- e.printStackTrace();
- }
- }
- public static void read(URL url) {
- try {
- BufferedReader in =new BufferedReader(new InputStreamReader(url.openStream()));
- String inputLine;
- while ((inputLine = in.readLine()) != null) {
- //System.out.println(inputLine);
- } in .close();
- } catch(IOException e) {
- e.printStackTrace();
- }
- }
- }
使用如下脚本测试:
- # ! /bin/sh
- for i in {
- 1..20
- };
- do java TestTrailingSlash > out1 java TestWithoutTrailingSlash > out2 done
将输出的时间做成表格:
可以发现,添加了
的速度更快,这是因为省去了查找是否有
- /
文件的操作。这也给我们启发:URL结尾的
- tags
最好还是加上!
- /
以上,本周末发现的一些坑。
来源: http://www.cnblogs.com/zhaoyu1995/p/7909849.html