网络爬虫 + 模拟浏览器 (获取有权限网站资源):
获取 URL
下载资源
分析
处理
- public class http {
- public static void main(String[]args) throws Exception
- {
- //http+s 更安全
- //URL.openStream() 打开于 URL 的连接, 并返回一个 InputStream 用于从连接中读取数据
- // 获取 URL
- URL url=new URL("https://www.jd.com");
- // 下载资源
- InputStream is = url.openStream();
- BufferedReader br=new BufferedReader(new InputStreamReader(is,"UTF-8"));;
- String msg=null;
- while((msg=br.readLine())!=null)
- {
- System.out.println(msg);
- }
- br.close();
- }
- }
获取有权限网络资源:
- public class http {
- public static void main(String[]args) throws Exception
- {
- //.openConnectio,, 返回一个 URLConnection 实例表示由所引用的远程对象的连接 URL
- //URLConnection 的子类有 HttpURLConnection 和 JarURLConnection
- URL url=new URL("https://www.jd.com");
- // 下载资源
- HttpURLConnection conn=(HttpURLConnection)url.openConnection();
- conn.setRequestMethod("GET");// 模拟浏览器得 get 请求
- conn.setRequestProperty( "User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) ApplewebKit/537.36 (Khtml, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763");
- BufferedReader br=new BufferedReader(new InputStreamReader(conn.getInputStream(),"UTF-8"));
- String msg=null;
- while((msg=br.readLine())!=null)
- {
- System.out.println(msg);
- }
- br.close();
- }
- }
来源: http://www.bubuko.com/infodetail-3165610.html