Spark 教程 --(4)Spark-shell 调用 SQLContext(HiveContext)

启动 Spark-shell:

[[email protected] ~]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc (master = yarn-client, App id = application_1554951897984_0111).
SQL context available as sqlContext.
scala> sc
res0: org.apache.spark.SparkContext = [email protected]
scala> sqlContext
res1: org.apache.spark.sql.SQLContext = [email protected]

上下文已经包含 sc 和 sqlContext:

Spark context available as sc (master = yarn-client, App id = application_1554951897984_0111).
SQL context available as sqlContext.

本地创建 people07041119.JSON

{
	"name":"zhangsan","job number":"101","age":33,"gender":"male","deptno":1,"sal":18000	
}
{
	"name":"lisi","job number":"102","age":30,"gender":"male","deptno":2,"sal":20000	
}
{
	"name":"wangwu","job number":"103","age":35,"gender":"female","deptno":3,"sal":50000	
}
{
	"name":"zhaoliu","job number":"104","age":31,"gender":"male","deptno":1,"sal":28000	
}
{
	"name":"tianqi","job number":"105","age":36,"gender":"female","deptno":3,"sal":90000	
}

本地创建 dept.JSON

{
	"name":"development","deptno":1	
}
{
	"name":"personnel","deptno":2	
}
{
	"name":"testing","deptno":3	
}

将本地文件上传到 HDFS 上:

bash-4.2$ hadoop dfs -put /home/**/data/people07041119.JSON /user/**

bash-4.2$ hadoop dfs -put /home/**/data/dept.JSON /user/**

结果如下:

执行 Scala 脚本, 加载文件:

scala> val people=sqlContext.jsonFile("/user/**/people07041119.json")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [age: bigint, deptno: bigint, gender: string, job number: string, name: string, sal: bigint]
scala> val dept=sqlContext.jsonFile("/user/**/dept.json")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [deptno: bigint, name: string]

执行 Scala 脚本, 查看文件内容:

scala> people.show
+---+------+------+----------+--------+-----+
|age|deptno|gender|job number|    name|  sal|
+---+------+------+----------+--------+-----+
| 33|     1|  male|       101|zhangsan|18000|
| 30|     2|  male|       102|    lisi|20000|
| 35|     3|female|       103|  wangwu|50000|
| 31|     1|  male|       104| zhaoliu|28000|
| 36|     3|female|       105|  tianqi|90000|
+---+------+------+----------+--------+-----+

显示前三条记录:

scala> people.show(3)
+---+------+------+----------+--------+-----+
|age|deptno|gender|job number|    name|  sal|
+---+------+------+----------+--------+-----+
| 33|     1|  male|       101|zhangsan|18000|
| 30|     2|  male|       102|    lisi|20000|
| 35|     3|female|       103|  wangwu|50000|
+---+------+------+----------+--------+-----+
only showing top 3 rows

查看列信息:

scala>  people.columns
res5: Array[String] = Array(age, deptno, gender, job number, name, sal)

添加过滤条件:

scala>  people.filter("gender='male'").count
res6: Long = 3

参考:

来源: http://www.bubuko.com/infodetail-3112908.html

与本文相关文章

暂无,快来抢沙发吧！