DML:Data Manipulation Language(数据操作语言, 与关系型数据库相似)
官方手册:
一, 加载数据到 hive 表中
1, 从文件系统加载数据入库
- LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1,
- partcol2=val2 ...)]
- LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1,
- partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
说明:
[local]: 如果加上表示本地地址, 如果没有表示 HDFS 上的地址.
[overwrite]: 如果加上表示覆盖之前的数据, 如果没有表示追加之前的数据.
[partition_sepc]: 如果加上表示加载进相应的分区.
2, 从其他表中装载
- INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]]
- select_statement1 FROM from_statement;
- INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
- FROM from_statement;
- (其中插入的字段要和查询的字段要保持一致)
二, 导出 hive 表中数据到文件系统
- INSERT OVERWRITE [LOCAL] DIRECTORY directory1
- [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
- SELECT ... FROM ...
注意: 输出到 HDFS, 有些 CDH 版本不支持;
输出到本地文件系统和 HDFS, 取决于 LOCAL 关键字
也可以用 Linux 重定向导出
- hive -e "select * from emp limit 5"> file
- (数据的导入导出常用 sqoop)
三, 基本 sql
- hive>select * from emp;
- hive>select * from emp where ename='...';hive>select * from emp where deptno=10;
- hive>select * from emp where sal between 800 and 1500;
- hive>select * from emp where ename in ('SMITH','WARD');
- hive>select count(1),max(sal),min(sal),avg(sal) from emp;// 聚合函数 - max,min,count,sum,avg
- hive>select deptno,avg(sal) from emp group by deptno;// 分组函数 - group by
- hive>select ename, sal,
- case
- when sal> 1 and sal <= 1000 then 'LOWER'
- when sal> 1000 and sal <= 2000 then 'MIDDLE'
- when sal> 2000 and sal <= 4000 then 'HIGH'
- ELSE 'HIGHEST' end
- from emp; //case when
四, export/import https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
- EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
- TO 'export_target_path' [ FOR replication('eventid') ]
- IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]]
- FROM 'source_path'
- [LOCATION 'import_target_path']
- Replication usage
来源: http://www.bubuko.com/infodetail-3016356.html