在windows下开发hadoop程序,所以经过试验,大讲台老师 总结了如何在windows下使用Eclipse来开发hadoop程序代码。
选择Map/Reduce,点击OK,在右下方看到有个Map/ReduceLocations的图标,点击Map/ReduceLocation选项卡,点击右边小象图标,打开HadoopLocation配置窗口:输入LocationName,任意名称即可.配置Map/ReduceMaster和DFSMastrer,Host和Port配置成与core-site.xml的设置一致即可。
去找core-site.xml配置:
fs.default.namehdfs://name01:9000
点击”Finish”按钮,关闭窗口。点击左侧的DFSLocations—>myhadoop(上一步配置的locationname),如能看到user,表示安装成功,但是进去看到报错信息:Error:Permissiondenied:user=root,access=READ_EXECUTE,inode=”/tmp”;hadoop:supergroup:drwx———。
应该是权限问题:把/tmp/目录下面所有的关于hadoop的文件夹设置成hadoop用户所有然后分配授予777权限。
cd/tmp/
chmod777/tmp/
chown-Rhadoop.hadoop/tmp/hsperfdata_root
之后重新连接打开DFSLocations就显示正常了。
Map/ReduceMaster(此处为Hadoop集群的Map/Reduce地址,应该和mapred-site.xml中的mapred.job.tracker设置相同)
(1):点击报错:
Aninternalerroroccurredduring:”ConnectingtoDFShadoopname01″.
java.net.UnknownHostException:name01
直接在hostname那一栏里面设置ip地址为:192.168.52.128,即可,这样就正常打开了,如下图所示:
5、新建WordCount项目
File—>Project,选择Map/ReduceProject,输入项目名称WordCount等。
在WordCount项目里新建class,名称为WordCount,报错代码如下:InvalidHadoopRuntimespecified;pleaseclick’ConfigureHadoopinstalldirectory’orfillinlibrarylocationinputfield,报错原因是目录选择不对,不能选择在跟目录E:\hadoop下,换成E:\u\hadoop\就可以了,如下所示:
一路下一步过去,点击Finished按钮,完成工程创建,Eclipse控制台下面出现如下信息:
14-12-9下午04时03分10秒:EclipseisrunninginaJRE,butaJDKisrequired
SomeMavenpluginsmaynotworkwhenimportingprojectsorupdatingsourcefolders.
14-12-9下午04时03分13秒:Refreshing[/WordCount/pom.xml]
14-12-9下午04时03分14秒:Refreshing[/WordCount/pom.xml]
14-12-9下午04时03分14秒:Refreshing[/WordCount/pom.xml]
14-12-9下午04时03分14秒:Updatingindexcentral|http://repo1.maven.org/maven2
14-12-9下午04时04分10秒:Updatedindexforcentral|http://repo1.maven.org/maven2
6,Lib包导入:
需要添加的hadoop相应jar包有:
/hadoop-2.3.0/share/hadoop/common下所有jar包,及里面的lib目录下所有jar包,
/hadoop-2.3.0/share/hadoop/hdfs下所有jar包,不包括里面lib下的jar包,
/hadoop-2.3.0/share/hadoop/mapreduce下所有jar包,不包括里面lib下的jar包,
/hadoop-2.3.0/share/hadoop/yarn下所有jar包,不包括里面lib下的jar包,
大概18个jar包左右。
7,Eclipse直接提交mapreduce任务所需要环境配置代码如下所示:
- package wc;
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.Reducer;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.util.GenericOptionsParser;
- public class W2 {
- public static class TokenizerMapper extends Mapper < Object,
- Text,
- Text,
- IntWritable > {
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(Object key, Text value, Context context) throws IOException,
- InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()) {
- word.set(itr.nextToken());
- context.write(word, one);
- }
- }
- }
- public static class IntSumReducer extends Reducer < Text,
- IntWritable,
- Text,
- IntWritable > {
- private IntWritable result = new IntWritable();
- public void reduce(Text key, Iterable < IntWritable > values, Context context) throws IOException,
- InterruptedException {
- int sum = 0;
- for (IntWritable val: values) {
- sum += val.get();
- }
- result.set(sum);
- context.write(key, result);
- }
- }
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
- System.setProperty(\
8、运行
8.1、在HDFS上创建目录input
- [hadoop@name01hadoop - 2.3.0] $hadoopfs - ls /
- [hadoop@name01hadoop - 2.3.0] $hadoopfs - mkdirinput
- mkdir: `input ':Nosuchfileordirectory
- [hadoop@name01hadoop-2.3.0]$PS:fs需要全目录的方式来创建文件夹
- 如果Apachehadoop版本是0.x或者1.x,
- bin/hadoophdfsfs-mkdir-p/in
- bin/hadoophdfsfs-put/home/du/input/in
- 如果Apachehadoop版本是2.x.
- bin/hdfsdfs-mkdir-p/in
- bin/hdfsdfs-put/home/du/input/in'
如果是发行版的hadoop,比如ClouderaCDH,IBMBI,HortonworksHDP则第一种命令即可。要注意创建目录的全路径。另外hdfs的根目录是/
2、拷贝本地README.txt到HDFS的input里
- [hadoop@name01hadoop - 2.3.0] $find. - nameREADME.txt
- . / share / doc / hadoop / common / README.txt
- [hadoop@name01~] $hadoopfs - copyFromLocal. / src / hadoop - 2.3.0 / share / doc / hadoop / common / README.txt / data / input
- [hadoop@name01~] $
- [hadoop@name01~] $hadoopfs - ls /
- Found2items
- drwxr - xr - x - hadoopsupergroup02014 - 12 - 1523 : 34 / data
- - rw - r--r--3hadoopsupergroup882014 - 08 - 2602 : 21 / input
- Youhavenewmailin /
- var / spool / mail / root
- [hadoop@name01~] $
3,运行hadoop结束后,查看输出结果
(1)、直接在hadoop服务器上面查看
- [hadoop@name01~] $hadoopfs - ls / data /
- Found2items
- drwxr - xr - x - hadoopsupergroup02014 - 12 - 1523 : 29 / data / input
- drwxr - xr - x - hadoopsupergroup02014 - 12 - 1523 : 34 / data / output
- [hadoop@name01~] $
(2)、去Eclipse下查看
(3)、在控制台上查看信息
End.
来源: http://www.36dsj.com/archives/97236