博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hadoop平台运行WordCount程序
阅读量:4129 次
发布时间:2019-05-25

本文共 7693 字,大约阅读时间需要 25 分钟。

   1. 经典的WordCound程序(WordCount.java)

import java.io.IOException;import java.util.ArrayList;import java.util.Iterator;import java.util.List;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;public class WordCount extends Configured implements Tool {    public static class MapClass extends MapReduceBase implements            Mapper
{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } /** * A reducer class that just emits the sum of the input values. */ public static class Reduce extends MapReduceBase implements Reducer
{ public void reduce(Text key, Iterator
values, OutputCollector
output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } static int printUsage() { System.out.println("wordcount [-m
] [-r
]
"); ToolRunner.printGenericCommandUsage(System.out); return -1; } /** * The main driver for word count map/reduce program. Invoke this method to * submit the map/reduce job. * * @throws IOException * When there is communication problems with the job tracker. */ public int run(String[] args) throws Exception { JobConf conf = new JobConf(getConf(), WordCount.class); conf.setJobName("wordcount"); // the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); List
other_args = new ArrayList
(); for (int i = 0; i < args.length; ++i) { try { if ("-m".equals(args[i])) { conf.setNumMapTasks(Integer.parseInt(args[++i])); } else if ("-r".equals(args[i])) { conf.setNumReduceTasks(Integer.parseInt(args[++i])); } else { other_args.add(args[i]); } } catch (NumberFormatException except) { System.out.println("ERROR: Integer expected instead of " + args[i]); return printUsage(); } catch (ArrayIndexOutOfBoundsException except) { System.out.println("ERROR: Required parameter missing from " + args[i - 1]); return printUsage(); } } // Make sure there are exactly 2 parameters left. if (other_args.size() != 2) { System.out.println("ERROR: Wrong number of parameters: " + other_args.size() + " instead of 2."); return printUsage(); } FileInputFormat.setInputPaths(conf, other_args.get(0)); FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1))); JobClient.runJob(conf); return 0; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); }}

2. 保证hadoop集群是配置好了的,单机的也好。新建一个目录,比如 /home/admin/WordCount
编译WordCount.java程序。

javac -classpath /home/admin/hadoop/hadoop-0.19.1-core.jar WordCount.java -d /home/admin/WordCount

3. 编译完后在/home/admin/WordCount目录会发现三个class文件 WordCount.class,WordCount$Map.class,WordCount$Reduce.class。
  cd 进入 /home/admin/WordCount目录,然后执行:

jar cvf WordCount.jar 
*
.
class

  就会生成 WordCount.jar 文件。

  4. 构造一些输入数据
  input1.txt和input2.txt的文件里面是一些单词。如下:

[admin@host WordCount]$ cat input1.txt
Hello, i love china
are you ok
?

[admin@host WordCount]$ cat input2.txt
hello, i love word
You are ok

  在hadoop上新建目录,和put程序运行所需要的输入文件:

hadoop fs 
-
mkdir 
/
tmp
/
input
hadoop fs 
-
mkdir 
/
tmp
/
output
hadoop fs 
-
put input1.txt 
/
tmp
/
input
/

hadoop fs 
-
put input2.txt 
/
tmp
/
input
/

  5. 运行程序,会显示job运行时的一些信息。

[admin@host WordCount]$ hadoop jar WordCount.jar WordCount 
/
tmp
/
input 
/
tmp
/
output
10
/
09
/
16
 
22
:
49
:
43
 WARN mapred.JobClient: Use GenericOptionsParser 
for
 parsing the arguments. Applications should implement Tool 
for
 the same.
10
/
09
/
16
 
22
:
49
:
43
 INFO mapred.FileInputFormat: Total input paths to process :
2

10
/
09
/
16
 
22
:
49
:
43
 INFO mapred.JobClient: Running job: job_201008171228_76165
10
/
09
/
16
 
22
:
49
:
44
 INFO mapred.JobClient: map 
0
%
 reduce 
0
%

10
/
09
/
16
 
22
:
49
:
47
 INFO mapred.JobClient: map 
100
%
 reduce 
0
%

10
/
09
/
16
 
22
:
49
:
54
 INFO mapred.JobClient: map 
100
%
 reduce 
100
%

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Job complete: job_201008171228_76165
10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Counters: 
16

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: File Systems
10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: HDFS bytes read
=
62

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: HDFS bytes written
=
73

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Local bytes read
=
152

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Local bytes written
=
366

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Job Counters
10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Launched reduce tasks
=
1

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Rack
-
local map tasks
=
2

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Launched map tasks
=
2

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Map
-
Reduce Framework
10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Reduce input groups
=
11

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Combine output records
=
14

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Map input records
=
4

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Reduce output records
=
11

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Map output bytes
=
118

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Map input bytes
=
62

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Combine input records
=
14

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Map output records
=
14

10
/
09
/
16
 
22
:
49
:
55
 INFO mapred.JobClient: Reduce input records
=
14

  6. 查看运行结果

[admin@host WordCount]$ hadoop fs 
-
ls 
/
tmp
/
output
/

Found 
2
 items
drwxr
-
x
---
 
-
 admin admin 
0
 
2010
-
09
-
16
 
22
:
43
 
/
tmp
/
output
/
_logs
-
rw
-
r
-----
 
1
 admin admin 
102
 
2010
-
09
-
16
 
22
:
44
 
/
tmp
/
output
/
part
-
00000

[admin@host WordCount]$ hadoop fs 
-
cat 
/
tmp
/
output
/
part
-
00000

Hello, 
1

You 
1

are 
2

china 
1

hello, 
1

2

love 
2

ok 
1

ok
?
 
1

word 
1

you 
1
其中可能出现的问题

1:java.io.FileNotFoundException

   这个异常是因为目录创建上有问题,于是重新检查了下目录,发现自己弄成/opt/hadoop/tmp/inout。而是/tmp/input

2:org.apache.hadoop.mapred.FileAlreadyExistsException

   这个异常主要是因为上一个导致的,因为hadoop 由于进行的是耗费资源的计算,生产的结果默认是不能被覆盖的,因此中间结果输出目录一定不能存在,否则出现这个错误。

于是就执行命令删除output文件  /opt/hadoop/bin/hadoop fs -rmr /tmp/output

3:ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name/current

是因为hadoop-database 文件夹没有获取权限

转载地址:http://yhwvi.baihongyu.com/

你可能感兴趣的文章
acm Sheep Frenzy(状态压缩+BFS)
查看>>
mTSP(多旅行者哈密顿 图问题)
查看>>
hdu 4405 Aeroplane chess(很水的期望DP)
查看>>
点到线段的距离
查看>>
套合子(典型的匈牙利算法)
查看>>
区间DP(可以看成记忆化搜索)
查看>>
hdu 4412 Sky Soldiers(区间DP)
查看>>
卡诺图
查看>>
网络流之--最小点权覆盖和最大点权独立集
查看>>
hdu 4411 Arrest(最小费用最大流)
查看>>
UVa 12501 Bulky process of bulk reduction(线段树 + lazy思想 + 相对位置)
查看>>
poj 1011
查看>>
Linux PC和51系列单片机串行通信的设计
查看>>
linux文件系统介绍
查看>>
Linux根目录下各个文件夹的作用
查看>>
linux上的apache2服务器开启激活和配置端口方法
查看>>
linux操作系统下的glibc库介绍
查看>>
win7安装iis && HTTP 500错误怎么办?
查看>>
字符,字节,和编码
查看>>
Linux命令——chmod(修改读写执行等权限)
查看>>