例子1: 统计一个文本文件里的单词数量
[hadoop@mylinux ~]$ cd hadoop-0.20.2/
[hadoop@mylinux hadoop-0.20.2]$ mkdir input #在本地系统建立目录 input [hadoop@mylinux hadoop-0.20.2]$ cd input #进入input目录 [hadoop@mylinux input]$ vi file01 #新建一个文件file01 hello hadoop this is first examples by huxin hadoop [hadoop@mylinux hadoop-0.20.2]$ cd [hadoop@mylinux ~]$ cd hadoop-0.20.2/ [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -mkdir input 在hadoop文件系统中建立目录input,注意和上面的本地系统目录input区分开 [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -put ./input/file01 input #将本地的file01放进hadoop文件系统的目录中 [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls input Found 1 items -rw-r--r-- 1 hadoop supergroup 54 2011-06-22 19:25 /user/hadoop/input/file01 [hadoop@mylinux hadoop-0.20.2]$ hadoop jar hadoop-0.20.2-examples.jar wordcount input output #对hadoop文件系统进行count单词数,完成后自动生成一个output目录 [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls output Found 2 items drwxr-xr-x - hadoop supergroup 0 2011-06-22 19:26 /user/hadoop/output/_logs -rw-r--r-- 1 hadoop supergroup 61 2011-06-22 19:26 /user/hadoop/output/part-r-00000 [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -cat output/part-r-00000 #查看part-r-00000 by 1 examples 1 first 1 hadoop 2 hello 1 huxin 1 is 1 this 1 ----------------------- Done!
Wordcount源码是在src/examples/org/apache/hadoop/examples/WordCount.java这里。
下面我们来手动编译和执行一遍这段原代码吧: [hadoop@mylinux ~]$ cd hadoop-0.20.2/ [hadoop@mylinux hadoop-0.20.2]$ mkdir playground [hadoop@mylinux hadoop-0.20.2] $mkdir playground/src [hadoop@mylinux hadoop-0.20.2]$ mkdir playground/classes [hadoop@mylinux hadoop-0.20.2]$ cp src/examples/org/apache/hadoop/examples/WordCount.java playground/src/WordCount.java [[hadoop@mylinux hadoop-0.20.2]$ javac -classpath hadoop-0.20.2-core.jar:lib/commons-cli-1.2.jar -d playground/classes/ playground/src/WordCount.java [hadoop@mylinux hadoop-0.20.2]$ jar -cvf playground/wordcount.jar -C playground/classes/ . [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -rmr output #记得先删除前面我们建立过的output目录 Deleted hdfs://master:9000/user/hadoop/output [hadoop@mylinux hadoop-0.20.2]$ hadoop jar playground/wordcount.jar org.apache.hadoop.examples.WordCount input output [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls output Found 2 items drwxr-xr-x - hadoop supergroup 0 2011-06-22 19:55 /user/hadoop/output/_logs -rw-r--r-- 1 hadoop supergroup 61 2011-06-22 19:55 /user/hadoop/output/part-r-00000 [hadoop@mylinux hadoop-0.20.2]$ hadoop fs -cat output/part-r-00000 #查看part-r-00000 by 1 examples 1 first 1 hadoop 2 hello 1 huxin 1 is 1 this 1 ----- done