php做彩票网站吗,深圳网页设计与制作公司,做网站费用多少,怎么把危险网站idea上的MapReduce
一般在开发中#xff0c;若是等到环境搭配好了再进行测试或者统计数据#xff0c;数据处理等操作#xff0c;那会很耽误时间#xff0c;所以一般都是2头跑#xff0c;1波人去在客户机上搭建环境#xff0c;1波人通过在idea上搭建虚拟hadoop环境若是等到环境搭配好了再进行测试或者统计数据数据处理等操作那会很耽误时间所以一般都是2头跑1波人去在客户机上搭建环境1波人通过在idea上搭建虚拟hadoop环境然后再虚拟环境下编写测试功能代码
使用Java API实现MapReduce经典案例
【案例1数据去重】
1配置windows下的hadoop环境变量
步骤1将hadoop的安装包解压到指定位置本例指定位置是:C:\Program Files) 步骤2新建系统环境变量HADOOP_HOME 步骤3编辑系统环境变量path 步骤4添加windows系统的依赖文件在hadoop安装路径下添加winutils.exe,winutils.pdb和hadoop.dll共3个文件 注意
1一定要重启电脑让以上配置生效有时候不用重启也可以
2在命令提示符cmd中找不到hadoop的版本不影响后续编程 2配置好Maven
步骤1将maven相关文件夹apache-maven-3.6.0放在D盘的根目录
步骤2使用idea新建maven项目,并做如下maven设置 3编辑pom.xml文件添加Maven库依赖 dependenciesdependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-common/artifactIdversion3.1.3/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-hdfs/artifactIdversion3.1.3/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-client/artifactIdversion3.1.3/version/dependencydependencygroupIdjunit/groupIdartifactIdjunit/artifactIdversion4.12/version/dependency
/dependencies4Map阶段的实现编写DedupMapper.java代码 教材P116 package com.xyzy;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import java.io.IOException;public class DedupDriver {public static void main (String[] args) throws IOException,ClassNotFoundException, InterruptedException {Configuration conf new Configuration();Job job Job.getInstance(conf);job.setJarByClass(DedupDriver.class);job.setMapperClass(DedupMapper.class);job.setReducerClass(DedupReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(NullWritable.class);FileInputFormat.setInputPaths(job,new Path(D:/testdata/input));FileOutputFormat.setOutputPath(job, new Path(D:/testdata/output2));boolean res job.waitForCompletion(true);System.exit(res ? 0 : 1);}
}5Reduce阶段的实现编写DedupReducer.java代码教材P117) package com.xyzy;import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class DedupMapper extends MapperLongWritable, Text, Text,NullWritable {private static Text field new Text(); Overrideprotected void map(LongWritable key, Text value , Context context)throws IOException, InterruptedException{field value;context.write(field, NullWritable.get());}
}6驱动类的实现编写DedupDriver.java代码教材P117 package com.xyzy;import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;public class DedupReducer extends ReducerText,NullWritable, Text,NullWritable { Overrideprotected void reduce(Text key, IterableNullWritablevalue,Context context) throwsIOException, InterruptedException{context.write(key, NullWritable.get());}
}7要提前在d:/testdata/input中准备好素材(提醒一下output不是自己创建的文件夹而是运行系统自动生成的) 8运行后的效果 自动在d:/testdata/产生目录output,内容如下 如果已经产生一次结果若再想使用去重操作则需要改写结果存储的文件夹名例如将output改为output1即可