是dfs仅是针对dfs.webhdfs.enabled的.那么什么时候用fs

点击联系发帖人 时间：2017-07-09 02:24

hdfs dfs mkdir

后使用快捷导航没有帐号？
查看: 6054|回复: 7
请问dfs.replication=2 的意思
论坛徽章:23
本帖最后由不了峰于
09:58 编辑
dfs.replication=2 个人理解是导入一个数据文件总共存两个
可以看那本权威指南说
fs -ls in .
Found 2 items
-rw-r--r--& &3 grid supergroup& && && &25
11:24 /user/grid/in/test1.txt
-rw-r--r--& &3 grid supergroup& && && &13
11:24 /user/grid/in/test2.txt
第一列中的3 表示数据有三份,?& &
到时底是什么原因?&&我有两个datanode节点
--原来是这年配置文件写成下面这样子，所以造成dfs.replication=2 不起作用
hdfs-site.xml&configuration&&property&&name&dfs.replication&/name&&value&2&/value&&name&dfs.data.dir&/name&&value&/opt/hadoop/hadoop-0.20.2/hadoop_data&/value&&/property&&/configuration&
中级会员, 积分 405, 距离下一级还需 95 积分
论坛徽章:1
dfs.replication=2表示冗余份数，这是hdfs的容错机制，防止有磁盘受损后数据损害。
金牌会员, 积分 1112, 距离下一级还需 1888 积分
论坛徽章:3
我也想知道这个与配置的fs.data.dir有关系不，冗余数据是复制到哪儿去了？虽然这些之后都会搞清楚，不过一直存疑啊
金牌会员, 积分 2002, 距离下一级还需 998 积分
论坛徽章:6
嘉瑜猫发表于
我也想知道这个与配置的fs.data.dir有关系不，冗余数据是复制到哪儿去了？虽然这些之后都会搞清楚，不过一直 ...
这些文件都保存在由参数dfs.data.dir指定数据节点的目录中如：/home/grid/hadoop-0.20.2/data,文件的默认最大大小为64M ，假如一个本地文件为4G，上传到hadoop后，数据节点/home/grid/hadoop-0.20.2/data/current中-rw-r--r--. 1 grid grid
Apr 29 09:04 blk_2348469这类文件的大小总和就等于4G*冗余数，而且这些dfs文件在各数据节点中的分布并不一定是均匀分布的（如4G的本地文件上传到dfs后，如果数据节点数为2且冗余数为2，且每个数据节点的容量不想等的情况下，并不是每个数据节点存放4G。）
中级会员, 积分 332, 距离下一级还需 168 积分
论坛徽章:1
刚刚开始如果只有两个数据节点这个参数好像设定为1就可以了，这个是用作放几份冗余数据的，应该是master用的
论坛徽章:23
bluerchow 发表于
刚刚开始如果只有两个数据节点这个参数好像设定为1就可以了，这个是用作放几份冗余数据的，应该是master用的 ...
dfs.replication 这个参数是动态的。
也是就每次put一个文件上去，可以提定冗余备份数
中级会员, 积分 476, 距离下一级还需 24 积分
论坛徽章:2
replication指的是副本数，这里设置的是冗余副本数为2
金牌会员, 积分 1112, 距离下一级还需 1888 积分
论坛徽章:3
niss 发表于
这些文件都保存在由参数dfs.data.dir指定数据节点的目录中如：/home/grid/hadoop-0.20.2/data,文件的默认 ...
谢谢，虽然仍然不是很清楚。实际做M-R程序时就应该能搞明白了
扫一扫加入本版微信群温馨提示！由于新浪微博认证机制调整，您的新浪微博帐号绑定已过期，请重新绑定！&&|&&
LOFTER精选
网易考拉推荐
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class test {
public static void main(String[] args) throws Exception {
String local = args[0];
String hdfs = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(local));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
OutputStream out = fs.create(new Path(hdfs), new Progressable() {
public void progress() {
System.out.print(".");
IOUtils.copyBytes(in, out, 4096, true);
FileSystem类有一系列创建文件的方法，详细看各个重载版本的create()，如允许我们制定是否需要强制覆盖、文件备份数量、写入文件所用缓冲区大小、文件块大小及文件权限等。该方法会自动创建不存在的父级目录。FileSystem对象中还有追加的方法FSDataOutputStream
append()，需要配置hdfs的dfs.append.support属性值为ture，否则会提示：
Exception in thread “main” org.apache.hadoop.ipc.RemoteException:
java.io.IOException: Append to hdfs not supported. Please refer to
dfs.support.append configuration parameter.
另外apache官网的邮件列表文档里有一段话，说建议不要使用append，如下：
In short, appends in HDFS are extremely experimental and dangerous. Most
would advise you to leave this disabled. Your best option for “append” like
behavior is to rewrite the file with new content being added at the end. Append
support was briefly introduced and then removed as a number of issues came
详尽的原因有一个，可以去看看。
二、写一个与hadoop fs -getmerge相对应的一个简单程序： putmerge
。我们知道，getmerge命令是从hdfs上获取大量文件组合成一个文件放到本地文件系统中的命令。但是hadoop没有提供与这一过程相逆的命令。不幸的是我们会在处理apache日志过程中常用到这样的一个命令，比如有很多按日期分的apache日志。
我们想传到hdfs中使用MepReduce来处理的话，我们只能用笨办法先本地合成大文件，然后上传这个大文件到hdfs，这种方法很低效。我们接下来给出一个程序，利用hdfs提供的JavaAPI来编写一个上传多个文件的过程中合成一个大文件的程序：
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class putMerge {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]);
Path hdfsFile = new Path(args[1]);
FileStatus[] inputFiles = local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);
for (int i = 0; i < inputFiles.length; i++) {
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = local.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) > 0) {
out.write(buffer, 0, bytesRead);
in.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
一般为了方便，我们可以类似的给这个程序起个别名，为了连贯性，我这里把该程序打包放在$HADOOP_HOME/bin/putMerge.jar这个路径下，然后仿照说的起别名规则，在/etc/profile里边添加一条新的别名：
alias hputm=’hadoop jar $HADOOP_HOME/bin/putMerge.jar
putMerge’
然后这个命令的使用方法就是：
hputm input(本地目录名) hdfsoutputfilename
三、有时候我们想合并hdfs中的文件，并存在hdfs里，又不想经过下载到local文件系统里这一过程，我们可以书写这样的程序，并且实现递归合并：
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class filesmerge {
public static boolean isRecur = false;
public static void merge(Path inputDir, Path hdfsFile, FileSystem hdfs,FSDataOutputStream out) {
FileStatus[] inputFiles = hdfs.listStatus(inputDir);
for (int i = 0; i & inputFiles.length; i++) {
if (!hdfs.isFile(inputFiles[i].getPath())) {
if (isRecur){
merge(inputFiles[i].getPath(), hdfsFile, hdfs,out);
System.out.println(inputFiles[i].getPath().getName()
+ "is not file and not allow recursion, skip!");
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = hdfs.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) & 0) {
out.write(buffer, 0, bytesRead);
in.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
public static void errorMessage(String str) {
System.out.println("Error Message: " + str);
System.exit(1);
public static void main(String[] args) throws IOException {
if (args.length == 0)
errorMessage("filesmerge [-r|-R] &hdfsTargetDir& &hdfsFileName&");
if (args[0].matches("^-[rR]$")) {
isRecur = true;
if ((isRecur && args.length != 3) || ( !isRecur && args.length != 2)) {
errorMessage("filesmerge [-r|-R] &hdfsTargetDir& &hdfsFileName&");
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
Path inputDir;
Path hdfsFile;
if(isRecur){
inputDir = new Path(args[1]);
hdfsFile = new Path(args[2]);
inputDir = new Path(args[0]);
hdfsFile = new Path(args[1]);
if (!hdfs.exists(inputDir)) {
errorMessage("hdfsTargetDir not exist!");
if (hdfs.exists(hdfsFile)) {
errorMessage("hdfsFileName exist!");
FSDataOutputStream out = hdfs.create(hdfsFile);
merge(inputDir, hdfsFile, hdfs,out);
System.exit(0);
ps：第四部分待续。。。四、更不幸的是我们经常遇到的并非正常的文本文件，因为直接存储文本文件比较浪费空间，所以大部分服务器运维人员针对该类日志文件都是进行压缩打包存放的，所以我们有时候，或者说更多情况下需要的是对大量压缩包进行解压缩合并上传到hdfs的命令，为了方便我们同样只能自己搞生产了：第二篇这一篇主要介绍了一些java程序读取hdfs数据的方法。一、先模仿hadoop fs
-cat命令来写一个简单的读取HDFS中文件数据的小程序：
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLcat{
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
public static void main(String[] args) throws Exception {
InputStream in = null;
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4086, false);
} finally {
IOUtils.closeStream(in);
使用的时候的命令是：
hadoop jar URLcat.jar URLcat
hdfs://localhost:9000/user/hadoop/test.txthadoop jar URLcat.jar URLcat
file:///usr/home/hadoop/test.txt
可以读取hdfs里的，当然也可以读取本地文件系统里的内容，不过不能指定读取多个文件，可以对程序进行循环读取的修改。还有一个弊端是Java虚拟机只能调用URL中的setURLStreamHandlerFactory方法一次，所以其他java程序不能再使用该方法读取数据了。改进程序看第二例。
二、改进hdfs数据读取程序：
import java.io.InputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class HDFScat{
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
InputStream in = null;
in = fs.open(new Path(args[0]));
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
使用命令如下：
hadoop jar HDFScat.jar HDFScat test.txt
这里的命令不用再输入URI，是因为Configuration实例化的时候读取的配置信息，指定的配置的文件系统。所以这时候也不能再重新指定其他的文件系统，如本地的
file:/// 这样的参数。
三、实际上FileSystem对象的open方法返回的是FSDataInputStream对象，不是标准的java.io类，它继承了java.io.DataInputStream接口，并支持随机访问，因此可从数据流任意处读取数据：
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class doubleCat{
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FSDataInputStream in = null;
in = fs.open(new Path(args[0]));
IOUtils.copyBytes(in, System.out, 4096, false);
in.seek(0);
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
使用命令同上例，使用FSDataInputStream对象的seek方法可以重新定位数据流的输出位置。这里FSDataInputStream也实现了PositionedReadable接口，实现了方法read(position,
buffer, offset,
length)等方法。这里的read方法是从position位置开始至多读取length个字符存入缓冲区buffer的指定偏移量offset处。这里需注意seek是高开销的方法。第三篇这篇主要说一下java编程对HDFS里的文件进行创建、删除、查询等操作。一、之前的一篇里有提到如何创建文件，这里简单再说一下代码：
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.create(new Path(hdfsPath));
create方法有多种重载，详细情况看API文档。
二、创建目录的样例如下：
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(new Path(hdfsPath));
mkdirs方法有多种重载，详细情况看API文档。和上边的create方法一样，都会根据path建立相应的文件或目录，如果父级目录不存在，则自动创建。如果这并非你所期望的，需要先对路径中的各级目录进行判断。
三、检查目录或文件是否存在：
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.exists(new Path(hdfsPath));
四、查看文件系统中文件元数据,包含文件长度、块大小、备份、修改时间、所有者以及权限信息：
public class getStatus {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FileStatus stat = fs.getFileStatus(new Path(args[0]));
System.out.print(stat.getAccessTime()+" "+stat.getBlockSize()+" "+stat.getGroup()
+" "+stat.getLen()+" "+stat.getModificationTime()+" "+stat.getOwner()
+" "+stat.getReplication()+" "+stat.getPermission()
FileStatus有一个isDir()方法，能够判断是否为目录或是否存在，如果判断是否存在使用exists方法比较方便。
五、查看目录列表:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
public class getPaths {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FileStatus[] statu = fs.listStatus(new Path(args[0]));
Path [] listPaths=FileUtil.stat2Paths(statu);
for(Path p:listPaths){
System.out.println(p);
主要使用的是FileSystem对象的listStatus()方法，有多个重载，可以传入一个Paht数组，同时查询多个给的路径。如果需要查询子目录的路径，需要另行写一个函数做递归调用，比较简单就不再另外写了。
六、删除文件和目录：使用的是FileSystem对象的delete(Path f,boolean
recursive)方法，布尔值设置为true时，才会删除一个目录。
七、文件模式。细心的可能已经尝试过了，以上的一些程序是不适用*、[]等通配符的传参的。FileSystem对象提供有globStatus()方法可以接受含有通配符的参数。
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
public class pathFilter implements PathFilter{
private final String regex;
public pathFilter (String regex){
this.regex=regex;
public boolean accept(Path path) {
return !path.toString().matches(regex);
//---------------------------------------------
public class regxList{
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FileStatus[] statu = fs.globStatus(new Path(args[0]), new pathFilter ("^2007"));
Path [] listPaths=FileUtil.stat2Paths(statu);
for(Path p:listPaths){
System.out.println(p);
这里顺便使用PathFilter，主要用来过滤通配符不需要匹配的内容。第四篇本篇主要写了hbase程序利用java API对hbase进行建数据表、插入数据、查询数据和删除数据表的几个简单操作，程序如下：
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class test {
public static void main(String[] args) throws IOException {
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(config);
HTableDescriptor htd = new HTableDescriptor("test");
HColumnDescriptor hcd = new HColumnDescriptor("data");
htd.addFamily(hcd);
admin.createTable(htd);
byte[] tablename = htd.getName();
HTableDescriptor[] tables = admin.listTables();
if (tables.length != 1 && Bytes.equals(tablename, tables[0].getName())) {
throw new IOException("Faild create of table");
HTable table=new HTable(config,tablename);
byte[] row1 = Bytes.toBytes("row1");
Put p1=new Put(row1);
byte[] databytes = Bytes.toBytes("data");
p1.add(databytes, Bytes.toBytes("1"), Bytes.toBytes("value1"));
table.put(p1);
Get g = new Get(row1);
Result result = table.get(g);
System.out.println("Get: "+result);
Scan scan=new Scan();
ResultScanner scanner=table.getScanner(scan);
for(Result scannerResult: scanner){
System.out.println("Scan: "+scannerResult);
}finally{scanner.close();}
admin.disableTable(tablename);
admin.deleteTable(tablename);
程序中首先创建一个Configuration实例，这个类会读入程序classpath下hbase-site.xml和hbase-default.xml文件中的hbase配置信息（看到这段我以为需要把这两个文件放入classpath路径下才可以，后来经过验证完全没必要，使用hbase
classname命令的情况下）。使用Configuration实例创建HBaseAdmin和HTable实例。HBaseAdmin用于管理HBase集群，添加和丢弃表。HTable则用于访问指定表。
使用HBaseAdmin实例创建一个名为test的只有一个列族data的表，然后的判断是测试是否创建成功。
使用HTable实例对表进行操作，使用Put对象把一个单元格的value1值放入row1的行的名为data:1的列上。列名通过两部分指定，列名(Bytes.toBytes(“data”))和修饰词（Bytes.toBytes(“1″)）。。然后使用put方法插入Put对象即可。取值也很简单，不再说明。
删除一个表前需要先禁用表，这就是最后两句的作用。
阅读(2422)|
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
历史上的今天
在LOFTER的更多文章
loftPermalink:'',
id:'fks_',
blogTitle:'利用HDFS、HBase的JavaAPI编程',
blogAbstract:'第一篇这篇主要介绍利用hdfs接口，使用java编程向hdfs写入数据。一、模仿hadoop \nfs -put 和 -copyFromLoca命令，实现本地复制文件到hdfs：\n'
{list a as x}
{if x.moveFrom=='wap'}
{elseif x.moveFrom=='iphone'}
{elseif x.moveFrom=='android'}
{elseif x.moveFrom=='mobile'}
${a.selfIntro|escape}{if great260}${suplement}{/if}
{list a as x}
推荐过这篇日志的人：
{list a as x}
{if !!b&&b.length>0}
他们还推荐了：
{list b as y}
转载记录：
{list d as x}
{list a as x}
{list a as x}
{list a as x}
{list a as x}
{if x_index>4}{break}{/if}
${fn2(x.publishTime,'yyyy-MM-dd HH:mm:ss')}
{list a as x}
{if !!(blogDetail.preBlogPermalink)}
{if !!(blogDetail.nextBlogPermalink)}
{list a as x}
{if defined('newslist')&&newslist.length>0}
{list newslist as x}
{if x_index>7}{break}{/if}
{list a as x}
{var first_option =}
{list x.voteDetailList as voteToOption}
{if voteToOption==1}
{if first_option==false},{/if}&&“${b[voteToOption_index]}”&&
{if (x.role!="-1") },“我是${c[x.role]}”&&{/if}
&&&&&&&&${fn1(x.voteTime)}
{if x.userName==''}{/if}
网易公司版权所有&&
{list x.l as y}
{if defined('wl')}
{list wl as x}{/list}}

淘宝游戏网

是dfs仅是针对dfs.webhdfs.enabled的.那么什么时候用fs

我要回帖

更多关于 hdfs dfs mkdir 的文章

更多推荐