[hadoop3.x]HDFS存储策略和冷热温三阶段数据存储(六)概述

举报
Maynor学长 发表于 2022/03/04 19:46:46 2022/03/04
【摘要】 目前博客Hadoop文章大都停留在Hadoop2.x阶段,本系列将依据黑马程序员大数据Hadoop3.x全套教程,对2.x没有的新特性进行补充更新,一键三连加关注,下次不迷路!

<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>[hadoop3.x]HDFS存储策略和冷热温三阶段数据存储(六)概述</title>
  <link rel="stylesheet" href="https://stackedit.io/style.css" />
</head>

<body class="stackedit">
  <div class="stackedit__html"><p></p><div class="toc"><h3>文章目录</h3><ul><ul><li><a href="#_1">前言</a></li><li><a href="#_8">历史文章</a></li><li><a href="#_11___32">🍑 1.1  💃存储策略命令💃</a></li><ul><ul><li><a href="#1___34">1  💃列出存储策略💃</a></li><li><a href="#2___49">2  💃设置存储策略💃</a></li><li><a href="#3___63">3  💃取消存储策略💃</a></li><li><a href="#4___75">4  💃获取存储策略💃</a></li></ul></ul><li><a href="#21___85">🍑2.1  💃冷热温三阶段数据存储💃</a></li><ul><ul><li><a href="#1__DataNode_94">1  💃配置DataNode存储目录💃</a></li><li><a href="#2___141">2  💃配置策略💃</a></li><li><a href="#3___207">3  💃上传测试💃</a></li></ul></ul><li><a href="#_249">后记</a></li></ul></ul></div><p></p>
<h2><a id="_1"></a>前言</h2>
<blockquote>
<p>目前博客Hadoop文章大都停留在Hadoop2.x阶段,本系列将依据黑马程序员大数据Hadoop3.x全套教程,对2.x没有的新特性进行补充更新,一键三连加关注,下次不迷路!</p>
</blockquote>
<h2><a id="_8"></a>历史文章</h2>
<p><a href="https://manor.blog.csdn.net/article/details/120521427">[hadoop3.x系列]HDFS REST HTTP API的使用(一)WebHDFS<br>
</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120541104">[hadoop3.x系列]HDFS REST HTTP API的使用(二)HttpFS<br>
</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120559671">[hadoop3.x系列]Hadoop常用文件存储格式及BigData File Viewer工具的使用(三)</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120566839">✨[hadoop3.x]新一代的存储格式Apache Arrow(四)<br>
</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120629086">[hadoop3.x]HDFS存储类型和存储策略(五)概述<br>
</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120640406">[hadoop3.x]HDFS存储策略和冷热温三阶段数据存储(六)概述<br>
</a></p>
<p><a href="https://manor.blog.csdn.net/article/details/120654013">[hadoop3.x]HDFS中的内存存储支持(七)概述<br>
</a></p>
<h2><a id="_11___32"></a>🍑 1.1  💃存储策略命令💃</h2>
<h4><a id="1___34"></a>1  💃列出存储策略💃</h4>
<p>列出所有存储策略。</p>
<p>命令:</p>
<pre><code>[root@node1 Examples]# 💃hdfs storagepolicies -listPolicies💃
Block Storage Policies:    BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], creationFallbacks=[PROVIDED, DISK], replicationFallbacks=[PROVIDED, DISK]}    BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}    BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}    BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}    BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}    BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}    BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}

hdfs storagepolicies -listPolicies
</code></pre>
<h4><a id="2___49"></a>2  💃设置存储策略💃</h4>
<p>给一个文件或目录设置存储策略</p>
<p>hdfs storagepolicies -setStoragePolicy -path <path> -policy </path></p>
<p>参数:</p>

<table>
<thead>
<tr>
<th>-path <path></path></th>
<th>引用目录或文件的路径</th>
</tr>
</thead>
<tbody>
<tr>
<td>-policy </td>
<td>存储策略的名称</td>
</tr>
</tbody>
</table><h4><a id="3___63"></a>3  💃取消存储策略💃</h4>
<p>取消文件或目录的存储策略。在执行unset命令之后,将应用当前目录最近的祖先存储策略,如果没有任何祖先的策略,则将应用默认的存储策略。</p>
<p>hdfs storagepolicies -unsetStoragePolicy -path <path></path></p>
<p>参数:</p>

<table>
<thead>
<tr>
<th>-path <path></path></th>
<th>引用目录或文件的路径</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table><h4><a id="4___75"></a>4  💃获取存储策略💃</h4>
<p>获取文件或目录的存储策略。</p>
<p>hdfs storagepolicies -getStoragePolicy -path <path></path></p>

<table>
<thead>
<tr>
<th>-path <path></path></th>
<th>引用目录或文件的路径。</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table><h2><a id="21___85"></a>🍑2.1  💃冷热温三阶段数据存储💃</h2>
<p>为了更加充分的利用存储资源,我们可以将数据分为冷、热、温三个阶段来存储。</p>

<table>
<thead>
<tr>
<th>/data/hdfs-test/data_phase/hot</th>
<th>热阶段数据</th>
</tr>
</thead>
<tbody>
<tr>
<td>/data/hdfs-test/data_phase/warm</td>
<td>温阶段数据</td>
</tr>
<tr>
<td>/data/hdfs-test/data_phase/cold</td>
<td>冷阶段数据</td>
</tr>
</tbody>
</table><h4><a id="1__DataNode_94"></a>1  💃配置DataNode存储目录💃</h4>
<p>为了能够支撑不同类型的数据,我们需要在hdfs-site.xml中配置不同存储类型数据的位置。</p>
<p>进入到Hadoop配置目录,编辑hdfs-site.xml</p>
<pre><code>cd /export/server/hadoop-3.1.4/etc/hadoop
vim hdfs-site.xml
&lt;property&gt;
  &lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;
  &lt;value&gt;[DISK]file:///export/server/hadoop-3.1.4/data/datanode,[ARCHIVE]file:///export/server/hadoop-3.1.4/data/archive&lt;/value&gt;
  &lt;description&gt;DataNode存储名称空间和事务日志的本地文件系统上的路径&lt;/description&gt;
&lt;/property&gt;
</code></pre>
<p>分发到不同另外两个节点中</p>
<pre><code>scp hdfs-site.xml node2.itcast.cn:$PWD

scp hdfs-site.xml node3.itcast.cn:$PWD
</code></pre>
<p>重启HDFS集群</p>
<pre><code>stop-dfs.sh
start-dfs.sh
</code></pre>
<p>配置好后,我们在WebUI的Datanodes页面中点击任意一个DataNode节点</p>
<p>[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1HGUDOAb-1633607380830)(https://gitee.com/the_efforts_paid_offf/picture-blog/raw/master/img/20211006210405.jpg)]</p>
<p>[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VHsmIDyH-1633607380831)(https://gitee.com/the_efforts_paid_offf/picture-blog/raw/master/img/20211006210409.jpg)]</p>
<p>可以看到,现在配置的是两个目录,一个StorageType为ARCHIVE、一个Storage为DISK。</p>
<h4><a id="2___141"></a>2  💃配置策略💃</h4>
<p>创建测试目录结构</p>
<pre><code>hdfs dfs -mkdir -p /data/hdfs-test/data_phase/hot
hdfs dfs -mkdir -p /data/hdfs-test/data_phase/warm
hdfs dfs -mkdir -p /data/hdfs-test/data_phase/cold
</code></pre>
<ol start="2">
<li>查看当前HDFS支持的存储策略</li>
</ol>
<pre><code>[root@node1 Examples]# hdfs storagepolicies -listPolicies
Block Storage Policies:
BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], creationFallbacks=[PROVIDED, DISK], replicationFallbacks=[PROVIDED, DISK]}
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
</code></pre>
<ol start="3">
<li>分别设置三个目录的存储策略</li>
</ol>
<pre><code>hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/hot -policy HOT
hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/warm -policy WARM
hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/cold -policy COLD
</code></pre>
<ol start="4">
<li>查看三个目录的存储策略</li>
</ol>
<pre><code>hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/hot
hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/warm 
hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/cold 
</code></pre>
<pre><code>[root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/hot
The storage policy of /data/hdfs-test/data_phase/hot:
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

[root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/warm 
The storage policy of /data/hdfs-test/data_phase/warm:
BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}

[root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/cold 
The storage policy of /data/hdfs-test/data_phase/cold:
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
</code></pre>
<h4><a id="3___207"></a>3  💃上传测试💃</h4>
<p>分别上传文件到三个目录中测试</p>
<pre><code>hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/hot
hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/warm
hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/cold
</code></pre>
<p>查看不同存储策略文件的block位置</p>
<pre><code>[root@node1 hadoop]# hdfs fsck /data/hdfs-test/data_phase/hot/profile -files -blocks -locations
Connecting to namenode via http://node1.itcast.cn:9870/fsck?ugi=root&amp;files=1&amp;blocks=1&amp;locations=1&amp;path=%2Fdata%2Fhdfs-test%2Fdata_phase%2Fhot%2Fprofile
FSCK started by root (auth:SIMPLE) from /192.168.88.100 for path /data/hdfs-test/data_phase/hot/profile at Sun Oct 11 22:03:05 CST 2020

/data/hdfs-test/data_phase/hot/profile 3158 bytes, replicated: replication=3, 1 block(s):  OK
0. BP-538037512-192.168.88.100-1600884040401:blk_1073742535_1750 len=3158 Live_repl=3  [DatanodeInfoWithStorage[192.168.88.101:9866,DS-96feb29a-5dfd-4692-81ea-9e7f100166fe,DISK], DatanodeInfoWithStorage[192.168.88.100:9866,DS-79739be9-5f9b-4f96-a005-aa5b507899f5,DISK], DatanodeInfoWithStorage[192.168.88.102:9866,DS-e28af2f2-21ae-4aa6-932e-e376dd04ddde,DISK]]

hdfs fsck /data/hdfs-test/data_phase/warm/profile -files -blocks -locations

/data/hdfs-test/data_phase/warm/profile 3158 bytes, replicated: replication=3, 1 block(s):  OK
0. BP-538037512-192.168.88.100-1600884040401:blk_1073742536_1751 len=3158 Live_repl=3  [DatanodeInfoWithStorage[192.168.88.102:9866,DS-636f34a0-682c-4d1b-b4ee-b4c34e857957,ARCHIVE], DatanodeInfoWithStorage[192.168.88.101:9866,DS-ff6970f8-43e0-431f-9041-fc440a44fdb0,ARCHIVE], DatanodeInfoWithStorage[192.168.88.100:9866,DS-79739be9-5f9b-4f96-a005-aa5b507899f5,DISK]]


hdfs fsck /data/hdfs-test/data_phase/cold/profile -files -blocks -locations
/data/hdfs-test/data_phase/cold/profile 3158 bytes, replicated: replication=3, 1 block(s):  OK
0. BP-538037512-192.168.88.100-1600884040401:blk_1073742537_1752 len=3158 Live_repl=3  [DatanodeInfoWithStorage[192.168.88.102:9866,DS-636f34a0-682c-4d1b-b4ee-b4c34e857957,ARCHIVE], DatanodeInfoWithStorage[192.168.88.101:9866,DS-ff6970f8-43e0-431f-9041-fc440a44fdb0,ARCHIVE], DatanodeInfoWithStorage[192.168.88.100:9866,DS-ca9759a0-f6f0-4b8b-af38-d96f603bca93,ARCHIVE]]
</code></pre>
<p>我们可以看到:</p>
<blockquote>
<p>hot目录中的block,3个block都在DISK磁盘</p>
<p>warm目录中的block,1个block在DISK磁盘,另外两个在archive磁盘</p>
<p>cold目录中的block,3个block都在archive磁盘</p>
</blockquote>
<h2><a id="_249"></a>后记</h2>
<p>📢博客主页:<a href="https://manor.blog.csdn.net">https://manor.blog.csdn.net</a><br>
📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正!<br>
📢本文由 manor 原创,首发于 CSDN博客🙉<br>
📢Hadoop系列文章会每天更新!✨</p>
</div>
</body>

</html>

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。