2019-01-07

hadoopコマンドでHDFSからS3へデータコピー

Hadoop

事前準備

AWSのIAMでS3に書き込み可能なアクセスキーとシークレットキーを発行しておいてください。

概要

hadoopコマンドでHDFSからAWSのS3へデータコピーする例です。

コマンド例のディレクトリ名などは以下のようになっていますので読み替えてください。

HDFS ディレクトリ	hdfsdir
S3バケット名	s3bucket
S3ディレクトリ	s3dir
S3アクセスキー	S3_ACCESS_KEY_ID
S3シークレットキー	S3_SECRET_ACCESS_KEY

コマンド例

`distcp`の場合

hadoop distcp \
hdfs:///hdfsdir \
s3a://S3_ACCESS_KEY_ID:S3_SECRET_ACCESS_KEY@s3bucket/s3dir

`fs -cp`の場合

hadoop fs -cp \
hdfs:///hdfsdir \
s3a://S3_ACCESS_KEY_ID:S3_SECRET_ACCESS_KEY@s3bucket/s3dir

TIPS

Hadoopのバージョンの違いから↓の書き方じゃないと構文エラーになる時があります。

上記コマンドでエラーになる場合は以下を試してください。

hadoop distcp \
-Dfs.s3a.access.key=S3_ACCESS_KEY_ID \
-Dfs.s3a.secret.key=S3_SECRET_ACCESS_KEY \
hdfs://hdfsdir s3a://s3bucket/s3dir

`distcp`と`fs -cp`の違い

distcpは分散コピー
fs -cpは単体サーバーでコピー

データが多い場合distcpのほうが速く転送出来る
データが少ない場合fs -cpのほうが早い場合もあり

シークレットキーに`/`スラッシュが入るとエラーになる

シークレットキーにスラッシュが入ったら再作成するしかありません。

エラーは以下のような内容が出力されます。

With failures, global counters are inaccurate; consider running with -i
Copy failed: java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string
    at java.util.Objects.requireNonNull(Objects.java:228)
    at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:69)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:185)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1061)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:678)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:895)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:922)

参考

Copying Cluster Data Using DistCp | 5.8.x | Cloudera Documentation
HadoopとS3 - Qiita

2018-12-27

Ciscoスイッチのポートミラーリング設定

概要

Cisciスイッチでポートミラーリングを行い、パケットキャプチャを行った際にメモです。

空いているmonitorのsession番号を指定してミラーリングの設定を行います。

monitorの設定の確認

まずsession番号1がすでに使われていないか確認します。
以下コマンドで確認出来ます。

# show monitor session 1

下記のような出力ならsession 1は未使用なので使っても大丈夫です。

No SPAN configuration is present in the system for session [1].

monitorの設定

session番号1番を使用
GigabitEthernet1/0/1のパケットをGigabitEthernet1/0/2にミラーリング
filter設定でVLAN番号100番のパケットだけキャプチャ

# configure terminal
(config)# monitor session 1 source interface GigabitEthernet1/0/1
(config)# monitor session 1 filter vlan 100
(config)# monitor session 1 destination interface GigabitEthernet1/0/2

monitorの設定の確認

#show monitor session 1
Session 1
---------
Type                   : Local Session
Source Ports           :
    Both               : Gi1/0/1
Destination Ports      : Gi1/0/2
    Encapsulation      : Native
          Ingress      : Disabled
Filter VLANs           : 100

上記のようになっていれば正しく設定されています。

monitor設定の削除

不要になったら以下で設定を削除しておきます。

no monitor session 1

参考

パケットキャプチャ - Catalystスイッチの設定 2（SPAN / RSPAN）
SPAN（ミラーリングその3） CCNP実機で学ぶ

2018-12-19

Redisのslave-read-only設定の挙動

Redis

概要

Redisのレプリケーション設定にslave-read-only設定がある。
デフォルトはyesになっていてslaveにデータは書き込めないようになっている。
設定をnoにするとslaveにデータが書き込めるようになるがそのデータはそのslaveサーバーにしか存在しない状態になる。

その動きを確認した際のメモ。

検証した際のマスターとスレーブの構成は以下の通り

redis01	master
redis02	slave
redis03	slave

redis03(slave)

slave-read-only yesの状態

[tsunokawa@redis03 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>
127.0.0.1:6379> set testkey6 testvalue6
(error) READONLY You can't write against a read only slave.
127.0.0.1:6379>

yesからnoに変更

slave-read-only yes

↓

slave-read-only no

これでredis03(slave)に書き込みができるようになった。

本当に書き込みができるか試してみる。

[tsunokawa@redis03 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>
127.0.0.1:6379> set testkey6 testvalue6
OK
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
 6) "testkey6"
127.0.0.1:6379>

書き込めた。

redis01(master)

[tsunokawa@redis01 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>

redis02(slave)

[tsunokawa@redis02 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>

当然redi01(master)ともう一つのスレーブサーバーのredis02(slave)には更新は反映されない。

先ほど無理やり書き込みを行ったredis03(slave)だけにsetした値
testkey6:testvalue6
同じキーで
testkey6:testvalue777
と値だけ変えたものをredis01(master)でsetした場合、redis03(slave)にだけ存在しているtestkey6:testvalue6はどうなるのか確認した。

結果を先に書くとデータが上書きされた。

redis03(slave)

[tsunokawa@redis03 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
 6) "testkey6"
127.0.0.1:6379> get testkey6
"testvalue6"
127.0.0.1:6379>

redis01(master)

[tsunokawa@redis01 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>

testkey6はまだ無い

redis02(slave)

[tsunokawa@redis02 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
127.0.0.1:6379>

testkey6はまだ無い

この状態でredis01(master)から値をset

[tsunokawa@redis01 ~]$ redis-cli
127.0.0.1:6379> set testkey6 testvalue777
OK
127.0.0.1:6379> get testkey6
"testvalue777"
127.0.0.1:6379>

127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
 6) "testkey6"
127.0.0.1:6379>

レプリケーションされたため、redis02(slave)でも値が入った

[tsunokawa@redis02 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
 6) "testkey6"
127.0.0.1:6379> get testkey6
"testvalue777"
127.0.0.1:6379>

redis03(slave)は上書きされた。

[tsunokawa@redis03 ~]$ redis-cli
127.0.0.1:6379> KEYS *
 1) "testkey1"
 2) "testkey2"
 3) "testkey3"
 4) "testkey4"
 5) "testkey5"
 6) "testkey6"
127.0.0.1:6379>
127.0.0.1:6379> get testkey6
"testvalue777"
127.0.0.1:6379>

よって同じキーがsetされると上書きされる。
特にエラーが出たりはしない。

事前準備

概要

コマンド例

distcpの場合

fs -cpの場合

TIPS

Hadoopのバージョンの違いから↓の書き方じゃないと構文エラーになる時があります。

distcpとfs -cpの違い

シークレットキーに/スラッシュが入るとエラーになる

参考

概要

monitorの設定の確認

monitorの設定

monitorの設定の確認

monitor設定の削除

参考

概要

redis03(slave)

redis01(master)

redis02(slave)

redis03(slave)

redis01(master)

redis02(slave)

`distcp`の場合

`fs -cp`の場合

`distcp`と`fs -cp`の違い

シークレットキーに`/`スラッシュが入るとエラーになる