#onenote# hadoop security

HDFS encryption zone

For transparent encryption, we introduce a new abstraction to HDFS: the encryption zone. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. Each encryption zone is associated with a single encryption zone key which is specified when the zone is created. Each file within an encryption zone has its own unique data encryption key (DEK). DEKs are never handled directly by HDFS. Instead, HDFS only ever handles an encrypted data encryption key (EDEK). Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. HDFS datanodes simply see a stream of encrypted bytes.

 

A new cluster service is required to manage encryption keys: the Hadoop Key Management Server (KMS). In the context of HDFS encryption, the KMS performs three basic responsibilities:

  1. Providing access to stored encryption zone keys
  2. Generating new encrypted data encryption keys for storage on the NameNode
  3. Decrypting encrypted data encryption keys for use by HDFS clients

 

From <https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html>

 

 

 

  1. HDFS Encryption Overview

HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.

HDFS encryption involves several elements:

  • Encryption key: A new level of permission-based access protection, in addition to standard HDFS permissions.
  • HDFS encryption zone: A special HDFS directory within which all data is encrypted upon write, and decrypted upon read.
    • Each encryption zone is associated with an encryption key that is specified when the zone is created.
    • Each file within an encryption zone has a unique encryption key, called the “data encryption key” (DEK).
    • HDFS does not have access to DEKs. HDFS DataNodes only see a stream of encrypted bytes. HDFS stores “encrypted data encryption keys” (EDEKs) as part of the file’s metadata on the NameNode.
    • Clients decrypt an EDEK and use the associated DEK to encrypt and decrypt data during write and read operations.
  • Ranger Key Management Service (Ranger KMS): An open source key management service based on Hadoop’s KeyProvider API.
    For HDFS encryption, the Ranger KMS has three basic responsibilities:

    • Provide access to stored encryption zone keys.
    • Generate and manage encryption zone keys, and create encrypted data keys to be stored in Hadoop.
    • Audit all access events in Ranger KMS.

 

From <https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/hdfs-encryption-overview.html>

 

To test the command

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_hdfs_encryption_keys_zones.html

 

 

Security design

LAYERS OF DEFENSE FOR A HADOOP CLUSTER

  • Perimeter Level Security – Network Security, Apache Knox (gateway)
  • Authentication : Kerberos
  • Authorization
  • OS Security : encryption of data in network and hdfs

 

Apache Knox Gateway 项目的目的是为了简化和标准化发布和实现安全的 Hadoop 集群,通过集中式的 REST APIs 访问服务。

Knox拓展了Hadoop的安全边界,实现了与LDAP、用于证书管理的活动目录等框架进行了充分整合,为跨Hadoop和所有相关项 目的授权提供了一个通用服务。

The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters.

The Knox Gateway provides a single access point for all REST interactions with Apache Hadoop clusters.

In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise.

      • Authentication (LDAP and Active Directory Authentication Provider)
      • Federation/SSO (HTTP Header Based Identity Federation)
      • Authorization (Service Level Authorization)
      • Auditing

Coupled with proper network isolation of a Kerberos secured Apache Hadoop cluster,

the Knox Gateway provides the enterprise with a solution that:

      • Integrates well with enterprise identity management solutions
      • Protects the details of the cluster deployment (hosts and ports are hidden from endusers)
      • Simplifies the number of services that clients need to interact with

Machine generated alternative text:
APACH E
KNOX
奝
Federation/SSO
Service Level
Auditing Authorization
webhdtsi nterna 1:50070
templeton.internal :50111
stargate.internal:60080
oozie.internal:11000
namenodeinternal :8020
jobtracker:internal:8050
_ ?WL
P%KPTi LOnmOh
H? ir/LIEN!
https://nosr:8扞抜sI atewayimyciustei
JSON/XMLITEXT

From <https://knox.apache.org/>

WHAT IS APACHE KNOX?

The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache™ Hadoop® services. It provides the following features:

      • Single REST API Access Point
      • Centralized authentication, authorization and auditing for Hadoop REST/HTTP services
      • LDAP/AD Authentication, Service Authorization and Audit
      • Eliminates SSH edge node risks
      • Hides Network Topology

From <http://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/>

Machine generated alternative text:
Amb&t Swdbox De*ibo渄
? HDFS Saiimary Ccnflgs
O MapReduce2
O YARN Summaiy
Tez t
O Hive
? HBase
Pig
Sqoop
o Oozie
O ZooKeeper
O Falcon
o Storni
O Flume
O An抌ari Melflcs
O Atlas
4rIKnox
Slider
O Spark
ci Zeppelin Notebouic
?
dmln
________?e___
O Stafl Demo LDAP

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s