HDFS encryption zone
For transparent encryption, we introduce a new abstraction to HDFS: the encryption zone. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. Each encryption zone is associated with a single encryption zone key which is specified when the zone is created. Each file within an encryption zone has its own unique data encryption key (DEK). DEKs are never handled directly by HDFS. Instead, HDFS only ever handles an encrypted data encryption key (EDEK). Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. HDFS datanodes simply see a stream of encrypted bytes.
A new cluster service is required to manage encryption keys: the Hadoop Key Management Server (KMS). In the context of HDFS encryption, the KMS performs three basic responsibilities:
- Providing access to stored encryption zone keys
- Generating new encrypted data encryption keys for storage on the NameNode
- Decrypting encrypted data encryption keys for use by HDFS clients
- HDFS Encryption Overview
HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.
HDFS encryption involves several elements:
- Encryption key: A new level of permission-based access protection, in addition to standard HDFS permissions.
- HDFS encryption zone: A special HDFS directory within which all data is encrypted upon write, and decrypted upon read.
- Each encryption zone is associated with an encryption key that is specified when the zone is created.
- Each file within an encryption zone has a unique encryption key, called the “data encryption key” (DEK).
- HDFS does not have access to DEKs. HDFS DataNodes only see a stream of encrypted bytes. HDFS stores “encrypted data encryption keys” (EDEKs) as part of the file’s metadata on the NameNode.
- Clients decrypt an EDEK and use the associated DEK to encrypt and decrypt data during write and read operations.
- Ranger Key Management Service (Ranger KMS): An open source key management service based on Hadoop’s KeyProvider API.
For HDFS encryption, the Ranger KMS has three basic responsibilities:
- Provide access to stored encryption zone keys.
- Generate and manage encryption zone keys, and create encrypted data keys to be stored in Hadoop.
- Audit all access events in Ranger KMS.
To test the command
LAYERS OF DEFENSE FOR A HADOOP CLUSTER
- Perimeter Level Security – Network Security, Apache Knox (gateway)
- Authentication : Kerberos
- OS Security : encryption of data in network and hdfs