Network

Hadoop Distributed File System – portable file system

The Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. It provides high-throughput access to application data, and similar functionality to that provided by the Google File System.

HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data.

Each node in a Hadoop instance typically has a single namenode; a cluster of datanodes form the HDFS cluster. The situation is typical because each node does not require a datanode to be present. Each datanode serves up blocks of data over the network using a block protocol specific to HDFS.

HDFS is designed to scale to tens of petabytes of storage and runs on top of the filesystems of the underlying operating systems. It is a sub-project of the Apache Hadoop project.

Key Features

  • Supports very large files.
  • Master/slave architecture.
  • Simple Coherency Model.
  • Data access via MapReduce streaming.
  • Easily portable from one platform to another.
  • Supports a traditional hierarchical file organization.
  • Designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size.
  • Blocks of a file are replicated for fault tolerance across multiple hosts, avoiding the need for RAID storage.
  • Safemode.
  • Persistence of File System Metadata.
  • HDFS communication protocols are layered on top of the TCP/IP protocol.
  • Compatible with data rebalancing schemes.
  • Checksum checking on the contents of HDFS files.
  • Snapshots.

Website: hadoop.apache.org
Support: Users Guide
Developer: The Apache Software Foundation
License: Apache License 2.0

HDFS is written in Java. Learn Java with our recommended free books and free tutorials.


Related Software

File Systems
HDFSDistributed file system providing high-throughput access
SeaweedFSSimple and highly scalable distributed file system
LustreFile system for computer clusters
CephFSUnified, distributed storage system
AlluxioVirtual distributed file system
GlusterFSScale-out NAS file system
JuiceFSDistributed POSIX file system
XtreemFSObject-based, distributed file system for wide area networks
MooseFSPOSIX-compliant distributed file system
Quantcast File SystemHigh-performance, fault-tolerant, distributed file system
OrangeFSMulti-server scalable parallel file system
LeilFSDistributed POSIX file system

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted