참고하세요.
Hadoop is an open-source, distributed processing framework that allows for the storage and processing of large data sets across a distributed network of commodity hardware. It was initially developed by Doug Cutting and Mike Cafarella in 2005 and is maintained by the Apache Software Foundation.
Hadoop's core components include:
Hadoop Distributed File System (HDFS): HDFS is a distributed file system that stores data across multiple machines in a cluster. It is designed to provide high-throughput access to data and to be highly fault-tolerant.
MapReduce: MapReduce is a programming model and software framework for processing large data sets in a distributed environment. It is designed to simplify the development of distributed applications by providing an easy-to-use programming model.
YARN: YARN (Yet Another Resource Negotiator) is the resource management layer of Hadoop. It is responsible for managing the resources in a Hadoop cluster, including CPU, memory, and disk.
Hadoop is widely used in big data applications, where it provides a cost-effective solution for storing, processing, and analyzing large data sets. It has become the de facto standard for distributed data processing, and is used by many organizations, including Facebook, Yahoo, and LinkedIn.
Hadoop is designed to work on commodity hardware, which makes it an affordable solution for organizations that need to process large amounts of data. It can also be used on cloud infrastructure, such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
One of the key advantages of Hadoop is its ability to handle large volumes of data. This is achieved through distributed processing, where the data is broken down into smaller chunks and processed on multiple machines in parallel. This allows for much faster processing times than would be possible on a single machine.
Hadoop also provides fault tolerance, which means that it can continue to operate even if one or more machines in the cluster fail. This is achieved through data replication, where multiple copies of the data are stored across the cluster. If one machine fails, the data can be retrieved from another machine that has a copy of the data.
There are many tools and technologies built on top of Hadoop, including Apache Hive, Apache Pig, and Apache Spark. These tools provide additional functionality for data processing and analysis, and make it easier to work with data in a Hadoop environment.
Overall, Hadoop is a powerful and flexible framework for storing, processing, and analyzing large data sets. It has become a cornerstone of the big data ecosystem and is used by organizations of all sizes and industries to gain insights from their data.
Hadoop was originally developed by Doug Cutting and Mike Cafarella in 2005 as an open-source implementation of Google's MapReduce and Google File System (GFS) papers. Since then, it has grown into a mature ecosystem with a large and active community of developers and users.
In addition to its ability to handle large data volumes and provide fault tolerance, Hadoop also provides a high degree of scalability. The system can be easily scaled up or down by adding or removing nodes from the cluster, making it a flexible solution for organizations that need to process varying amounts of data.
Hadoop is used in a wide variety of applications, from scientific research and financial analysis to social media and online advertising. Some of the most common use cases include data warehousing, log processing, machine learning, and ETL (extract, transform, load) operations.
The Hadoop ecosystem includes a number of related projects, such as Apache HBase (a distributed, NoSQL database), Apache Kafka (a distributed streaming platform), and Apache Flink (a stream processing framework). These projects extend the capabilities of Hadoop and provide additional functionality for data processing and analysis.
Overall, Hadoop is a powerful and flexible framework for big data processing and analysis. Its ability to handle large data volumes and provide fault tolerance make it a popular choice for organizations that need to process and analyze massive amounts of data.
ㅇ 하둡 기초 정리
https://han-py.tistory.com/361
ㅇ 하둡
https://wikidocs.net/22654
ㅇ 하둡
https://velog.io/@ha0kim/2021-03-02
ㅇ 하둡
https://www.databricks.com/kr/glossary/hadoop
ㅇ HDFS 구조
https://wikidocs.net/23582
ㅇ 하둡 및 하둡의 구성요소
https://www.opentutorials.org/course/2908/17055
https://gachonyws.github.io/hadoop/hadoop3/