[컴퓨터과학과/참고] 하둡(Hadoop)

<p> </p><p>참고하세요.</p><p> </p><p> </p><p>Hadoop is an open-source, distributed processing framework that allows for the storage and processing of large data sets across a distributed network of commodity hardware. It was initially developed by Doug Cutting and Mike Cafarella in 2005 and is maintained by the Apache Software Foundation.</p><p style="text-align: start;">Hadoop's core components include:</p><ol style="list-style-type: decimal;" data-ke-list-type="decimal"><li><p>Hadoop Distributed File System (HDFS): HDFS is a distributed file system that stores data across multiple machines in a cluster. It is designed to provide high-throughput access to data and to be highly fault-tolerant.</p></li><li><p>MapReduce: MapReduce is a programming model and software framework for processing large data sets in a distributed environment. It is designed to simplify the development of distributed applications by providing an easy-to-use programming model.</p></li><li><p>YARN: YARN (Yet Another Resource Negotiator) is the resource management layer of Hadoop. It is responsible for managing the resources in a Hadoop cluster, including CPU, memory, and disk.</p></li></ol><p style="text-align: start;"> </p><p style="text-align: start;">Hadoop is widely used in big data applications, where it provides a cost-effective solution for storing, processing, and analyzing large data sets. It has become the de facto standard for distributed data processing, and is used by many organizations, including Facebook, Yahoo, and LinkedIn.</p><p> </p><p> </p><p>Hadoop is designed to work on commodity hardware, which makes it an affordable solution for organizations that need to process large amounts of data. It can also be used on cloud infrastructure, such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform.</p><p style="text-align: start;">One of the key advantages of Hadoop is its ability to handle large volumes of data. This is achieved through distributed processing, where the data is broken down into smaller chunks and processed on multiple machines in parallel. This allows for much faster processing times than would be possible on a single machine.</p><p style="text-align: start;">Hadoop also provides fault tolerance, which means that it can continue to operate even if one or more machines in the cluster fail. This is achieved through data replication, where multiple copies of the data are stored across the cluster. If one machine fails, the data can be retrieved from another machine that has a copy of the data.</p><p style="text-align: start;">There are many tools and technologies built on top of Hadoop, including Apache Hive, Apache Pig, and Apache Spark. These tools provide additional functionality for data processing and analysis, and make it easier to work with data in a Hadoop environment.</p><p style="text-align: start;">Overall, Hadoop is a powerful and flexible framework for storing, processing, and analyzing large data sets. It has become a cornerstone of the big data ecosystem and is used by organizations of all sizes and industries to gain insights from their data.</p><p> </p><p> </p><p>Hadoop was originally developed by Doug Cutting and Mike Cafarella in 2005 as an open-source implementation of Google's MapReduce and Google File System (GFS) papers. Since then, it has grown into a mature ecosystem with a large and active community of developers and users.</p><p style="text-align: start;">In addition to its ability to handle large data volumes and provide fault tolerance, Hadoop also provides a high degree of scalability. The system can be easily scaled up or down by adding or removing nodes from the cluster, making it a flexible solution for organizations that need to process varying amounts of data.</p><p style="text-align: start;">Hadoop is used in a wide variety of applications, from scientific research and financial analysis to social media and online advertising. Some of the most common use cases include data warehousing, log processing, machine learning, and ETL (extract, transform, load) operations.</p><p style="text-align: start;">The Hadoop ecosystem includes a number of related projects, such as Apache HBase (a distributed, NoSQL database), Apache Kafka (a distributed streaming platform), and Apache Flink (a stream processing framework). These projects extend the capabilities of Hadoop and provide additional functionality for data processing and analysis.</p><p style="text-align: start;">Overall, Hadoop is a powerful and flexible framework for big data processing and analysis. Its ability to handle large data volumes and provide fault tolerance make it a popular choice for organizations that need to process and analyze massive amounts of data.</p><p> </p><p> </p><p> </p><p>ㅇ 하둡 기초 정리</p><p><a href="https://han-py.tistory.com/361" target="_blank" class="ke-link">https://han-py.tistory.com/361</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="하둡(Hadoop) 기초 정리" data-og-description="하둡에 대해 알아보기 전에 큰 흐름에서의 하둡에 대해 이해를 해보자. 하둡은 기본적으로 빅데이터를 처리하는 과정 속에서 사용되어진다. 빅데이터를 처리하는 흐름으로는 우선 데이터를 수" data-og-host="han-py.tistory.com" data-og-source-url="https://han-py.tistory.com/361" data-og-url="https://han-py.tistory.com/361" data-og-image="https://scrap.kakaocdn.net/dn/jAoJM/hyStKcJNLx/Yo5ez69oi0k2OLWAPtige0/img.png?width=657&height=387&face=0_0_657_387,https://scrap.kakaocdn.net/dn/biQnUB/hyStX4boPj/ICMQOGMw4IUtv2ANvHOPhK/img.png?width=657&height=387&face=0_0_657_387,https://scrap.kakaocdn.net/dn/bAHtB7/hyStN8jIat/rS8S5ZeuC9txuP1EsC61V1/img.png?width=769&height=555&face=0_0_769_555"><a href="https://han-py.tistory.com/361" target="_blank" data-source-url="https://han-py.tistory.com/361"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/jAoJM/hyStKcJNLx/Yo5ez69oi0k2OLWAPtige0/img.png?width=657&height=387&face=0_0_657_387,https://scrap.kakaocdn.net/dn/biQnUB/hyStX4boPj/ICMQOGMw4IUtv2ANvHOPhK/img.png?width=657&height=387&face=0_0_657_387,https://scrap.kakaocdn.net/dn/bAHtB7/hyStN8jIat/rS8S5ZeuC9txuP1EsC61V1/img.png?width=769&height=555&face=0_0_769_555)"></div><div class="og-text"><p class="og-title">하둡(Hadoop) 기초 정리</p><p class="og-desc">하둡에 대해 알아보기 전에 큰 흐름에서의 하둡에 대해 이해를 해보자. 하둡은 기본적으로 빅데이터를 처리하는 과정 속에서 사용되어진다. 빅데이터를 처리하는 흐름으로는 우선 데이터를 수</p><p class="og-host">han-py.tistory.com</p></div></a></div><p> </p><p>ㅇ 하둡</p><p><a href="https://wikidocs.net/22654" target="_blank" class="ke-link">https://wikidocs.net/22654</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="1-하둡이란?" data-og-description="하둡은 2006년 야후의 더그 커팅이 '넛치'라는 검색엔진을 개발하는 과정에서 대용량의 비정형 데이터를 기존의 RDB 기술로는 처리가 힘들다는 것을 깨닫고, 새로운 기술을 찾는 …" data-og-host="wikidocs.net" data-og-source-url="https://wikidocs.net/22654" data-og-url="https://wikidocs.net/22654" data-og-image="https://scrap.kakaocdn.net/dn/eaIRYw/hyStR3XTFi/D0vgpRSRW8dRYAs0UIfWXk/img.png?width=100&height=117&face=0_0_100_117"><a href="https://wikidocs.net/22654" target="_blank" data-source-url="https://wikidocs.net/22654"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/eaIRYw/hyStR3XTFi/D0vgpRSRW8dRYAs0UIfWXk/img.png?width=100&height=117&face=0_0_100_117)"></div><div class="og-text"><p class="og-title">1-하둡이란?</p><p class="og-desc">하둡은 2006년 야후의 더그 커팅이 '넛치'라는 검색엔진을 개발하는 과정에서 대용량의 비정형 데이터를 기존의 RDB 기술로는 처리가 힘들다는 것을 깨닫고, 새로운 기술을 찾는 …</p><p class="og-host">wikidocs.net</p></div></a></div><p> </p><p> </p><p>ㅇ 하둡</p><p><a href="https://velog.io/@ha0kim/2021-03-02" target="_blank" class="ke-link">https://velog.io/@ha0kim/2021-03-02</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="하둡(Hadoop)" data-og-description="이번 주 토픽 나갑니다~토픽 : 하둡1\. 하둡의 개념2\. 하둡의 아키텍처3\. 하둡의 구성 요소와 역할" data-og-host="velog.io" data-og-source-url="https://velog.io/@ha0kim/2021-03-02" data-og-url="https://velog.io/@ha0kim/2021-03-02" data-og-image="https://scrap.kakaocdn.net/dn/dKvKvt/hyStVL29TD/OCTcAVDZwc7fOKvdKNv7s1/img.png?width=580&height=264&face=0_0_580_264,https://scrap.kakaocdn.net/dn/ttt5O/hyStLJrPmG/qpF7CDghMMIxf3vDZfmfKK/img.png?width=580&height=264&face=0_0_580_264,https://scrap.kakaocdn.net/dn/hbi8S/hyStQqtarS/k8WCtosmEFIHx5wgL7CMPK/img.png?width=1372&height=807&face=0_0_1372_807"><a href="https://velog.io/@ha0kim/2021-03-02" target="_blank" data-source-url="https://velog.io/@ha0kim/2021-03-02"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/dKvKvt/hyStVL29TD/OCTcAVDZwc7fOKvdKNv7s1/img.png?width=580&height=264&face=0_0_580_264,https://scrap.kakaocdn.net/dn/ttt5O/hyStLJrPmG/qpF7CDghMMIxf3vDZfmfKK/img.png?width=580&height=264&face=0_0_580_264,https://scrap.kakaocdn.net/dn/hbi8S/hyStQqtarS/k8WCtosmEFIHx5wgL7CMPK/img.png?width=1372&height=807&face=0_0_1372_807)"></div><div class="og-text"><p class="og-title">하둡(Hadoop)</p><p class="og-desc">이번 주 토픽 나갑니다~토픽 : 하둡1\. 하둡의 개념2\. 하둡의 아키텍처3\. 하둡의 구성 요소와 역할</p><p class="og-host">velog.io</p></div></a></div><p> </p><p>ㅇ 하둡</p><p><a href="https://www.databricks.com/kr/glossary/hadoop" target="_blank" class="ke-link">https://www.databricks.com/kr/glossary/hadoop</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="What is Hadoop? – Databricks" data-og-description="하둡이란 무엇입니까? "하둡"이란 무엇을 의미할까요? 더 중요한 것은, "하둡"은 무엇의 약자일까요? 사실, 고가용성 분산형 객체 지향적 플랫폼(High Availability Distributed Object Oriented Platform)을 뜻합" data-og-host="www.databricks.com" data-og-source-url="https://www.databricks.com/kr/glossary/hadoop" data-og-url="https://www.databricks.com/kr/glossary/hadoop" data-og-image="https://scrap.kakaocdn.net/dn/dMdbT8/hyStYaWBKh/dMdW9NKkrt6hRKklTvBmaK/img.png?width=1200&height=630&face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cd7Fgm/hyStRJFFZ0/pSv4eJNr0xNMr6sPnmRuuK/img.png?width=1200&height=630&face=0_0_1200_630,https://scrap.kakaocdn.net/dn/yTsDm/hyStOF93FI/IvN656xmlmIODS2ky9kgc1/img.png?width=768&height=2432&face=0_0_768_2432"><a href="https://www.databricks.com/kr/glossary/hadoop" target="_blank" data-source-url="https://www.databricks.com/kr/glossary/hadoop"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/dMdbT8/hyStYaWBKh/dMdW9NKkrt6hRKklTvBmaK/img.png?width=1200&height=630&face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cd7Fgm/hyStRJFFZ0/pSv4eJNr0xNMr6sPnmRuuK/img.png?width=1200&height=630&face=0_0_1200_630,https://scrap.kakaocdn.net/dn/yTsDm/hyStOF93FI/IvN656xmlmIODS2ky9kgc1/img.png?width=768&height=2432&face=0_0_768_2432)"></div><div class="og-text"><p class="og-title">What is Hadoop? – Databricks</p><p class="og-desc">하둡이란 무엇입니까? "하둡"이란 무엇을 의미할까요? 더 중요한 것은, "하둡"은 무엇의 약자일까요? 사실, 고가용성 분산형 객체 지향적 플랫폼(High Availability Distributed Object Oriented Platform)을 뜻합</p><p class="og-host">www.databricks.com</p></div></a></div><p> </p><p>ㅇ HDFS 구조</p><p><a href="https://wikidocs.net/23582" target="_blank" class="ke-link">https://wikidocs.net/23582</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="01-구조(Architecture)" data-og-description="HDFS는 마스터 슬레이브 구조로 하나의 네임노드와 여러 개의 데이터노드로 구성됩니다. 네임노드는 메타데이터를 가지고 있고, 데이터는 블록 단위로 나누어 데이터노드에 저장됩니다.…" data-og-host="wikidocs.net" data-og-source-url="https://wikidocs.net/23582" data-og-url="https://wikidocs.net/23582" data-og-image="https://scrap.kakaocdn.net/dn/bE3PBH/hyStWYvQqX/QgvHqjoMbINpwjLbAfTTA0/img.png?width=100&height=117&face=0_0_100_117,https://scrap.kakaocdn.net/dn/SzSl3/hyStLvUmCE/JBVBkieJySdsu7DrPPGr31/img.png?width=874&height=604&face=0_0_874_604,https://scrap.kakaocdn.net/dn/0xO9R/hyStMVTD0X/pq4k9PiC7UoaiAy5xnzHFk/img.png?width=726&height=420&face=0_0_726_420"><a href="https://wikidocs.net/23582" target="_blank" data-source-url="https://wikidocs.net/23582"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/bE3PBH/hyStWYvQqX/QgvHqjoMbINpwjLbAfTTA0/img.png?width=100&height=117&face=0_0_100_117,https://scrap.kakaocdn.net/dn/SzSl3/hyStLvUmCE/JBVBkieJySdsu7DrPPGr31/img.png?width=874&height=604&face=0_0_874_604,https://scrap.kakaocdn.net/dn/0xO9R/hyStMVTD0X/pq4k9PiC7UoaiAy5xnzHFk/img.png?width=726&height=420&face=0_0_726_420)"></div><div class="og-text"><p class="og-title">01-구조(Architecture)</p><p class="og-desc">HDFS는 마스터 슬레이브 구조로 하나의 네임노드와 여러 개의 데이터노드로 구성됩니다. 네임노드는 메타데이터를 가지고 있고, 데이터는 블록 단위로 나누어 데이터노드에 저장됩니다.…</p><p class="og-host">wikidocs.net</p></div></a></div><p> </p><p> </p><div class="figure-img" data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1Cbe8/9448e2758eae673822584f3135bf0ecfd81a19cd" class="txc-image" data-img-src="https://t1.daumcdn.net/cafeattach/1Cbe8/9448e2758eae673822584f3135bf0ecfd81a19cd" data-origin-width="874" data-origin-height="604"></div><p> </p><p> </p><p>ㅇ 하둡 및 하둡의 구성요소</p><p><a href="https://www.opentutorials.org/course/2908/17055" target="_blank" class="ke-link">https://www.opentutorials.org/course/2908/17055</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="하둡(Hadoop) 소개 및 기본 구성요소 설명 - 내가 아는 모든 IT ~" data-og-description="하둡이란? 분산 환경에서 빅 데이터를 저장하고 처리할 수 있는 자바 기반의 오픈 소스 프레임 워크. 구성요소 1. 하둡 분산형 파일시스템(Hadoop Distributed FileSystem, HDFS) 하둡 네트워크에 연결된 기" data-og-host="www.opentutorials.org" data-og-source-url="https://www.opentutorials.org/course/2908/17055" data-og-url="https://www.opentutorials.org/course/2908/17055" data-og-image="https://scrap.kakaocdn.net/dn/nbKfS/hyStTtXt8e/fdVb6PZE4mqTfe4LqLB7Q1/img.png?width=694&height=449&face=0_0_694_449,https://scrap.kakaocdn.net/dn/jzwJI/hyStZHMahK/T4zMttWdxW1riMksKvFt4K/img.png?width=481&height=314&face=0_0_481_314,https://scrap.kakaocdn.net/dn/XiveR/hyStOe8xsp/krlJhvN6fk9I9kx5I6Nid1/img.png?width=658&height=390&face=0_0_658_390"><a href="https://www.opentutorials.org/course/2908/17055" target="_blank" data-source-url="https://www.opentutorials.org/course/2908/17055"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/nbKfS/hyStTtXt8e/fdVb6PZE4mqTfe4LqLB7Q1/img.png?width=694&height=449&face=0_0_694_449,https://scrap.kakaocdn.net/dn/jzwJI/hyStZHMahK/T4zMttWdxW1riMksKvFt4K/img.png?width=481&height=314&face=0_0_481_314,https://scrap.kakaocdn.net/dn/XiveR/hyStOe8xsp/krlJhvN6fk9I9kx5I6Nid1/img.png?width=658&height=390&face=0_0_658_390)"></div><div class="og-text"><p class="og-title">하둡(Hadoop) 소개 및 기본 구성요소 설명 - 내가 아는 모든 IT ~</p><p class="og-desc">하둡이란? 분산 환경에서 빅 데이터를 저장하고 처리할 수 있는 자바 기반의 오픈 소스 프레임 워크. 구성요소 1. 하둡 분산형 파일시스템(Hadoop Distributed FileSystem, HDFS) 하둡 네트워크에 연결된 기</p><p class="og-host">www.opentutorials.org</p></div></a></div><p> </p><p> </p><p><a href="https://gachonyws.github.io/hadoop/hadoop3/" target="_blank" class="ke-link">https://gachonyws.github.io/hadoop/hadoop3/</a></p><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="하둡(Hadoop) 세부: 하둡의 구조 및 기능 (2/3)" data-og-description="하둡(hadoop) 하둡의 데이터 처리 방식과 구성 요소 여러가지 모드(단일, 의사분산/의사, 완전분산) 기능의 이해(HDFS, MapReduce) 아키텍쳐(HDFS, MapReduce, YARN) 에코시스템" data-og-host="gachonyws.github.io" data-og-source-url="https://gachonyws.github.io/hadoop/hadoop3/" data-og-url="https://gachonyws.github.io/hadoop/hadoop3/" data-og-image="https://scrap.kakaocdn.net/dn/bpByBy/hyStPLQsiC/KRI9KrnWKdXFtEmfB7urk1/img.png?width=540&height=262&face=0_0_540_262,https://scrap.kakaocdn.net/dn/csCwRQ/hyStNgdZwg/yKkSrCkJfjeYtgKfEeS5m0/img.png?width=1922&height=1232&face=0_0_1922_1232,https://scrap.kakaocdn.net/dn/bxukG4/hyStTAJGDx/Su7JfJf9h9x7X3o7GVmc7K/img.png?width=1812&height=916&face=0_0_1812_916"><a href="https://gachonyws.github.io/hadoop/hadoop3/" target="_blank" data-source-url="https://gachonyws.github.io/hadoop/hadoop3/"><div class="og-image" style="background-image:url(https://scrap.kakaocdn.net/dn/bpByBy/hyStPLQsiC/KRI9KrnWKdXFtEmfB7urk1/img.png?width=540&height=262&face=0_0_540_262,https://scrap.kakaocdn.net/dn/csCwRQ/hyStNgdZwg/yKkSrCkJfjeYtgKfEeS5m0/img.png?width=1922&height=1232&face=0_0_1922_1232,https://scrap.kakaocdn.net/dn/bxukG4/hyStTAJGDx/Su7JfJf9h9x7X3o7GVmc7K/img.png?width=1812&height=916&face=0_0_1812_916)"></div><div class="og-text"><p class="og-title">하둡(Hadoop) 세부: 하둡의 구조 및 기능 (2/3)</p><p class="og-desc">하둡(hadoop) 하둡의 데이터 처리 방식과 구성 요소 여러가지 모드(단일, 의사분산/의사, 완전분산) 기능의 이해(HDFS, MapReduce) 아키텍쳐(HDFS, MapReduce, YARN) 에코시스템</p><p class="og-host">gachonyws.github.io</p></div></a></div><p> </p><p> </p>