In this article, you will learn why we need a distributed computing system and Hadoop ecosystem.
Introduction This article is a continuation of Hadoop – Distributed Computing Environment. We will be developing knowledge about why we need Hadoop and the ecosystem of Hadoop here. Link Click here to get details about distributed computing environments. Working with distributed systems needs software which can coordinate and manage the processors and machines within the distributed environment. As the scaling of giant corporations like Google keeps on increasing, they started to build new software that can run on all the distributed systems. Objective of the software that was developed for distributed systems will be, as shown below.
Thus, Google worked on these two concepts and they designed the software for this purpose.
Both of these combine together to work in Hadoop. Google File System works namely as Hadoop Distributed File System and Map Reduce is the Map Reduce algorithm that we have in Hadoop. Hence, HDFS and MapReduce join together with Hadoop for us. HDFS is a file system that is used to manage the storage of the data across machines in a cluster. Perhaps MapReduce is a framework to process the data across the multiple Servers. Hadoop is distributed by Apache Software foundation whereas it’s an open source. In 2013, MapReduce into Hadoop was broken into two logics, as shown below.
Now, MapReduce framework is to just define the data processing task. It was focused on what logic that the raw data has to be focused on. YARN is a framework again, which will be running the data processing task across the multiple machines, managing memory, managing processing etc. Work allocation of Hadoop
Hadoop Ecosystem Hadoop Ecosystem holds the following blocks.
Pro WPF: Windows Presentation Foundation in .NET 3.0