A Hands on session was conducted for third year IT students on the dates 19th, 20th, 26th and 27th of December 2014. This session was conducted by a Corporate Trainer Mr. Niraj Bhatt. Following are the contents covered in the session:
Topics Covered on 19th and 26th of December 2014:
- Introduction to Hadoop: Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
- MapReduce: A programming model and associated implementation for processing and generating large data. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project. The framework takes care of scheduling tasks, monitoring them and re-executing any failed tasks.
Topics Covered on 20th and 27th of December 2014:
- HDFS: Introduction to scalable and portable Hadoop Distributed File System. HDFS uses a master/slave architecture in which one device (the master) controls one or more other devices (the slaves). The HDFS cluster consists of a single NameNode and a master server manages the file system namespace and regulates access to files.
- HIVE: A data WareHouse build on Hadoop for data summarization and analysis. Hadoop was built to organize and store massive amounts of data of all shapes, sizes and formats. Because of Hadoop’s “schema on read” architecture, a Hadoop cluster is a perfect reservoir of heterogeneous data—structured and unstructured—from a multitude of sources. Data analysts use Hive to explore, structure and analyze that data, then turn it into business insight.
- PIG: A high level platform for creating map reduce programs used with hadoop. Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data workers to write complex data transformations without knowing Java. Pig’s simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL.
Thus, the session was useful to the students which provided with the basic understanding of what exactly the Hadoop platform is. Students become familiar with handling the databases and unstructured data over distributed file system platforms. This will certainly help them in the industry while handling large amount of data for data analysis.