Introduction to Big Data and Hadoop

Big Data and Hadoop
Latest Batch Schedule
Please note the batch Name you wish to sign up for, it will be used at the time of checkout.
Date | Batch Name | Batch Timing |
---|---|---|
03/02/21 | Weekdays Batch 1 | 8:00 PM - 10:00 PM |
15/02/21 | Weekdays Batch 2 | 8:30 PM - 10:30 PM |
28/02/21 | Weekend Batch 1 | 11 PM - 1 PM |
The market of Big Data and Hadoop are perpetually evolving over the last few years. Learn How to analyze Big Data with the Big Data and Hadoop certification. With the introduction to Hadoop architecture, upscale your analytics career to the next level. Big Data with Hadoop is the ultimate course to attain where you are familiarized with various models and terminologies related to Big data analytics.
Introduction To Big Data & Data Engineering, Big Data Analytics
• What is Big Data & Data engineering?
• Importance of Data Engineering in the Big Data world
• Role of RDBMS (SQL Server), Hadoop, Spark, NOSQL and Cloud computing in Data engineering
• What is Big Data Analytics
• Key terminologies (Data Mart, Data warehouse, Data Lake, Data Ocean, ETL, Data Model, Schema, Data pipeline etc)
Since Irizpro offers live-instructor-led training, you get the benefit of direct and on-spot access of a virtual classroom, unlike a recorded version where you may have to depend on external sources for clearing your doubts.
Browse on more for detailed course curriculum, assessment pattern and certification.
Please click on the next tab to get Curriculum Details.
About Irizpro Courses: All our courses are Live, Virtual Classroom-based sessions. So, the instructor is live sharing the screen with the participants. The sessions are open to questions and discussions. Post the course completion, the recording of the same is shared with the participants for future reference. The maximum number of participants allowed in each batch is 15 only so that a quality interaction can happen between the instructors and participants. All courses are certified by IRIZPRO Learning Solutions.
For any further details, please connect with us at 7506937544 over call/Whatsapp. You can email us at: [email protected]
Course Features
- Lectures 145
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 0
- Certificate Yes
- Assessments Yes
Introduction To RDBMS & SQL Server
- What are Databases & RDBMS
- Create data model (Schema –Meta Data –ER Diagram) & database
- Data Integrity Constraints & types of Relationships
- Introduction to SQL Server & SQL
- Working with Tables
- SQL Management Studio & Utilizing the Object Explorer
- Basic concepts – Queries, Data types & NULL Values, Operators, Comments in SQL, Joins, Indexes, Functions, Views, Sorting, filtering, sub querying, summarising, merging, appending, new variable creation, case when statement usage etc.
- Data manipulation – Reading & Manipulating a Single and multiple tables
- Data based objects creation(DDL Commands) (Tables, Indexes, views etc)
- Optimizing your work
- End to End to data manipulation exercise
Hadoop(Big Data) Eco-System
- Motivation for Hadoop
- Limitations and Solutions of existing Data Analytics Architecture
- Comparison of traditional data management systems with Big Data Evaluate key framework requirements for Big Data analytics
- Hadoop Ecosystem & core components
- The Hadoop Distributed File System – Concept of data storage
- Explain different types of cluster setups(Fully distributed/Pseudo etc.)
- Hadoop Cluster Overview & Architecture
- A Typical enterprise cluster – Hadoop Cluster Modes
- HDFS Overview & Data storage in HDFS
- Get the data into Hadoop from local machine(Data Loading ) – vice versa
- Practice complete data loading and managing them using command line(Hadoop commands) & HUE
- Map Reduce Overview (Traditional way Vs. MapReduce way)
RDBMS & Hadoop Integration Using Sqoop
Data Analysis Using Hive
- Apache Hive – Hive Vs. PIG – Hive Use Cases
- Discuss the Hive data storage principle
- Explain the File formats and Records formats supported by the Hive environment
- Perform operations with data in Hive
- Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
- Hive Script, Hive UDF
- Join datasets using a variety of techniques, including Map-side joins and Sort-Merge-Bucket joins
- Use advanced Hive features like windowing, views and ORC files
- Hive Persistence formats
- Loading data in Hive – Methods
- Serialization & Deserialization
- Integrating external BI tools with Hadoop Hive
- Use Hive to compute ngrams on Avro-formatted files
- Use the Hive analytics functions (rank, dense_rank, cume_dist, row_number)
Data Analysis Using Impala
Data Transformation And Analysis Using Pig
- Introduction to Data Analysis Tools
- Apache PIG – MapReduce Vs Pig, Pig Use Cases
- PIG’s Data Model
- PIG Streaming
- Pig Latin Program & Execution
- Pig Latin : Relational Operators, File Loaders, Group Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
- PIG Macros
- Parameterization in Pig (Parameter Substitution)
- Use Pig to automate the design and implementation of MapReduce applications
- Use Pig to apply structure to unstructured Big Data
SPARK: Introduction
Spark: Spark In Practice
Introduction To Scala For Apache Spark
Resilient Distributed DataSets (RDD)
Spark SQL: Analysing Structured Data
Spark Streaming (Introduction)
Spark MLLib: Scaleable Machine Learning On Spark (Introduction)
Introduction To NoSQL
- Limitations of RDBMS & Motivation for NoSQL
- Nosql Design goals & Advantages
- Types of Nosql databases (Categories) – Cassandra/MongoDB/Hbase
- CAP theorem
- NoSQL database queries and update languages
- Indexing and searching in NoSQL Databases
- Reducing data via reduce function
- Clustering and scaling of NoSQL Database
- Query behaviors in MongoDB
MongoDB Architecture & Installation
- Overview & Architecture of MongoDB
- Depth understanding of Database and Collection
- Documents and Key/Values etc.
- Introduction to JSON and BSON Documents
- Installing MongoDB on Linux
- Usage of various MongoDB Tools available with MongoDB package
- Introduction to MongoDB shell
- MongoDB Data types
- CRUD concepts & operations
Data Modeling & Schema Design - MongoDB
MongoDB Drivers (Node.JS)
MongoDB - Indexing
MongoDB Administration, Scale & Security
- MongoDB monitoring, health check, backups & Recovery options, Performance Tuning
- Data Imports & Exports to & from MongoDB
- Introduction to Scalability & Availability
- MongoDB replication, Concepts around sharding, Types of sharding and Managing shards
- Master – Slave Replication
- Security concepts & Securing MongoDB
MongoDB – Application Development
Introduction To Cloud Computing
- What is Cloud Computing? Why it matters?
- Cloud Companies (Microsoft Azure, GCP, AWS ) & their Cloud Services (Compute, storage, networking, apps, cognitive etc.)
- Traditional IT Infrastructure vs. Cloud Infrastructure
- Use Cases of Cloud computing
- Overview of Cloud Segments: IaaS, PaaS, SaaS
- Overview of Cloud Deployment Models
- Overview of Cloud Security
- Introduction to AWS, Microsoft Azure Cloud and OpenStack. Similarities and differences between these Public / Private Cloud offerings
Big Data Analytics Using Cloud Computing (GCP Or Azure Or AWS)
- Creating Virtual machine
- Overview of available Big Data products & Analytics
- Services in Cloud
- Storage services
- Database Services
- Compute Services
- Analytics Services
- Machine Learning Services
- Manage Hadoop Ecosystem & Spark, NOSQL in the Cloud Services
- Creating Data pipelines
- Scaling Data Analysis & Machine Learning Models
Case Studies