Course Description |
Big Data Analysis is the hot topic job nowadays. But it’s a big problem too. In this lesson’s aim is how to query on RDBMS and Big data ecosystems products, designing modern edition data warehouses and managing massively parallel processing data warehouse technologies on cloud platforms. We will start to ask a few questions: What’s the problem of data world. What’s the technologies? Why does this technology exist and why do I need it? How can I get the best out of it utilizing something familiar like SQL. How can I design and query on RDBMS system, Hadoop ecosystem products like Pig Latin, Hive, Spark etc. and MPP products like Azure SQL DW, AWS Redshift, Azure Stream Analytics, Big Data Lake Analytics etc. |
|
Program Outcomes and Competences |
Level |
Assessed by |
1) |
Building on the skills acquired during the undergraduate degree, an improved and deepened level of expertise in the field of big data analytics related to machine learning. |
H |
|
2) |
Applied in-depth theoretical and practical knowledge in the fields of statistics, computation and computer science related to machine learning. |
H |
|
3) |
Extensive knowledge about the analysis and modeling methods used in machine learning and their limitations. |
S |
|
4) |
Ability to design and perform exploratory research based on analytics, modeling and experimentation; to generate solutions to complex situations encountered in this process and to interpret the results. |
S |
|
5) |
Ability to describe the analytics process and its results both verbally and in writing on national and international platforms within or outside of the field of machine learning. |
H |
|
6) |
Awareness of social, scientific and ethical values regarding the machine learning, processing, usage, interpretation and dissemination stages and in all related professional activities. |
H |
|
7) |
Professional awareness new and emerging applications in the machine learning field and an ability to demonstrate their uses. |
H |
|
8) |
Competence to act as a leader in multi-disciplinary teams, to develop big data-driven solutions to complex situations; to take responsibility. |
N |
|
9) |
Ability to communicate in English both verbally and in writing at European Language Portfolio General Level B2. |
S |
|
10) |
Understanding of social and environmental aspects of machine learning applications. |
N |
|
Week |
Subject |
1) |
1. History of Data
1.1 Definition of OLTP Systems
1.2 Why we need datawarehouse systems
1.3 Real world problems (volume, velocity, variety, variability)
2. Vendors of Big Data
2.1 Data Transformation Products
2.2 Data Visualization Products
2.3 Data Analytics Products
2.4 Cloud oriented MPP Systems
2.5 Volume Problem Oriented Vendors (Cloud Products & Open Source)
2.6 Velocity Problem Oriented Vendors (Cloud Products & Open Source)
2.7 Variety Problem Oriented Vendors (Cloud Products & Open Source)
2.8 Variability Problem Oriented Vendors (Cloud Products & Open Source)
2.9 Data Mining Vendors (Cloud Products & Open Source)
2.10 Automated Reporting Products
|
2) |
2. Data Warehousing and Business Intelligence Insights
2.1 Designing and implementing a data warehouse
2.3 Developing Data Access and Transformation Layer (ETL)
2.4 Reporting Layer
2.5 Analytics Layer
2.6 Building Data Quality Solutions
2.6 Scenarios & solutions for Data warehouse
|
3) |
3. Designing & Querying on OLTP Systems
3.1 Database Design
3.2 Learning SQL Query (CRUD Operations)
3.3 Logical Query Processing
3.3 Programmable SQL Objects
3.4 Aggregates and Analysis
3.5 Query Optimization and Understanding in-memory tables
3.6 T-SQL for Business Intelligence Practioners
|
4) |
4. Designing ETL Layer
4.1 Introduction to Integration Services
4.2 Learning item list of Control Flow And Data Flow Tasks
4.3 Using Variables, Parameters and Expressions
4.4 Error and Event Handling
4.3 Data Cleansing Demos with Real World Dirty Data
4.4 ETL package monitoring and optimization
4.5 Special Design Scenarios
|
5) |
3. Sql On Hadoop
5.1 Introduction of Hadoop Ecosystems And Azure HDInsight
5.2 Hive Architecture and Principles
5.4 Data Definition, Description And Selection using Hive Query Language
5.3 Advanced Data Analysis using Hive
|
6) |
3. Real Time Analytics
3.7 Definition of Real Time Analytics
3.2 Ingestion Data into Event Hubs
3.8 Benefits and use cases of Azure Event Hub & Stream Analytics
3.4 Querying on Azure Stream Analytics
3.5 Case Study: Real Time Social Media Analytics
3.6 Data Visualization for Streaming Data on Power BI
|
7) |
4. Working with Unstructured Data
4.1 Understanding the rationale of Pig
4.2 Writing Evalutation and Filter Functions
4.3 Developing and Testing Pig Latin Scripts
4.4 Real World Scenarios: Analyzing TV Series
|
8) |
5. Massively Parallel Processing Products
5.1 Introduction to MPP Systems
5.2 Amazon Data Warehouse and Amazon Redshift Integration Projects
5.3 Azure Data Warehouse Overview
|
9) |
5. Massively Parallel Processing Products
5.4 Designing & Querying Data on MPP Systems
5.5 Scalability & elasticity for Amazon Redshift and Azure SQL Data Warehouse
|
10) |
6. Data Visualization
6.1 Creating Visualization and Dashboard Architecture
6.2 Visual Analytics with Microsoft Power BI
6.3 In Memory Analytics using Qlikview
|
11) |
6. Advanced Analytics
6.4 Getting started with Azure Machine Learning
6.5 Using Azure ML Studio
|
12) |
6. Advanced Analytics
6.6 Getting Data in and out of ML Studio
6.7 AWS Machine Learning vs. Azure Machine Learning
6.8 Advanced Analytics on SQL Server 2016 using R Script
|
13) |
8. Big Data Lake Analytics
8.1 The Need for Data Lake
8.2 ADLA complements Hadoop systems
|
14) |
8. Big Data Lake Analytics
8.3 Using C# with U-SQL
|
15) |
Final Ezamination Period |
16) |
Final Examination Period |