Course Description
In this course, we illustrate common technology stack available in most of the advanced database management frameworks. This course covers some of the advanced topics available in these technology stacks such as databases (No SQL databases), storage (HDFS, S3 on AWS, ..etc) and data access applications and technologies (Hive, Shark, Mlib,...etc). The course is heavily dependent on research papers where students read the papers and present them during the class.
Reading Assignments
Research papers are avilable for students as reading assignments. Each student should prepare a review document for reviewing the research paper. The review should follow the review template available in the attachments of the course.
- Google File System (2003) [PDF]
- HDFS (2010) [PDF]
- Map Reduce (2004) [PDF]
- Google Spanner (2012) [PDF]
- Hive (2010) [PDF]
- Shark(2013) [PDF]
- BigTable (2006) [PDF]
- NoSQL DBs [PDF]
Project
Students are required to deliver a project. The project can be one of the following:
- Data Analysis Project where they identify a question and a data set and answers the question through analyzing the data set.
- Research Area Topic that involves data management or data analysis.
- The project is a group project where 3 students maximum can collaborate in working on it.
- Use a competition data set and problem such as the ones in [KAGGLE] and conference competitions: [COMAD 2016]
Examples of Previous Projects
IBM BlueMix Accounts
- You can use IBM Bluemix Cloud account.. Ask for the promo code to give it to you.
- You can use IBM Watson for data analytics .. Ask for the promo code to give it to you.
Grading Scheme
- Reading Assignments: 21% (3% on each paper review)
- Paper Presentation: 9%
- Project Presentation: 20%
- Project Paper and Implementation: 20%
- Final: 30% (Plus Project+Presentation grade will be given as the rest of the 30% of the legal grades of the final exam)