CS 533 -- Fall 2008

I:  XML Data Management and Querying

II.  Data Mining Techniques 

(MWF 2:00-2:50 p.m. at MWF 2:00-2:50 p.m. at Lindegren Lab).

Course Web Site: www.cs.siu.edu/~dche/courses/CS533

 

INSTRUCTOR

Dr. (Daren) Dunren Che (Associate Professor)

Office: Faner 2128
Office Hours: 10am-12am MWF
Phone: 453-6046 (prefer email contacts!!)
Email: dche@cs.siu.edu

PREREQUISITES

430 with a grade of C or better and strong enthusiasm in doing research in the area of data management and data mining. This course asks dedication and hard work.

COURSE DESCRIPTION

The main theme of this course is on the concepts and techniques of data mining. But a major extra ingredient added as an integral part to this course is a comprehensive introduction to the state-of-the-art XML data management technology.

The XML data management topic is  added because of the increasing importance of XML to the entire discipline of computer science . XML data is now flooding, and XML related technologies are penetrating everywhere. Consequently, XML data repositories will be the vast future realm for the data mining technology to explore.

Data mining, or knowledge discovery from data repositories, has during the last few years emerged as one of the most exciting fields in computer science. Data mining aims at finding useful regularities in large data sets. Interest in the field is motivated by the growth of computerized data collections which are routinely kept by many organizations and commercial enterprises, and by the high potential value of patterns discovered in those collections. For instance, bar code readers at supermarkets produce extensive amounts of data about purchases. An analysis of this data can reveal previously unknown, yet useful information about the shopping behavior of the customers.

Data mining refers to a set of techniques that have been designed to efficiently find interesting pieces of information or knowledge in large amounts of data. Association rules, for instance, are a class of patterns that tell which products tend to be purchased together. There is currently a large commercial interest in the area, both for the development of data mining software and for the offering of consulting services on data mining.

In this course we explore how this interdisciplinary field brings together techniques from databases, statistics, machine learning, and information retrieval. We will discuss the main data mining methods currently used, including data cleaning, clustering and classification techniques, algorithms for association rule mining, text indexing and searching algorithms, how search engines rank pages, and recent techniques for web mining. Designing algorithms for these tasks is difficult because the input data sets are very large, and the tasks may be very complex.

XML DB Topics to be Covered:

Data Mining Topics Covered:

TEXT AND REFERENCES

1. Text:

    Data Mining: Introductory and Advanced Topics, 1/e
    Margaret H. Dunham
    ©2003 | Prentice Hall | Paper; 315 pp | Instock
    ISBN-10: 0130888923 | ISBN-13: 9780130888921
 

2. Reference:

    Data Mining: Concepts and Techniques, 2nd ed.

    Jiawei Han and Micheline Kamber

    The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor
    Morgan Kaufmann Publishers, March 2006. ISBN 1-55860-901-6

REQUIREMENT AND GRADING POLICY

A final letter grade will assigned to each student based on two factors: one written exam (covering the data mining topics only) and one research report/paper (for CS majors only) on anything related to database, data mining, and/or XML, each taking 50%. For non-CS majors, an extra written exam can be used as a substitute for the research part.

This course encourages and promotes student research, and is expected to lead those interested students to a proper thesis topic (pre-thesis study!). Students are required to choose a topic related to data mining or  XML data management or both, and read at least 10 selected research papers, and at the end submit a research report/paper, which will be counted as 50% toward fulfilling the course's requirement. A 25-minutes time slot will be scheduled for each student to report his/her research to the whole class (Students will be involved in grading the presentations of their classmates).

The instructor advocates pair work for producing quality research results. This means that  two students may work together on the same research topic and coauthor the same research report/paper, but need to clearly indicate (usually at the end of their paper) the specific contribution of each author. Research reports/papers will be graded using common criteria that are used for reviewing academic journal/conference papers, e.g., readability and clarity of presentation, adequacy of literature review, adequacy of analysis of issues, and originality of proposed approach/ideas/methods, etc.

The final letter grade will be assigned based on the following "standard" scales:

A ---- above 90

B ---- 80 to 89

C ---- 70 to 80

D ---- 60 to 69

F ---- Less than 60

Important: Start your research EARLY because time is never plenty for producing a good research result.

A good attendance record may earn you extra credits unnoticeably!!

A ROUGH SCHEDULE of Lectures

 Week 1 to Week 4: XML data management topics

Week 5 to Week 12: Data mining topics

Week 13 to Week 15: Students presentation and Final exam (the Friday before the final week)

Emergency Response Guide

See the attachment (next page)