(ICT3) Students, Welcome!


This is the official webpage hosting the materials for the Data Mining and Knowledge Discovery course at the Jožef Stefan IPS. Here you can find the main materials, past projects, course schedule and more. For any questions, please write to us.

In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term "data mining" was used in a similarly critical way by economist Michael Lovell in an article published in the Review of Economic Studies in 1983. Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative). The term data mining appeared around 1990 in the database community, generally with positive connotations. For a short time in 1980s, a phrase "database mining"™, was used, but since it was trademarked by HNC, a San Diego-based company, to pitch their Database Mining Workstation; researchers consequently turned to data mining. Other terms used include data archaeology, information harvesting, information discovery, knowledge extraction, etc. Gregory Piatetsky-Shapiro coined the term "knowledge discovery in databases" for the first workshop on the same topic (KDD-1989) and this term became more popular in AI and machine learning community. However, the term data mining became more popular in the business and press communities. Currently, the terms data mining and knowledge discovery are used interchangeably.


Schedule

Below you can find the current schedule. Please, be present, if possible, at all lessons/labs. (to be added)
Date Time Room Professor
09.11.2021 15:00 - 17:00 ZOOM Lavrač Nada
16.11.2021 15:00 - 18:00 ZOOM Lavrač Nada, Osojnik Aljaž
23.11.2021 15:00 - 18:00 ZOOM Lavrač Nada, Osojnik Aljaž
30.11.2021 15:00 - 18:30 ZOOM Ženko Bernard, Osojnik Aljaž
07.12.2021 15:00 - 19:00 ZOOM Ženko Bernard, Osojnik Aljaž
21.12.2021 15:00 - 19:00 ZOOM Ženko Bernard, Osojnik Aljaž
04.01.2021 15:00 - 17:00 ZOOM Žnidaršič Martin
18.01.2022 15:00 - 18:30 ZOOM Žnidaršič Martin, Osojnik Aljaž

Course Materials

Here you can find the relevant course materials. Will be added as the year progresses.

Nada Lavrač

The collection of lecture notes.

Bernard Ženko

The collection of lecture notes.

Martin Žnidaršič

The collection of lecture notes.

Aljaž Osojnik (practicals)

The collection of practical notes, partially adapted from prof. Petra Kralj Novak.

Course Requirements

The student must fullfil all of the stated requirements in order to pass. The requirements are as follows.

Main requirements

  • Attending lectures and hands-on exercises
  • Written exam (40%)
  • Data mining seminar from advanced data mining topics (60%)
    • Data analysis of your own data in Orange or by using other data mining tools
    • Half a page seminar proposal on written exam day
    • Deliver a 4 pages written report (printed and electronic copy) in Information Society paper format
      on seminar presentations day (use paper template and guidelines)
    • Oral presentation of seminar results (10 minutes for presentation + 5 minutes discussion, use slides template)
    • Deadlines: topic definition by 4. 1. 2022, submission by 22. 2. 2022, presentation on 1. 3. 2022

Useful links


Literature:

  1. Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874
  2. Bramer, Principles of Data Mining
  3. Optional (advanced): Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning, Second edition. New York: Springer series in statistics. , pages 9-18
  4. Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Mozina M, Polajnar M, Toplak M, Staric A, Stajdohar M, Umek L, Zagar L, Zbontar J, Zitnik M, Zupan B (2013) Orange: Data Mining Toolbox in Python, JMLR 14(Aug): pages 2349−2353.
  5. Pedregosa et al. (2011) Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830.
  6. Chollet, F. et al. (2015) "Keras"
(2021) Created by Blaž Škrlj and Aljaž Osojnik.