mail unicampaniaunicampania webcerca

    Rosanna CAMPAGNA

    Insegnamento di NUMERICAL METHODS FOR DATA ANALYSIS

    Corso di laurea in DATA ANALYTICS

    SSD: MAT/08

    CFU: 6,00

    ORE PER UNITÀ DIDATTICA: 56,00

    Periodo di Erogazione: Secondo Semestre

    Italiano

    Lingua di insegnamento

    Inglese

    English

    Teaching language

    English

    Contents

    Design and implementation of basic numerical methods for
    - Linear Least Squares;
    - Principal Component Analysis;
    - Clustering (k-means and k-medoids); 

    - Linear Discriminant Analysis;
    - Non-negative Matrix Factorization.

    Application of the previous methods to simple classification, text mining and page ranking problems.

    Textbook and course materials

    L. Eldèn, Matrix Methods in Data Mining and Pattern Recognition, SIAM, 2007.

    A draft of the book "Data Mining: An Algorithmic Approach to Clustering and Classification", by D. Calvetti and E. Somersalo, should be available by February 2020.

    Course objectives

    Knowledge and understanding: students are expected to acquire basic knowledge of numerical methods and software for data analysis.

    Applying knowledge and understanding: students should be able to select and properly apply basic numerical methods and software tools for data analysis.

    Communication skills: students should be able to illustrate the methods and tools learnt during the course and to communicate the results obtained with them, using a suitable basic technical and scientific language.

    Prerequisites

    Students are not required to pass the exams of other courses before taking this one, but knowledge of the contents of the Linear Algebra course and of the basics of the Analysis course is recommended.

    Teaching methods

    The course consists of lectures (32 hours, 4 CFU - ECTS credits) and laboratory sessions (24 hours, 2 CFU - ECTS credits).

    Course attendance is not mandatory, but it is strongly recommended.

    Evaluation methods

    Students are evaluated through an oral assessment, aimed at verifying if they matched the objectives of the course. During the assessment, students are also asked to provide a computer-based illustration of methods and tools studied in the course. To this aim, students can use computer programs developed by themselves or made available by the instructor during the course. Use of other course material is not allowed.

    Marks are expressed in thirtieths. The minimum passing mark is 18/30. Outstanding performance is marked 30/30 cum laude.

    In order to be admitted to the evaluation, students must show a valid id card.

    Course Syllabus

    - Linear Algebra (review of basics and introduction of new concepts): matrices, vectors, norms, fundamental subspaces, projections, orthogonal matrices, eigenvalues and eigenvectors, singular value decomposition (SVD).

    - Linear Least Squares (LLS) fitting: mathematical formulation, numerical algorithms for LLS, application to regression problems.

    - Principal Component Analysis (PCA): mathematical formulation, removal of redundancies in data, model reduction and visualization of high-dimensional data, data centering, computation of PCA via SVD, examples of application.

    - Clustering: mathematical formulation of the clustering problem, k-means algorithm, k-medoids algorithm, applications to simple problems and comparisons.

    - Linear Discriminant Analysis (LDA); scatter matrices and spread, optimization of the spread among clusters, LDA algorithm, simple examples of application.

    - Non-negative Matrix Factorization (NMF): mathematical formulation of the rank-k NMF problem, Alternating Non-negative Least Squares algorithm, application: identifying communities in networks.

    - Classification: basic concepts, distance classifier, dissimilarity measures, k-nearest neighbor classifier, PCA classifier, LDA classifier, Learning Vector Quantifier, example: handwritten digits classification.

    - Text mining: basic concepts of text data, query matching, Latent Semantic Indexing, application of NMF, examples of use.

    - Page ranking: page ranking and random surfing, random walks on graphs, eigenvalue analysis, page ranking algorithm, application to ranking of web pages.

    For all the previous topics, laboratory activities will be carried out, aimed at implementing the numerical algorithms discussed during the lectures or using available software, and at applying the algorithms to simple classification, text mining and page ranking problems. All the activities will be performed using the MATLAB software environment.

    facebook logoinstagram buttonyoutube logotype