mail unicampaniaunicampania webcerca

    Fabrizio MATURO

    Insegnamento di STATISTICAL LEARNING

    Corso di laurea in DATA ANALYTICS

    SSD: SECS-S/01

    CFU: 2,00

    ORE PER UNITÀ DIDATTICA: 24,00

    Periodo di Erogazione: Secondo Semestre

    Italiano

    Lingua di insegnamento

    INGLESE

    English

    Teaching language

    English

    Contents

    1) Introduction to Modern Statistical Learning Approaches
    2) Unsupervised Classification
    3) Supervised Classification
    4) Semi-Supervised Classification
    5) Introduction to Social network analysis
    6) Introduction to Fuzzy Set Theory

    Textbook and course materials

    - Material provided during the lessons
    - JAMES, WITTEN, HASTIE, TIBSHIRANI. An introduction to statistical learning with applications in R. Springer.
    - HASTIE, TIBSHIRANI AND FRIEDMAN. The elements of statistical learning: data mining, inference and prediction. Springer-Verlag.

    Course objectives

    Knowledge and understanding.
    The course aims at the introduction and understanding of methodological aspects of Statistical Learning (preliminary concepts)
    Applied knowledge and understanding.
    The course aims at the knowledge and understanding of the application aspects of the main techniques of Statistical Learning through exercises, laboratory activities and the using of specialist software.
    Making judgements
    The course aims to give ability to the student at:
    - formulating an own evaluation and judgment based on learned notions and from a comparison, in classroom, with the teacher and with the other students;
    - identifying and collecting additional information for the subject knowledge through additional books and teaching materials;
    - improving ability in how to do and in how to take decisions, considering various aspects of the matter, especially applicative ones;
    - performing knowledge extraction from databases by using methodologies and techniques of Statistical Learning with specialist software (R and Python).


    Communication skills.
    The course aims to provide the student with communication skills on learnt data analysis methods and on results of practical exercises.

    Learning skills.
    The course aims to provide the student with:
    - learning skills necessary for understanding and using of Statistical Learning techniques for data processing;
    - ability to draw on different bibliographical sources, in English, in order to acquire new skills in this field.

    Prerequisites

    Basic knowledge of mathematics, descriptive and inferential statistics.

    Teaching methods

    Teaching is structured in frontal lessons, divided into theoretical lessons and practical sessions using the R software.

    Evaluation methods

    The assessment of students' learning level will be carried out with a computer test and a subsequent oral discussion.
    The computer test consists of exercises related to the methods that will be illustrated during the course and can contain some questions about the theory.
    The duration of this test will depend on the degree of difficulty of the proposed questions and will be communicated during the course.
    The main objective of the practical test is to prove "knowledge" and "know-how". Instead, the oral exam is aimed at probing communication skills, mastering the specific technical language of the discipline dealt with, clarity of exposition and the ability to interpret.
    The exam methods are the same for attending and non-attending students.

    Course Syllabus

    1) Introduction to Modern Statistical Learning Approaches

    2) Unsupervised Classification: Clustering
    Dissimilarity measures
    Hierarchical clustering
    - Agglomerative hierarchical clustering
    - Divisive hierarchical clustering
    Selecting the number of groups
    Pre-processing operations for clustering
    Non-hierarchical clustering
    - K-Means
    - PAM
    - CLARA
    Cluster validity analysis
    Introduction to Fuzzy set theory
    Fuzzy clustering
    Biclustering

    3) Supervised Classification
    k-Nearest-Neighbours
    Misclassification Error
    Resampling methods: Cross-validation, bootstrap
    Linear Regression Model
    Linear Discriminant Analysis
    Logistic Regression
    ROC Curve, Sensitivity, Specificity
    Naïve Bayes Classifier
    Regression & classification trees
    Bagging, boosting, random forests
    Support vector machines
    Feature selection

    4) Semi-Supervised Classification
    Self-Training
    Cluster-then-label

    5) Splines & smoothing splines

    6) Introduction to Social network analysis

    facebook logoinstagram buttonyoutube logotype