Published using Google Docs
Experimental projects - A.Y. 2023/2024
Updated automatically every 5 minutes

Experimental projects - A.Y. 2023/2024

Students can submit their work by the following fixed deadlines:

Approximately within two to three weeks from each deadline, there will be an oral examination where the project will be thoroughly discussed with the TAs. We stress that group projects are not allowed: students must complete their projects individually.

Read the following instructions carefully. Besides complying with the project’s specifications, it is extremely important that students follow a sound methodology both in the data preprocessing phase and when running the experiments. In particular, no data manipulation should depend on test set information. Moreover, hyperparameter tuning should focus on regions of values where performance trade-offs are explicit. Any implementation must use Python 3 (any other choice must be preliminarily agreed with the teaching assistants).

Project 1: Kernelized Linear Classification

Download this dataset. The goal is to learn how to classify the  labels based on the numerical features  according to the - loss, which is the metric you should adopt when evaluating the trained models. Explore the dataset and perform the appropriate preprocessing steps. Please be mindful of data leakage between the training and test sets.

Implement from scratch (without using libraries such as Scikit-learn) the following machine learning algorithms:

  1. The Perceptron
  2. Support Vector Machines (SVMs) using the Pegasos algorithm
  3. Regularized logistic classification (i.e., the Pegasos objective function with logistic loss instead of hinge loss)

Test the performance of these models. Next, attempt to improve the performance of the previous models by using polynomial feature expansion of degree 2. Include and compare the linear weights corresponding to the various numerical features you found after the training phase.

Then, try using kernel methods. Specifically, implement from scratch (again, without using libraries such as Scikit-learn):

  1. The kernelized Perceptron with the Gaussian and the polynomial kernels
  2. The kernelized Pegasos with the Gaussian and the polynomial kernels for SVM (refer to the kernelized Pegasos paper with its pseudo-code here in Figure 3. Note that there is a typo in the pseudo-code. Identify and correct it.)

Evaluate the performance of these models as well.

Remember that relevant hyperparameter tuning is a crucial part of the project and must be performed using a sound procedure.

Ensure that the code you provide is polished, working, and, importantly, well-documented.

Write a report discussing your findings, with particular attention to the adopted methodology, and provide a thorough discussion of the models’ performance and their theoretical interpretation. Include comments on the presence of overfitting or underfitting and discuss the computational costs.

Project 2: Tree predictors for binary classification

Download the Mushroom dataset. The main task of this project is the implementation from scratch of tree predictors for binary classification to determine whether mushrooms are poisonous. The tree predictors must use single-feature binary tests as the decision criteria at any internal node (as seen in the lectures). More precisely, consider thresholds on a single feature in the case of a numerical/ordinal feature or membership tests in the case of a categorical feature. We suggest the following guidelines to aid your work on this project.

First, implement a basic class/structure for the nodes of the tree predictors. It should possess the following attributes/procedures:

Second, implement a class/structure for the (binary) tree predictor. It should contain the following attributes/procedures:

Feel free to add extra attributes/procedures if deemed necessary for the task.

Train the tree predictors adopting at least 3 reasonable criteria for the expansion of the leaves, and at least 2 reasonable stopping criteria. Compute the training error of each tree predictor according to the 0-1 loss.

Perform hyperparameter tuning according to the splitting criteria and the stopping criteria adopted (e.g., tune the threshold on the maximum size of the tree) for at least one of the tree predictors. Keep in mind that the relevant hyperparameter tuning is an important part of the project and must be performed using a sound procedure.

Write a report where you discuss your findings, with particular care about the adopted methodology, and a thorough discussion about the models’ performance (with comments on the eventual presence of over/underfitting). In the case of overfitting, some ways to tackle it would be by pruning the tree predictors (see the references below) or by an appropriate stopping criterion.

We suggest the following resources, in addition to the lecture notes of the course, as further reading for the interested student:

Optional: Implement random forest by reusing the already implemented tree predictor class/structure.


Project 3: Comparative analysis of manual vs. automatic feature extraction in ultrasound musculoskeletal segmentation masks to identify distension

Description: Ultrasound imaging is a critical tool in diagnosing musculoskeletal disorders. In this project, we want to evaluate the relation between the segmented area of the knee recess and the presence of liquid (distension).

Dataset: We have a dataset of ~700 binary masks of knee recess extracted from ultrasound images manually annotated by expert physicians. The masks represent the knee joint recess and each is also associated with the status of “distended” (i.e., enlarged”) or not-distended. For a brief introduction to the medical problem from a computer science perspective, please refer to [1].

Objective: Classify each mask as “distended” or “not-distended”. Classification performance should be evaluated in the following two cases:

  1. when features are manually extracted from the mask (e.g., area of the recess, longest segment, etc.);
  2. when features are automatically extracted with deep automatic feature extraction.

Extra task: Use explainable AI to better understand which features the models consider relevant for the classification.

Notice: Before starting to work on project 3 (or if you are interested in these or other ML challenges in the field of medicine), make sure to get in touch with Marco Colussi or Prof. Sergio Mascetti for further instructions or clarifications. They will be responsible for the evaluation of your project.

References:

[1]: Colussi, Marco, et al. "Ultrasound detection of subquadricipital recess distension." Intelligent Systems with Applications 17 (2023): 200183.

Project 4: Advanced techniques in medical image segmentation for musculoskeletal ultrasound

Description: Accurate segmentation of musculoskeletal structures in ultrasound images is essential for diagnosis and treatment planning. Advanced deep learning techniques, particularly convolutional neural networks (CNNs), diffusion, and foundation models have shown great promise in medical image segmentation. We are addressing the problem of automatically segmenting the joint recess.

Dataset: We will provide the student with a set of 687 ultrasound images of both distended and nondistended recesses each one with the annotated area of the recess,

Objective: Develop and compare state-of-the-art deep learning models for the segmentation of medical images.

Extra task: Introduce a constrained loss [2].

Notice: Before starting to work on project 4 (or if you are interested in these or other ML challenges in the field of medicine), make sure to get in touch with Marco Colussi or Prof. Sergio Mascetti for further instructions or clarifications. They will be responsible for the evaluation of your project.

References:

[2]: Wang, Ping, et al. "CAT: Constrained adversarial training for anatomically-plausible semi-supervised segmentation." IEEE Transactions on Medical Imaging (2023).


Theory projects

Students who want to work on a theory project must write an email to the instructor indicating a topic (typically chosen among those covered in class) they would like to focus their project on. The instructor will then suggest one or two papers in that area.

Keep in mind that theory projects are specifically addressed to students who have a good disposition toward mathematics. Do not choose a theory project only because you are not good at coding.

Here is an example of a good report for a theory project.