The practical work below is divided into sessions. You are asked, each time, to read the content, to do the exercises when there are some, and to ask me questions when something is not understood. At the end of each session, you will have to send your production to: cguyeux@femto-st.fr.
At the end of the course, I will randomly draw two practical works per student, to give a final grade on the practical part. The last session will be the occasion to make a final test on paper, covering all the sessions.
First session: introduction to Python
Second session : The Pandas library
Third session : Clustering (Unsupervised learning)
Fourth session : Supervized learning
General presentation
Supervized learning preparation
Project
Take your clustering codes from the previous session, and add a dimentionality reduction step upstream. Try several methods, and improve your clustering results according to various metrics previously proposed.
fifth session
Sixth session: time series
Choose a time series from the UCI directory and use it as an example in the following points. You can send me at the end of the day what you have achieved on the time series of your choice, both on its analysis and its prediction.
- Generalities on time series
- Temporal structures
- Reference series
- Decomposition of time series
- Time series prediction
Deep Learning for IoT - M2 IoT
- Reinforcement Learning
- Deep Sequence Modeling
- MIT Deep Sequence Modeling labs
- Classification of text (What you have to do) - slides to read
- Sentiment classification - IMDB dataset - Colab file to complete
- Number of weights: IMDB-LSTM versus IMDB-GRU
- A focus on word embedding
- Colab file Word2Vec
- Colab file Glove - pretrained_word_embeddings
- Correct the file if needed
- Use only 10% of the samples, 20%, try to increase before the RAM becomes saturated
- Load a smaller file of pre-trained GloVe embeddings from NLP Stanford, I suggest you glove.2024.wikigiga.50d.zip
- SMS classification - Link to CSV file in "Classification of text"
- Colab file SMS-LSTM-Classif
- Try to learn an embedding with Word2Vec based on Colab file Word2Vec
- Complaint classification - Link to CSV file in "Classification of text"
- Use SMS classification file to perform multiclass classification of complaints
- Colab file with clean_text which could be useful
- Prediction (What you have to do) slide(s) to read
- Possibilities to format data with timeseries_dataset_from_array
- Prediction / forecasting of electrical power consumption
- Evaluation
- Information on the different Colab files you must submit is are given here
- Deadline to send the files is Wednesday 7 January 2026 - 3PM
- An interview will likely be scheduled for the beginning of the week of January 12
- My email address is: michel.salomon@umlp.fr