Weekly Meeting Reports

5/13 Meeting Report

The following points were discussed at the meeting held on May 13th:

  • We shared the papers and articles we read to tiscuss about the most suitable techniques we found.
  • Among those techniques we're using SVD, whose output will be run on KNN. Another way was to combine Matrix Factorization and KNN and optimizing our blender to possibly get better results.
  • One of our team members suggested the extension of KNN technique (ENN) with the possibility of improving our current final results. We went over the method to make sure all of us understand it correctly.
  • Next, we assigned tasks to each of our team members. The tasks are divided into the following groups:
    1. SVD and webpage
    2. Matrix Factorization and blender
    3. ENN
    4. Visualizing ENN results and creating figures, preparing the sequence of our presentation and its style, and creating our presentation slides.
  • Note: We decided that each member focus on one group of tasks. However, we will work dynamically to help each other complete different parts.

As usual we will continue communicating via our Slack page for further discussion of the mentioned tasks and other related topics.

5/7 Meeting Report

The following points were discussed at the meeting held on May 7th:

  • We decided that we should spend about 5 days reading different papers, articles, journals, or any other valid online source to learn the best possible approaches and techniques to combine our current methods.
  • Also, we concluded that we should not continue focusing on methods that require extensive feature engineering, due to the time complexity and instead we must focus on the methods that do not require comprehensive feature engineering such as: KNN, SVD, and MF.
  • Based on the above points, we plan to have our code ready by Monday 5/16, and our presentation and website ready by the end of Tuesday 5/17
  • The day and time for our next meeting will be Saturday night on May 14th.

4/27 Meeting Report

Here are the main points discussed at our meeting:

  • As time complexity was a big concern for feature engineering, we knew we had to come up with ways to filter our data to be smaller.
  • Initially there were many ideas on how to design appropriate filters:
    1. Finding the strongest and weakest students and filtering out the average ones.
    2. Calculating the average duration time to see if we could eliminate the some samples based on that.
    3. Engineering unique problems ("ProblemHierarchy;ProblemName") to calculate the total number of views for each problem so we can analyze that and remove some unique problems based on their view counts.
  • After data engineering, we researched and thought about an appropriate ratio to create new training, validation and test sets based on the original dataset.
  • Next, our team considered the the most appropriate methods to be used in Phase 2. We predicted and then realized even after filtering the initial data, the size of the new dataset would still be very large for us in order to experiment with machine learning methods that require comprehensive feature engineering. KNN algorithm was our option to experiment with before the progress report deadline.

Note: The above points were the primarily topics we discussed in our weekly meeting. Afterwards, our team members have been constantly communicating with each other on our Slack team page to further discuss about details of each point, new ideas, etc.

4/19 Revised Meeting with Professor Ding

Hello Professor Ding,


Thank you for taking the time and giving us feedback on our Phase 1 report. The followings are the main points of what we discussed today.

  • Our Phase 1 report shows our team's hard work and it is well organized but lacks any indication that we deeply understand the methods we used and specifically why we used those techniques.
  • For the final phase, we are free to create our own data and test sets in any suitable way.
  • It is suggested that we should pick one or two methods and focus on them instead of working on many methods at the same time.
  • A progress report is needed for Phase 2 and that progress report is a webpage that we create to publish our project report with main sections that describe / include:

    1. Why is our topic a good machine learning project?
    2. Methods used with reasonings
    3. Experiments we have done
    4. Overall design
    5. Source code
    6. Datasets
  • The project website can be used to showcase some of our skills in machine learning / data analysis if we would like to work in a related field at a company.
  • Our final presentation will be on May 19th and it should take no more than 30 minutes.

Regards,
Team Swift