This was my graduation project related to customer recommendation of airline industry. The aim of the project is to find out certain pattern out of the flight history of customers. Used R and Python to implement different algorithms for predicting. image-title-here

Advisor: Prof. Hsin-Min Lu @NTUIM
Co-workers: Kelly Hsieh, Andre Chang @NTUIM

Due to authority and IP policies, only few slides and contents are shown below.

image-title-here

Data Description

Data is based on XXX airline’s customer hisotry from 2013-2015.
The attributes contained two categories of attributes:

  • Demographic attributes: Age, Gender, Member Status …
  • Flight attributes: Flight number, Destination, Departure Time … We later define demographic attributes as users, and flight attributes as items. In total, we defined 2572 items and 13347 users.

Data Preprocessing

Used Python to deal with data sparsity and bias.
Toolkits used:

  • scikit-learn
  • numpy
  • panda

The raw data is quite sparse and biased, and thus we spent lots of time to conduct data preprocessing.

Algorithms

  • User-based collaborative filtering: Emphasis on user features from data, calculate the cosine-similarity of data.
  • Item-based collaborative filtering: Emphasis on user features from data. calculate the cosine-similarity of data.
  • Singular Value Decomposition (SVD): Calculated the singular values of matrix, and used them to conduct analysis.

image-title-here
image-title-here

Result

In short, we compared diffrent parameter tuning (k, nn, irnba).
As a result, we got the best hit rate of 20.11% hit rate with item-based method.
That is, in the top 10 recommendations from 2572 options, we can get a 20.11% hit rate.

image-title-here