This was my graduation project related to customer recommendation of airline industry.
The aim of the project is to find out certain pattern out of the flight history of customers.
Used R and Python to implement different algorithms for predicting.
Advisor: Prof. Hsin-Min Lu @NTUIM
Co-workers: Kelly Hsieh, Andre Chang @NTUIM
Due to authority and IP policies, only few slides and contents are shown below.
Data Description
Data is based on XXX airline’s customer hisotry from 2013-2015.
The attributes contained two categories of attributes:
- Demographic attributes: Age, Gender, Member Status …
- Flight attributes: Flight number, Destination, Departure Time … We later define demographic attributes as users, and flight attributes as items. In total, we defined 2572 items and 13347 users.
Data Preprocessing
Used Python to deal with data sparsity and bias.
Toolkits used:
- scikit-learn
- numpy
- panda
The raw data is quite sparse and biased, and thus we spent lots of time to conduct data preprocessing.
Algorithms
- User-based collaborative filtering: Emphasis on user features from data, calculate the cosine-similarity of data.
- Item-based collaborative filtering: Emphasis on user features from data. calculate the cosine-similarity of data.
- Singular Value Decomposition (SVD): Calculated the singular values of matrix, and used them to conduct analysis.
Result
In short, we compared diffrent parameter tuning (k, nn, irnba).
As a result, we got the best hit rate of 20.11% hit rate with item-based method.
That is, in the top 10 recommendations from 2572 options, we can get a 20.11% hit rate.