Store Memebers Behavior Analysis

1. Data preparation and customer analytics

Background:

The Manager wants to better understand the types of customers who purchase Chips and their purchasing behaviour within the region.
The insights from analysis will feed into the supermarket’s strategic plan for the chip category in the next half year.

End Goal:

To form a strategy based on the findings to provide a clear recommendation to Manager so make sure the insights can have a commercial application.

Early Exploratory analysis

  • Examine transaction data
  • Examine customer data
  • Data analysis and customer segments
  • Deep dive into customer segments
  • Derive extra features (pack size and brand name)and define metrics of interest that draw insights on who spends on chips and what drives spending for each customer segment

Data Cleaning and Fixing

Step 1: Basic Checking and Editing by Excel

  • View the data in Excel and apply Filter to check the missing values in each columns – No missing values in the dataset.

  • Use short date format to change date columns to normal date expression. However, there are some outliers in the dataset,the card number 226000 only have two transactions in the whole dataset, so I decide to remove them.

  • By observing the product names using Filter, I find some similar names such as WW - Woolworths,CCs - Cheezels Cheese,Infzns - Infzions, NCC-Natural Chip - Natural Chip Company, RRD - Red Rock Deli, Smith - Smiths, Snbts- Sunbites, GrnWves - Grain Waves, which I decide to fix using Python for simplicity.

  • Load the data in Python for futher checking and operation.

Step 2: Data Manipulation by Python

  1. Load the Transaction Data as Tdata, Load the purchase data as Pdata and try to merge two data as one,since Transaction Data and Purchase data share the same membership card column
1
2
3
4
Tdata=pd.read_excel(r'C:\Users\Administrator\Desktop\transaction_data.xlsx')
Pdata=pd.read_csv(r'C:\Users\Administrator\Desktop\purchase_behaviour.csv')
data=Tdata.merge(Pdata,on= 'LYLTY_CARD_NBR')
data.info() # By checking the info of data, we didn't find any strange in the dataset.
  1. Because we are focusing on chip category , we only want to keep the chip products. So let’s create a new dataframe for chip prodcut.
1
2
3
4
5
# Use str.contains to find the product name that contain the 'chip'. And Create a new dataframe for chips only.
chipdata=data[data['PROD_NAME'].str.contains('chip',case=False)] # Case sensitive = False
chipdata.reset_index(drop=True,inplace=True) # If you don't want it saved as a column then drop = True, inplace= True(reassign the index)
print(chipdata.head())
chipdata.duplicated().any()# check duplicated data
  1. Fix the product names to make product name consistent
1
2
3
4
5
6
7
8
9
10
# Clean all the special character in the product name
chipdata['PROD_NAME'] = chipdata['PROD_NAME'].str.replace('\W', '')
# Regular expression is the key point to understand the formula
# Replace all the name to standard format
chipdata['PROD_NAME']=chipdata['PROD_NAME'].replace({'WW': 'Woolworths','CCs':'Cheezels Cheese','Infzns':'Infzions','Natural Chip Co': 'Natural Chip Company','NCC':'Natural Chip Company','RRD' :'Red Rock Deli', 'Smith':'Smiths','Snbts':'Sunbites','GrnWves':'Grain Waves'},regex=True)
# check to see what product name we have
chipdata['PROD_NAME'].unique()
# From the result we can see that we have some problem on Smithss
chipdata['PROD_NAME']=chipdata['PROD_NAME'].replace({'Smithsss':'Smiths'},regex=True)
chipdata['PROD_NAME'].unique()# now it appears normall

Now the data is good for customer analysis.For simplicity, we can load the data into Tableau for visualization analysis

Customer Analysis

**Lets figure out several questions that may help us understand the customer behavior:

  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2020-2021 By Yang Yue
  • Visitors: | Views:

请我喝杯奶茶吧~