Association Mining Using Apriori Algorithm | Hashtag Statistics


Have you ever been to any shop? of course, yes!

Have you ever realized why toothpaste, mouthwash, bathing soap, conditioner, washing powder are kept in one aisle? Why bread, butter, egg, cornflakes are kept in another isle?

Aisles layout are ordered in such a manner to reduce customer's effort in buying products, isn't it?

So, the question is, "Do all the retailers implement Association Rule Mining before placing products on shelf in their stores?"

Not exactly! It's just a common sense!


Wait!


But what if you see <hot dog, coke, chips> or <beer, burger, potato, chips> together at one shelf?

Apparently they must have done some study or did some Association Mining for this wise arrangement!


Association of different products can be discovered using many approaches e.g. Eclat algorithm, FP-growth algorithm and Apriori algorithm.

In this article, we will focus on Apriori Algorithm to find some association among different products.


Example:

Let's say, we have data of a retail shop of its six different transactions at some point of sale.

The transaction record is as follows:

Transaction ID Items Purchased 
T1 Burger, Cold Drink, Beer, Chips
T2 Milk, Cake, Eggs
T3 Beer, Chips
T4 Eggs, Juice, Milk, Burger, Chips, Beer
T5 Cold Drink, Cake, Chips
T6 Burger, Beer, Chips


Support Threshold

Before we proceed further lets know about Support Threshold. You must be wondering what is this new keyword all about. Well, its not so complicated. Apparently learning Apriori Algorithm could be a cake walk for you!

Support Threshold tells us the popularity/proportion of a specific product in store. Or in a more comfortable way, we can say how popular a specific product is or how often a product is being purchased in a given store.

Lets take an example:

Support (Burger) = No. of times burger is being purchased / Total number of transaction
Support (Burger) = 3 / 6
Support (Burger) = 50%

Support (Beer) = No. of times beer is being purchased / Total number of transaction
Support (Beer) = 4 / 6
Support (Beer) = 66.67%

Here we can interpret that burger is 50% popular whereas beer is 66.67% popular in store.


Now lets assume that our minimum Support Threshold is 50%. It means we would remove all the items whose minimum Support Threshold lies below 50%.

The reason behind setting up the minimum Support threshold is, if a product is less popular or frequent, it is very less likely that it can make association with another product.

So, minimum Support Threshold = Total number of transaction x 50%
minimum Support Threshold = 6 x 50%
minimum Support Threshold = 3

It signifies that we would remove all the items whose frequency is less than 3.


Items Frequency Items Frequency
Burger 3 Burger 3
Cold Drink 2 Cold Drink 2
Beer 4   Since min. Support is 3, we need to exclude all the items whose frequency is less than 3   Chips 5
Chips 5     Milk 2
Milk 2     Cake 2
Cake 2     Beer 4
Eggs 2 Eggs 2
Juice 1 Juice 1


After removing all the items whose minimum Support are less than 3, we are left with following items in our table.

Items       Frequency
Burger 3
Beer 4
Chips 5

Since our goal is to associate different items into a relevant item-set, further we will make pair of all the items that have been selected in previous step.

Make different item-set of two product (associate two products together)

So, here we are left with items e.g. Burger, Beer and Chips. So, we can have maximum three sets e.g. <Burger, Beer>, <Burger, Chips> and <Beer, Chips>

Item Set       Frequency
Burger, Beer 3
Burger, Chips 3
Beer, Chips 4

Since, the frequency of all the item-sets are higher than 3 (min. Support Threshold), we will select all the items for further operation.

Make different item-set of three product (associate three products together)

Now out of these three items e.g. Burger, Beer and Chips, we can have only one set that consists of all the three items e.g. <Burger, Beer, Chips>

Item Set         Frequency
Burger, Beer, Chips 3


In the end we are left with only one set and there can't have more association out of it.

From the above analysis, it is found that Burger, Beer and Chips can be associated to each other. It can also be concluded that if <Beer> is found to be infrequent, we can expect <Burger, Chips>  to be infrequent too.


So, now retailer can make some specific aisles placing Burger, Beer and Chips beside each other to help customers pick products easily from the store and also this can boost the store sales simultaneously.



Caution:

- Here in the given example, we had only eight items in the list. So, just think of a big store where thousands of products go live. How cumbersome and expensive it would be to calculate Support Threshold since it has to go through entire database?

- When we have to find a large number of candidate rules, it can be computationally expensive.


As we have seen from the above example that how easy Apriori Algorithm is to implement and interpret, it has been used in market research extensively.


How often do you use Apriori Algorithm for your business segment? Do you use some other alternatives of Association Mining? Much appreciated if let me know your thoughts in the comments below. Please do share the post with your friends as well!

Comments

  1. Great explanation! Indeed it is so easy to understand.

    ReplyDelete
  2. Can you show some light on some alternative approach of this algorithm?

    ReplyDelete

Post a Comment