This example below illustrates how to use Analytic Solver Data Mining’s Association Rules method using the example dataset contained in the file, Associations.xlsx. Click Help – Example Models on the Data Mining ribbon, then Forecasting/Data Mining Examples to open this dataset.
- Click Associate – Association Rules to open the Association Rules dialog.
- Since the data contained in the Associations.xlsx dataset are all 0’s and 1’s, Data in binary matrix format is selected by default for the option, Input data format. Analytic Solver Data Mining will treat the data as a matrix of two entities -- zeros and non-zeros. A 0 signifies that the item is absent in that transaction and a 1 signifies the item is present.
- Note: If a value other than 0 or 1 were present in the dataset, Data in Item List Format would have been selected by default for the option, Input Data Format.
- Keep the default of 200 for the Minimum Support (# transactions). This option specifies the minimum number of transactions in which a particular item-set must appear to qualify for inclusion in an association rule.
- Keep the default of 50 for Minimum confidence %. This option specifies the minimum confidence threshold for rule generation. If A is the set of Antecedents and C the set of Consequents, then only those A =>C ("Antecedent implies Consequent") rules will qualify, for which the ratio (support of A U C) / (support of A) at least equals this percentage.
- Click OK.
AR_Output is inserted to the right of the Assoc_binary worksheet.
Rule 27 indicates that if a Cook book and a Reference book is purchased, then with 80% confidence a Child book will also be purchased. The A - Support indicates that the rule has the support of 305 transactions, meaning that 305 people bought a cook book and a Reference book. The C - Support column indicates the number of transactions involving the purchase of Child books. The Support column indicates the number of transactions where all three types were purchased.
The Lift Ratio indicates how much more likely a transaction will be found where all three book types (Cook, Reference, and Child) are purchased, as compared to the entire population of transactions. In other words, the Lift Ratio is the Confidence divided by the percentage of C-Support transactions in the entire dataset. The percentage of C-Support transactions in the entire dataset for Rule 27 is .423 (846/2000). Confidence is then divided by this value to find the Lift Ratio or 0.803/.423 = 1.899. Given support at 80.3% and a lift ratio of 1.899 (lift ratio > 1), this rule can be considered “useful”.