What are association rules? How do we find them?: Association rules are specific patterns in the dataset in form of “If A is in set than B will be in the set too with high probability”. We can find them with the Data Mining algorithms. At first, we have to find all frequent itemsets. Then we can use these frequent itemsets to find the association rules.
How can we describe association rules?: We can describe the association rules with two itemsets, like if A then B, the support of this association rule and the confidence.
Support is the proportion of transactions, that contain both A and B (yellow area). 1) All the person, that buy a, also buy b. Therefore a imply b. a → b and b → a have the same support, because the amount of transactions that contains a and b is the same in both cases. But the confidense of b → a is low, because the support of b without a is much bigger.
What are frequent/maximal/closed itemsets?: Some itemset is frequent, when its support is bigger than some threshold. Not frequent itemsets are not interesting for us.
A frequent itemset is maximal, if this set is not a part of some another frequent itemset. Frequent/Maximal depend on minimal support.
The itemset is closed if this set is not a part the set with the same frequency. Closeness depends not on the minimal support.
Give an exemplary association rule with high/low support/confidence: High support and low configense. Set B is huge, and set A ist small compared to it. Then B → A would have a low confidence. But if the B is for example bigger than the rest ofdata set, then the support of B → A would be high.
Why can support/confidence for association rules be misleading?: If two items negatively correlated but are frequent enough, we can produce some rule with good support and confidence for them. In this case “correct” rules will become low support and confidence values.
We can use an another measure, that is called Lift.
What are multi-dimensional association rules?: We can extend the principle of mining for association rules in the plain items sets to some relational data. For example, we can try to find some association rules in some bank table, that contains the nationality, age and income of customer.
What are multi-level association rules and how to find them?: we can have a hierarchy of objects instead of the plane list of objects. There could be interesting associations between different levels. We can compare the single items from one category with the whole another category. For example, White Bread → Milk. Or, more obviusly, red wine → meat, and white wine → fish.
We can encode each category and the corresponding items and specify different threshold for different hierarchy levels.
Then we count the presence of different prefixes in data set, then filter all the values below the threshold out. This data can be used to mine different multi-level assosiations.