Fragen:
- Explain KDD
- There is an alternative approach called CRISP. Can you explain the differences? -> It contains a loop, after some evaluation we go back to reiterate the previous steps to improve based on what we learned
- How to compare how close two Variables are? -> Pearsson Correlation, Covariance, Chi^2
- Explain Chi^2 test
- How would we compare two continuous valued variables? -> Kolmogorov-Smirnov
- Whats the difference between Mann-Whitney and K-S?
- What paradigmns have we seen for outlier detection? -> Density, Distance, Statistical, Hierarchical
- Whats the problem with the statistical approach? -> Assume some model , very difficult especially for multivariate Distributions
- Explain LOF
- For which values of LOF is a point an outlier? -> the higher the LOF the higher the outlierness
- How do we choose the k for LOF? -> Rule of thumb 2*d, else we need some ground truth to decide
- Explain the difference between trivial and non-trivial outliers
- Explain the difference between strong and weak outliers