Northwestern Polytechnic University Anomaly Detection Discussion Questions
Description
Having Trouble Meeting Your Deadline?
Get your assignment on Northwestern Polytechnic University Anomaly Detection Discussion Questions completed on time. avoid delay and – ORDER NOW
YOUR ANSWERS MUST APPEAR WITHIN THIS PROBLEM DOCUMENT.
YOU MUST WRITE USING YOUR OWN WORDS.ANSWERS TAKEN FROM THE INTERNET OR ANSWERS THAT MATCH ANOTHER STUDENTS WILL RECEIVE ZERO (0) POINTS.
10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.
10% WILL BE DEDUCTED IF YOU CREATE A “TITLE PAGE” TYPE OF DOCUMENT.
1. Many statistical tests for outliers were developed in an environment in which a few hundred observations was a large data set. We explore the limitations of such approaches.
(a) For a set of 1,000,000 values, how likely are we to have outliers according to the test that says a value is an outlier if it is more than three standard deviations from the average? (Assume a normal distribution.)
ANSWER:
(b) Does the approach that states an outlier is an object of unusually low probability need to be adjusted when dealing with large data sets? If so, how?
ANSWER:
2. Consider the (relative distance) K-means scheme for outlier detection described in Section 10.5 and the accompanying figure
(a) The points at the bottom of the compact cluster shown in the figure have a somewhat higher outlier score than those points at the top of the compact cluster. Why?
ANSWER:
(b) Suppose that we choose the number of clusters to be much larger, e.g., 10. Would the proposed technique still be effective in finding the most extreme outlier at the top of the figure? Why or why not?
ANSWER:
(c) The use of relative distance adjusts for differences in density. Give an example of where such an approach might lead to the wrong conclusion.
ANSWER: