Type II Error in statistical hypothesis testing is incorrectly retaining a false null hypothesis (a “false negative”). A type II error (or error of the second kind) is the failure to reject a false null hypothesis. Examples of type II errors would be a blood test failing to detect the disease it was designed to detect, in a patient who really has the disease; a…

Read More## Category: GLOSSARY

Type I Error in statistical hypothesis testing is the incorrect rejection of a true null hypothesis (a false positive). More simply stated, a type I error is detecting an effect that is not present. A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. Usually, a type I error leads one to conclude that a supposed…

Read MoreTrue Positive Rate (Sensitivity) is a statistical measure which measures the proportion of positives that are correctly identified as such (for example, the percentage of sick people who are correctly identified as having the condition). Another way to understand it, with examples in the context of medical tests is that sensitivity is the extent to which true positives are not missed/overlooked. Thus a highly sensitive…

Read MoreTrue Negative Rate (Specificity) is a statistical measure which measures the proportion of negatives that are correctly identified as such (for example, the percentage of healthy people who are correctly identified as not having the condition). Specificity is the extent to which positives really represent the condition of interest and not some other condition being mistaken for it. A highly specific test rarely registers a…

Read MoreThree Sigma Rule in the empirical sciences express a conventional heuristic that “nearly all” values are taken to lie within three standard deviations of the mean, i.e. that it is empirically useful to treat 99.7% probability as “near certainty”.The rule states that even for non-normally distributed variables, at least 88.8% of cases should fall within properly-calculated three-sigma intervals. It follows from Chebyshev’s Inequality. For unimodal…

Read MoreSupport Vector Machines (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good…

Read MoreSupervised Learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which…

Read MoreStatistical Significance in statistical hypothesis testing is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the probability of obtaining results at least as extreme as those observed, given that the null hypothesis is true. The significance level, α, is the probability of rejecting the null hypothesis, given that it is true….

Read MoreStatistical Power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. The power is a function of the possible distributions, often determined by a parameter, under the alternative hypothesis. As the power increases, there are decreasing chances of a…

Read MoreSentiment Analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. Generally…

Read More