学习笔记之scikit-learn - 浩然119 - 博客园
- https://www.cnblogs.com/pegasus923/p/9997485.html
- 3. Model selection and evaluation — scikit-learn 0.20.3 documentation
- https://scikit-learn.org/stable/model_selection.html#model-selection
Accuracy paradox - Wikipedia
- https://en.wikipedia.org/wiki/Accuracy_paradox
- The accuracy paradox is the finding that is not a good metric for when in . This is because a simple model may have a high level of accuracy but be too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that every case is category A will have an accuracy of 99%. are better measures in such cases. The underlying issue is that class priors need to be accounted for in error analysis. Precision and recall help, but precision too can be biased by very unbalanced class priors in the test sets.
Confusion matrix - Wikipedia
- https://en.wikipedia.org/wiki/Confusion_matrix
- In the field of and specifically the problem of , a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a one (in it is usually called a matching matrix). Each row of the represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
- It is a special kind of , with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
- condition positive (P) the number of real positive cases in the data
- condition negative (N) the number of real negative cases in the data
- true positive (TP) eqv. with hit
- true negative (TN) eqv. with correct rejection
- false positive (FP) eqv. with ,
- false negative (FN) eqv. with miss,
- , , , or
- , or
Sensitivity and specificity - Wikipedia
- https://en.wikipedia.org/wiki/Sensitivity_and_specificity
- Sensitivity and specificity are statistical measures of the performance of a , also known in statistics as a :
- Sensitivity (also called the true positive rate, the , or probability of detection in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
- Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
- In general, Positive = identified and negative = rejected. Therefore:
- True positive = correctly identified
- False positive = incorrectly identified
- True negative = correctly rejected
- False negative = incorrectly rejected
- https://en.wikipedia.org/wiki/Precision_and_recall
- In , and , precision (also called ) is the fraction of relevant instances among the retrieved instances, while recall (also known as ) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Both precision and recall are therefore based on an understanding and measure of .
- Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 identified as dogs, 5 actually are dogs (true positives), while the rest are cats (false positives). The program's precision is 5/8 while its recall is 5/12. When a returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. So, in this case, precision is "how useful the search results are", and recall is "how complete the results are".
- In , if the is that all items are irrelevant (where the hypothesis is accepted or rejected based on the number selected compared with the sample size), absence of (i.e.: perfect of 100% each) corresponds respectively to perfect precision (no false positive) and perfect recall (no false negative). The above pattern recognition example contained 8 − 5 = 3 type I errors and 12 − 5 = 7 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. The exact relationship between to precision depends on the percent of positive cases in the population.
- In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant ones, while high recall means that an algorithm returned most of the relevant results.
True condition | ||||||
Condition positive | Condition negative | = Σ Condition positive/Σ Total population | (ACC) = Σ True positive + Σ True negative/Σ Total population | |||
Predictedcondition | Predicted conditionpositive | , | , | (PPV), = Σ True positive/Σ Predicted condition positive | (FDR) = Σ False positive/Σ Predicted condition positive | |
Predicted conditionnegative | , | (FOR) = Σ False negative/Σ Predicted condition negative | (NPV) = Σ True negative/Σ Predicted condition negative | |||
(TPR), , , probability of detection = Σ True positive/Σ Condition positive | (FPR), , probability of false alarm = Σ False positive/Σ Condition negative | (LR+) = TPR/FPR | (DOR) = LR+/LR− | = 2 · Precision · Recall/Precision + Recall | ||
(FNR), Miss rate = Σ False negative/Σ Condition positive | (SPC), Selectivity, (TNR) = Σ True negative/Σ Condition negative | (LR−) = FNR/TNR |
Receiver operating characteristic - Wikipedia
- https://en.wikipedia.org/wiki/Receiver_operating_characteristic
- A receiver operating characteristic curve, or ROC curve, is a that illustrates the diagnostic ability of a system as its discrimination threshold is varied.
- The ROC curve is created by plotting the (TPR) against the (FPR) at various threshold settings. The true-positive rate is also known as , or probability of detection in . The false-positive rate is also known as the or probability of false alarm and can be calculated as (1 − ). It can also be thought of as a plot of the as a function of the of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity as a function of . In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the (area under the probability distribution from {\displaystyle -\infty } to the discrimination threshold) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability on the x-axis.
- ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic .
- The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battlefields and was soon introduced to to account for perceptual detection of stimuli. ROC analysis since then has been used in , , , of ,, model performance assessment, and other areas for many decades and is increasingly used in and research.
- The ROC is also known as a relative operating characteristic curve, because it is a comparison of two operating characteristics (TPR and FPR) as the criterion changes.
Machine Learning with Python: Confusion Matrix in Machine Learning with Python
- https://www.python-course.eu/confusion_matrix.php
学习笔记之Machine Learning Crash Course | Google Developers - 浩然119 - 博客园
- https://www.cnblogs.com/pegasus923/p/10508444.html
- Classification: ROC Curve and AUC | Machine Learning Crash Course | Google Developers
- https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
- 一键入门型介绍,基础知识介绍得很系统。
精确率与召回率,RoC曲线与PR曲线 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/5993450.html
- 主要是概念上的介绍。
机器学习之分类性能度量指标 : ROC曲线、AUC值、正确率、召回率 - 简书
- https://www.jianshu.com/p/c61ae11cc5f6
- https://zhwhong.cn/2017/04/14/ROC-AUC-Precision-Recall-analysis/
- 详细介绍ROC/Theshold影响/AUC并有配图,能更好理解。
精确率、召回率、F1 值、ROC、AUC 各自的优缺点是什么? - 知乎
- https://www.zhihu.com/question/30643044
- 对ROC / PR / Threshold的调整 解释得很详细很到位。
模型评估方法基础总结 - AI遇见机器学习
- https://mp.weixin.qq.com/s/nZfu90fOwfNXx3zRtRlHFA
- 基础概念介绍。
- 一、留出法
- 二、交叉验证
- 1.简单交叉验证
- 2.S折交叉验证
- 3.留一交叉验证
- 三、自助法
- 四、调参与最终模型
- 我们在算法学习中,还经常会遇到有参数(parameter)需要设定(像是梯度上升的步长),参数配置的不同,往往也会影响到模型的性能。这中对算法参数的设定,就是我们通常所说的“参数调节”,简称调参(parameter tuning)。
- 而机器学习涉及的参数有两种:
- 第一种是我们需要人为设置的参数,这种参数称为超参数,数目通常在10个以内
- 另一类是模型参数,数目可能很多,在大型深度学习模型中甚至会有上百亿个参数。
全面理解模型性能评估方法 - 机器学习算法与自然语言处理
- https://mp.weixin.qq.com/s/5kWdmi8LgdDTjJ40lqz9_A
- 总结介绍各个方法,并有公式配图。
- 评估模型,不仅需要有效可行的实验估计方法,还需要有衡量模型泛化能力的评价标准,这便是性能度量(performance measure)。
- 性能度量反映任务需求,在对比不同模型的能力时,使用不同的性能度量往往会导致不同的评判结果,也即是说,模型的好坏其实也是相对的,什么样的模型是“合适”的,不仅和算法与数据有关,还和任务需求有关,而本章所述的性能度量,便是由任务需求出发,用于衡量模型的方法。
- 一、均方误差
- 二、错误率与精度
- 三、查准率、查全率
- 四、平衡点(Break-Even Point , BEP)与F1
- 五、多个二分类混淆矩阵的综合考查
- 六、ROC与AUC
- 七、代价敏感错误率与代价曲线
How to tune threshold to get different confusion matrix ?
- Note : be careful to avoid overfitting.
- classification - Scikit - changing the threshold to create multiple confusion matrixes - Stack Overflow
- https://stackoverflow.com/questions/32627926/scikit-changing-the-threshold-to-create-multiple-confusion-matrixes
- python - scikit .predict() default threshold - Stack Overflow
- https://stackoverflow.com/questions/19984957/scikit-predict-default-threshold
- python - How to set a threshold for a sklearn classifier based on ROC results? - Stack Overflow
- https://stackoverflow.com/questions/41864083/how-to-set-a-threshold-for-a-sklearn-classifier-based-on-roc-results?noredirect=1&lq=1
- python - how to set threshold to scikit learn random forest model - Stack Overflow
- https://stackoverflow.com/questions/49785904/how-to-set-threshold-to-scikit-learn-random-forest-model