博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
学习笔记之Model selection and evaluation
阅读量:6489 次
发布时间:2019-06-24

本文共 10053 字,大约阅读时间需要 33 分钟。

学习笔记之scikit-learn - 浩然119 - 博客园

  • https://www.cnblogs.com/pegasus923/p/9997485.html
  • 3. Model selection and evaluation — scikit-learn 0.20.3 documentation
    • https://scikit-learn.org/stable/model_selection.html#model-selection

Accuracy paradox - Wikipedia

  • https://en.wikipedia.org/wiki/Accuracy_paradox
  • The accuracy paradox is the  finding that  is not a good metric for  when  in . This is because a simple model may have a high level of accuracy but be too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that every case is category A will have an accuracy of 99%.  are better measures in such cases. The underlying issue is that class priors need to be accounted for in error analysis. Precision and recall help, but precision too can be biased by very unbalanced class priors in the test sets.

Confusion matrix - Wikipedia

  • https://en.wikipedia.org/wiki/Confusion_matrix
  • In the field of  and specifically the problem of , a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a  one (in  it is usually called a matching matrix). Each row of the  represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
  • It is a special kind of , with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
  • condition positive (P) the number of real positive cases in the data
  • condition negative (N) the number of real negative cases in the data
  • true positive (TP) eqv. with hit
  • true negative (TN) eqv. with correct rejection
  • false positive (FP) eqv. with , 
  • false negative (FN) eqv. with miss, 
  • , , , or 
    • {\displaystyle \mathrm {TPR} ={\frac {\mathrm {TP} }{P}}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} }}=1-\mathrm {FNR} }
  • ,  or 
    • {\displaystyle \mathrm {TNR} ={\frac {\mathrm {TN} }{N}}={\frac {\mathrm {TN} }{\mathrm {TN} +\mathrm {FP} }}=1-\mathrm {FPR} }

Sensitivity and specificity - Wikipedia

  • https://en.wikipedia.org/wiki/Sensitivity_and_specificity
  • Sensitivity and specificity are statistical measures of the performance of a , also known in statistics as a :
    • Sensitivity (also called the true positive rate, the , or probability of detection in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
    • Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
  • In general, Positive = identified and negative = rejected. Therefore:
    • True positive = correctly identified
    • False positive = incorrectly identified
    • True negative = correctly rejected
    • False negative = incorrectly rejected

 
Precision and recall - Wikipedia
  • https://en.wikipedia.org/wiki/Precision_and_recall
  • In ,  and , precision (also called ) is the fraction of relevant instances among the retrieved instances, while recall (also known as ) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Both precision and recall are therefore based on an understanding and measure of .
  • Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 identified as dogs, 5 actually are dogs (true positives), while the rest are cats (false positives). The program's precision is 5/8 while its recall is 5/12. When a  returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. So, in this case, precision is "how useful the search results are", and recall is "how complete the results are".
  • In , if the  is that all items are irrelevant (where the hypothesis is accepted or rejected based on the number selected compared with the sample size), absence of (i.e.: perfect  of 100% each) corresponds respectively to perfect precision (no false positive) and perfect recall (no false negative). The above pattern recognition example contained 8 − 5 = 3 type I errors and 12 − 5 = 7 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. The exact relationship between  to precision depends on the percent of positive cases in the population.
  • In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant ones, while high recall means that an algorithm returned most of the relevant results.

  True condition
  Condition positive Condition negative  = Σ Condition positive/Σ Total population  (ACC) = Σ True positive + Σ True negative/Σ Total population
Predicted
condition
Predicted condition
positive
,
,
 (PPV),  = Σ True positive/Σ Predicted condition positive  (FDR) = Σ False positive/Σ Predicted condition positive
Predicted condition
negative
,
 (FOR) = Σ False negative/Σ Predicted condition negative  (NPV) = Σ True negative/Σ Predicted condition negative
   (TPR), , , probability of detection = Σ True positive/Σ Condition positive  (FPR), , probability of false alarm = Σ False positive/Σ Condition negative  (LR+) = TPR/FPR  (DOR) = LR+/LR−  = 2 · Precision · Recall/Precision + Recall
 (FNR), Miss rate = Σ False negative/Σ Condition positive  (SPC), Selectivity,  (TNR) = Σ True negative/Σ Condition negative  (LR−) = FNR/TNR

Receiver operating characteristic - Wikipedia

  • https://en.wikipedia.org/wiki/Receiver_operating_characteristic
  • receiver operating characteristic curve, or ROC curve, is a  that illustrates the diagnostic ability of a  system as its discrimination threshold is varied.
  • The ROC curve is created by plotting the  (TPR) against the  (FPR) at various threshold settings. The true-positive rate is also known as ,  or probability of detection in . The false-positive rate is also known as the  or probability of false alarm and can be calculated as (1 − ). It can also be thought of as a plot of the  as a function of the  of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity as a function of . In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the  (area under the probability distribution from {\displaystyle -\infty }-\infty to the discrimination threshold) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability on the x-axis.
  • ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic .
  • The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battlefields and was soon introduced to  to account for perceptual detection of stimuli. ROC analysis since then has been used in , , ,  of ,, model performance assessment, and other areas for many decades and is increasingly used in  and  research.
  • The ROC is also known as a relative operating characteristic curve, because it is a comparison of two operating characteristics (TPR and FPR) as the criterion changes.

Machine Learning with Python: Confusion Matrix in Machine Learning with Python

  • https://www.python-course.eu/confusion_matrix.php

学习笔记之Machine Learning Crash Course | Google Developers - 浩然119 - 博客园

  • https://www.cnblogs.com/pegasus923/p/10508444.html
  • Classification: ROC Curve and AUC  |  Machine Learning Crash Course  |  Google Developers
    • https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc  
    • 一键入门型介绍,基础知识介绍得很系统。

精确率与召回率,RoC曲线与PR曲线 - 刘建平Pinard - 博客园

  • https://www.cnblogs.com/pinard/p/5993450.html
  • 主要是概念上的介绍。

机器学习之分类性能度量指标 : ROC曲线、AUC值、正确率、召回率 - 简书

  • https://www.jianshu.com/p/c61ae11cc5f6
  • https://zhwhong.cn/2017/04/14/ROC-AUC-Precision-Recall-analysis/
  • 详细介绍ROC/Theshold影响/AUC并有配图,能更好理解。

精确率、召回率、F1 值、ROC、AUC 各自的优缺点是什么? - 知乎

  • https://www.zhihu.com/question/30643044
  • 对ROC / PR / Threshold的调整 解释得很详细很到位。

模型评估方法基础总结 - AI遇见机器学习

  • https://mp.weixin.qq.com/s/nZfu90fOwfNXx3zRtRlHFA
  • 基础概念介绍。
  • 一、留出法
  • 二、交叉验证
    • 1.简单交叉验证
    • 2.S折交叉验证
    • 3.留一交叉验证
  • 三、自助法
  • 四、调参与最终模型
    • 我们在算法学习中,还经常会遇到有参数(parameter)需要设定(像是梯度上升的步长),参数配置的不同,往往也会影响到模型的性能。这中对算法参数的设定,就是我们通常所说的“参数调节”,简称调参(parameter tuning)。
    • 而机器学习涉及的参数有两种:
      • 第一种是我们需要人为设置的参数,这种参数称为超参数,数目通常在10个以内
      • 另一类是模型参数,数目可能很多,在大型深度学习模型中甚至会有上百亿个参数。

全面理解模型性能评估方法 - 机器学习算法与自然语言处理

  • https://mp.weixin.qq.com/s/5kWdmi8LgdDTjJ40lqz9_A
  • 总结介绍各个方法,并有公式配图。
  • 评估模型,不仅需要有效可行的实验估计方法,还需要有衡量模型泛化能力的评价标准,这便是性能度量(performance measure)。
  • 性能度量反映任务需求,在对比不同模型的能力时,使用不同的性能度量往往会导致不同的评判结果,也即是说,模型的好坏其实也是相对的,什么样的模型是“合适”的,不仅和算法与数据有关,还和任务需求有关,而本章所述的性能度量,便是由任务需求出发,用于衡量模型的方法。
  • 一、均方误差
  • 二、错误率与精度
  • 三、查准率、查全率
  • 四、平衡点(Break-Even Point , BEP)与F1
  • 五、多个二分类混淆矩阵的综合考查
  • 六、ROC与AUC
  • 七、代价敏感错误率与代价曲线

How to tune threshold to get different confusion matrix ?

  • Note : be careful to avoid overfitting.
  • classification - Scikit - changing the threshold to create multiple confusion matrixes - Stack Overflow
    • https://stackoverflow.com/questions/32627926/scikit-changing-the-threshold-to-create-multiple-confusion-matrixes
  • python - scikit .predict() default threshold - Stack Overflow
    • https://stackoverflow.com/questions/19984957/scikit-predict-default-threshold
  • python - How to set a threshold for a sklearn classifier based on ROC results? - Stack Overflow
    • https://stackoverflow.com/questions/41864083/how-to-set-a-threshold-for-a-sklearn-classifier-based-on-roc-results?noredirect=1&lq=1
  • python - how to set threshold to scikit learn random forest model - Stack Overflow
    • https://stackoverflow.com/questions/49785904/how-to-set-threshold-to-scikit-learn-random-forest-model

转载于:https://www.cnblogs.com/pegasus923/p/10469919.html

你可能感兴趣的文章
大声说出你对女神的爱!Geek is A choice. Girls make difference. ...
查看>>
oracle查看执行计划
查看>>
RedisManager使用手册(五)-- 自定义Redis安装包
查看>>
「镁客早报」小米松果分拆成立大鱼半导体;ofo回应破产传闻 ...
查看>>
Linux基础命令---文本显示head
查看>>
JavaScript 日期权威指南
查看>>
一个关于临时对象的BUG
查看>>
spring cloud微服务分布式云架构-Spring Cloud Netflix
查看>>
小程序入门---开发工具的使用
查看>>
分布式系统架构之消息系统
查看>>
Hibernate分页
查看>>
一枚戒指,一场仪式,这件事阿里巴巴坚持了15年
查看>>
被严重低估的Web开发框架:WordPress
查看>>
【直播预告】云栖社区特邀专家蒋泽银:Jpom一款低侵入式Java运维、监控软件...
查看>>
AMH面板如何部署SSL证书
查看>>
2018MaxCompute开发者圣诞趴 — 承认吧,你向往的不是红包,而是最前沿的大数据技术...
查看>>
平台的核心交互与基础角色——互联网平台建设系列
查看>>
String(JDK1.8) 源码阅读记录
查看>>
威马汽车完成30亿元C轮融资,百度领投
查看>>
Java编程微服务架构框架-监控与管理(SpringBoot)
查看>>