본문 바로가기

카테고리 없음

German Credit Data Set Arff Download



Couple days ago I was looking for well-known dataset – german credit. It is a good starter for practicing credit risk scoring. Unfourtuanetly I have found only original file in.data format without column names. Jan 09, 2019  German Credit Card (Source: VectorStock) Introduction of Exploratory Data Analysis (EDA) Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as.

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In our data science course, this morning, we’ve use random forrest to improve prediction on the German Credit Dataset. The dataset is

German Credit Data Set Arff Download Free

App download play store app. Almost all variables are treated a numeric, but actually, most of them are factors,

German credit data set arff download 2017

(etc). Let us convert categorical variables as factors,

Windows software download sites #14 FileCluster pc software free. download full versionThis website was founded in the year 2006 and it is providing latest and updated software until now.

Let us now create our training/calibration and validation/testing datasets, with proportion 1/3-2/3

The first model we can fit is a logistic regression, on selected covariates

Lucia maria mollin. Based on that model, it is possible to draw the ROC curve, and to compute the AUC (on ne validation dataset)

An alternative is to consider a logistic regression on all explanatory variables

We might overfit, here, and we should observe that on the ROC curve

There is a slight improvement here, compared with the previous model, where only five explanatory variables were considered.

Consider now some regression tree (on all covariates)

Download

We can visualize the tree using

The ROC curve for that model is

German Credit Data Set Arff Download

As expected, a single has a lower performance, compared with a logistic regression. And a natural idea is to grow several trees using some boostrap procedure, and then to agregate those predictions.

German Credit Data Set Arff Download

German Credit Data Set Arff Download 2017

German Credit Data Set Arff Download

Here this model is (slightly) better than the logistic regression. Actually, if we create many training/validation samples, and compare the AUC, we can observe that – on average – random forests perform better than logistic regressions,

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates

German Credit Data Set Arff Download Online

about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.