Outlier Detection by Boosting Regression Trees

Chèze , Nathalie; Poggi, Jean-Michel

doi:10.18869/acadpub.jsri.3.1.1

[Home ] [Archive]

Journal of Statistical Research of Iran JSRI

Main Menu

Journal Information

Home

Archive

For Authors

For Reviewers

Principles of Transparency

Contact us

Search in website

Committed to

AWT IMAGE

Attribution-NonCommercial
CC BY-NC

AWT IMAGE

Open Access Publishing

Prevent Plagiarism

Registered in

Statistics

Journal volumes: 17

Journal issues: 34

Articles views: 833944

Articles downloads: 480190

Total authors: 581

Unique authors: 422

Repeated authors: 159

Repeated authors percent: 27

Submitted articles: 371

Accepted articles: 266

Rejected articles: 25

Published articles: 219

Acceptance rate: 71.7

Rejection rate: 6.74

Average Time to Accept: 282 days

Average Time to First Review: 27.2 days

Average Time to Publish: 26.1 days

Last 3 years statistics:

Submitted articles: 6

Accepted articles: 0

Rejected articles: 0

Published articles: 0

Acceptance rate: 0

Rejection rate: 0

Average Time to Accept: 0 days

Average Time to First Review: 0 days

Average Time to Publish: 0 days

____

Volume 3, Issue 1 (9-2006)

JSRI 2006, 3(1): 1-22

Back to browse issues page

Outlier Detection by Boosting Regression Trees

Nathalie Chèze

, Jean-Michel Poggi¹

1- , Jean-Michel.Poggi@math.u-psud.fr

Abstract: (4252 Views)

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of the average number of appearances in bootstrap samples. So the procedure is noise distribution free. It allows to select outliers as particularly hard to predict observations. A lot of well-known bench data sets are considered and a comparative study against two well-known competitors allows to show the value of the method.