Reading Lists
I've highlighted the most important papers, even if
they are earlier in the progression.
The AdaBoost Approach
- A decision-theoretic generalization of on-line learning and an
application to boosting. Freund and Schapire 1995/7. You have to
look at this paper, but you don't have to read it.
- Improved Boosting Algorithms Using Confidence-rated
Predictions. Schapire and Singer 1999. Sections 1-4 are an
absolute must read. Very concise and packed with useful
interpretations.
-
The boosting approach to machine learning: An overview. Schapire 2002.
Section 6. Supplement with the original Friedman, Hastie, Tibshirani
paper if desired. Describes an alternative and gentler loss
(a.k.a. potential, potential loss, cost, etc) function.
-
Boosting the Margin: A New Explanation for the Effectiveness
of Voting Methods. Schapire et al. 1998. Sections 1,5,6.
Understanding the idea is more important than the actual proofs.
-
Boosting Algorithms as Gradient Descent. Mason et al. 2000. Similar
in spirit to the view of Friedman, Hastie, and Tibshirani 1998.
Sections 1 and 2 develop the AnyBoost framework, which is a helpful
generalizations to AdaBoost.
The Boost By Majority Approach
- Boosting a weak learning algorithm by majority.
Freund 1990/5. The first part of section 2 (before 2.1) describes the
"boost by majority game".
- Improved Boosting Algorithms Using Confidence-rated
Predictions. Schapire and Singer 1999. Sections 1-4 are an absolute
must read. Very concise and packed with useful interpretations.
-
An adaptive version of the boost by majority
algorithm. Freund 1999/2000. IMHO, the biggest jump in
boosting since Schapire's original algorithm. There are two main
parts to the paper: 1) infinitely many iterations with infinitely
small movements and 2) setting alpha by satisfying the constraints: a)
expected value of $h(x)y$ is 0 w.r.t. the new weighting and b) the
average difference in potential. If you understand these two ideas
and their motivations (from Schapire and Singer 99 and Freund 1990/95)
the algorithm falls out naturally.
-
Drifting games. Schapire 2000. A very nice generalization of the
boost by majority paradigm to the more "natural" space. This
interpretation is also useful for understanding BrownBoost.
-
Continuous Drifting games. Freund and Opper 2002. An extension of
Schapire's drifting games to continuous domains. Both BrownBoost and
NormalBoost are natural consequences of this work.
The Bregmen Divergences, Geometry, and Optimization Approach
- Boosting as Entropy Projection. Kivinen and
Warmuth 1999. An interesting and very different approach that
minimizes convex divergence measures.
- Logistic Regression, AdaBoost and Bregman
Distances. Collins, Schapire, Singer 2000. Similar to
Kivinen and Warmuth; however, it provides a few very practical results
for converting AdaBoost to LogitBoost using an alternative loss
function. Also ties in well with some of the statistical approaches
to boosting.
- Linear Programming Boosting via Column Generation. Demiriz,
Bennett, and Shawe-Taylor 2002. Introduction of LPBoost and general
relationship to linear programming and boosting. A very complete
discussion, showing how LPBoost can be used in many of the classic
statistical settings.
- Totally Corrective Boosting Algorithms that Maximize the
Margin. Warmuth, Liao, and Ratsch 2006. Puts the LPBoost and
geometric (Bregman divergence of Kivinen and Warmuth) together.
Instead of minimizing the correlation of the hypothesis and the
subsequent weighting of the training set, TotalBoost minimizes the
corrleations between all previous hypotheses and the next
weighting. This leads to sparser sets of hypotheses and a very nifty
iteration bound.
- Boosting Algorithms for Maximizing the Soft Margin. Warmuth,
Glocer, and Ratsch 2007. Simple moral: TotalBoost overfits, SoftBoost
is noise resistant. SoftBoost comes with an iteration bound and noise
resistance using slack variables in SVM literature.
- Entropy Regularized LPBoost. Warmuth,
Glocer, and Vishwanathan 2008. Slam together all the good things in this framework
and you get... ERLP. This algorithm is surprisingly effective and
an excellent implmentation is on its way.
Statistical Approaches to Boosting
- Additive Logistic Regression: a Statistical View of
Boosting. Friedman, Hastie, Tibshirani 1999. An interesting paper
that casts boosting into the classic log likelihood model that all of
statistics follows. The resulting algorithmic contribution,
LogitBoost, can be implement in a AdaBoost framework (LogLossBoost in
JBoost) using an alternative weighting. Be sure to read rejoinders as
some of them gave me a chuckle.
- Friedman has a paper on \epsilon boosting which sounds very
interesting, though I have yet to read it.
- Tong Zhang has a lot of very good papers. I'll list the
highlights when I finish reading them... this may take a while.
- The question of consistency was a big deal in the statistics
community and was approached by a large variety of people. They
showed variants of boosting were consistent, but never AdaBoost. I
believe Peter Bartlett gets credit for finally showing that AdaBoost
is consistent.
-
Evidence Contrary to the Statistcal View of
Boosting. Mease and Wyner 2008. This paper is extremely
important to read if you plan on doing research in boosting for
practical purposes. It is not bulletproof (I'm sure someone will be
able to find issues with it), but it is a well thought out and
executed idea that uses empirical evidence instead of theory to drive
boosting research. Some of the details of the paper may be proven
slightly incorrect, but I believe that the overall idea will stand up
to scrutiny.
Asymmetric Cost
- AUC Optimization vs. Error Rate Minimization. This paper shows
that rankboost is directly related to optimizing the AUC over the
error rate. A nice set of ideas and summary of other work.
- Boosting the Area Under the ROC Curve. Uses the MartiBoost approach
to maintain an arbitrary distribution over examples of different classes.
This is a pretty large departure from normal boosting, so it has not seemed
to catch on yet.
- Fast and Robust Classification using Asymmetric AdaBoost
and a Detector Cascade. NIPS 2001. This is one of the
Viola & Jones papers. They use a simple reweighting and show some
improvement in classification. This and "sliding bias" are the
defacto asymmetric methods due to their simplicity of implementation.
(sliding bias is just forcing a certain margin to be obtained before
classifying.
- The Alternating Decision Tree Learning Algorithm. This paper is not
asymmetric cost, but the root is "Class is always +1" so it rebalances the
dataset after the first iteration, which is kinda nice. It is also
able to implement any (asymmetric) boosting algorithms. Also, the final
experimental method is interesting, since an area of abstention is used
to enforce a certain amount of confidence in each prediction.
- Asymmetric Boosting. Hamed Masnadi-Shirazi and Nuno Vasconcelos.
ICML 2007. This paper just puts a multiplicative factor in the
exponent of the reweighting equation. This actually reduces the
margin on the more important class!
- Boosting algorithms for maximizing the soft margin. While not a
asymmetric boosting paper, methods for asymmetric cost are obvious
extensions to this work. This would likely be better then asymmetric
AdaBoost since AdaBoost uses an exponential loss function. SoftBoost
is likely similar to the LogitBoost (or LogLossBoost) loss function,
so the reweighting would be similar to asymmetric LogitBoost. Both of
these would likely be better than asymmetric AdaBoost algorithms.
Misc Background Reading
- The strength of weak learnability. Schapire 1990. Cool
algorithm, cool analysis, cool write up. Unfortunately rendered
useless by majority vote methods.
- Any of the tutorials. There are a lot of really good boosting
tutorials that cover the whole spectrum of applied to
theoretical.
-
The alternating decision tree learning algorithm. Freund and Mason
1999. An interesting way to use boosting with decision trees/stumps.
-
Any of the empirical comparisons of different voting methods.
-
The ideas of bagging and arcing are frequently mentioned in boosting
papers. These are ideas of the extremely well respected statistician
Leo Breiman. While they haven't caught on in the same way that
boosting has (see "Boosting the margin..." for an explanation) they
are interesting ideas and bagging (along with bootstrapping) is widely
used for variance reduction of any estimator.