[language] Datamining, Statistics and Linguistics

H.M. Hubey hubeyh at mail.montclair.edu
Mon Jan 20 00:23:15 UTC 2003


<><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><>


This excerpt is from a brand new and very influential book. The Preface
explains why this
book was necessary.

-------------------begin here------------------------

The field of statistics is constantly challenged by the problems that
science and industry brings
to its door. In the early days, these problems often came from
agricultural and industrial experiments
and were relatively small in scope. With the advent of computers and the
information age, statistical
problems have exploded in both size and complexity. Challenges in the
areas of data storage, organization and searching have led to the new
field of "datamining"; statistical and computational
problems in biology and medicine have created "bioinformatics". Vast
amounts data are being
generated in many fields, and the statistician's job is to make sense of
it all: extract important
patterns and trends, and understand "what the data says." We call this
learning from data.

The challenges from data have led to a revolution in the statistical
sciences. Since computation
plays such a key role, it is not surprising that much of this new
development has been done
by researchers in other fields such as computer science and engineering.

The learning problems that we consider can be roughly categorized as
either supervised or
unsupervised. In supervised learning, the goal is to predict the value
of an outcome
measure based on a number of input measures; in unsupervised learning,
there is no outcome
measure, and the goal is to describe the associations and patterns among
a set of input
measures.

This book is our attempt to bring together many of the new ideas in
learning, and explain
them in a statistical framework. While some mathematical details are
needed, we emphasize
the methods and their conceptual underpinnings rather than their
theoretical properties. As
a result, we hope that this book will appeal not just to statisticians
but also to researchers
and practitioners in a wide variety of fields.

Just as we have learned a great deal from researchers outside the field
of statistics, our
statistical viewpoint may help others to better understand different
aspects of learning:

        There is no true interpretation of anything; interpretation is a
vehicle in the
        service of human comprehension. The value of interpretation is
in enabling
        others to fruitfully think about an idea.
                                                            Andreas Buja

We would like to acknowledge.......

                    Trevor Hastie

                    Robert Tibshirani

                    Jerome Friedman


                    May 2001


--
M. Hubey
-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o
The only difference between humans and machines is that humans
can be created by unskilled labor. Arthur C. Clarke

/\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey



---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><>
Copyrights/"Fair Use":  http://www.templetons.com/brad/copymyths.html
The "fair use" exemption to copyright law was created to allow things
such as commentary, parody, news reporting, research and education
about copyrighted works without the permission of the author. That's
important so that copyright law doesn't block your freedom to express
your own works -- only the ability to express other people's.
Intent, and damage to the commercial value of the work are
important considerations.

You are currently subscribed to language as: language at listserv.linguistlist.org
To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu



More information about the Language mailing list