By Grant S. Ingersoll
Grant S. Ingersoll, Thomas S. Morton and Andrew L. Farris, "Taming textual content: how to define, manage, and control It"
English | ISBN: 193398838X | 2013 | writer: Manning guides | EPUB | 320 pages | 10 + 10 MB
There is lots textual content in our lives, we're virtually drowning in it. thankfully, there are leading edge instruments and methods for dealing with unstructured info which can throw the clever developer a much-needed lifeline. You'll locate them during this book.
Taming textual content is a realistic, example-driven consultant to operating with textual content in actual purposes. This ebook introduces you to helpful ideas like full-text seek, right identify attractiveness, clustering, tagging, details extraction, and summarization. You'll discover actual use instances as you systematically take up the rules upon which they're equipped. Written in a transparent and concise type, this ebook avoids jargon, explaining the topic in phrases you could comprehend with out a historical past in records or usual language processing. Examples are in Java, however the strategies might be utilized in any language.
Purchase of the print publication comes with a suggestion of a unfastened PDF, ePub, and Kindle booklet from Manning. additionally to be had is all code from the book.
When to exploit text-taming techniques
Important open-source libraries like Solr and Mahout
How to construct text-processing applications
About the Authors
Grant Ingersoll is an engineer, speaker, and coach, a Lucene committer, and a cofounder of the Mahout machine-learning undertaking. Thomas Morton is the first developer of OpenNLP and greatest Entropy. Drew Farris is a expertise advisor, software program developer, and contributor to Mahout, Lucene, and Solr.
"Takes the secret out of very advanced processes."—From the Foreword by means of Liz Liddy, Dean, iSchool, Syracuse University
Table of Contents
Getting all started taming text
Foundations of taming text
Fuzzy string matching
Identifying humans, locations, and things
Classification, categorization, and tagging
Building an instance query answering system
Untamed textual content: exploring the following frontier