论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2008-05-16 11:29 |只看该作者 |倒序浏览

Without extraction tools

Tools are needed to manage all available information including the Web,subscription services, and internal data stores. Without an extraction tool (aproduct specifically designed to find, organize, and output the data you want),you have very poor choices for getting information. Your choices are:

Use search engines Search engines help find some Web information, butthey do not pinpoint information, cannot fill out web forms they encounter toget you the information you need, are perpetually behind in indexing content,and at best, can only go two or three levels deep into a Web site. And theycannot search file directories on your network.

Manually surf the Web and filedirectories Aside from the labor-intensiveaspect of this option, the work is tedious, costly, error prone, and very timeconsuming. Humans have to read the content of each page to see if it matchestheir criteria, whereas a computer is simply matching patterns, which is somuch faster.

Create custom programming Custom programming is costly, can be buggy, requiresmaintenance, and takes time to develop. Plus the programs must be constantlyupdated as the location of information frequently changes.

Inefficient methods means the information analyst spends time finding,collecting, and aggregating data instead of analyzing data and gaining thecompetitive edge. This also affects the application programmer who has to spendtime developing extraction tools instead of developing tools for the corebusiness.

For more information,please visit our website: http://www.knowlesys.com
New solutionsimprove productivity
Extraction tools using a concise notation to define precise navigation andextraction rules greatly reduce the time spent on systematic collectionefforts. Tools that support a variety of format options provide a singledevelopment platform for all collection needs regardless of electronicinformation source.

Early attempts at software tools for “Web harvesting” and unstructureddata mining emerged, and started to get the attention of informationprofessionals. These products did a reasonable job of finding and extractingWeb information for intelligence gathering purposes. But this was not enough.Organizations needed to reach the “deep Web” and other electronic informationsources, capabilities beyond simplistic Web content clipping.

A new generation of information extraction tools is markedly improvingproductivity for information analysts and application developers.

Uses forextraction tools
The most popular applications for information extraction tools remaincompetitive intelligence gathering and market research, but there are some newapplications emerging as organizations learn how to better use thefunctionality in the new generation of tools.

Deep Web price gathering The explosion of e-tailing, e-business, ande-government makes a plethora of competitive pricing information available onWeb sites and government information portals. Unfortunately, price lists aredifficult to extract without selecting product categories or filling out Webforms. Also, some prices are buried deep in .pdf documents. Automated formscompletion and automated downloading are necessary features to retrieve pricesfrom the deep Web.

Primary research Message boards, e-pinion sites, and other Web forumsprovide a wealth of public opinion and user experience information on consumerproducts, air travel, test drives, experimental drugs, etc. While much of thisinformation can be found with a search engine, features like simultaneous boardcrawling, selective content extraction, task scheduling, and custom outputreformatting are only available with extraction tools.

Content aggregation forinformation portals Content isexploding and available from Web and non-Web sources. Extraction tools cancrawl the Web, internal information sources, and subscription services toautomatically populate portals with pertinent content such as competitiveinformation, news, and financial data.

Supporting CRM systems The Web is a valuable source of external data toselectively populate a data warehouse or a CRM database. To date mostorganizations focus on aggregating internal data for their data warehouses andCRM systems. Now, however, some organizations are realizing the value of addingexternal data as well. In the book Web Farming for the Data Warehouse fromMorgan Kaufman Publishers, Dr. Richard Hackathorn writes, “It is the synergismof external market information with internal customer data that creates thegreatest business benefit."

Scientific research Scientific information on a given topic (such as agene sequence) is available on multiple Web sites and subscription services. Aneffective extraction tool can automate the location and extraction of thisinformation and aggregate it into a single presentation format or portal. Thissaves scientific researchers countless hours of searching, reading, copying,and pasting.

Business activity monitoring Extraction tools can continuously monitor dynamicallychanging information sources to provide real time alerts and to populateinformation portals and dashboards.

For more information, please visit our website: http://www.knowlesys.com

文库|博客

leoby

白手起家

论坛徽章:: 0

2楼 [报告]

发表于 2008-10-29 12:37 |只看该作者

up

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › 数据库技术 › DB2 › Why web data extraction service?

Why web data extraction service? [复制链接]

浏览过的版块