免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4133 | 回复: 1
打印 上一主题 下一主题

Why web data extraction service? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-05-16 11:29 |只看该作者 |倒序浏览
Without extraction tools
Tools are needed to manage all available information including the Web,subscription services, and internal data stores. Without an extraction tool (aproduct specifically designed to find, organize, and output the data you want),you have very poor choices for getting information. Your choices are:
Use search engines Search engines help find some Web information, butthey do not pinpoint information, cannot fill out web forms they encounter toget you the information you need, are perpetually behind in indexing content,and at best, can only go two or three levels deep into a Web site. And theycannot search file directories on your network.
Manually surf the Web and filedirectories Aside from the labor-intensiveaspect of this option, the work is tedious, costly, error prone, and very timeconsuming. Humans have to read the content of each page to see if it matchestheir criteria, whereas a computer is simply matching patterns, which is somuch faster.
Create custom programming Custom programming is costly, can be buggy, requiresmaintenance, and takes time to develop. Plus the programs must be constantlyupdated as the location of information frequently changes.
Inefficient methods means the information analyst spends time finding,collecting, and aggregating data instead of analyzing data and gaining thecompetitive edge. This also affects the application programmer who has to spendtime developing extraction tools instead of developing tools for the corebusiness.

For more information,please visit our website: http://www.knowlesys.com
New solutionsimprove productivity
Extraction tools using a concise notation to define precise navigation andextraction rules greatly reduce the time spent on systematic collectionefforts. Tools that support a variety of format options provide a singledevelopment platform for all collection needs regardless of electronicinformation source.
Early attempts at software tools for “Web harvesting” and unstructureddata mining emerged, and started to get the attention of informationprofessionals. These products did a reasonable job of finding and extractingWeb information for intelligence gathering purposes. But this was not enough.Organizations needed to reach the “deep Web” and other electronic informationsources, capabilities beyond simplistic Web content clipping.
A new generation of information extraction tools is markedly improvingproductivity for information analysts and application developers.

Uses forextraction tools
The most popular applications for information extraction tools remaincompetitive intelligence gathering and market research, but there are some newapplications emerging as organizations learn how to better use thefunctionality in the new generation of tools.
Deep Web price gathering The explosion of e-tailing, e-business, ande-government makes a plethora of competitive pricing information available onWeb sites and government information portals. Unfortunately, price lists aredifficult to extract without selecting product categories or filling out Webforms. Also, some prices are buried deep in .pdf documents. Automated formscompletion and automated downloading are necessary features to retrieve pricesfrom the deep Web.
Primary research Message boards, e-pinion sites, and other Web forumsprovide a wealth of public opinion and user experience information on consumerproducts, air travel, test drives, experimental drugs, etc. While much of thisinformation can be found with a search engine, features like simultaneous boardcrawling, selective content extraction, task scheduling, and custom outputreformatting are only available with extraction tools.
Content aggregation forinformation portals Content isexploding and available from Web and non-Web sources. Extraction tools cancrawl the Web, internal information sources, and subscription services toautomatically populate portals with pertinent content such as competitiveinformation, news, and financial data.
Supporting CRM systems The Web is a valuable source of external data toselectively populate a data warehouse or a CRM database. To date mostorganizations focus on aggregating internal data for their data warehouses andCRM systems. Now, however, some organizations are realizing the value of addingexternal data as well. In the book Web Farming for the Data Warehouse fromMorgan Kaufman Publishers, Dr. Richard Hackathorn writes, “It is the synergismof external market information with internal customer data that creates thegreatest business benefit."
Scientific research Scientific information on a given topic (such as agene sequence) is available on multiple Web sites and subscription services. Aneffective extraction tool can automate the location and extraction of thisinformation and aggregate it into a single presentation format or portal. Thissaves scientific researchers countless hours of searching, reading, copying,and pasting.
Business activity monitoring Extraction tools can continuously monitor dynamicallychanging information sources to provide real time alerts and to populateinformation portals and dashboards.

For more information, please visit our website: http://www.knowlesys.com

论坛徽章:
0
2 [报告]
发表于 2008-10-29 12:37 |只看该作者
up
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP