免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: zhangzhh05
打印 上一主题 下一主题

[算法] 对Google算法优越性的一点小体会 [复制链接]

论坛徽章:
0
291 [报告]
发表于 2008-06-28 22:07 |只看该作者
这个速度是相当的快,如果一张表保持200万的记录,只需要根据邮箱算在哪台服务器组,哪台服务器,哪个库,哪张表,速度是相当相当的快。

论坛徽章:
0
292 [报告]
发表于 2008-06-28 22:21 |只看该作者
内存算法不能否定数据库方法, 因为没有数据库就没有对数据有效方便的处理.
数据库算法不能否定内存算法,因为一些业务需求的特殊性.

论坛徽章:
0
293 [报告]
发表于 2008-06-29 18:23 |只看该作者

说mysql不行的人,是自己没有适当的硬件环境和深入的研究

这有几个管理过千万用户的项目,
我瞧着真有些可笑,
自己写一个测试脚本就说mysql写入,查询不行
首先你的mysql调优过没,你的机器是什么配置的机器,你的系统是做了什么样的优化没,你的硬件是多少个盘组成的盘阵
你写的sql测试语句是多大的写入量,你的连接是否是本地还是通过网络socket进行写入的。

mysql的参数优化有100多个,不才研究了大概最少40-50个参数能让你数据库性能提高5-10倍。
系统的优化内核参数大概也够几百种了吧。

很多人认为mysql性能不行,我个人感觉是因为他们接触的机器太滥了,大部分是因为磁盘IO所导致,非mysql不行也。

给你们瞧瞧ebay用mysql如何用的
Fun Facts About eBay

110 Million items for sale on the site

$59 Billion in gross merchandize value (GMV) per year

Approx $2,039 worth of goods traded on the site every second

276 Million registered users

2 Billion URL requests per day

6,000 application servers with 12,000 Java processes

40 Billion database requests per day

300 different databases (over 700 instances)

9 PB of data storage

13 million lines of source code  
(In 2008 will surpass Windows NT 4.0 O/S 16 million lines)


Background

Further distinguish the eBay shopping experience

Provide a more relevant and even better user experience

Provide users with a more rich experience with greater continuity

Provide users with the best selection tailored to their interests/profile

Provide better user experience through real time personalization data feedback loop that is immediately available

Provide users with tailored alternatives

Further distinguish the eBay business value proposition

Advertising shown to more relevant buyers

More effective merchandizing and marketing of items

Increase conversion rates through better buyer experience and greater relevancy of items presented to the buyer


Background

eBay needed to expand its real time personalization capabilities

eBay needed to be able to associate more data with sessions

Both personalization and session data were constrained by technology

Cookies limitation

      Client side cookie limit of 4KB data

      Long term scalability issue of sending all cookie data, whether needed or not

High cost of traditional server side solutions using an OLTP database

      eBay’s very large scale quickly multiplies costs in to a very large number

      Throughput of OLTP’s decrease with high write ratio of approximately 50%

      Large number of licenses/servers needed for throughput was cost prohibitive

High cost of other commercial alternatives at eBay’s very large scale

These constraints were limiting business decisions and had to be solved


General Vision

Every Application Server

Can Access Data

For Every URL Request

(All 2 Billion of them!)

Session Data

Personalization Data


General Requirements

Handle 4 Billion reads/writes per day

Support connections and requests from 12,000 Java processes

High throughput on low cost hardware

Scale both horizontally and vertically for 10x future growth

Scale without operational interruption

High availability and operational failure robustness

Low latency response times

Low licensing, support, and total cost of ownership costs

Enterprise class support agreement

Enterprise class management and monitoring tools

Driver for Java


Why MySQL Memory Engine?

MySQL Memory Engine had the best performance

Very impressive POC results for MySQL Memory Engine

Approx 2X more throughput than nearest competitor (Java driver)

eBay test case of 50/50 read/writes showed approx 13,000 TPS @ 50% CPU for a Sun 4100 running Solaris 10 x86 (2 CPU, Dual Core Opteron, 16GB RAM) for a network client

Handled 20,000 concurrent connections with less than 1% degradation in throughput than baseline case (eBay developed patch)

Production performance has been consistent with POC results


Why MySQL Memory Engine?

MySQL Enterprise had a very attractive cost structure

MySQL’s ability to offer enterprise class support

MySQL’s combined throughput and cost structure provided a low cost system for the scale of eBay

Power and flexibility of using SQL for different needs

A company with a significant track record



Why MySQL Memory Engine?

The power of open source

eBay has developed and contributed two enhancements to MySQL

      Support for an event port based threading and connection handling model for scalable connection handling

      Support for true variable size columns in MySQL Memory Engine

Option to be able to apply our talent and create the enhancements we need quickly

Receive the benefits of innovations of others via open source


Why MySQL Memory Engine?

The power of an open source company behind the product

Ability to collaborate with MySQL on enhancements to the product

Option to request enhancements from a company behind the product

Out of the box monitoring and administration tools

Eliminate tying up high end eBay talent in owning it ourselves

An enterprise class open source product

Enterprise class support offerings for use in critical systems  





eBay Personalization System Overview

MySQL Memory Engine

Cache Tier

Application

Servers

Browser

Persistent

Database



Replication

eBay Personalization System Overview

MySQL Memory Engine

Cache Tier

Application

Servers

Persistent

Database

5 min Batched

Write Back

Read/Write

Cache Miss Read


eBay Personalization System Overview

Replication optional based on criticality of data loss for past 5 min

Trade-off between data criticality versus double the memory cost

Some personalization data may not be critical enough for the additional hardware cost

Single threaded MySQL replication is generally problematic

Once replication falls behind it stays behind with continued traffic

Replication can be achieved via dual writes from the application server performed transparently by the framework

Second write to replica can be asynchronous

Automatic redistribution of data when node failure or draining a node



eBay Personalization System Overview

Write back to persistent database performed by batch process

Evictions performed by batch process based on target free memory

Buffering space is set aside in case persistent database is unavailable

Special techniques used to minimize table lock duration during write back and eviction operations


Results

A business critical system running on MySQL Enterprise for one of the largest scale websites in the world

Highly scalable and low cost system that handles all of eBay’s personalization and session data needs

Ability to handle 4 billion requests per day of 50/50 read/write operations for approximately 40KB of data per user / session

Approx 25 Sun 4100’s running 100% of eBay’s personalization and session data service (2 CPU, Dual core Opteron, 16 GB RAM, Solaris 10 x86)


Results

Highly manageable system for entire operational life cycle

Leveraging MySQL Dashboard as a critical tool in providing insight into system performance, trending, and identifying issues

Adding new applications to ebay.com domain that previously would have been in a different domain because of cookie constraints

Creating several new business opportunities that would not have been possible without this new low cost personalization platform

Leveraging MySQL Memory Engine for other types of caching tiers that are enabling new business opportunities

论坛徽章:
0
294 [报告]
发表于 2008-06-29 19:25 |只看该作者
这也能顶到30页,服了各位了~
大家都歇歇吧~

论坛徽章:
0
295 [报告]
发表于 2008-06-30 02:27 |只看该作者
不错,不错,有增长知识了~~

论坛徽章:
0
296 [报告]
发表于 2008-06-30 12:21 |只看该作者
原帖由 zszyj 于 2008-6-28 10:47 发表

另外,碰撞的程度,与KEY值的离散程度有关, 越相似,碰撞的可能性就越大.  


@.@ google要8组的确是防止碰撞,但是您确定您这个对hash的理解是正确的?

论坛徽章:
0
297 [报告]
发表于 2008-07-03 11:53 |只看该作者
咦,每人讨论了哈?人捏。。。大家继续呀。。。

论坛徽章:
0
298 [报告]
发表于 2008-07-07 17:41 |只看该作者
我很抱歉,这个其实很简单,设想一个人的账号只是有字符组成,我们可以建立一颗字母数,就是在你自己的电脑上也可以轻松在O(n)*Sigma的时间内判断账号是否重复,Sigma= 不同字符的个数,比如如果只能使用英文字符的话,那么 Sigma=26,如果加上数字Sigma=26+10=36.相信账号里面是没有中文的吧!>>>!!>>!>!如果有,我有其他解决方案。

论坛徽章:
0
299 [报告]
发表于 2008-07-07 17:58 |只看该作者
今天上班时同学推荐这个帖子的连接,30页一帖不漏完整的看下来,忍不住注册了个用户来发表评论

很感谢 bbpet    zszyj     shan_ghost      三位技术都城很牛这是不用说的


bbpet  实事求是,通过数据来说话,佩服


zszyj   实践经验丰富,对数据库研究挺深,技术牛,说话冲


shan_ghost     理论水平深厚,但你越往后的帖子越看越有一种幸灾乐祸的感觉,特别是25,26页。


以上只是看了30多页的帖子后的个人看法,真诚的感谢三位让我着实学到了些东西,,,


很多人都说:做技术的人,很少有人会服一个人,除非那人被公认为确实很牛。


大家讨论完全不需要带着人身攻击,你不同意他的观点,拿出数据来给他看,说他用算法不行,你大可以粘一段程序出来。在真理面前没有谁会不信服的。。 帖子在29,30页要是没给出程序片段,这帖子我看要被顶到100来页。。。。。。。。。。。

论坛徽章:
0
300 [报告]
发表于 2008-07-07 18:01 |只看该作者
ps:套用之前一位老兄(名字我忘了,也赖得去翻)的话:如果我是公司的决策者,我会选择bbpet 和  zszyj   来开发,让shan_ghost去做技术前瞻探索。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP