1 ... 8 9 10 111213 14 15 16 ... 17 / 17 页下一页

[C++] ASIO,无锁,高并发,高可靠, 统一,网络架构,抗DOS,低端4核心服务器CPU 每秒87万QPS ECHO [复制链接]

wlmqgzm

富足长乐

论坛徽章:: 9

111楼 [报告]

发表于 2015-10-31 21:03 |只看该作者

为了对比和双核G3258的性能, 贴出来新的4核心服务器CPU, 只跑单个测试程序ab运行的性能, 大约是34.7万QPS ECHO,
双核G3258只跑单个测试程序ab运行的性能, 大约是32.5万QPS ECHO,

guo@guo-desktop:~$ (ab -n 10000000 -c 100 -k h ttp://127.0.0.1:1971/jjjjjjjjjjjjj &);
guo@guo-desktop:~$ This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking 127.0.0.1 (be patient)
Completed 1000000 requests
Completed 2000000 requests
Completed 3000000 requests
Completed 4000000 requests
Completed 5000000 requests
Completed 6000000 requests
Completed 7000000 requests
Completed 8000000 requests
Completed 9000000 requests
Completed 10000000 requests
Finished 10000000 requests

Server Software:
Server Hostname:       127.0.0.1
Server Port:          1971

Document Path:       /jjjjjjjjjjjjj
Document Length:       0 bytes

Concurrency Level:    100
Time taken for tests: 28.817 seconds
Complete requests:    10000000
Failed requests:       0
Non-2xx responses:    10000000
Keep-Alive requests: 10000000
Total transferred:    1190000000 bytes
HTML transferred:    0 bytes
Requests per second: 347019.14 [#/sec] (mean)
Time per request:    0.288 [ms] (mean)
Time per request:    0.003 [ms] (mean, across all concurrent requests)
Transfer rate:       40327.42 [Kbytes/sec] received

Connection Times (ms)
            min  mean[+/-sd] median max
Connect:       0 0 0.0    0    3
Processing:    0 0 0.0    0    2
Waiting:       0 0 0.0    0    2
Total:       0 0 0.0    0    5

Percentage of the requests served within a certain time (ms)
  50%    0
  66%    0
  75%    0
  80%    0
  90%    0
  95%    0
  98%    0
  99%    1
100%    5 (longest request)

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

112楼 [报告]

发表于 2015-11-01 12:59 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-01 13:42 编辑

3个ab 74万QPS ECHO,  下面是3个ab测试过程中top的情况,
其中总体CPU占有率多数是  系统态sys, 43.7%, 其次是用户态us, 24.3%, 最后是中断si20.8%,  空闲11%
这个说明, 我们的代码能够优化的部分只是其中24%那很小的一部分, 未来代码优化带来的性能提升空间有限.

Tasks: 263 total, 4 running, 259 sleeping, 0 stopped, 0 zombie
%Cpu(s): 24.3 us, 43.7 sy,  0.0 ni, 11.2 id,  0.0 wa,  0.0 hi, 20.8 si,  0.0 st
KiB Mem:  32865404 total, 30318112 used,  2547292 free, 81688 buffers
KiB Swap:       0 total,       0 used,       0 free. 536076 cached Mem

  PID USER    PR  NI VIRT RES SHR S  %CPU %MEM    TIME+ COMMAND
4742 guo    20 0  611276 8404 3100 S 408.5  0.0  17:29.12 echo_server
8732 guo    20 0  342740 200836 3852 R  99.7  0.6 0:26.89 ab
8735 guo    20 0  342740 208344 3772 R  99.7  0.6 0:26.88 ab
8734 guo    20 0  342740 187580 3856 R  99.4  0.6 0:26.88 ab

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

113楼 [报告]

发表于 2015-11-01 13:44 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-01 13:49 编辑

4个ab测试程序, 82万QPS ECHO, 下面是测试情况:

guo@guo-desktop:~$ top

top - 13:48:26 up  1:28,  4 users,  load average: 0.12, 0.94, 1.60
Tasks: 269 total, 6 running, 263 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.0 us, 52.0 sy,  0.0 ni,  1.7 id,  0.0 wa,  0.0 hi, 24.3 si,  0.0 st
KiB Mem:  32865404 total, 30346440 used,  2518964 free, 114592 buffers
KiB Swap:       0 total,       0 used,       0 free. 759520 cached Mem

  PID USER    PR  NI VIRT RES SHR S  %CPU %MEM    TIME+ COMMAND
29085 guo    20 0  152524 8460 3152 S 404.5  0.0 0:12.57 echo_server
29182 guo    20 0  342740  26192 3908 R  95.7  0.1 0:02.98 ab
29178 guo    20 0  342740  24916 3876 R  95.1  0.1 0:02.95 ab
29184 guo    20 0  342740  25176 3876 R  93.4  0.1 0:02.87 ab
29180 guo    20 0  342740  23612 3908 R  90.4  0.1 0:02.82 ab

guo@guo-desktop:~$ (ab -n 10000000 -c 100 -k h ttp://127.0.0.1:1971/jjjjjjjjjjjjj &);(ab -n 10000000 -c 100 -k h ttp://127.0.0.1:1971/jjjjjjjjjjjjj &);(ab -n 10000000 -c 100 -k h ttp://127.0.0.1:1971/jjjjjjjjjjjjj &);(ab -n 10000000 -c 100 -k h ttp://127.0.0.1:1971/jjjjjjjjjjjjj &);
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking 127.0.0.1 (be patient)
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking 127.0.0.1 (be patient)
guo@guo-desktop:~$ This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking 127.0.0.1 (be patient)
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking 127.0.0.1 (be patient)
Completed 1000000 requests
Completed 1000000 requests
Completed 1000000 requests
Completed 1000000 requests
Completed 2000000 requests
Completed 2000000 requests
Completed 2000000 requests
Completed 2000000 requests
Completed 3000000 requests
Completed 3000000 requests
Completed 3000000 requests
Completed 3000000 requests
Completed 4000000 requests
Completed 4000000 requests
Completed 4000000 requests
Completed 4000000 requests
Completed 5000000 requests
Completed 5000000 requests
Completed 5000000 requests
Completed 5000000 requests
Completed 6000000 requests
Completed 6000000 requests
Completed 6000000 requests
Completed 6000000 requests
Completed 7000000 requests
Completed 7000000 requests
Completed 7000000 requests
Completed 7000000 requests
Completed 8000000 requests
Completed 8000000 requests
Completed 8000000 requests
Completed 8000000 requests
Completed 9000000 requests
Completed 9000000 requests
Completed 9000000 requests
Completed 9000000 requests
Completed 10000000 requests
Finished 10000000 requests

Server Software:
Server Hostname:       127.0.0.1
Server Port:          1971

Document Path:       /jjjjjjjjjjjjj
Document Length:       0 bytes

Concurrency Level:    100
Time taken for tests: 47.748 seconds
Complete requests:    10000000
Failed requests:       0
Non-2xx responses:    10000000
Keep-Alive requests: 10000000
Total transferred:    1190000000 bytes
HTML transferred:    0 bytes
Requests per second: 209434.07 [#/sec] (mean)
Time per request:    0.477 [ms] (mean)
Time per request:    0.005 [ms] (mean, across all concurrent requests)
Transfer rate:       24338.53 [Kbytes/sec] received
Completed 10000000 requests
Finished 10000000 requests

Server Software:
Server Hostname:       127.0.0.1
Server Port:          1971

Document Path:       /jjjjjjjjjjjjj
Document Length:       0 bytes

Concurrency Level:    100
Time taken for tests: 48.357 seconds
Complete requests:    10000000
Failed requests:       0
Non-2xx responses:    10000000
Keep-Alive requests: 10000000
Total transferred:    1190000000 bytes
HTML transferred:    0 bytes
Requests per second: 206794.63 [#/sec] (mean)
Time per request:    0.484 [ms] (mean)
Time per request:    0.005 [ms] (mean, across all concurrent requests)
Transfer rate:       24031.80 [Kbytes/sec] received
Completed 10000000 requests
Finished 10000000 requests

Server Software:
Server Hostname:       127.0.0.1
Server Port:          1971

Document Path:       /jjjjjjjjjjjjj
Document Length:       0 bytes

Concurrency Level:    100
Time taken for tests: 48.437 seconds
Complete requests:    10000000
Failed requests:       0
Non-2xx responses:    10000000
Keep-Alive requests: 10000000
Total transferred:    1190000000 bytes
HTML transferred:    0 bytes
Requests per second: 206452.02 [#/sec] (mean)
Time per request:    0.484 [ms] (mean)
Time per request:    0.005 [ms] (mean, across all concurrent requests)
Transfer rate:       23991.98 [Kbytes/sec] received
Completed 10000000 requests
Finished 10000000 requests

Server Software:
Server Hostname:       127.0.0.1
Server Port:          1971

Document Path:       /jjjjjjjjjjjjj
Document Length:       0 bytes

Concurrency Level:    100
Time taken for tests: 50.615 seconds
Complete requests:    10000000
Failed requests:       0
Non-2xx responses:    10000000
Keep-Alive requests: 10000000
Total transferred:    1190000000 bytes
HTML transferred:    0 bytes
Requests per second: 197569.37 [#/sec] (mean)
Time per request:    0.506 [ms] (mean)
Time per request:    0.005 [ms] (mean, across all concurrent requests)
Transfer rate:       22959.72 [Kbytes/sec] received

Connection Times (ms)
            min  mean[+/-sd] median max
Connect:       0 0 0.0    0    3
Processing:    0 0 1.4    0    37
Waiting:       0 0 1.4    0    37
Total:       0 0 1.4    0    37

Percentage of the requests served within a certain time (ms)
  50%    0
  66%    0
  75%    0
  80%    0
  90%    0
  95%    1
  98%    3
  99%    7
100%    37 (longest request)

Connection Times (ms)
            min  mean[+/-sd] median max
Connect:       0 0 0.0    0    4
Processing:    0 0 1.4    0    37
Waiting:       0 0 1.4    0    37
Total:       0 0 1.4    0    37

Percentage of the requests served within a certain time (ms)
  50%    0
  66%    0
  75%    0
  80%    0
  90%    0
  95%    1
  98%    3
  99%    7
100%    37 (longest request)

Connection Times (ms)
            min  mean[+/-sd] median max
Connect:       0 0 0.0    0    3
Processing:    0 0 1.4    0    37
Waiting:       0 0 1.4    0    37
Total:       0 0 1.4    0    37

Percentage of the requests served within a certain time (ms)
  50%    0
  66%    0
  75%    0
  80%    0
  90%    0
  95%    1
  98%    3
  99%    7
100%    37 (longest request)

Connection Times (ms)
            min  mean[+/-sd] median max
Connect:       0 0 0.0    0    3
Processing:    0 1 1.4    0    39
Waiting:       0 1 1.4    0    39
Total:       0 1 1.4    0    39

Percentage of the requests served within a certain time (ms)
  50%    0
  66%    0
  75%    0
  80%    0
  90%    0
  95%    1
  98%    3
  99%    7
100%    39 (longest request)

guo@guo-desktop:~$

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

114楼 [报告]

发表于 2015-11-01 14:40 |只看该作者

4核心测试, 小字节数据包收发流量每秒100Mbyte, 已经可以把一个千兆网络占满了,
目前的测试, 测试软件基本上占了一半的CPU使用率, 对真实测试结果有影响. 估计, 目前, 4核心CPU的真实处理能力已经超过100万QPS ECHO,
如果另外找一台机器来做测试, 如果期望有超过82万QPS ECHO的性能, 那么,至少要绑定2个千兆网卡或者使用万兆网络, 才能得到理想的结果,

暂时没有机器, 可以拿来做测试, 对网络层的测试暂时告一段落, 后面继续其他工作.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

115楼 [报告]

发表于 2015-11-02 00:00 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-02 11:33 编辑

以前没有太多调优，　现在很多底层的东西都要自己测试, 积累经验．

测试mutex的效率，１亿次无锁冲突的lock, 大约1.５秒
测试atomic的效率，１亿次atomic, 大约0.５秒
测试内存屏障１亿次全屏障, 大约0.41秒
测试循环的效率，１亿次循环加法, 大约0.０秒

guo@guo-desktop:/cpp/echo_server/bin/Release$ ./echo_server
begin test mutex.
100000000
1.503490s wall, 1.500000s user + 0.000000s system = 1.500000s CPU (99.8%)
begin test atomic.
0.509658s wall, 0.500000s user + 0.000000s system = 0.500000s CPU (98.1%)
100000000
begin test loop.
0.000000s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
100000000
begin test fenced_block
0.410092s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (97.5%)
100000000

void test_mutex( void )
{
unsigned int i;
std::mutex mutex1;
unsigned int j = 0;
std::cout << "begin test mutex."<< std::endl;
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::lock_guard<std::mutex> lock(cout_mutex);
++j;
continue;
}
std::cout << j << std::endl;
return;
}
void test_atomic( void )
{
unsigned int i;
std::atomic<unsigned int> j(0);
std::cout << "begin test atomic."<< std::endl;
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_loop( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test loop."<< std::endl;
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_fenced_block( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test fenced_block"<< std::endl;
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
boost::asio::detail::fenced_block fb( boost::asio::detail::fenced_block::full );
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

116楼 [报告]

发表于 2015-11-02 00:04 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-02 11:29 编辑

各类内存池对比，内存申请的测试，基本上还是std默认的性能最优，Boost等都慢多了, 差距非常大，．因此　内存池无法更加优化．

申请３２字节大小，１亿次
begin test vector mem 1, std default .
2.132984s wall, 2.120000s user + 0.000000s system = 2.120000s CPU (99.4%)
100000000
begin test vector mem 2, boost::pool_allocator.
8.729578s wall, 8.720000s user + 0.000000s system = 8.720000s CPU (99.9%)
100000000
begin test vector mem 3.  boost::fast_pool_allocator
8.764987s wall, 8.760000s user + 0.000000s system = 8.760000s CPU (99.9%)
100000000
begin test vector mem 4.  __gnu_cxx::__pool_alloc
4.872141s wall, 4.870000s user + 0.000000s system = 4.870000s CPU (100.0%)
100000000
begin test vector mem 5.  __gnu_cxx::malloc_allocato
2.955835s wall, 2.950000s user + 0.000000s system = 2.950000s CPU (99.8%)
100000000
begin test vector mem 6.  __gnu_cxx::__mt_alloc
3.311399s wall, 3.300000s user + 0.000000s system = 3.300000s CPU (99.7%)
100000000

申请４KByte字节大小，１亿次
begin test vector mem 1, std default .
6.051421s wall, 6.040000s user + 0.000000s system = 6.040000s CPU (99.8%)
100000000
begin test vector mem 2, boost::pool_allocator.
200.278515s wall, 200.100000s user + 0.000000s system = 200.100000s CPU (99.9%)
100000000
begin test vector mem 3.  boost::fast_pool_allocator
196.984990s wall, 196.820000s user + 0.000000s system = 196.820000s CPU (99.9%)
100000000
begin test vector mem 4.  __gnu_cxx::__pool_alloc
112.791246s wall, 112.680000s user + 0.000000s system = 112.680000s CPU (99.9%)
100000000
begin test vector mem 5.  __gnu_cxx::malloc_allocator
112.410630s wall, 112.310000s user + 0.000000s system = 112.310000s CPU (99.9%)
100000000
begin test vector mem 6.  __gnu_cxx::__mt_alloc
113.117813s wall, 113.020000s user + 0.000000s system = 113.020000s CPU (99.9%)
100000000

void test_vector_mem1( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 1, std default ."<< std::endl;
std::vector<char> vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char> vt1(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_vector_mem2( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 2, boost::pool_allocator."<< std::endl;
std::vector<char, boost::pool_allocator<char> > vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char, boost::pool_allocator<char> > vt1(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_vector_mem3( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 3. boost::fast_pool_allocator"<< std::endl;
std::vector<char, boost::fast_pool_allocator<char> > vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char, boost::fast_pool_allocator<char> > vt1(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_vector_mem4( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 4. __gnu_cxx::__pool_alloc"<< std::endl;
std::vector<char, __gnu_cxx::__pool_alloc<char> > vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char, __gnu_cxx::__pool_alloc<char> > vt0(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_vector_mem5( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 5. __gnu_cxx::malloc_allocato"<< std::endl;
std::vector<char, __gnu_cxx::malloc_allocator<char> > vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char, __gnu_cxx::malloc_allocator<char> > vt1(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_vector_mem6( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test vector mem 6. __gnu_cxx::__mt_alloc"<< std::endl;
std::vector<char, __gnu_cxx::__mt_alloc<char> > vt0(32);
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
std::vector<char, __gnu_cxx::__mt_alloc<char> > vt1(32);
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

windoze

版主

论坛徽章:: 44

117楼 [报告]

发表于 2015-11-02 02:16 |只看该作者

本帖最后由 windoze 于 2015-11-02 02:17 编辑

回复 113# wlmqgzm

至少你需要试试tcmalloc，dlmalloc，libumem之类现成的malloc实现，然后再试试自制一个lock free/wait free slab based memory allocator。
光这么一句“std默认的性能最优”说明你完全没搞清这玩意儿里面到底怎么转的。

我才不会告诉你当前Linux GLIBC默认用的是ptmalloc…………

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

118楼 [报告]

发表于 2015-11-02 09:03 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-02 09:06 编辑

回复 114# windoze

谢谢提醒, 多测试一下, 以前用过google-perftools里面带有tcmalloc, 好久不用都忘记了.
只测试了最主流的三种, 默认:ptmalloc, 性能最快的tcmalloc, 号称多线程下有更好表现的jemalloc.
ptmalloc是dlmalloc的分支, libumem没有找到新的版本, 只有2007年的版本, 就不测试了.
基本上还是tcmalloc性能最快.

另外, 更换内存引擎, Echo server收发包性能无变化, 因为accept时预先分配了12KByte, 单ab测试程序只有100个连接, 后来每个收发包都不涉及内存分配
应该对新建连接性能有影响.

ptmalloc 32Byte
begin test vector mem 1, std default .
2.132984s wall, 2.120000s user + 0.000000s system = 2.120000s CPU (99.4%)
100000000

jemalloc 32Byte
begin test vector mem 1, std default .
1.625155s wall, 1.620000s user + 0.000000s system = 1.620000s CPU (99.7%)
100000000

tcmalloc 32Byte
begin test vector mem 1, std default .
1.210702s wall, 1.210000s user + 0.000000s system = 1.210000s CPU (99.9%)
100000000

===================================================

ptmalloc 4KByte
begin test vector mem 1, std default .
6.051421s wall, 6.040000s user + 0.000000s system = 6.040000s CPU (99.8%)
100000000

jemalloc4k
begin test vector mem 1, std default .
5.321043s wall, 5.310000s user + 0.000000s system = 5.310000s CPU (99.8%)
100000000

tcmalloc 4K
begin test vector mem 1, std default .
4.931310s wall, 4.920000s user + 0.000000s system = 4.920000s CPU (99.8%)
100000000

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wlmqgzm

富足长乐

论坛徽章:: 9

119楼 [报告]

发表于 2015-11-02 11:04 |只看该作者

本帖最后由 wlmqgzm 于 2015-11-02 11:28 编辑

回复 95# hellioncu

谢谢提醒, 昨天在服务器代码中又仔细找了一遍可能需要内存屏障的地方, 基本上所有的代码都不需要, 因为这些需要线程交换数据的地方, 都调用了io_sevice.post, post内部设置了内存屏障
只有1个显示统计总计数的函数, 可能会需要.
最终调用这个显示统计总计数函数的地方只有一处, 就是最后退出, 总体析构的时候, 而且析构过程中有至少有1ms sleep保护,最后才调用显示统计总计数, 理论上也不需要内存屏障
但是考虑以后, 可能要更改代码, 提供随时查看总计数, 在更新计数器的函数中就加上了 sfence.

boost::asio::detail::fenced_block  fb(  boost::asio::detail::fenced_block::half );

继续进行底层的基本性能测试, 算学习吧, 更精细的了解情况.
测试对比性能, 只使用sfence的代码, 每秒大约可执行6亿次,  比不冲突的锁, 性能要高10倍.  减少锁的使用量, 确实能够提高一点性能.
但是, 非冲突的锁,  单线程下每秒也可以执行6000多万次, 性能也是非常突出的.
所以, 有锁的前提下, 最重要的是减少锁冲突, 代码中有几个锁,如果不冲突的话, 代价并不高.

begin test  half fenced_block(sfence)
0.163584s wall, 0.160000s user + 0.000000s system = 0.160000s CPU (97.8%)
100000000
begin test fenced_block(sfence+lfence)
0.410092s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (97.5%)
100000000
begin test mutex.
100000000
1.503490s wall, 1.500000s user + 0.000000s system = 1.500000s CPU (99.8%)
begin test atomic.
0.509658s wall, 0.500000s user + 0.000000s system = 0.500000s CPU (98.1%)
100000000

void test_fenced_block( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test fenced_block"<< std::endl;
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
boost::asio::detail::fenced_block fb( boost::asio::detail::fenced_block::full );
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}
void test_half_fenced_block( void )
{
unsigned int i;
unsigned int j=0;
std::cout << "begin test half fenced_block"<< std::endl;
{
boost::timer::auto_cpu_timer a2;
for( i=0; i<100000000; ++i ) {
boost::asio::detail::fenced_block fb( boost::asio::detail::fenced_block::half );
++j;
continue;
}
}
std::cout << j << std::endl;
return;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

windoze

版主

论坛徽章:: 44

120楼 [报告]

发表于 2015-11-02 12:15 |只看该作者

回复 115# wlmqgzm

I服了you，你就拿你现在100个连接的echo server测32个字节的小包看不出区别了…………

难道说你花这么大力气搞这么一堆东西就打算做一个支持100并发的echo server？再说你现在拿这么个echo server往死了做benchmark有什么意义呢？具体应用中的瓶颈肯定不一样啊

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

1 ... 8 9 10 111213 14 15 16 ... 17 / 17 页下一页

返回列表

Chinaunix › 论坛 › 程序设计 › C/C++ › ASIO,无锁,高并发,高可靠, 统一,网络架构,抗DOS,低端4核 ...

[C++] ASIO,无锁,高并发,高可靠, 统一,网络架构,抗DOS,低端4核心服务器CPU 每秒87万QPS ECHO [复制链接]

浏览过的版块