免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1800 | 回复: 0
打印 上一主题 下一主题

【转】sphinx在windows下的安装与测试 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-11-03 17:42 |只看该作者 |倒序浏览
【转】sphinx在windows下的安装与测试




.转自:http://www.cnblogs.com/ainiaa/archive/2010/12/21/1912459.html

1.直接在http://www.sphinxsearch.com/downloads.html找到最新的windows版本,我这里下的是Win32 release binaries with MySQL support,下载后解压在D:\sphinx目录下;
2.在D:\sphinx\下新建一个data目录用来存放索引文件,一个log目录方日志文件,复制D:\sphinx\sphinx.conf.in到D:\sphinx\bin\sphinx.conf(注意修改文件名);
3.修改D:\sphinx\bin\sphinx.conf,我这里列出需要修改的几个:

Php代码
  1. 1.type        = mysql # 数据源,我这里是mysql   
  2. 2.sql_host    = localhost # 数据库服务器   
  3. 3.sql_user    = root # 数据库用户名   
  4. 4.sql_pass    = '' # 数据库密码   
  5. 5.sql_db      = test # 数据库   
  6. 6.sql_port    = 3306 # 数据库端口   
  7. 7.sql_query_pre   = SET NAMES utf8 # 去掉此行前面的注释,如果你的数据库是uft8编码的   
  8. 8.index test1   
  9. 9.{   
  10. 10.# 放索引的目录   
  11. 11. path   = D:/sphinx/data/   
  12. 12.# 编码   
  13. 13. charset_type  = utf-8   
  14. 14. #  指定utf-8的编码表   
  15. 15. charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F   
  16. 16. # 简单分词,只支持0和1,如果要搜索中文,请指定为1   
  17. 17. ngram_len    = 1   
  18. 18.# 需要分词的字符,如果要搜索中文,去掉前面的注释   
  19. 19. ngram_chars   = U+3000..U+2FA1F   
  20. 20.}   
  21. 21.# index test1stemmed : test1   
  22. 22.# {   
  23. 23. # path   = @CONFDIR@/data/test1stemmed   
  24. 24. # morphology  = stem_en   
  25. 25.# }   
  26. 26.   
  27. 27.# 如果没有分布式索引,注释掉下面的内容   
  28. 28.# index dist1   
  29. 29.# {   
  30. 30. # 'distributed' index type MUST be specified   
  31. 31. # type    = distributed   
  32. 32. # local index to be searched   
  33. 33. # there can be many local indexes configured   
  34. 34. # local    = test1   
  35. 35. # local    = test1stemmed   
  36. 36. # remote agent   
  37. 37. # multiple remote agents may be specified   
  38. 38. # syntax is 'hostname:port:index1,[index2[,...]]   
  39. 39. # agent    = localhost:3313:remote1   
  40. 40. # agent    = localhost:3314:remote2,remote3   
  41. 41. # remote agent connection timeout, milliseconds   
  42. 42. # optional, default is 1000 ms, ie. 1 sec   
  43. 43. # agent_connect_timeout = 1000   
  44. 44. # remote agent query timeout, milliseconds   
  45. 45. # optional, default is 3000 ms, ie. 3 sec   
  46. 46. # agent_query_timeout  = 3000   
  47. 47.# }   
  48. 48.# 搜索服务需要修改的部分   
  49. 49.searchd   
  50. 50.{   
  51. 51. # 日志   
  52. 52. log     = D:/sphinx/log/searchd.log   
  53. 53. # PID file, searchd process ID file name   
  54. 54. pid_file   = D:/sphinx/log/searchd.pid   
  55. 55. # windows下启动searchd服务一定要注释掉这个   
  56. 56. # seamless_rotate  = 1   
  57. 57.}  
  58. type        = mysql # 数据源,我这里是mysql
  59. sql_host    = localhost # 数据库服务器
  60. sql_user    = root # 数据库用户名
  61. sql_pass    = '' # 数据库密码
  62. sql_db      = test # 数据库
  63. sql_port    = 3306 # 数据库端口
  64. sql_query_pre   = SET NAMES utf8 # 去掉此行前面的注释,如果你的数据库是uft8编码的
  65. index test1
  66. {
  67. # 放索引的目录
  68. path   = D:/sphinx/data/
  69. # 编码
  70. charset_type  = utf-8
  71. #  指定utf-8的编码表
  72. charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
  73. # 简单分词,只支持0和1,如果要搜索中文,请指定为1
  74. ngram_len    = 1
  75. # 需要分词的字符,如果要搜索中文,去掉前面的注释
  76. ngram_chars   = U+3000..U+2FA1F
  77. }
  78. # index test1stemmed : test1
  79. # {
  80. # path   = @CONFDIR@/data/test1stemmed
  81. # morphology  = stem_en
  82. # }

  83. # 如果没有分布式索引,注释掉下面的内容
  84. # index dist1
  85. # {
  86. # 'distributed' index type MUST be specified
  87. # type    = distributed
  88. # local index to be searched
  89. # there can be many local indexes configured
  90. # local    = test1
  91. # local    = test1stemmed
  92. # remote agent
  93. # multiple remote agents may be specified
  94. # syntax is 'hostname:port:index1,[index2[,...]]
  95. # agent    = localhost:3313:remote1
  96. # agent    = localhost:3314:remote2,remote3
  97. # remote agent connection timeout, milliseconds
  98. # optional, default is 1000 ms, ie. 1 sec
  99. # agent_connect_timeout = 1000
  100. # remote agent query timeout, milliseconds
  101. # optional, default is 3000 ms, ie. 3 sec
  102. # agent_query_timeout  = 3000
  103. # }
  104. # 搜索服务需要修改的部分
  105. searchd
  106. {
  107. # 日志
  108. log     = D:/sphinx/log/searchd.log
  109. # PID file, searchd process ID file name
  110. pid_file   = D:/sphinx/log/searchd.pid
  111. # windows下启动searchd服务一定要注释掉这个
  112. # seamless_rotate  = 1
  113. }
复制代码
4.导入测试数据
  1. C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot test<d:/sphinx/example.sql
复制代码
5.建立索引

Php代码
  1. 1.D:\sphinx\bin>indexer.exe –all   
  2. 2.Sphinx 0.9.8-release (r1533)   
  3. 3.Copyright (c) 2001-2008, Andrew Aksyonoff   
  4. 4.using config file ‘./sphinx.conf’…   
  5. 5.indexing index ‘test1′…   
  6. 6.collected 4 docs, 0.0 MB   
  7. 7.sorted 0.0 Mhits, 100.0% done   
  8. 8.total 4 docs, 193 bytes   
  9. 9.total 0.101 sec, 1916.30 bytes/sec, 39.72 docs/sec   
  10. 10.D:\sphinx\bin>  
  11. D:\sphinx\bin>indexer.exe –all
  12. Sphinx 0.9.8-release (r1533)
  13. Copyright (c) 2001-2008, Andrew Aksyonoff
  14. using config file ‘./sphinx.conf’…
  15. indexing index ‘test1′…
  16. collected 4 docs, 0.0 MB
  17. sorted 0.0 Mhits, 100.0% done
  18. total 4 docs, 193 bytes
  19. total 0.101 sec, 1916.30 bytes/sec, 39.72 docs/sec
  20. D:\sphinx\bin>
复制代码
6.搜索’test’试试

Php代码
  1. 1.D:\sphinx\bin>search.exe test   
  2. 2.Sphinx 0.9.8-release (r1533)   
  3. 3.Copyright (c) 2001-2008, Andrew Aksyonoff   
  4. 4.using config file ‘./sphinx.conf’…   
  5. 5.index ‘test1′: query ‘test ‘: returned 3 matches of 3 total in 0.000 sec   
  6. 6.displaying matches:   
  7. 7.1. document=1, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008   
  8. 8.        id=1   
  9. 9.        group_id=1   
  10. 10.        group_id2=5   
  11. 11.        date_added=2008-11-26 14:58:59   
  12. 12.        title=test one   
  13. 13.        content=this is my test document number one. also checking search within   
  14. 14. phrases.   
  15. 15.2. document=2, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008   
  16. 16.        id=2   
  17. 17.        group_id=1   
  18. 18.        group_id2=6   
  19. 19.        date_added=2008-11-26 14:58:59   
  20. 20.        title=test two   
  21. 21.        content=this is my test document number two   
  22. 22.3. document=4, weight=1, group_id=2, date_added=Wed Nov 26 14:58:59 2008   
  23. 23.        id=4   
  24. 24.        group_id=2   
  25. 25.        group_id2=8   
  26. 26.        date_added=2008-11-26 14:58:59   
  27. 27.        title=doc number four   
  28. 28.        content=this is to test groups   
  29. 29.words:   
  30. 30.1. ‘test’: 3 documents, 5 hits   
  31. 31.D:\sphinx\bin>  
  32. D:\sphinx\bin>search.exe test
  33. Sphinx 0.9.8-release (r1533)
  34. Copyright (c) 2001-2008, Andrew Aksyonoff
  35. using config file ‘./sphinx.conf’…
  36. index ‘test1′: query ‘test ‘: returned 3 matches of 3 total in 0.000 sec
  37. displaying matches:
  38. 1. document=1, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
  39.         id=1
  40.         group_id=1
  41.         group_id2=5
  42.         date_added=2008-11-26 14:58:59
  43.         title=test one
  44.         content=this is my test document number one. also checking search within
  45. phrases.
  46. 2. document=2, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
  47.         id=2
  48.         group_id=1
  49.         group_id2=6
  50.         date_added=2008-11-26 14:58:59
  51.         title=test two
  52.         content=this is my test document number two
  53. 3. document=4, weight=1, group_id=2, date_added=Wed Nov 26 14:58:59 2008
  54.         id=4
  55.         group_id=2
  56.         group_id2=8
  57.         date_added=2008-11-26 14:58:59
  58.         title=doc number four
  59.         content=this is to test groups
  60. words:
  61. 1. ‘test’: 3 documents, 5 hits
  62. D:\sphinx\bin>
复制代码
6.测试中文搜索
修改test数据库中documents数据表,

Php代码
  1. 1.UPDATE `test`.`documents` SET `title` = ‘测试中文’, `content` = ‘this is my test document number two,应该搜的到吧’ WHERE `documents`.`id` = 2;  
  2. UPDATE `test`.`documents` SET `title` = ‘测试中文’, `content` = ‘this is my test document number two,应该搜的到吧’ WHERE `documents`.`id` = 2;
复制代码
重建索引:
D:\sphinx\bin>indexer.exe –all
搜索’中文’试试:

Php代码
  1. 1.D:\sphinx\bin>search.exe 中文   
  2. 2.Sphinx 0.9.8-release (r1533)   
  3. 3.Copyright (c) 2001-2008, Andrew Aksyonoff   
  4. 4.using config file ‘./sphinx.conf’…   
  5. 5.index ‘test1′: query ‘中文 ‘: returned 0 matches of 0 total in 0.000 sec   
  6. 6.words:   
  7. 7.D:\sphinx\bin>  
  8. D:\sphinx\bin>search.exe 中文
  9. Sphinx 0.9.8-release (r1533)
  10. Copyright (c) 2001-2008, Andrew Aksyonoff
  11. using config file ‘./sphinx.conf’…
  12. index ‘test1′: query ‘中文 ‘: returned 0 matches of 0 total in 0.000 sec
  13. words:
  14. D:\sphinx\bin>
复制代码
貌似没有搜到,这是因为windows命令行中的编码是gbk,当然搜不出来。我们可以用程序试试,在D:\sphinx\api下新建一个foo.php的文件,注意utf-8编码

Php代码
  1. 1.<?php   
  2. 2.require ‘sphinxapi.php’;   
  3. 3.$s = new SphinxClient();   
  4. 4.$s->SetServer(‘localhost’,3312);   
  5. 5.$result = $s->Query(‘中文’);   
  6. 6.var_dump($result);   
  7. 7.?>  
  8. <?php
  9. require ‘sphinxapi.php’;
  10. $s = new SphinxClient();
  11. $s->SetServer(‘localhost’,3312);
  12. $result = $s->Query(‘中文’);
  13. var_dump($result);
  14. ?>
复制代码
启动Sphinx searchd服务

Php代码
  1. 1.D:\sphinx\bin>searchd.exe   
  2. 2.Sphinx 0.9.8-release (r1533)   
  3. 3.Copyright (c) 2001-2008, Andrew Aksyonoff   
  4. 4.WARNING: forcing –console mode on Windows   
  5. 5.using config file ‘./sphinx.conf’…   
  6. 6.creating server socket on 0.0.0.0:3312   
  7. 7.accepting connections  
  8. D:\sphinx\bin>searchd.exe
  9. Sphinx 0.9.8-release (r1533)
  10. Copyright (c) 2001-2008, Andrew Aksyonoff
  11. WARNING: forcing –console mode on Windows
  12. using config file ‘./sphinx.conf’…
  13. creating server socket on 0.0.0.0:3312
  14. accepting connections
复制代码
执行PHP查询:

Php代码
  1. 1.php d:/sphinx/api/foo.php  
复制代码
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP