论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2009-01-04 23:14 |只看该作者 |倒序浏览

附件中的两个文件是 mingw32下面编译好的 jcc和pylucen的安装包
python通过 jni调用 lucene api感觉还比较好的,
jre1.6.0.3

lucene-2.4.0-py2.6-win32.rar

3.31 MB, 下载次数: 183

pylucene 安装包

JCC-2.1.win32-py2.6.rar

207.55 KB, 下载次数: 170

文库|博客

xiaoyu9805119

富足长乐

论坛徽章:: 0

2楼 [报告]

发表于 2009-01-05 09:02 |只看该作者

期待楼主分享点实例出来看看

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

redskywy

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2009-01-05 21:58 |只看该作者

下面的代码是 pylucene sample的代码
创建索引

import sys, os, lucene, threading, time
from datetime import datetime

"""
This class is loosely based on the Lucene (java implementation) demo class
org.apache.lucene.demo.IndexFiles.  It will take a directory as an argument
and will index all of the files in that directory and downward recursively.
It will index on the file path, the file name and the file contents.  The
resulting Lucene index will be placed in the current directory and called
'index'.
"""

class Ticker(object):

def __init__(self):
      self.tick = True

def run(self):
      while self.tick:
         sys.stdout.write('.')
         sys.stdout.flush()
         time.sleep(1.0)

class IndexFiles(object):
"""Usage: python IndexFiles <doc_directory>"""

def __init__(self, root, storeDir, analyzer):

      if not os.path.exists(storeDir):
         os.mkdir(storeDir)
      store = lucene.FSDirectory.getDirectory(storeDir, True)
      writer = lucene.IndexWriter(store, analyzer, True)
      writer.setMaxFieldLength(1048576)
      self.indexDocs(root, writer)
      ticker = Ticker()
      print 'optimizing index',
      threading.Thread(target=ticker.run).start()
      writer.optimize()
      writer.close()
      ticker.tick = False
      print 'done'

def indexDocs(self, root, writer):
      for root, dirnames, filenames in os.walk(root):
         for filename in filenames:
            if not filename.endswith('.txt') and not filename.endswith(".py"):
                  continue
            print "adding", filename
            try:
                  path = os.path.join(root, filename)
                  file = open(path)
                  contents = unicode(file.read(), 'iso-8859-1')
                  file.close()
                  doc = lucene.Document()
                  doc.add(lucene.Field("name", filename,
                                       lucene.Field.Store.YES,
                                       lucene.Field.Index.UN_TOKENIZED))
                  doc.add(lucene.Field("path", path,
                                       lucene.Field.Store.YES,
                                       lucene.Field.Index.UN_TOKENIZED))
                  if len(contents) > 0:
                     doc.add(lucene.Field("contents", contents,
                                          lucene.Field.Store.NO,
                                          lucene.Field.Index.TOKENIZED))
                  else:
                     print "warning: no content in %s" % filename
                  writer.addDocument(doc)
            except Exception, e:
                  print "Failed in indexDocs:", e

if __name__ == '__main__':
sys.argv.append(".")
if len(sys.argv) < 2:
      print IndexFiles.__doc__
      sys.exit(1)
lucene.initVM(lucene.CLASSPATH)
print 'lucene', lucene.VERSION
start = datetime.now()
try:
      IndexFiles(sys.argv[1], "index", lucene.StandardAnalyzer())
      end = datetime.now()
      print end - start
except Exception, e:
      print "Failed: ", e

评分

参与人数 1	可用积分 +5	收起理由
xiaoyu9805119	+ 5	实例

查看全部评分

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

redskywy

丰衣足食

论坛徽章:: 0

4楼 [报告]

发表于 2009-01-05 21:59 |只看该作者

from lucene import \
    QueryParser, IndexSearcher, StandardAnalyzer, FSDirectory, Hit, \
    VERSION, initVM, CLASSPATH

"""
This script is loosely based on the Lucene (java implementation) demo class
org.apache.lucene.demo.SearchFiles.  It will prompt for a search query, then it
will search the Lucene index in the current directory called 'index' for the
search query entered against the 'contents' field.  It will then display the
'path' and 'name' fields for each of the hits it finds in the index.  Note that
search.close() is currently commented out because it causes a stack overflow in
some cases.
"""
def run(searcher, analyzer):
    while True:
        print
        print "Hit enter with no input to quit."
        command = raw_input("Query:")
        if command == '':
            return

        print
        print "Searching for:", command
        query = QueryParser("contents", analyzer).parse(command)
        hits = searcher.search(query)
        print "%s total matching documents." % hits.length()

        for hit in hits:
            doc = Hit.cast_(hit).getDocument()
            print 'path:', doc.get("path"), 'name:', doc.get("name")

if __name__ == '__main__':
    STORE_DIR = "index"
    initVM(CLASSPATH)
    print 'lucene', VERSION
    directory = FSDirectory.getDirectory(STORE_DIR, False)
    searcher = IndexSearcher(directory)
    analyzer = StandardAnalyzer()
    run(searcher, analyzer)
    searcher.close()

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

linu_z

白手起家

论坛徽章:: 0

5楼 [报告]

发表于 2009-06-06 02:01 |只看该作者

能分享下mingw win下怎么编译的吗

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › 程序设计 › Python › python2.6 +jcc + pylucen (lucene2.4)安装

python2.6 +jcc + pylucen (lucene2.4)安装 [复制链接]

评分

浏览过的版块