PyAMS additional features and services¶
Elasticsearch¶
At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 6.4. The Ingest attachment plug-in is also required to handle attachments correctly.
Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and ingest-attachment plug-in
Tip
Documentation for installing ElasticSearch 6.4
After Elasticsearch installation, following steps describe how to configure ES with PyAMS.
Initializing Elasticsearch index¶
If you want to use an Elasticsearch index, you have to initialize index settings and mappings; Elasticsearch integration is defined through the PyAMS_content_es package.
1. Enable service¶
In Pyramid INI application files (etc/development.ini and etc/production.ini):
# Elasticsearch server settings
elastic.server = http://127.0.0.1:9200
elastic.index = pyams
- Where:
- elastic.server: address of Elasticsearch server; you can include authentication arguments in the form http://login:password@w.x.y.z:9200
- elastic.index: name of Elasticsearch index.
On startup, main PyAMS application process can start in indexer process which will handle indexing requests in asynchronous mode; this process settings are defined like this:
# PyAMS content Elasticsearch indexer process settings
pyams_content.es.tcp_handler = 127.0.0.1:5557
pyams_content.es.start_handler = false
pyams_content.es.allow_auth = admin:admin
pyams_content.es.allow_clients = 127.0.0.1
- Where:
- pyams_content.es.tcp_handler: IP address and listening port of PyAMS indexer process
- pyams_content.es.start_handler: if true, the indexer process is started on PyAMS startup; otherwise (typically in a cluster configuration), the process is supposed to be started from another master server
- pyams_content.es.allow_auth: login and password to be used to connect to indexer process (settings are defined in the same way on indexer process and on all it’s clients)
- pyams_content.es.allow_clients: list of IP addresses allowed to connect to indexer process.
2. Initialize Elasticsearch database¶
Configuration files for attachment pipeline, index and mappings settings are available into pyams_content_es source package or in PyAMS installation folder:
(env) $ cd docs/elasticsearch
(env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json
And with elastic.index = pyams
defined as Elasticsearch index name: “http://localhost:9200/pyams”:
(env) $ curl -XDELETE http://localhost:9200/pyams
(env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json
(env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping -d @mappings/WfTopic.json
(env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json
(env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json
Troubleshooting: If you have a 406 error try to add -H 'Content-Type: application/json'
in Curl command lines.
3. Update index contents¶
If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with
pymas_es_index
command line script. From a shell:
(env) $ ./bin/pyams_es_index ../etc/development.ini
Natural Language Toolkit - NLTK¶
PyAMS is using NLTK features through the PyAMS_calalog.
See also
Visit https://www.nltk.org/ to learn more about NLTK
Initializing NLTK (Natural Language ToolKit)¶
Some NLTK collections like tokenizers and stopwords utilities are used to index fulltext contents elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and configuration of several elements which are done as follow:
1. Run the Python shell into PyAMS environment:
(env) $ ./bin/py
2. In the Python shell:
>>> import nltk
>>> nltk.download()
3. Configuration installation directory:
Tip
On Debian GNU/Linux, you can choose any directory between ‘~/nltk_data’ (where ‘~’ is the homedir of user running Pyramid application), ‘/usr/share/nltk_data’, ‘/usr/local/share/nltk_data’, ‘/usr/lib/nltk_data’ and ‘/usr/local/lib/nltk_data’
Please check if you have permission to write to this directory!
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> c
Data Server:
- URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml>
- 6 Package Collections Available
- 107 Individual Packages Available
Local Machine:
- Data directory: /home/tflorac/nltk_data
Config> d
New directory> /usr/local/lib/nltk_data
4. Return to the main menu:
---------------------------------------------------------------------------
s) Show Config u) Set Server URL d) Set Data Dir m) Main Menu
---------------------------------------------------------------------------
Config> m
5. Download utilities:
- punkt
- Punkt Tokenizer Models
- stopwords
- Stopwords Corpus
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d
Download which package (l=list; x=cancel)?
Identifier> punkt
Downloading package punkt to /usr/local/lib/nltk_data...
Downloader> d
Download which package (l=list; x=cancel)?
Identifier> stopwords
Downloading package stopwords to /usr/local/lib/nltk_data...
Tip
The full list of NTLK Collection can be displayed with the l) list
option.