PyAMS additional features and services

Elasticsearch

At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 6.4. The Ingest attachment plug-in is also required to handle attachments correctly.

Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and ingest-attachment plug-in

After Elasticsearch installation, following steps describe how to configure ES with PyAMS.

Initializing Elasticsearch index

If you want to use an Elasticsearch index, you have to initialize index settings and mappings; Elasticsearch integration is defined through the PyAMS_content_es package.

1. Enable service

In Pyramid INI application files (etc/development.ini and etc/production.ini):

# Elasticsearch server settings
elastic.server = http://127.0.0.1:9200
elastic.index = pyams
Where:
  • elastic.server: address of Elasticsearch server; you can include authentication arguments in the form http://login:password@w.x.y.z:9200
  • elastic.index: name of Elasticsearch index.

On startup, main PyAMS application process can start in indexer process which will handle indexing requests in asynchronous mode; this process settings are defined like this:

# PyAMS content Elasticsearch indexer process settings
pyams_content.es.tcp_handler = 127.0.0.1:5557
pyams_content.es.start_handler = false
pyams_content.es.allow_auth = admin:admin
pyams_content.es.allow_clients = 127.0.0.1
Where:
  • pyams_content.es.tcp_handler: IP address and listening port of PyAMS indexer process
  • pyams_content.es.start_handler: if true, the indexer process is started on PyAMS startup; otherwise (typically in a cluster configuration), the process is supposed to be started from another master server
  • pyams_content.es.allow_auth: login and password to be used to connect to indexer process (settings are defined in the same way on indexer process and on all it’s clients)
  • pyams_content.es.allow_clients: list of IP addresses allowed to connect to indexer process.

2. Initialize Elasticsearch database

Configuration files for attachment pipeline, index and mappings settings are available into pyams_content_es source package or in PyAMS installation folder:

(env) $ cd docs/elasticsearch
(env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json

And with elastic.index = pyams defined as Elasticsearch index name: “http://localhost:9200/pyams”:

(env) $ curl -XDELETE http://localhost:9200/pyams

(env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json

(env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping  -d @mappings/WfTopic.json
(env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json
(env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json

Troubleshooting: If you have a 406 error try to add -H 'Content-Type: application/json' in Curl command lines.

3. Update index contents

If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with pymas_es_index command line script. From a shell:

(env) $ ./bin/pyams_es_index ../etc/development.ini

Natural Language Toolkit - NLTK

PyAMS is using NLTK features through the PyAMS_calalog.

See also

Visit https://www.nltk.org/ to learn more about NLTK

Initializing NLTK (Natural Language ToolKit)

Some NLTK collections like tokenizers and stopwords utilities are used to index fulltext contents elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and configuration of several elements which are done as follow:

1. Run the Python shell into PyAMS environment:

(env) $ ./bin/py

2. In the Python shell:

>>> import nltk
>>> nltk.download()

3. Configuration installation directory:

Tip

On Debian GNU/Linux, you can choose any directory between ‘~/nltk_data’ (where ‘~’ is the homedir of user running Pyramid application), ‘/usr/share/nltk_data’, ‘/usr/local/share/nltk_data’, ‘/usr/lib/nltk_data’ and ‘/usr/local/lib/nltk_data

Please check if you have permission to write to this directory!

NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> c

Data Server:
  - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml>
  - 6 Package Collections Available
  - 107 Individual Packages Available

Local Machine:
  - Data directory: /home/tflorac/nltk_data

Config> d
  New directory> /usr/local/lib/nltk_data

4. Return to the main menu:

---------------------------------------------------------------------------
    s) Show Config   u) Set Server URL   d) Set Data Dir   m) Main Menu
---------------------------------------------------------------------------
Config> m

5. Download utilities:

punkt
Punkt Tokenizer Models
stopwords
Stopwords Corpus
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d
Download which package (l=list; x=cancel)?
  Identifier> punkt
    Downloading package punkt to /usr/local/lib/nltk_data...
Downloader> d
Download which package (l=list; x=cancel)?
  Identifier> stopwords
    Downloading package stopwords to /usr/local/lib/nltk_data...

Tip

The full list of NTLK Collection can be displayed with the l) list option.