.. _plugins:

PyAMS additional features and services
======================================


Elasticsearch
+++++++++++++

At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 6.4. The Ingest attachment
plug-in is also required to handle attachments correctly.

Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in


.. tip:: Documentation for installing ElasticSearch 6.4

    - https://www.elastic.co/guide/en/elasticsearch/reference/6.4/gs-installation.html
    - https://www.elastic.co/guide/en/elasticsearch/plugins/6.4/ingest-attachment.html


After Elasticsearch installation, following steps describe how to configure ES with PyAMS.


Initializing Elasticsearch index
--------------------------------

If you want to use an Elasticsearch index, you have to initialize index settings and mappings;
Elasticsearch integration is defined through the *PyAMS_content_es* package.


1. Enable service
'''''''''''''''''

In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*):

.. code-block:: ini

    # Elasticsearch server settings
    elastic.server = http://127.0.0.1:9200
    elastic.index = pyams

Where:
 - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form
   *http://login:password@w.x.y.z:9200*
 - **elastic.index**: name of Elasticsearch index.


On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in
asynchronous mode; this process settings are defined like this:

.. code-block:: ini

    # PyAMS content Elasticsearch indexer process settings
    pyams_content.es.tcp_handler = 127.0.0.1:5557
    pyams_content.es.start_handler = false
    pyams_content.es.allow_auth = admin:admin
    pyams_content.es.allow_clients = 127.0.0.1

Where:
 - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process
 - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically
   in a cluster configuration), the process is supposed to be started from another *master* server
 - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined
   in the same way on indexer process and on all it's clients)
 - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process.


2. Initialize Elasticsearch database
''''''''''''''''''''''''''''''''''''

Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source
package or in PyAMS installation folder:


.. code-block:: bash

    (env) $ cd docs/elasticsearch
    (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json


And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*:

.. code-block:: shell

    (env) $ curl -XDELETE http://localhost:9200/pyams

    (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json

    (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping  -d @mappings/WfTopic.json
    (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json
    (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json


*Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines.


3. Update index contents
''''''''''''''''''''''''

If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with
``pymas_es_index`` command line script. From a shell:

.. code-block:: bash

    (env) $ ./bin/pyams_es_index ../etc/development.ini


Natural Language Toolkit - NLTK
+++++++++++++++++++++++++++++++

PyAMS is using NLTK features through the *PyAMS_calalog*.

.. seealso::

    Visit https://www.nltk.org/ to learn more about NLTK


Initializing NLTK (Natural Language ToolKit)
--------------------------------------------

Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents
elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and
configuration of several elements which are done as follow:


*1. Run the Python shell into PyAMS environment:*

.. code-block:: bash

    (env) $ ./bin/py


*2. In the Python shell:*

.. code-block:: pycon

    >>> import nltk
    >>> nltk.download()


*3. Configuration installation directory:*

.. tip::

    On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running
    Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and
    '*/usr/local/lib/nltk_data*'

    Please check if you have permission to write to this directory!


.. code-block:: shell

    NLTK Downloader
    ---------------------------------------------------------------------------
        d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
    ---------------------------------------------------------------------------
    Downloader> c

    Data Server:
      - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml>
      - 6 Package Collections Available
      - 107 Individual Packages Available

    Local Machine:
      - Data directory: /home/tflorac/nltk_data

    Config> d
      New directory> /usr/local/lib/nltk_data


*4. Return to the main menu:*

.. code-block:: shell

        ---------------------------------------------------------------------------
            s) Show Config   u) Set Server URL   d) Set Data Dir   m) Main Menu
        ---------------------------------------------------------------------------
        Config> m


*5. Download utilities:*

    punkt
        Punkt Tokenizer Models
    stopwords
        Stopwords Corpus


.. code-block:: shell

        ---------------------------------------------------------------------------
            d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
        ---------------------------------------------------------------------------
        Downloader> d
        Download which package (l=list; x=cancel)?
          Identifier> punkt
            Downloading package punkt to /usr/local/lib/nltk_data...
        Downloader> d
        Download which package (l=list; x=cancel)?
          Identifier> stopwords
            Downloading package stopwords to /usr/local/lib/nltk_data...


.. tip::

    The full list of NTLK Collection can be displayed with the ``l) list`` option.