.. _plugins: PyAMS additional features and services ====================================== Elasticsearch +++++++++++++ At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 6.4. The Ingest attachment plug-in is also required to handle attachments correctly. Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in .. tip:: Documentation for installing ElasticSearch 6.4 - https://www.elastic.co/guide/en/elasticsearch/reference/6.4/gs-installation.html - https://www.elastic.co/guide/en/elasticsearch/plugins/6.4/ingest-attachment.html After Elasticsearch installation, following steps describe how to configure ES with PyAMS. Initializing Elasticsearch index -------------------------------- If you want to use an Elasticsearch index, you have to initialize index settings and mappings; Elasticsearch integration is defined through the *PyAMS_content_es* package. 1. Enable service ''''''''''''''''' In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*): .. code-block:: ini # Elasticsearch server settings elastic.server = http://127.0.0.1:9200 elastic.index = pyams Where: - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form *http://login:password@w.x.y.z:9200* - **elastic.index**: name of Elasticsearch index. On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in asynchronous mode; this process settings are defined like this: .. code-block:: ini # PyAMS content Elasticsearch indexer process settings pyams_content.es.tcp_handler = 127.0.0.1:5557 pyams_content.es.start_handler = false pyams_content.es.allow_auth = admin:admin pyams_content.es.allow_clients = 127.0.0.1 Where: - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically in a cluster configuration), the process is supposed to be started from another *master* server - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined in the same way on indexer process and on all it's clients) - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process. 2. Initialize Elasticsearch database '''''''''''''''''''''''''''''''''''' Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source package or in PyAMS installation folder: .. code-block:: bash (env) $ cd docs/elasticsearch (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*: .. code-block:: shell (env) $ curl -XDELETE http://localhost:9200/pyams (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping -d @mappings/WfTopic.json (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json *Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines. 3. Update index contents '''''''''''''''''''''''' If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with ``pymas_es_index`` command line script. From a shell: .. code-block:: bash (env) $ ./bin/pyams_es_index ../etc/development.ini Natural Language Toolkit - NLTK +++++++++++++++++++++++++++++++ PyAMS is using NLTK features through the *PyAMS_calalog*. .. seealso:: Visit https://www.nltk.org/ to learn more about NLTK Initializing NLTK (Natural Language ToolKit) -------------------------------------------- Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and configuration of several elements which are done as follow: *1. Run the Python shell into PyAMS environment:* .. code-block:: bash (env) $ ./bin/py *2. In the Python shell:* .. code-block:: pycon >>> import nltk >>> nltk.download() *3. Configuration installation directory:* .. tip:: On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and '*/usr/local/lib/nltk_data*' Please check if you have permission to write to this directory! .. code-block:: shell NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> c Data Server: - URL: - 6 Package Collections Available - 107 Individual Packages Available Local Machine: - Data directory: /home/tflorac/nltk_data Config> d New directory> /usr/local/lib/nltk_data *4. Return to the main menu:* .. code-block:: shell --------------------------------------------------------------------------- s) Show Config u) Set Server URL d) Set Data Dir m) Main Menu --------------------------------------------------------------------------- Config> m *5. Download utilities:* punkt Punkt Tokenizer Models stopwords Stopwords Corpus .. code-block:: shell --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> d Download which package (l=list; x=cancel)? Identifier> punkt Downloading package punkt to /usr/local/lib/nltk_data... Downloader> d Download which package (l=list; x=cancel)? Identifier> stopwords Downloading package stopwords to /usr/local/lib/nltk_data... .. tip:: The full list of NTLK Collection can be displayed with the ``l) list`` option.