This repository has been archived on 2026-04-06. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Egor Svitin 3b81d997a9 show env
2023-02-21 15:34:36 +03:00
2023-02-07 12:19:06 +03:00
2022-12-14 12:36:37 +01:00
2023-02-21 15:34:36 +03:00
2022-12-14 12:36:37 +01:00
2023-01-30 10:45:26 +03:00
2023-02-07 12:19:06 +03:00
2023-01-30 10:44:20 +03:00
2023-02-21 14:52:28 +03:00
2023-02-01 14:24:56 +03:00
2023-02-21 13:40:34 +03:00

Book Meta Data Parser

Microservice which solves only one issue parse book meta data from our publishers. Not depends on what format publisher stores this data, the service must grub this information and send an array of data to the main application without any formatting. The main idea is to add components for parsing different formats and have the ability to add publishers just by updating config files.

Version 1.0

Added two components for working with CSV and FTP.

Tech Stack

• Docker
• Python 3.11
• MongoDb 6.0.2
• Dynaconf
• Pydantic
• MongoEngine

Folder structure

• app
    ◦ components
    ◦ configs
        ▪ configs.py  keys and url for connection to our main app and creds for service db
        ▪ main.json  main config
        ▪ sources.json  list of sources with components that they use
    ◦ models
    ◦ sources
        ▪ file_types
        ▪ source_types

Sources configuration

To configure a new source you need to update source config by adding the params below:

• source_name
• source          //with neccesary params for component
• parser_type     //with neccesary params for component

Example for CSV files from FTP:

  {
    "sources": {
      "McGrawHill": {
        "source_name": "McGrawHill",
        "source": {
          "type": "ftp",
          "ftp_url": "127.0.0.1",
          "ftp_login": "frp_login",
          "ftp_password": "frp_pass",
          "local_files_path": "/app/files/McGrawHill/",
          "file_regex": "*.csv"
        },
        "parser_type": {
          "format": "csv"
        }
      }
    }
  }

Each source parser starts by crontab by command

python update.py {source_name}

To see list of source types use command

python update.py -h

Run Updates

Copy .env.sample to .env and update settings

Languages
Python 99%
Dockerfile 1%