Archived

This repository has been archived on 2026-04-06. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Go to file

Egor Svitin 217d63a3c0 update ant target

2023-02-02 12:36:13 +03:00

build

update ant target

2023-02-02 12:36:13 +03:00

components

first commit

2022-12-14 12:36:37 +01:00

configs

add build settings

2023-01-30 10:45:26 +03:00

models

first commit

2022-12-14 12:36:37 +01:00

sources

add build settings

2023-01-30 10:45:26 +03:00

.env.sample

update env file

2023-02-01 14:24:56 +03:00

.gitignore

add gitingnore

2023-01-30 10:44:20 +03:00

Dockerfile

update Dockerfile

2023-02-01 15:29:16 +03:00

README.md

update env file

2023-02-01 14:24:56 +03:00

update.py

first commit

2022-12-14 12:36:37 +01:00

README.md

Book Meta Data Parser

Microservice which solves only one issue – parse book meta data from our publishers. Not depends on what format publisher stores this data, the service must grub this information and send an array of data to the main application without any formatting. The main idea is to add components for parsing different formats and have the ability to add publishers just by updating config files.

Version 1.0

Added two components for working with CSV and FTP.

Tech Stack

• Docker
• Python 3.11
• MongoDb 6.0.2
• Dynaconf
• Pydantic
• MongoEngine

Folder structure

• app
    ◦ components
    ◦ configs
        ▪ configs.py – keys and url for connection to our main app and creds for service db
        ▪ main.json – main config
        ▪ sources.json – list of sources with components that they use
    ◦ models
    ◦ sources
        ▪ file_types
        ▪ source_types

Sources configuration

To configure a new source you need to update source config by adding the params below:

• source_name
• source          //with neccesary params for component
• parser_type     //with neccesary params for component

Example for CSV files from FTP:

  {
    "sources": {
      "McGrawHill": {
        "source_name": "McGrawHill",
        "source": {
          "type": "ftp",
          "ftp_url": "127.0.0.1",
          "ftp_login": "frp_login",
          "ftp_password": "frp_pass",
          "local_files_path": "/app/files/McGrawHill/",
          "file_regex": "*.csv"
        },
        "parser_type": {
          "format": "csv"
        }
      }
    }
  }

Each source parser starts by crontab by command

python update.py {source_name}

To see list of source types use command

python update.py -h

Run Updates

Copy .env.sample to .env and update settings

README.md Unescape Escape

Book Meta Data Parser

Version 1.0

Tech Stack

Folder structure

Sources configuration

Run Updates

README.md