This repository has been archived on 2026-04-06. You can view files and clone it. You cannot open issues or pull requests or push a commit.
2025-03-24 08:15:10 +03:00
2022-07-27 20:21:20 +03:00
2022-10-26 16:22:32 +03:00
2023-06-09 14:31:07 +03:00
2023-06-07 18:39:02 +03:00
2020-11-25 17:15:32 +03:00
2020-02-14 14:48:09 +03:00
2023-06-09 14:25:41 +03:00
2025-03-24 08:15:10 +03:00
2022-11-15 16:35:18 +03:00
2023-01-19 12:22:33 +03:00
2022-11-15 16:35:18 +03:00
2023-06-09 14:21:39 +03:00

Converter


LiveCarta converter

Table of Contents

Introduction

This is a Python 3 project for converting Docx|Epub documents -> LiveCarta inner format. Livecarta book format is tree structure, where nodes are chapters. Livecarta chapter is title + html code. Livecarta html code follows some restrictions:

  • On structure. It should be not nested, the less nested tags, the better.
    • It is only allowed that the <p> contains <span>, <span> contains <span>.
  • The structure tags that are used: span, p, blockquote, ul, ol, li, table.
  • List of all valid tags: iframe, a, span, blockquote, i, s, u, ul, ol, li, strong, p, img, caption, thead, tbody, td, th, tr, table, sup.
  • Styles are added as inline, i.e. attribute style in html tag.
  • Each tag has its own restrictions on attributes and style. See doc/style_config

Features

  • Converts Epub, Docx to JSON(LiveCarta inner format)
  • Compatible with python 3
  • Very small size (only .py files)
  • Multithreaded

Top level project structure

  • consumer.py - code which is responsible for receiving messages from rabbitMQ
  • class Access - contains API code which is responsible for interaction with server.
  • class Solver - contains code responsible for pipeline of solving the task: receiving book file, conversion, status updating, sending result back to server.
  • livecarta_config.py - constants that depend on LiveCarta

How it Works

2 approaches in 3 steps each works:

Epub

Step 1 - Add CSS to HTML inline_style

Step 2 - Process every HTML chapter of Epub with presets

Step 3 - Convert dicts of HTML to JSON(LiveCarta inner format)

Docx

Step 1 - Conversion of DOCX to HTML via LibreOffice

Step 2 - Process HTML with presets

Step 3 - Conversion of HTML to JSON(LiveCarta inner format)

Setup

python -m pip install -r requirements.txt

Development

To fix a bug or enhance an existing module, follow these steps:

  • Fork the repo
  • Create a new branch (git checkout -b improve-feature)
  • Make the appropriate changes in the files
  • Add changes to reflect the changes made
  • Commit your changes (git commit -am 'Improve feature')
  • Push to the branch (git push origin improve-feature)
  • Create a Pull Request

How to Use

1. Run consumer.py The script will be constantly waiting for a message from the queue(RabbitMQ), into which we load the book via Import File to Convert in the admin panel You can also upload the book that have been converted locally using def local_convert() in consumer.py

2. Run docx_solver.py

  1. You need to run it on Linux system, but if u're using Windows - just using python docker intepreter
  2. Upload a book to books/docx/ and set the variable docx_file_path = books/docx/book_name in main

3. Run epub_solver.py Before that upload a book to books/epub/ and set the variable epub_file_path = books/epub/book_name in main

Description
Languages
Python 99.6%
Dockerfile 0.2%
Shell 0.2%