Converter
Table of Contents
- Introduction
- Features
- [Top level project structure](#top level project structure)
- [How it Works](#how it works)
- Setup
- [How to use](#how to use)
Introduction
This is a Python 3 project for converting Docx|Epub documents -> LiveCarta inner format. Livecarta book format is tree structure, where nodes are chapters. Livecarta chapter is title + html code. Livecarta html code follows some restrictions:
- On structure. It should be not nested, the less nested tags, the better.
- It is only allowed that the <p> contains <span>, <span> contains <span>.
- The structure tags that are used:
span, p, blockquote, ul, ol, li, table. - List of all valid tags:
iframe, a, span, blockquote, i, s, u, ul, ol, li, strong, p, img, caption, thead, tbody, td, th, tr, table, sup. - Styles are added as inline, i.e. attribute
stylein html tag. - Each tag has its own restrictions on attributes and style. See doc/style_config
Features
- Converts Epub, Docx to JSON(LiveCarta inner format)
- Compatible with python 3
- Very small size (only .py files)
- Multithreaded
Top level project structure
consumer.py- code which is responsible for receiving messages from rabbitMQ- class
Access- contains API code which is responsible for interaction with server. - class
Solver- contains code responsible for pipeline of solving the task: receiving book file, conversion, status updating, sending result back to server. livecarta_config.py- constants that depend on LiveCarta
How it Works
2 approaches in 3 steps each works:
Epub
Step 1 - Add CSS to HTML inline_style
Step 2 - Process every HTML chapter of Epub with presets
Step 3 - Convert dicts of HTML to JSON(LiveCarta inner format)
Docx
Step 1 - Conversion of DOCX to HTML via LibreOffice
Step 2 - Process HTML with presets
Step 3 - Conversion of HTML to JSON(LiveCarta inner format)
Setup
python -m pip install -r requirements.txt
Development
To fix a bug or enhance an existing module, follow these steps:
- Fork the repo
- Create a new branch (
git checkout -b improve-feature) - Make the appropriate changes in the files
- Add changes to reflect the changes made
- Commit your changes (
git commit -am 'Improve feature') - Push to the branch (
git push origin improve-feature) - Create a Pull Request
How to Use
1. Run consumer.py
The script will be constantly waiting for a message from the queue(RabbitMQ), into which we load the book via Import File to Convert in the admin panel
You can also upload the book that have been converted locally using def local_convert() in consumer.py
b. Run docx_solver.py
- You need to run it on Linux system, but if u're using Windows - just using python docker intepreter
- Upload a book to books/docx/ and set the variable
docx_file_path = books/docx/book_namein main
c. Run epub_solver.py
Before that upload a book to books/epub/ and set the variable epub_file_path = books/epub/book_name in main