This repository has been archived on 2026-04-06. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
BookConverter/README.md
2023-06-09 14:21:39 +03:00

87 lines
3.6 KiB
Markdown

<h1 align="center"> Converter </h1> <br>
<p align="center">
<a href="https://livecarta.com/">
<img alt="LiveCarta converter" title="LiveCarta converter" src="https://assets.openstax.org/oscms-prodcms/media/partner_logos/LiveCarta_Logo.png" width="450">
</a>
</p>
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents
- [Introduction](#introduction)
- [Features](#features)
- [Top level project structure](#top-level-project-structure)
- [How it Works](#how-it-works)
- [Setup](#setup)
- [Development](#development)
- [How to use](#how-to-use)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Introduction
This is a Python 3 project for converting Docx|Epub documents -> LiveCarta inner format.
Livecarta book format is tree structure, where nodes are chapters.
Livecarta chapter is title + html code. Livecarta html code follows some restrictions:
- On structure. It should be not nested, the less nested tags, the better.
- It is only allowed that the &lt;p&gt; contains &lt;span&gt;, &lt;span&gt; contains &lt;span&gt;.
- The structure tags that are used: `span, p, blockquote, ul, ol, li, table`.
- List of all valid tags: `iframe, a, span, blockquote, i, s, u, ul, ol, li, strong, p, img, caption, thead, tbody, td, th, tr, table, sup.`
- Styles are added as _inline_, i.e. attribute `style` in html tag.
- Each tag has its own restrictions on attributes and style. See doc/style_config
## Features
- Converts Epub, Docx to JSON(LiveCarta inner format)
- Compatible with python 3
- Very small size (only .py files)
- Multithreaded
## Top level project structure
- `consumer.py` - code which is responsible for receiving messages from rabbitMQ
- class `Access` - contains API code which is responsible for interaction with server.
- class `Solver` - contains code responsible for pipeline of solving the task: receiving book file, conversion, status updating, sending result back to server.
- `livecarta_config.py `- constants that depend on LiveCarta
## How it Works
**2 approaches** in 3 steps each works:
#### Epub
***Step 1*** - Add CSS to HTML inline_style
**Step 2** - Process every HTML chapter of Epub with presets
**Step 3** - Convert dicts of HTML to JSON(LiveCarta inner format)
#### Docx
**Step 1** - Conversion of DOCX to HTML via LibreOffice
**Step 2** - Process HTML with presets
**Step 3** - Conversion of HTML to JSON(LiveCarta inner format)
## Setup
python -m pip install -r requirements.txt
### Development
To fix a bug or enhance an existing module, follow these steps:
- Fork the repo
- Create a new branch (`git checkout -b improve-feature`)
- Make the appropriate changes in the files
- Add changes to reflect the changes made
- Commit your changes (`git commit -am 'Improve feature'`)
- Push to the branch (`git push origin improve-feature`)
- Create a Pull Request
## How to Use
**1.** Run `consumer.py`
The script will be constantly waiting for a message from the queue(RabbitMQ), into which we load the book via Import File to Convert in the admin panel
You can also upload the book that have been converted locally using `def local_convert()` in `consumer.py`
**2.** Run `docx_solver.py`
1. You need to run it on Linux system, but if u're using Windows - just using python docker intepreter
2. Upload a book to books/docx/ and set the variable `docx_file_path = books/docx/book_name` in __main__
**3.** Run `epub_solver.py`
Before that upload a book to books/epub/ and set the variable `epub_file_path = books/epub/book_name` in __main__