forked from LiveCarta/BookConverter
87 lines
3.6 KiB
Markdown
87 lines
3.6 KiB
Markdown
<h1 align="center"> Converter </h1> <br>
|
|
<p align="center">
|
|
<a href="https://livecarta.com/">
|
|
<img alt="LiveCarta converter" title="LiveCarta converter" src="https://assets.openstax.org/oscms-prodcms/media/partner_logos/LiveCarta_Logo.png" width="450">
|
|
</a>
|
|
</p>
|
|
|
|
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
|
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
|
## Table of Contents
|
|
|
|
- [Introduction](#introduction)
|
|
- [Features](#features)
|
|
- [Top level project structure](#top level project structure)
|
|
- [How it Works](#how it works)
|
|
- [Setup](#setup)
|
|
- [Development](#development)
|
|
- [How to use](#how to use)
|
|
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
|
## Introduction
|
|
This is a Python 3 project for converting Docx|Epub documents -> LiveCarta inner format.
|
|
Livecarta book format is tree structure, where nodes are chapters.
|
|
Livecarta chapter is title + html code. Livecarta html code follows some restrictions:
|
|
|
|
- On structure. It should be not nested, the less nested tags, the better.
|
|
- It is only allowed that the <p> contains <span>, <span> contains <span>.
|
|
- The structure tags that are used: `span, p, blockquote, ul, ol, li, table`.
|
|
- List of all valid tags: `iframe, a, span, blockquote, i, s, u, ul, ol, li, strong, p, img, caption, thead, tbody, td, th, tr, table, sup.`
|
|
- Styles are added as _inline_, i.e. attribute `style` in html tag.
|
|
- Each tag has its own restrictions on attributes and style. See doc/style_config
|
|
|
|
## Features
|
|
- Converts Epub, Docx to JSON(LiveCarta inner format)
|
|
- Compatible with python 3
|
|
- Very small size (only .py files)
|
|
- Multithreaded
|
|
|
|
## Top level project structure
|
|
- `consumer.py` - code which is responsible for receiving messages from rabbitMQ
|
|
- class `Access` - contains API code which is responsible for interaction with server.
|
|
- class `Solver` - contains code responsible for pipeline of solving the task: receiving book file, conversion, status updating, sending result back to server.
|
|
- `livecarta_config.py `- constants that depend on LiveCarta
|
|
|
|
## How it Works
|
|
**2 approaches** in 3 steps each works:
|
|
#### Epub
|
|
***Step 1*** - Add CSS to HTML inline_style
|
|
|
|
**Step 2** - Process every HTML chapter of Epub with presets
|
|
|
|
**Step 3** - Convert dicts of HTML to JSON(LiveCarta inner format)
|
|
|
|
#### Docx
|
|
**Step 1** - Conversion of DOCX to HTML via LibreOffice
|
|
|
|
**Step 2** - Process HTML with presets
|
|
|
|
**Step 3** - Conversion of HTML to JSON(LiveCarta inner format)
|
|
|
|
## Setup
|
|
|
|
python -m pip install -r requirements.txt
|
|
|
|
### Development
|
|
To fix a bug or enhance an existing module, follow these steps:
|
|
|
|
- Fork the repo
|
|
- Create a new branch (`git checkout -b improve-feature`)
|
|
- Make the appropriate changes in the files
|
|
- Add changes to reflect the changes made
|
|
- Commit your changes (`git commit -am 'Improve feature'`)
|
|
- Push to the branch (`git push origin improve-feature`)
|
|
- Create a Pull Request
|
|
|
|
## How to Use
|
|
**1.** Run `consumer.py`
|
|
The script will be constantly waiting for a message from the queue(RabbitMQ), into which we load the book via Import File to Convert in the admin panel
|
|
You can also upload the book that have been converted locally using `def local_convert()` in `consumer.py`
|
|
|
|
**b.** Run `docx_solver.py`
|
|
1. You need to run it on Linux system, but if u're using Windows - just using python docker intepreter
|
|
2. Upload a book to books/docx/ and set the variable `docx_file_path = books/docx/book_name` in __main__
|
|
|
|
**c.** Run `epub_solver.py`
|
|
Before that upload a book to books/epub/ and set the variable `epub_file_path = books/epub/book_name` in __main__
|