Development Considerations

Why Markdown?

To provide a flexible base for migrating your notes to the app of your choice.

Sort all iterators with arbitrary order

Reproducibility is more important than memory usage and speed.

# good
for item in sorted(file_or_folder.iterdir()):

# bad
for item in file_or_folder.iterdir():

Why pyinstaller and not nuitka?

I did have a bit of experience in setting up pyinstaller. Nuitka was tested in a PoC, but didn't show any major benefits.

Why is the executable so large?

Pandoc is included and is standalone ~144 MB large. This has the biggest impact on the size. The module sizes in particular can be analyzed by using the following code snippet in the pyinstaller spec file:

coll = COLLECT(
    exe,
    a.binaries,
    a.datas,
    strip=False,
    upx=True,
    upx_exclude=[],
    name="jimmy",
)

The resulting files can be listed and ordered by size by:

$ du -lh --max-depth=2 dist/jimmy | sort -h
12K     dist/jimmy/_internal/src
24K     dist/jimmy/_internal/wheel-0.44.0.dist-info
40K     dist/jimmy/_internal/Markdown-3.7.dist-info
44K     dist/jimmy/_internal/anyblock_exporter
60K     dist/jimmy/_internal/cryptography-43.0.3.dist-info
60K     dist/jimmy/_internal/setuptools
108K    dist/jimmy/_internal/ossl-modules
164K    dist/jimmy/_internal/puremagic
296K    dist/jimmy/_internal/charset_normalizer
2,4M    dist/jimmy/_internal/yaml
11M     dist/jimmy/_internal/cryptography
15M     dist/jimmy/_internal/lib-dynload
144M    dist/jimmy/_internal/pypandoc
213M    dist/jimmy/_internal
262M    dist/jimmy

Why cryptography and not pycryptodome?

They worked both at the first implementation. cryptography made a slightly better impression, so it was chosen.

Format Conversion Paths

graph TD;
    File -- Plain Formats (plaintext,markdown) --> Markdown;

    File -- Pandoc Unsupported Formats (anytype,colornote,tid,zim,zkn3 bbcode) --> pandoc_unsupported[Pyparsing / External Lib];
    pandoc_unsupported --> Markdown;

    %% pandoc paths
    File -- HTML --> Beautifulsoup;
    Beautifulsoup -- Preprocessed HTML String --> Pandoc;
    File -- Pandoc Supported Formats --> Pandoc;
    File -- Encapsulated Pandoc Supported Formats (EML,ENEX,Notion,Zoho) --> note_extraction[Extract Note];
    note_extraction -- Pandoc Supported Formats --> Pandoc;
    Pandoc --> Markdown;

Intermediate Format

  • HTML:
    • Easily modifiable by beautifulsoup and others.
    • Supports wide range of elements that can be "reduced" to Markdown.
    • No additional dependency (beautifulsoup is used already).
  • Pandoc AST:
    • Python: Panflute and pandocfilters aren't up-to-date (problems with tables especially).
    • Lua: Learning curve, second scripting language in this repo.
    • General: Some filters need some preprocessing (in HTML), like iframes.