How I shrank texlive 2023 built from source. -------------------------------------------- License: CC-BY https://creativecommons.org/licenses/by/4.0/ (c) 2023 Ken Moffat Introduction: ------------- I keep backups of the root filesystems of my active systems (initially rsync over nfs), and as the size of every package has grown I'm finding that the space taken by the filesystems increases. Some of my root filesystems are small (20GB - I keep /home, /sources and /scratch (a workspace) separately) and the space for a full texlive from source, as well as for the backups, can become a problem. If you are using binary texlive, my approach here will NOT work for you (it breaks tlmgr updates). In texlive, the package database 'tlpdb' determines what gets used to assemble packages and put them into schemes. Some interesting (to me) observations are that 'source' for everything is included (some of that, e.g. for OTF or TTF fonts, requires tools that almost no BLFS user is likely to want), and that items very rarely get removed. My initial attempt looked at not building certain programs I knew that I did not want. But in general the listed configure options tend not to work, or to have unfortunate side-effects (e.g. removing a whole chunk of programs). Since one of the biggest parts of a full texlive 2023 installation is the fonts, I decided to begin by removing what _I_ have no intention of using. In a full 2023 texmf-dist (7.4G) the big parts are doc (3.5G) and fonts (2.7G). Doing this was "fun" (in the usual programmer's or sysadmin's use of that word). For everything except 'context', using strace can eventually determine where things are going wrong. But getting to an "I think this will work" situation and then testing it needs two systems (one with the current cut-down attempt, one with the full version to compare strace outputs) and keeping full texmf-dist and a tarball of font files I thought I needed in a work area. Plus, of course, the documents I wanted to render. At the completion of the *full* exercise I've reduced my 2023 texmf-dist to 1.8 GB and I can render all my current test documents, including things I hope to add to my tests. Please note that I have no interest in typesetting equations, except in an old context example. There is still a lot of small stuff which I could probably remove, but spending time on doing and testing that does not seem worthwhile. 1.Fonts. -------- 1.1 What actually happens when TeX variants render a file. ---------------------------------------------------------- A codepoint needs to be mapped to a glyph, and then tfm (TeX Font Metrics) files are used to get the glyph's width, heights. etc. In the plain TeX case Type1 pfb files are used to get the glyph shapes. For documents using pdflatex various sizes of computer modern tfm files are used, then it appears to look for corresponding cm*.vf files (none were found) before using the type1 pfb fonts. For UTF-8 text European Computer Modern (jknappen/ec/) tfm and type1 fonts may be needed. Lualatex seems to default to using Latin Modern opentype public fonts, plus whatever you specify using fontspec (and tfm files are not needed for opentype fonts). Asymptote uses computer modern fonts and (if needed) type1 urw fonts such as helvetic (for the Helvetica pen in one of my examples). Xelatex seems to default to latin modern opentype public fonts plus whatever you specify using fontspec, but in one of my examples it apparently used computer modern tfm files. Trying to add a different opentype font back into my system failed, suggesting that tfm files were needed.. Dvisvgm uses computer modern tfm and type1 files. Uplatex (an older japanese vertical-typesetting command) uses tfm files from uptex-fonts and computer-modern, and is followed by dvipdfmx which then uses cm type1 for English text and opentype HaranoAjiMincho-Regular for normal text, with HaranoAjiGothic-Medium for the headings. Context (mkiv) defaults to Latim Nodern but can use opentype fonts. 1.2 Debugging. -------------- To get all debug output when an engine fails to render because something is missing, use the KPATHSEA_DEBUG=-1 environment variable. There will be a lot of output, best to log it. In some cases it is not obvious why an engine failed - for those, you need to be able to render the document with the debug information on a full system and then compare the logs. 2 hash tables inc map files 4 file open and close 32 file searches Later testing suggests KPATHSEA_DEBUG=32 provides more than enough info. NB context cannot use this approach and will fall over at the first error before eventually stopping. If you have a known-good context document you should be able to scroll back to see why it failed (bad documents include those with typos in the commands, and the output may be less than clear). 1.3 The directories in texmf-dist/fonts. ---------------------------------------- du -sch /opt/texlive/2023/texmf-dist/fonts/* 207M /opt/texlive/2023/texmf-dist/fonts/afm 864K /opt/texlive/2023/texmf-dist/fonts/cid 22M /opt/texlive/2023/texmf-dist/fonts/cmap 22M /opt/texlive/2023/texmf-dist/fonts/enc 36K /opt/texlive/2023/texmf-dist/fonts/lig 11M /opt/texlive/2023/texmf-dist/fonts/map 12M /opt/texlive/2023/texmf-dist/fonts/misc 23M /opt/texlive/2023/texmf-dist/fonts/ofm 366M /opt/texlive/2023/texmf-dist/fonts/opentype 872K /opt/texlive/2023/texmf-dist/fonts/ovf 2.7M /opt/texlive/2023/texmf-dist/fonts/ovp 212K /opt/texlive/2023/texmf-dist/fonts/pk 884K /opt/texlive/2023/texmf-dist/fonts/sfd 85M /opt/texlive/2023/texmf-dist/fonts/source 622M /opt/texlive/2023/texmf-dist/fonts/tfm 298M /opt/texlive/2023/texmf-dist/fonts/truetype 661M /opt/texlive/2023/texmf-dist/fonts/type1 37M /opt/texlive/2023/texmf-dist/fonts/type3 319M /opt/texlive/2023/texmf-dist/fonts/vf 2.7G total 1.3.1 afm --------- Adobe font metrics files, for type1 postscript. 1.3.2 cid --------- Fontforge cidmap files to map from (adobe) CIDs (Character IDs) to unicode code points. 1.3.3 cmap ---------- The CMap and PDF Mapping resources distributed by Adobe. 1.3.4 enc --------- Files listing the codepoints for a font. 1.3.5 lig --------- This contains only files for afm2pl, used to convert afm (Adobe Font Metrics) to TeX pl (property list) files for conversion to tfm files - this preserves kerns and ligatures. 1.3.6 map --------- The map files map glyph or font names. 1.3.7 misc ---------- These appear to be for CJK or (xetex) for mapping various things. 1.3.8 ofm --------- This contains (binary) Omega Font Metrics files for the obsolete omega system of handling large charactersets. 1.3.9 opentype -------------- This contains Open Type Font (otf) files, usable from lualatex and xelatex via fontspec, and from context. 1.3.10 ovf ---------- According to 'file' these ovf files contain TeX virtual font data. From the directory names they seem to be associated with the Omega ofm files (above). 1.3.11 pk --------- PK fonts used to be used by previewers to view documents generated using Type 1 fonts. It turns out that asymptote uses these (pk/ljfour/public/cm/dpi600/) to prepare its own documentation during its tests. 1.3.12 sfd ---------- This contains subfont data for CJK encodings. 1.3.13 source ------------- Source code for changing fonts. 1.3.14 tfm ---------- The TeX Font Metric files. 1.3.15 truetype --------------- TrueType Fonts. 1.3.16 type1 ------------ Type 1 (postscript) fonts. 1.3.17 type3 ------------ Type 3 is an outdated variant of postscript fonts. These fonts do not support hinting (unlike Type 1) so look worse at smaller sizes, but can use all of the postscript language, for example to contain shades of grey or variable stroke widths. 1.3.18 vf --------- Virtual fonts. Quoting texfaq.org: 'Virtual fonts provide a means of collecting bits and pieces together to make the glyphs of a font: the bits and pieces may be glyphs from “other” fonts, rules and other “basic” typesetting commands, and the positioning information that specifies how everything comes together.' 1.4 What I wish to render. -------------------------- It is some years since I started to try to use texlive source to render test documents and to (eventually) test if my attempts to build asy, biber and xindy worked. At that time I mostly looked at what I could find online, so my old files do not specify fonts, they just use the defaults (Latin Modern) and some of what I have uses old markup. Later I found uses for lualatex and xelatex in rendering unicode text with other fonts and in detailing what various fonts contained. Nowadays, I do not think I will create fresh files using non-unicode. Things which I hope to do (time permitting) are to integrate test files for Japanese vertical typesetting, revise my context (mkiv) example to use the public example from the context garden (one of the few examples there that still works), to use dvisvgm to create svg files, and to use mkiv with OTF and TTF files. I have the test examples for all of these. My point in specifying this detail is that you almost certainly will not use all of the fonts I use, but will instead need others specified by the packages you use. 1.5 The font files I needed or decided to keep. ----------------------------------------------- 1.3.1 afm --------- I need afm/public/amxfonts/{cm,cmextra} for pdflatex usage (1.1 MB) 1.3.1 cmap ---------- I kept all of these (22MB) necause when I'm playing with japanese typesetting, particularly vertical typesetting, these get used by (at least) the older (uptex) method. 1.5.2 enc --------- For dvips I copied only the afm2pl, base, cmathbb, cm-lgc, cmsrb, cm-super and cm-unicode directories (i.e. afm2pl in case needed - it is small - base and then only computer-modern variants). For pdftex, t2 and ttf2pk I copied everything (these directories are small). These take only 612 KB. 1.5.3 map --------- I seem to have missed mentioning this in my initial version of this file. I'm now keeping: dvipdfm (for latin modern). dvipdfmx (for a test using uplatex which needs to invoke dvipdfmx). dvips: only amsfonts, cm* and updmap. The amsfonts and cm variants are used where computer modern is in use. The files in updmap are used by asymptote's test suite when determining if ghostscript is old (<=9.13) or new (>= 9.14) and therefore mighti also be used elsewhere in asymptote. fontname (used by dvipdfmx). glyphlist(used by dvipdfmx). luatex (used by context). pdftex (at least the updmap/pdftex.map file is needed for pdflatex, the other two directories are small and maybe useful. At that point I was not concerned about small space savings. vtex: aliases files, again only a small directory. The updated total for this is 33 MB. 1.5.4 opentype -------------- I copied the following, they are all in opentype/public: fandol (simplified chinese), gnu-freefont, haranoaji and haranoaji-extra (japanese), libertine (Linux Libertine etc, I use this), lm and lm-math (Latin Modern), nimbus15 (Nimbus Mono, Nimbus Sans L), tex-gyre (TG Adventor, TG Bonum, TG Cursor, TG Heros, TG Pagella, TG Schola, TGTermes and tex-gyre-math). I particularly use TG Heros in one context file) although I probably don't need the tex-gyre-math. These total 338 MB. 1.5.5 pk -------- After I had discarded an old slimmed-down version which turned out to be missing a few things including context, I had to reinstall and then reinstall the extra programs. Asy complained about the missing pk fonts and said the glyphs would be blank, so I added all 212 KB of this tree. 1.5.6 tfm --------- I copied the following: · adobe (needed for helvetica in one of my asymptote tests, others might be needed for labelling in asymptote), on its own this comes to 27 MB; · jknappen/ec (European Computer Modern); · metapost (not really sure if needed, but not worth retesting without this; · ptex-fonts (not sure if I need this now that I've settled on uptex, again too small to justify retesting); · public: (only amsfonts/cmextra, amsfonts/symbols, cm, lm, mflogo, tipa) One of my tests uses tipa for a glyph classed as phonetic; · uptex-fonts (for vertical japanese using uptex); · urw and urw35vf - both for asymptote; These total 40 MB. 1.5.7 truetype -------------- Although there are some useful fonts here, fontspec (at least in xelatex) can only find TTF fonts accessed by name via fontconfig, so these fonts would need to be accessed by filename. Where I have a use for these fonts (e.g. analysing their coverage) I will need to install them to the system. However, my lasti release of my tests looks for a Japanese serif font and ought to look for a Korean font (I had thought no Korean TTFs were included in texlive). I also hope to look some more at Japanese, Korean and perhaps Traditional Chinese, so I've copied a few fonts: · truetype/public/arphic-ttf/bsmi00lp.ttf · truetype/public/ipaex/ipaex{g,m}.ttf · truetype/public/unfonts-core/Un{Batang,Graphic}*.ttf These total 41 MB. 1.5.8 type1 ----------- I copied the following: · hoekwater (copied all, used by context, the directories are small so I kept the manfnt-font and mflogo-font directories). · public/: amsfonts/{cm,cmextra} (used by pdflatex) cm-super, cm-unicode (can be used by pdflatex) · urw - drop in replacements for Adobe fonts These total 77 MB 1.5.9 vf -------- adobe (actually urw and adobe source*pro), uptex-fonts These total 30 MB and I found a need for some of them when testing vertical japanese text. 1.6 The process after identifying what I think I want to keep. -------------------------------------------------------------- Because the directory structure of fonts is quite deep, I decided it was easiest for me to create a tarball of the items I reuired, in their directories, so that I could then delete everything under fonts/ and load the tarball thre. Obviously, ownerships is root:root. For everything else I delete, it is simplest for me to just delete the appropriate directories. Regardless of *how* the files are removed, it is necessary for root to then run 'mktexlsr' so that the ls-R file in the installed system is up to date. If you are using a DESTDIR method you should be able to cd to your texmf-dist before install and run 'ls -LAR ./ >ls-R'. Please note that a straight BLFS-style source install in the absence of texmf-dist will copy certain scripts to texmf-dist. This is why I do the full install and then remove things afterwards. If you add or remove packages you should probably also run 'fmtutil-sys --all'. 2. Removing things other than fonts from texmf-dist. ---------------------------------------------------- My initial thought was that I wanted to keep all the documentation so that I could access it while offline. But then I realised that for most of the packages I have no idea what they do, nor why I would want to use them, so in the end I decided it would be better to only look online. For modern variants (lualatex and xelatex) I can see a reason for me to look at these even if my internet connection is down. 2.1 Removing documentation -------------------------- 2.1.1 doc/context ----------------- This seems to include a lot of historic stuff and most of it is beyond me. Removing saves 119 MB. 2.1.2 doc/cstex --------------- The 00-README-cslatex says this has been obsolete and unmaintained since October 2012. The remaining files suggest cstex was for typesetting Czech and Slovak. Removing saves 4.6 MB. 2.1.3 doc/eplain ---------------- Expanded plain tex, so no interest to me, removing only saves 1.8 MB. 2.1.4 doc/etex -------------- This was intended to fill the gap between TeX3 and the New Typesetting System and was apparently used for the development of latex2e (which has been in use for many years). Removing only saves 516 KB. 2.1.4 doc/fonts --------------- Documentation, typically including samples in pdf and tex form, for the fonts shipped in texlive. Removing saves 257 MB. 2.1.5 doc/hitex --------------- HiTeX provides hnt (HINT) files for onscreen reading of documents on mobile devices with small screens, using a special viewer which adapts dynamically to the available display area. Removing saves 3.2 MB. 2.1.6 doc/latex --------------- The main documentation for latex packages. Because I would need prompting for why I might want to use any of these, I decided that asking online, followed by reading online docs, is the approach I will use. Removing saves 2.2 GB. 2.1.7 doc/latex-dev ------------------- Documentation for other latex items, such as graphics. Removing saves 60 MB. 2.1.8 doc/luatex ---------------- Documentation for basic things in luatex, such as luaotfload, also old files and various CJK documentation. Removing saves 20MB. 2.1.9 doc/metapost ------------------ Documentation for MetaPost (creating scalable graphics in PostScript). Removing saves 44 MB. 2.1.10 doc/mex -------------- Appears to be for typesetting Polish using PostScript, Removing saves 188 KB. 2.1.11 doc/omega ---------------- An old system for handling large character sets. Removing only saves 848 KB. 2.1.12 doc/optex ---------------- OpTEX is a LuaTEX format with Plain TEX and OPmac, intended to be a modern Plain TEX with power from OPmac macros, using preferred Unicode fonts. Removing saves 2.4 MB. 2.1.13 doc/otherformats ----------------------- Files for jadetex, lollipop, psizzl, startex, texsis, xmltex. Removing saves 2.1 MB. 2.1.14 doc/plain ---------------- Documentation for tex without latex. Removing saves 56 MB. 2.1.15 doc/platex ----------------- The original variant of latex for japanese typesetting, removing saves 4.5 MB. 2.1.16 doc/pmxchords -------------------- Something for musixtex, removing only saves 668 KB. 2.1.17 doc/ptex --------------- For japanese typesetting (includes coverage of variants such as uptex), removing saves 3.5 MB. 2.1.18 doc/support ------------------ This appears to be documentation for random packages, removing saves 71 MB. 2.1.19 doc/texlive ------------------ Documentation for texlive itself, removing saves 35 MB. 2.1.20 doc/xetex ---------------- Documentation from when XeTeX was developed for the Macintosh, removing saves 2.6 MB. 2.1.21 source ------------- Documentation or source for packages, often in .dtx files (commented source code and the user documentation) and .ltx (latex document) files, removing saves 424 MB. 2.2 Other package files (stylesheets, etc) ------------------------------------------ When I looked at the opentype fonts in tlpdb I realised that many had an associated package. Many of those packages contained tfm, type1, and similar files for use in latex. But I have deleted those tfm and type1 files. Although many of the package directories are trivially small, they are a waste of space. I first looked at a random sample and saw some which were half a megabyte or larger, so decided to look at them all. That is 3 days I won't get back :-( In the end I decided to remove these, and also some associated directories (e.g. courierten had related courier and courierscaled packages. I do not propose to list these directories here, there are more than 180 of them saving 23 MB (over 3200 files) and the choice is specific to my uses. This still leaves many packages I am unlikely to ever want to use, but I see no point in spending more time looking through them and then runnng my tests against what is left. Revision history: 2023-07-30 Additions to get asymptote to run its tests. I realised that I had omitted mentioning map/ in what I needed. Spell-checked. 2023-07-14 Initial version. [EOT]