The UnZip package contains
ZIP extraction utilities. These are
useful for extracting files from ZIP
archives. ZIP archives are created
with PKZIP or Info-ZIP utilities, primarily in a DOS
environment.
The UnZip package has some locale related issues. See the discussion below in the section called “UnZip Locale Issues”. A more general discussion of these problems can be found in the Program Assumes Encoding section of the Locale Related Issues page.
Download (HTTP): http://downloads.sourceforge.net/infozip/unzip552.tar.gz
Download (FTP): ftp://tug.ctan.org/tex-archive/tools/zip/info-zip/src/unzip552.tar.gz
Download MD5 sum: 9d23919999d6eac9217d1f41472034a9
Download size: 1.1 MB
Estimated disk space required: 6.7 MB
Estimated build time: 0.1 SBU
User Notes: http://wiki.linuxfromscratch.org/blfs/wiki/unzip
Use of UnZip in the JDK, Mozilla, DocBook or any other BLFS package installation is not a problem, as BLFS instructions never use UnZip to extract a file with non-ASCII characters in the file's name.
The UnZip package assumes that
filenames stored in the ZIP archives created on non-Unix systems
are encoded in CP850, and that they should be converted to
ISO-8859-1 when writing files onto the filesystem. Such assumptions
are not always valid. In fact, inside the ZIP archive, filenames
are encoded in the DOS codepage that is in use in the relevant
country, and the filenames on disk should be in the locale
encoding. In MS Windows, the OemToChar() C function (from
User32.DLL) does the correct
conversion (which is indeed the conversion from CP850 to a superset
of ISO-8859-1 if MS Windows is set up to use the US English
language), but there is no equivalent in Linux.
When using unzip to unpack a ZIP archive containing non-ASCII filenames, the filenames are damaged because unzip uses improper conversion when any of its encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R locale, conversion of filenames from CP866 to KOI8-R is required, but conversion from CP850 to ISO-8859-1 is done, which produces filenames consisting of undecipherable characters instead of words (the closest equivalent understandable example for English-only users is rot13). There are several ways around this limitation:
1) For unpacking ZIP archives with filenames containing non-ASCII characters, use WinZip while running the Wine Windows emulator.
2) After running unzip, fix the damage made to the filenames using the convmv tool (http://j3e.de/linux/convmv/). The following is an example for the ru_RU.KOI8-R locale:
Step 1. Undo the conversion done by unzip:
convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \</path/to/unzipped/files>Step 2. Do the correct conversion instead:
convmv -f cp866 -t koi8-r -r --nosmart --notest \</path/to/unzipped/files>
3) Apply this patch to unzip: https://bugzilla.altlinux.ru/attachment.cgi?id=532. It will apply with some offsets.
It allows to specify the assumed filename encoding in the ZIP
archive using the -O charset_name
option and the on-disk filename encoding using the -I charset_name option. Defaults: the on-disk
filename encoding is the locale encoding, the encoding inside the
ZIP archive is guessed according to the builtin table based on the
locale encoding. For US English users, this still means that unzip
converts from CP850 to ISO-8859-1 by default.
Caveat: this method works only with 8-bit locale encodings, not with UTF-8. Attempting to use a patched unzip in UTF-8 locales may result in a segmentation fault and is probably a security risk.
Note that if you applied the patch described above for locale issues, the required security patch will have some offsets. Now install UnZip by running the following commands:
patch -Np1 -i ../unzip-5.52-security_fix-1.patch && make -f unix/Makefile LOCAL_UNZIP=-D_FILE_OFFSET_BITS=64 linux
To test the results, issue: make check.
Now, as the root user:
make prefix=/usr install
linux: This target in the
Makefile makes assumptions that are
useful for a Linux system when compiling the executables. To obtain
alternatives to this target, use make
list
LOCAL_UNZIP=...: This sets
the compilation flags to allow UnZip to handle files up to 4 GB.
Last updated on 2007-04-04 21:42:53 +0200