Category: developers journal

PHP’s ZipArchive and zip archives made on OSX gotcha

Author: seven September 20, 2010

Think happy posts! :) While trying to to unzip (extract) with php v5.3, which are made on osx using builtin ZipArchive library, I've encountered mysterious problem. ZipArchive was seeing dir (folder) separators and filenames inside zip archive in a really screwed up way:

dizajn publikacija:jelovnik/1:poc?etna.jpg -> this is a folder "dizajn publikacija" subfolder "jelovnik" and file "1_početna.jpg"
1:poc?etna.jpg - file with a name "1_početna.jpg"

Encoding problem of some sort obviously, but I really didn't have time to find which one (if that really is the case or maybe I've again scored a new undocumented bug).

So I kissed ZipArchive's extractTo() method good bye, and written my own extraction by using getStream() method and writing the output with fwrite. Good thing that I only had to unzip all files to a single folder (flatten archive) so I couldn't care less about folders. Files did however contain ":" sign which is very illegal on windows file system. So to bypass it, I've "normalized" the filename to something more relaxing for storage on web server (pseudo code):

...
$entry = $zip->getNameIndex($i);
$base = basename($entry);
$base = str_replace(array(' ', '-'), array('_','_'),$base) ;
$base = preg_replace('/[^A-Za-z0-9_\.]/', '', $base) ; 
...

This didn't preserve filenames but I got the content out of the archive.

Here is full code (it lacks proper error handling):

function unzip_archive($zipfile, $destination)
{
	
	$zipfile = str_replace("\\","/",$zipfile); 
	$destination = str_replace("\\","/",$destination); 
	
	if(!file_exists($zipfile)) throw new Exception('No such file.');


    if (!is_dir ($destination) )
    {
        $oldumask = umask(0);
        if(!mkdir($destination , 0777))
        {
            throw new Exception('Cannot create destination folder.');
        }
        umask($oldumask);
    }
    $zip = new ZipArchive;
    if ( $zip->open( $zipfile ) )
    {
        for ( $i=0; $i < $zip->numFiles; $i++ )
        {
            $entry = $zip->getNameIndex($i);
			if ( substr( $entry, -1 ) == '/' ) continue; // skip directories 
			$pattern = '/(^._|.DS_Store|__MACOSX)/';
			$matched = preg_match($pattern, $entry, $matches);
			if ($matched) { 
				//echo $entry; print_r($matches);
				continue;
			}

            $base = basename($entry);
            $base = str_replace(array(' ', '-'), array('_','_'),$base) ;
            $base = preg_replace('/[^A-Za-z0-9_\.]/', '', $base) ; 
       	
            // $zip->extractTo($destination, array($entry));
            //echo $zip->getStatusString();
            
            $fp = $zip->getStream( $entry );
            $ofp = fopen( $destination.'/'.$base, 'w' );
           
            if ( ! $fp )
                throw new Exception('Unable to extract the file.');
           
            while ( ! feof( $fp ) )
                fwrite( $ofp, fread($fp, 8192) );
           
            fclose($fp);
            fclose($ofp);            
        }

        $zip->close();
        return true;
        
    }
    else
    {	// nije zip arhiva?
    	return false;	
    }
}
Author
seven
CEO/CTO at Nivas®
Neven Jacmenović has been passionately involved with computers since late 80s, the age of Atari and Commodore Amiga. As one of internet industry pioneers in Croatia, since 90s, he has been involved in making of many award winning, innovative and successful online project. He is experienced full stack web developer, analyst and system engineer. In his spare time, Neven is transforming retro-futuristic passion into various golang, Adobe Flash and JavaScript/WebGL projects.

    2 thoughts on “PHP’s ZipArchive and zip archives made on OSX gotcha”

  • RegEx? Normalized? Filename? Flatten archive?

    SGU: September 28 at 9/8c
    V: November 2010
    Caprica: October 5 10/9c

    :)

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>