{"id":1745,"date":"2010-09-20T20:46:37","date_gmt":"2010-09-20T19:46:37","guid":{"rendered":"http:\/\/www.nivas.hr\/blog\/?p=1745"},"modified":"2011-01-12T12:45:51","modified_gmt":"2011-01-12T11:45:51","slug":"phps-ziparchive-and-zip-archives-made-on-osx-gotcha","status":"publish","type":"post","link":"https:\/\/www.nivas.hr\/blog\/2010\/09\/20\/phps-ziparchive-and-zip-archives-made-on-osx-gotcha\/","title":{"rendered":"PHP&#8217;s ZipArchive and zip archives made on OSX gotcha"},"content":{"rendered":"<p>Think happy posts! :) While trying to to unzip (extract) with php v5.3, which are made on osx using builtin <a href=\"http:\/\/php.net\/manual\/en\/book.zip.php\">ZipArchive<\/a> library, I&#8217;ve encountered mysterious problem. ZipArchive was seeing dir (folder) separators and filenames inside zip archive in a really screwed up way:<\/p>\n<p><strong>dizajn publikacija:jelovnik\/1:poc?etna.jpg<\/strong> -> this is a folder &#8220;dizajn publikacija&#8221; subfolder &#8220;jelovnik&#8221; and file &#8220;1_po\u010detna.jpg&#8221;<br \/>\n<strong>1:poc?etna.jpg<\/strong> &#8211; file with a name &#8220;1_po\u010detna.jpg&#8221;<\/p>\n<p>Encoding problem of some sort obviously, but I really didn&#8217;t have time to find which one (if that really is the case or maybe I&#8217;ve again scored a new undocumented bug).<br \/>\n<!--more--><\/p>\n<p>So I kissed ZipArchive&#8217;s extractTo() method good bye, and written my own extraction by using getStream() method and writing the output with fwrite. Good thing that I only had to unzip all files to a single folder (flatten archive) so I couldn&#8217;t care less about folders. Files did however contain &#8220;:&#8221; sign which is very illegal on windows file system. So to bypass it, I&#8217;ve &#8220;normalized&#8221; the filename to something more relaxing for storage on web server (pseudo code):<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">...\r\n$entry = $zip-&gt;getNameIndex($i);\r\n$base = basename($entry);\r\n$base = str_replace(array(' ', '-'), array('_','_'),$base) ;\r\n$base = preg_replace('\/[^A-Za-z0-9_\\.]\/', '', $base) ; \r\n...<\/pre>\n<p>This didn&#8217;t preserve filenames but I got the content out of the archive.<\/p>\n<p>Here is full code (it lacks proper error handling):<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\r\nfunction unzip_archive($zipfile, $destination)\r\n{\r\n\t\r\n\t$zipfile = str_replace(&quot;\\\\&quot;,&quot;\/&quot;,$zipfile); \r\n\t$destination = str_replace(&quot;\\\\&quot;,&quot;\/&quot;,$destination); \r\n\t\r\n\tif(!file_exists($zipfile)) throw new Exception('No such file.');\r\n\r\n\r\n    if (!is_dir ($destination) )\r\n    {\r\n        $oldumask = umask(0);\r\n        if(!mkdir($destination , 0777))\r\n        {\r\n            throw new Exception('Cannot create destination folder.');\r\n        }\r\n        umask($oldumask);\r\n    }\r\n    $zip = new ZipArchive;\r\n    if ( $zip-&gt;open( $zipfile ) )\r\n    {\r\n        for ( $i=0; $i &lt; $zip-&gt;numFiles; $i++ )\r\n        {\r\n            $entry = $zip-&gt;getNameIndex($i);\r\n\t\t\tif ( substr( $entry, -1 ) == '\/' ) continue; \/\/ skip directories \r\n\t\t\t$pattern = '\/(^._|.DS_Store|__MACOSX)\/';\r\n\t\t\t$matched = preg_match($pattern, $entry, $matches);\r\n\t\t\tif ($matched) { \r\n\t\t\t\t\/\/echo $entry; print_r($matches);\r\n\t\t\t\tcontinue;\r\n\t\t\t}\r\n\r\n            $base = basename($entry);\r\n            $base = str_replace(array(' ', '-'), array('_','_'),$base) ;\r\n            $base = preg_replace('\/[^A-Za-z0-9_\\.]\/', '', $base) ; \r\n       \t\r\n            \/\/ $zip-&gt;extractTo($destination, array($entry));\r\n            \/\/echo $zip-&gt;getStatusString();\r\n            \r\n            $fp = $zip-&gt;getStream( $entry );\r\n            $ofp = fopen( $destination.'\/'.$base, 'w' );\r\n           \r\n            if ( ! $fp )\r\n                throw new Exception('Unable to extract the file.');\r\n           \r\n            while ( ! feof( $fp ) )\r\n                fwrite( $ofp, fread($fp, 8192) );\r\n           \r\n            fclose($fp);\r\n            fclose($ofp);            \r\n        }\r\n\r\n        $zip-&gt;close();\r\n        return true;\r\n        \r\n    }\r\n    else\r\n    {\t\/\/ nije zip arhiva?\r\n    \treturn false;\t\r\n    }\r\n}\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Think happy posts! :) While trying to to unzip (extract) with php v5.3, which are made on osx using builtin ZipArchive library, I&#8217;ve encountered mysterious problem. ZipArchive was seeing dir (folder) separators and filenames inside zip archive in a really screwed up way: dizajn publikacija:jelovnik\/1:poc?etna.jpg -> this is a folder &#8220;dizajn publikacija&#8221; subfolder &#8220;jelovnik&#8221; and&#8230;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts\/1745"}],"collection":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/comments?post=1745"}],"version-history":[{"count":8,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts\/1745\/revisions"}],"predecessor-version":[{"id":1853,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts\/1745\/revisions\/1853"}],"wp:attachment":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/media?parent=1745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/categories?post=1745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/tags?post=1745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}