免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1468 | 回复: 0
打印 上一主题 下一主题

Using the new ZIP [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-02-04 09:21 |只看该作者 |倒序浏览

Introducing ZIP
The ZIP file format, introduced in 1989, helps computer users transport large files by removing the empty spaces and redundant information from a file and by packaging multiple files into one bundle. Originally created by Phil Katz for the shareware program PKZIP, ZIP is a speedy, reliable, and open-specification compression format. These factors made ZIP a hit among PC users, and its open specification made it possible for programmers to build ZIP into Windows®, OS X, Konqueror, and Nautilus file managers. It's safe to consider ZIP a file compression format that almost every computer can handle.
Compression in general has many benefits, either related to the smaller file size or the ability to bundle several files into one archive file. When you reduce the file size, it conveniently reduces the bandwidth and storage overhead by the amount of the reduction. Size requirements are obvious, but the ability to bundle several files together is a huge benefit when the only other option is to download one after the other. You can offer your users the ability to create one download from many single files, making it much more convenient for you to get your wares to them.
Since ZIP is available on almost all computers, it is a great choice for compression for applications, especially if users will be downloading files to their desktops. For example, Windows XP users have the convenience of native ZIP support, called compressed folders. When a Windows XP user downloads a ZIP file, it appears as compressed folder the user can double-click on to open. OS X, Konqueror, and Nautilus users also need only double-click on ZIP files to open them.


Back to top
Where did that space go?
Files selected to be put into an archive (an alternate name for a ZIP file) are listed into a file structure, then deflated or compressed. The files are compressed into a ZIP file using one of many compression algorithms, but they all operate the same way: by removing redundant data or whitespace. This is especially effective when applied to a text file where there are lots of repeated words and punctuation that can be de-duplicated to save space. When working with a graphic file, such as a JPEG image, the compression doesn't work as well because the data is more difficult to de-duplicate.
There are several methods of reducing file size, and they are all common to the ZIP format. Every language or application that can speak ZIP knows these formats, and PHP is no different. PHP uses one of these algorithms to compress files just like other applications and gets the same results any other language or program would. Let's go through an example.
Let's take a quote often attributed to one of my favorite historical figures, Winston Churchill. The text file Winston.txt has the following text: "You make a living by what you get; you make a life by what you give."
To compress the file Winston.txt, we can remove unnecessary space by creating an index -- much like a search engine does -- and replacing the file components with a copy of this index and a listing of how the index components will lay out. With a very small example such as this one, the index can be close to as big as the example as if there were no duplicated words. Only when a word is duplicated is it added to the index, so this shouldn't be much of a problem.
In our example, we have the following duplicated words:
  • "we" is used four times
  • "make" is used two times
  • "a" is used two times
  • "by" is used twice
  • "what" is used twice
    We are left with the following single words that will simply be written out:
  • "living"
  • "get"
  • "life"
  • "give"
    When we write out our example using the index, we get "1 2 3 living 4 5 1 get, 1 2 3 life 4 5 1 give."
    You can already see that the file now contains less information. To be exact, you can do the math and determine how much you have saved. With the original file, you had a total of 64 characters, including punctuation. If you reduce the annotation in the index to something simpler like 1we4, you reduce the index to 23 characters and you add that to the actual string character total of 33, resulting in 56. That isn't a huge reduction in size, but you will start to see serious improvements in your compression if you added the rest of that speech to the file. The index would grow larger, but the number of instances of each word would increase, as well. Index overhead would start to reduce, and you would see a good compression rate begin to materialize.
    Compression for text files tends to average around 70 percent and for more complex and difficult-to-compress files, such as images, it tends to be around 10 percent.


    Back to top
    What is in a ZIP?
    Because ZIP archives can have multiple files, they have their own file structures, much like your local computer's file system. This structure is built around entries and allows a crafty coder to pick out just one file among many when decompressing. This can be handy when you only want one picture or one text file from the bunch. This structure also maintains directory information, which is extremely useful in transporting Web sites or other groups of files that have file system relationship dependencies.
    Each file has a set of information associated with it available to you in your PHP script. When you open the file, you will be able to access this information in the ZipArchive object you create. This information can be used for various purposes, such as verifying the unzipped file size or just creating a listing of the contents of a ZIP file without opening it.
    Part of the ZIP features of PHP allow you to handle these file structures in a very useful way. In the next example, we will show a couple ways to look at these and how to create the ZIP file structure while creating ZIP files.


    Back to top
    Putting the squeeze on
    Let's get going on an example of using PHP to make a ZIP file. The code examples I'm using are almost exactly the examples set forward by the PHP.net manual documentation, with a few modifications. (See
    Resources
    for the ZIP function page.) Let's go through the steps of creating a ZIP file and see how to build the file structure properly. We will insert strings as files into the new ZIP, as well as take existing files into the new ZIP.
    First, we need to have a text file handy to add to the ZIP.
    Listing 1. testtext.txt
                   
    Had I the heavens' embroidered cloths,
    Enwrought with golden and silver light,
    The blue and the dim and the dark cloths
    Of night and light and the half-light,
    I would spread the cloths under your feet:
    But I, being poor, have only my dreams;
    I have spread my dreams under your feet,
    Tread softly because you tread on my dreams.
    William Butler Yeats
    You may be testing this code to see the results of this effort firsthand. If you are, you should cut and paste Listing 1 and put it in the same directory as your PHP scripts. Save this file as testtext.txt because we will be referencing this name in the PHP code.
    Next, we need to create a ZIP file.
    Listing 2. zipcreate.php
                   
    open($filename, ZIPARCHIVE::CREATE)!==TRUE) {
       exit("cannot open \n");
    }
    $zip->addFromString("firstfile." . time() . ".txt", \
    "This is the first file in our ZIP, added as
    firstfile.txt.\n");
    $zip->addFromString("testdir/secondfile." . time() . ".txt", \
    "This is the second file in our ZIP,
    added as secondfile.txt in the testdir directory.\n");
    echo "numfiles: " . $zip->numFiles . "\n";
    $zip->close();
    ?>
    We start out by creating a new ZIP archive object in line one. Once we have our object, we set up a variable for the filename for later use in the script. We could call it out explicitly each time, but it's easier to set up a variable for multiple uses. The constant ZIPARCHIVE::CREATE indicates that a new file should be created if no file already exists by that name. If this is the first time you run the script, the ZIP file will be created new. If this does not occur, your script displays an error and exits.
    Next, use the method addFromString(), which allows you to create an entry in the ZIP archive using string data or from a string variable. You need to create a file handle to identify it in the ZIP file structure. In this case, we used firstfile.txt as its name, along with the time it was created. The string data is then stored. We go on to do this a second time with secondfile.txt, but this time we added a directory. You can see that the file identifier secondfile.txt is actually listed as having one level deeper in the path. It will appear in the testdir directory.
    We show the number of files in the archive to our users and close the file, saving it as a ZIP archive, ready for download. You may notice that when you continue to press Refresh or otherwise run the PHP script, the number of files in the ZIP will grow. This is because we keep opening the same file, located in the same directory, adding the two strings as files, and closing again.
    Figure 1. Sample output from zipcreate.php


    As you may notice, we didn't actually get a new file to add to the ZIP. We need to get our poetry in there so our user gets some culture. Let's add that now.
    Listing 3. Adding an external file to newzip.zip
                   
    open($filename, ZIPARCHIVE::CREATE)!==TRUE) {
       exit("cannot open \n");
    }
    $zip->addFromString("firstfile." . time() . ".txt", \
    "This is the first file in our ZIP, added as
    firstfile.txt.\n");
    $zip->addFromString("testdir/secondfile." . time() . ".txt", \
    "This is the second file in our ZIP,
    added as secondfile.txt.\n");
    $zip->addFile("testtext.txt");
    echo "numfiles: " . $zip->numFiles . "\n";
    $zip->close();
    ?>
    So, the paths being relative, we simply point to the file we want to add to the ZIP and use the addFile() method to pack it in. If the file testtext.txt is in the same directory, this will result in adding it to the archive. We have now created a new archive file from arbitrary string data and have added an external file to our archive. These are the most common tasks when building a new archive.


    Back to top
    Cracking it open
    ZIP files wouldn't be much use if we couldn't also get the files back out of the archive for normal use. Some programs can read files directly out of archives, but these programs must decompress the files first. Most commonly, we would simply open the entire file and expand it to its individual component files to prepare them for normal use. For our purposes, we will open our previously created ZIP file and see what's inside.
    Listing 4. zipread.php
                   
    open($filename)!==TRUE) {
       exit("cannot open \n");
    }
    print_r($zip);
    var_dump($zip);
    echo "
    ";
    echo "The file " .$filename. " has the following files:\n
    ";
    for ($i=0; $inumFiles;$i++) {
       echo "index: $i\n";
       print_r($zip->statIndex($i));
       echo "
    \n";
    }
    $zip->extractTo('./testdestination/');
    $zip->close();
    ?>
    As usual, we create a new instance of the ZipArchive class in the form of a variable named $zip. Using the open() method of ZipArchive, we open the ZIP archive we created. The if statement acts as simple error control, exiting our script if it does not find the file with a somewhat graceful error for our users. If we successfully open the file, the script moves on and prints some information about the ZIP archive to our users.
    Here, we accomplish two important tasks. We list what is in the ZIP archive, file by file. Since we are outputting the index array from the $zip object, we are getting a lot of data, including file size and checksum information. To trim this down, you can simply look at the individual properties of a file at a specific index.
    Once we have printed out what's in the file, we extract the whole thing into a directory named testdestination. If this directory isn't available, it will be created for us. Something to note at this point is that if the directory is already in place, or if files of the same name are in the target directory for the extraction, the ZIP functions will overwrite whatever is in place.
    We have opened the ZIP archive and prepared the files for use by saving them to a local directory and listed the contents, perhaps in preparation to modify the original ZIP file. These simple tasks are just the beginning and hardly the most complex application of file compression. Smart use of compression can do much more to add convenience to all kinds of file transfers. The native support of ZIP in PHP will allow a great deal of file transfer issues to be resolved.


    Back to top
    Summary

    Share this...





    Digg this story






    Post to del.icio.us



    [url=javascript:location.href='http://slashdot.org/bookmark.pl?url='+encodeURIComponent(location.href)+'&title='+encodeURIComponent(document.title)]

    [/url]
    [url=javascript:location.href='http://slashdot.org/bookmark.pl?url='+encodeURIComponent(location.href)+'&title='+encodeURIComponent(document.title)]Slashdot it![/url]



    ZIP is a great way to reduce bandwidth overhead or storage usage when handling large files that have a lot of white space or repeated data. We can pull out a lot of redundant or open space in a file and reduce it to its essence, making the file much more compact. This obviously has the result of reducing its footprint on our file system, as well as the overall bandwidth overhead when we are moving the file around.
    One potential application is when you need to upload a large number of files to the server, like when you upload photos to a photo gallery or when you need to upload a number of text files. Rather than arduously work through the upload dialogue boxes for each file, you can simply ZIP the file and have the upload script uncompress the files, as well. This can remove a headache's worth of clicking the Browse dialogue.
    In general, it's a great idea to ZIP files that won't be accessed directly or files that will be downloaded before use -- and we finally have this capability native to PHP.
    Resources
    Learn


    本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/10599/showart_476031.html
  • 您需要登录后才可以回帖 登录 | 注册

    本版积分规则 发表回复

      

    北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
    未成年举报专区
    中国互联网协会会员  联系我们:huangweiwei@itpub.net
    感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

    清除 Cookies - ChinaUnix - Archiver - WAP - TOP