Computing: Website and Database Programming

Uploading files using CGI and Perl.



This tutorial is intended for website programming or Perl beginners, who on one side have the knowledge to create HTML forms with the user data input fields and to write a Perl CGI-script to retrieve the user data, on the other side, do not know or are not sure how to proceed to upload a file, located on the user's computer to the webserver or to read the user's file and use its content within the uploading Perl script (as this file is located on the client's computer and not on the server, I'll call it remote file in the following paragraphs). The tutorial concentrates on the essential, describing the basics, allowing the readers to create their own upload scripts with a minimum of effort, without neglecting the security issues, that file uploads may cause. The examples, used in the tutorial, are mostly from my DNA molecular weight calculator online application.

HTML forms for file upload.

The first thing to consider is, that in order to be able to upload a file, the form field must be defined with the special multipart/form-data encoding type:
   <form id="form1" name="form1" action="/cgi-bin/dna_molweight.pl" method="post" enctype="multipart/form-data">

We then need "some element" on the form, that gives the user the possibility to specify the file to be uploaded. This element actually has to be a file input field. If we insert <input type="file" name="filename" id="filename"/> into the HTML code, the result on the corresponding webpage will be an upload button, that allows the user to browse to the file. This button (labeled Choose file in Chromium based browsers, labeled Browse... in Firefox) is followed by the text "No file selected", text that will be replaced by the name of the file, after the user has selected one. Here the file input field definition in my DNA application (the checkbox allows to choose between manually input of a DNA sequence and uploading a DNA file) and a screenshot of the DNA application webpage, showing the corresponding upload button (the user not yet having selected a file).
    <input type="checkbox" name="usefile" id="usefile" value="usefile" /> Load DNA sequence from local file:   <input type="file" name="filename" id="filename"/>

DNA molecular weight calculator: Webpage with a file input field for file upload

The submit input field is the same as on other forms. When the resulting Submit button on the webpage is pushed, the name of the remote file will be passed to the CGI script, in the same way as the other user data, entered on the form.

CGI scripts accessing remote files.

There is no special code needed in the CGI script to read data from a remote file. However, for evident reasons, you should always limit the filesize allowed for uploads. The maximum filesize is, of course, application dependent. In my DNA application, I chose 25kb; if the upload consists of photos, some megabytes would be an appropriate limit. The maximum size of the file to be uploaded may be indicated at the beginning of the CGI script (immediately after the use statements), using the $CGI::POST_MAX variable (the value assigned having to be in bytes). Here the first lines of my DNA script:
    #!/usr/bin/perl -I"."
    use strict; use warnings;
    use CGI;
    use CGI::Carp "fatalsToBrowser";
    use File::Basename;
    $CGI::POST_MAX = 1024 * 25;

Another thing, you should always do is to make sure, that the name of the upload file only contains safe characters. This is particularly important, if you intend to store the file on the server. Characters, such as "/" are really dangerous in filenames, as they might allow attackers to upload files to any directory they wanted (or at least, store it somewhere else, as the directory, that you wanted). The best practice is to only allow the following characters in filenames: letters, digits, underscores, hyphens, and (as needed for file extensions) periods, eventually also spaces (converting them to underscores, before saving the file). Additional security could be given by checking the file extension; if, for example, the upload is previewed to be a photo, refuse any files, that are not .png, .jpg, .jpeg and some others.

As some browsers return not only the simple name, but the complete path of the remote file, we need anyway to preview some code to split the filename into path, name and extension. A very simple way to do this, is to use the fileparse function of the File::Basename module (that's the reasons for the use File::Basename; in the code above.

My DNA application first checks, if the DNA has to be read from a remote file (if the checkbox is not selected, it is assumed to be entered manually). If this is the case, it checks, if it has received a filename from the "calling webpage" (this is not the case, if the user didn't select a file, of course, but also if the size of the file selected exceeds the limit defined at the beginning of the script). If the filename is present, the script validates it, refusing to upload any file with a name that contains one or more characters, that are not part of those, defined above, by displaying an "Invalid filename" message (the script doesn't do any extension checking, as the file content, having to be a valid DNA sequence, is checked character by character).
    my $fileusage = ''; my $filename = '';
    my %params = $cgi->Vars;
    $fileusage = $params{'usefile'}; $filename = $params{'filename'};
    if ($fileusage eq 'usefile') {
        if ($filename) {
            my ( $name, $path, $extension ) = fileparse ( $filename, '..*' );
            $filename = $name . $extension; my $filename2 = $filename;
            $filename2 =~ s/[^a-zA-Z0-9 _\.\-]//g;
            if ($filename2 eq $filename) {

                - read DNA sequence from remote file -
            }
            else {
                $mess = "Invalid filename: $filename!";
            }
        }
        else {
            $mess = 'No filename or could not upload local DNA file!';
        }
    }
    else {

        - get DNA sequence from text area -
    }

Reading remote files and saving them (or not) onto the webserver.

Reading a remote file works the same way than reading a local file: using a file handle, associated with the (input) file to get its data. In the case of a local file, the handle is created and opened by the statement open(FH, "input_filename") or die "$!";, in the case of a remote file, we can get the (opened) file handle by using the upload method of the CGI module:
    my $FH = $cgi->upload("remote_input_filename");
Having the file's handle, we can use its data in exactly the same way as we do with local files. Two examples to show the basics in the following paragraphs.

Saving a remote file onto the webserver.

This is not, what is done by my DNA application (why should I save a file that contains user data passed to the app, but not needed by myself or otherwise useful to have a copy on the server?), but would for example be the procedure to follow in the case of a photo site, where users can upload their pictures. But, first, we need a file upload directory. This directory may be located anywhere within the file structure. If you want the pictures to be visible on your website, you have to use a subdirectory of the server's htdocs (ex: /home/public_html/htdocs/upload). In other cases (for example a photo site, where you want to view the photos, before they become accessible by everyone), you may prefer a directory outside the htdocs (ex: /home/photos/new). In both cases, make sure that the webserver has (read and) write permission for the upload directory (on Unix systems, a chmod 777 should work fine). The following code uses the file handle, passed by the "calling website", to simply copy the remote's file content (for example a photo) to a file on the webserver (please note, that "filename" is the HTML id of the file input field on the form).
    my $upload_dir = "/home/public_html/htdocs/upload";
    my $filename = $query->param("filename");
    my $FH_IN = $query->upload("filename");
    open(FH_OUT, ">$upload_dir/$filename")
        or die "$!";
    binmode FH_OUT;
    while (<$FH_IN>) {
        print FH_OUT;
    }
    close FH_OUT;

Reading a remote file to work with its data.

This is the case of my DNA application, that reads the sequence from a file, stored on the user's computer, and uses the file's content to calculate the molecular weight of the sequence. Obviously, no need to save the DNA sequence on the webserver. Here the code (again, "filename" is the HTML id of the file input field on the form).
    my $FH_IN = $query->upload("filename");
    while (<$FH>) {
        $alldna .= $_;
    }

That's it!

Some final notes for those readers, who have no or little experience with working with files, using Perl. The variable $alldna, used in the code above, is a long string, that will contain the complete content of the DNA file. This includes the end-of-line markers, terminating the physical lines in text files. This makes it easy to display the string, formatted the same way as is the text file; in the case of the DNA application, to display the formatted sequence (FASTA header in the first line, than the sequence data lines, with the same number of bases per line as in the file). Perhaps, you wonder how to do to fill a HTML text area, as these elements have no value property? The page, generated by the DNA script, is based on a template HTML file, containing custom tags, that the script replaces with the actual values. Here the text area line in the template:
    <td colspan="4" style="padding-top: 10px"><textarea name="dna" id="dna" cols="90" rows="15">#dna#</textarea>
To fill the text area with the actual DNA sequence, the script simply replaces the #dna# tag by the content of the $alldna variable.