344 lines
11 KiB
Perl
344 lines
11 KiB
Perl
package PerlIO;
|
|
|
|
our $VERSION = '1.04';
|
|
|
|
# Map layer name to package that defines it
|
|
our %alias;
|
|
|
|
sub import
|
|
{
|
|
my $class = shift;
|
|
while (@_)
|
|
{
|
|
my $layer = shift;
|
|
if (exists %alias{$layer})
|
|
{
|
|
$layer = %alias{$layer}
|
|
}
|
|
else
|
|
{
|
|
$layer = "{$class}::$layer";
|
|
}
|
|
eval "require $layer";
|
|
warn "failed loading $layer\: {$@->message}" if $@;
|
|
}
|
|
}
|
|
|
|
sub F_UTF8 () { 0x8000 }
|
|
|
|
1;
|
|
__END__
|
|
|
|
=head1 NAME
|
|
|
|
PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files
|
|
|
|
open($fh,"<","his.jpg"); # portably open a binary file for reading
|
|
binmode($fh);
|
|
|
|
Shell:
|
|
PERLIO=perlio perl ....
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
When an undefined layer 'foo' is encountered in an C<open> or
|
|
C<binmode> layer specification then C code performs the equivalent of:
|
|
|
|
use PerlIO 'foo';
|
|
|
|
The perl code in PerlIO.pm then attempts to locate a layer by doing
|
|
|
|
require PerlIO::foo;
|
|
|
|
Otherwise the C<PerlIO> package is a place holder for additional
|
|
PerlIO related functions.
|
|
|
|
The following layers are currently defined:
|
|
|
|
=over 4
|
|
|
|
=item :unix
|
|
|
|
Lowest level layer which provides basic PerlIO operations in terms of
|
|
UNIX/POSIX numeric file descriptor calls
|
|
(open(), read(), write(), lseek(), close()).
|
|
|
|
=item :stdio
|
|
|
|
Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note
|
|
that as this is "real" stdio it will ignore any layers beneath it and
|
|
got straight to the operating system via the C library as usual.
|
|
|
|
=item :perlio
|
|
|
|
A from scratch implementation of buffering for PerlIO. Provides fast
|
|
access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt>
|
|
and in general attempts to minimize data copying.
|
|
|
|
C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
|
|
|
|
=item :crlf
|
|
|
|
A layer that implements DOS/Windows like CRLF line endings. On read
|
|
converts pairs of CR,LF to a single "\n" newline character. On write
|
|
converts each "\n" to a CR,LF pair. Note that this layer likes to be
|
|
one of its kind: it silently ignores attempts to be pushed into the
|
|
layer stack more than once.
|
|
|
|
It currently does I<not> mimic MS-DOS as far as treating of Control-Z
|
|
as being an end-of-file marker.
|
|
|
|
(Gory details follow) To be more exact what happens is this: after
|
|
pushing itself to the stack, the C<:crlf> layer checks all the layers
|
|
below itself to find the first layer that is capable of being a CRLF
|
|
layer but is not yet enabled to be a CRLF layer. If it finds such a
|
|
layer, it enables the CRLFness of that other deeper layer, and then
|
|
pops itself off the stack. If not, fine, use the one we just pushed.
|
|
|
|
The end result is that a C<:crlf> means "please enable the first CRLF
|
|
layer you can find, and if you can't find one, here would be a good
|
|
spot to place a new one."
|
|
|
|
Based on the C<:perlio> layer.
|
|
|
|
=item :mmap
|
|
|
|
A layer which implements "reading" of files by using C<mmap()> to
|
|
make (whole) file appear in the process's address space, and then
|
|
using that as PerlIO's "buffer". This I<may> be faster in certain
|
|
circumstances for large files, and may result in less physical memory
|
|
use when multiple processes are reading the same file.
|
|
|
|
Files which are not C<mmap()>-able revert to behaving like the C<:perlio>
|
|
layer. Writes also behave like C<:perlio> layer as C<mmap()> for write
|
|
needs extra house-keeping (to extend the file) which negates any advantage.
|
|
|
|
The C<:mmap> layer will not exist if platform does not support C<mmap()>.
|
|
|
|
=item :utf8
|
|
|
|
Declares that the stream accepts perl's I<internal> encoding of
|
|
characters. (Which really is UTF-8 on ASCII machines, but is
|
|
UTF-EBCDIC on EBCDIC machines.) This allows any character perl can
|
|
represent to be read from or written to the stream. The UTF-X encoding
|
|
is chosen to render simple text parts (i.e. non-accented letters,
|
|
digits and common punctuation) human readable in the encoded file.
|
|
|
|
Here is how to write your native data out using UTF-8 (or UTF-EBCDIC)
|
|
and then read it back in.
|
|
|
|
open(F, ">:utf8", "data.utf");
|
|
print F $out;
|
|
close(F);
|
|
|
|
open(F, "<:utf8", "data.utf");
|
|
$in = <F>;
|
|
close(F);
|
|
|
|
Note that this layer does not validate byte sequences. For reading
|
|
input, using C<:encoding(utf8)> instead of bare C<:utf8>, is strongly
|
|
recommended.
|
|
|
|
=item :bytes
|
|
|
|
This is the inverse of C<:utf8> layer. It turns off the flag
|
|
on the layer below so that data read from it is considered to
|
|
be "octets" i.e. characters in range 0..255 only. Likewise
|
|
on output perl will warn if a "wide" character is written
|
|
to a such a stream.
|
|
|
|
=item :raw
|
|
|
|
The C<:raw> layer is I<defined> as being identical to calling
|
|
C<binmode($fh)> - the stream is made suitable for passing binary data
|
|
i.e. each byte is passed as-is. The stream will still be
|
|
buffered.
|
|
|
|
In Perl 5.6 and some books the C<:raw> layer (previously sometimes also
|
|
referred to as a "discipline") is documented as the inverse of the
|
|
C<:crlf> layer. That is no longer the case - other layers which would
|
|
alter binary nature of the stream are also disabled. If you want UNIX
|
|
line endings on a platform that normally does CRLF translation, but still
|
|
want UTF-8 or encoding defaults the appropriate thing to do is to add
|
|
C<:perlio> to PERLIO environment variable.
|
|
|
|
The implementation of C<:raw> is as a pseudo-layer which when "pushed"
|
|
pops itself and then any layers which do not declare themselves as suitable
|
|
for binary data. (Undoing :utf8 and :crlf are implemented by clearing
|
|
flags rather than popping layers but that is an implementation detail.)
|
|
|
|
As a consequence of the fact that C<:raw> normally pops layers
|
|
it usually only makes sense to have it as the only or first element in
|
|
a layer specification. When used as the first element it provides
|
|
a known base on which to build e.g.
|
|
|
|
open($fh,":raw:utf8",...)
|
|
|
|
will construct a "binary" stream, but then enable UTF-8 translation.
|
|
|
|
=item :pop
|
|
|
|
A pseudo layer that removes the top-most layer. Gives perl code
|
|
a way to manipulate the layer stack. Should be considered
|
|
as experimental. Note that C<:pop> only works on real layers
|
|
and will not undo the effects of pseudo layers like C<:utf8>.
|
|
An example of a possible use might be:
|
|
|
|
open($fh,...)
|
|
...
|
|
binmode($fh,":encoding(...)"); # next chunk is encoded
|
|
...
|
|
binmode($fh,":pop"); # back to un-encoded
|
|
|
|
A more elegant (and safer) interface is needed.
|
|
|
|
=item :win32
|
|
|
|
On Win32 platforms this I<experimental> layer uses native "handle" IO
|
|
rather than unix-like numeric file descriptor layer. Known to be
|
|
buggy as of perl 5.8.2.
|
|
|
|
=back
|
|
|
|
=head2 Custom Layers
|
|
|
|
It is possible to write custom layers in addition to the above builtin
|
|
ones, both in C/XS and Perl. Two such layers (and one example written
|
|
in Perl using the latter) come with the Perl distribution.
|
|
|
|
=over 4
|
|
|
|
=item :encoding
|
|
|
|
Use C<:encoding(ENCODING)> either in open() or binmode() to install
|
|
a layer that does transparently character set and encoding transformations,
|
|
for example from Shift-JIS to Unicode. Note that under C<stdio>
|
|
an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding>
|
|
for more information.
|
|
|
|
=item :via
|
|
|
|
Use C<:via(MODULE)> either in open() or binmode() to install a layer
|
|
that does whatever transformation (for example compression /
|
|
decompression, encryption / decryption) to the filehandle.
|
|
See L<PerlIO::via> for more information.
|
|
|
|
=back
|
|
|
|
=head2 Alternatives to raw
|
|
|
|
To get a binary stream an alternate method is to use:
|
|
|
|
open($fh,"whatever")
|
|
binmode($fh);
|
|
|
|
this has advantage of being backward compatible with how such things have
|
|
had to be coded on some platforms for years.
|
|
|
|
To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>)
|
|
in the open call:
|
|
|
|
open($fh,"<:unix",$path)
|
|
|
|
=head2 Defaults and how to override them
|
|
|
|
If the platform is MS-DOS like and normally does CRLF to "\n"
|
|
translation for text files then the default layers are :
|
|
|
|
unix crlf
|
|
|
|
(The low level "unix" layer may be replaced by a platform specific low
|
|
level layer.)
|
|
|
|
Otherwise if C<Configure> found out how to do "fast" IO using system's
|
|
stdio, then the default layers are:
|
|
|
|
unix stdio
|
|
|
|
Otherwise the default layers are
|
|
|
|
unix perlio
|
|
|
|
These defaults may change once perlio has been better tested and tuned.
|
|
|
|
The default can be overridden by setting the environment variable
|
|
PERLIO to a space separated list of layers (C<unix> or platform low
|
|
level layer is always pushed first).
|
|
|
|
This can be used to see the effect of/bugs in the various layers e.g.
|
|
|
|
cd .../perl/t
|
|
PERLIO=stdio ./perl harness
|
|
PERLIO=perlio ./perl harness
|
|
|
|
For the various value of PERLIO see L<perlrun/PERLIO>.
|
|
|
|
=head2 Querying the layers of filehandles
|
|
|
|
The following returns the B<names> of the PerlIO layers on a filehandle.
|
|
|
|
my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
|
|
|
|
The layers are returned in the order an open() or binmode() call would
|
|
use them. Note that the "default stack" depends on the operating
|
|
system and on the Perl version, and both the compile-time and
|
|
runtime configurations of Perl.
|
|
|
|
The following table summarizes the default layers on UNIX-like and
|
|
DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>:
|
|
|
|
PERLIO UNIX-like DOS-like
|
|
------ --------- --------
|
|
unset / "" unix perlio / stdio [1] unix crlf
|
|
stdio unix perlio / stdio [1] stdio
|
|
perlio unix perlio unix perlio
|
|
mmap unix mmap unix mmap
|
|
|
|
# [1] "stdio" if Configure found out how to do "fast stdio" (depends
|
|
# on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
|
|
|
|
By default the layers from the input side of the filehandle is
|
|
returned, to get the output side use the optional C<output> argument:
|
|
|
|
my @layers = PerlIO::get_layers($fh, output => 1);
|
|
|
|
(Usually the layers are identical on either side of a filehandle but
|
|
for example with sockets there may be differences, or if you have
|
|
been using the C<open> pragma.)
|
|
|
|
There is no set_layers(), nor does get_layers() return a tied array
|
|
mirroring the stack, or anything fancy like that. This is not
|
|
accidental or unintentional. The PerlIO layer stack is a bit more
|
|
complicated than just a stack (see for example the behaviour of C<:raw>).
|
|
You are supposed to use open() and binmode() to manipulate the stack.
|
|
|
|
B<Implementation details follow, please close your eyes.>
|
|
|
|
The arguments to layers are by default returned in parenthesis after
|
|
the name of the layer, and certain layers (like C<utf8>) are not real
|
|
layers but instead flags on real layers: to get all of these returned
|
|
separately use the optional C<details> argument:
|
|
|
|
my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
|
|
|
|
The result will be up to be three times the number of layers:
|
|
the first element will be a name, the second element the arguments
|
|
(unspecified arguments will be C<undef>), the third element the flags,
|
|
the fourth element a name again, and so forth.
|
|
|
|
B<You may open your eyes now.>
|
|
|
|
=head1 AUTHOR
|
|
|
|
Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt>
|
|
|
|
=head1 SEE ALSO
|
|
|
|
L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
|
|
L<Encode>
|
|
|
|
=cut
|