root/branches/feature-server/plagger/lib/Plagger/Util.pm

Revision 1009 (checked in by miyagawa, 6 years ago)

r2680@rock (orig r938): miyagawa | 2006-06-09 10:28:17 +0900
add podtrac TruePermalink?. via http://d.hatena.ne.jp/mryfmo/20060608
r2681@rock (orig r939): miyagawa | 2006-06-09 10:30:48 +0900
TruePermalink?: add feedburner podcast redirector. Refs #226
r2682@rock (orig r940): miyagawa | 2006-06-09 16:11:35 +0900
use Last-Modified header to populate entry date, even if handler can't find one.
via http://subtech.g.hatena.ne.jp/otsune/20060608/norkdailymemo
r2683@rock (orig r941): miyagawa | 2006-06-09 16:12:52 +0900
take off utf-8 flag when taking digest value
r2684@rock (orig r942): miyagawa | 2006-06-09 17:04:38 +0900

Publish
CHTML: Don't die if body contains non-sjis mappable characters
r2685@rock (orig r943): miyagawa | 2006-06-09 17:26:01 +0900
defaults to cp932 would be better
r2686@rock (orig r944): miyagawa | 2006-06-09 17:37:37 +0900

r2687@rock (orig r945): miyagawa | 2006-06-09 18:48:15 +0900
add pya.cc upgrader via http://subtech.g.hatena.ne.jp/otsune/20060608/pya2feed
r2688@rock (orig r946): miyagawa | 2006-06-09 21:21:47 +0900
CustomFeed?
2chSearch
r2689@rock (orig r947): miyagawa | 2006-06-09 21:26:31 +0900
oops, remove </b>
r2690@rock (orig r948): miyagawa | 2006-06-09 21:44:42 +0900
fix date if it found true entry
r2691@rock (orig r949): miyagawa | 2006-06-09 21:59:05 +0900
need quotes
r2692@rock (orig r950): miyagawa | 2006-06-09 22:06:35 +0900
Planet: Scrubber support back inlib/Plagger/Plugin/Publish/Planet.pm
r2693@rock (orig r951): miyagawa | 2006-06-09 22:08:01 +0900
oops
r2694@rock (orig r952): otsune | 2006-06-09 22:11:04 +0900
fix extract http://pyc.cc/

r2695@rock (orig r953): otsune | 2006-06-09 22:12:28 +0900
add EntryFulltext? for seesaa blog

r2696@rock (orig r954): otsune | 2006-06-09 23:27:11 +0900
fix %3A

r2697@rock (orig r955): miyagawa | 2006-06-10 02:26:28 +0900
MixiDiarySearch?: decode keyword query
r2698@rock (orig r956): miyagawa | 2006-06-10 02:53:41 +0900
TruePermalink? enbug stuff. Use permlalink to find handlers
r2699@rock (orig r957): otsune | 2006-06-10 03:08:33 +0900
add EntryFulltext? http://headlines.yahoo.co.jp/

r2700@rock (orig r958): otsune | 2006-06-10 04:38:27 +0900
add Apple KB and TIL document

r2701@rock (orig r959): otsune | 2006-06-10 04:43:22 +0900
oops.

r2702@rock (orig r960): miyagawa | 2006-06-10 23:07:48 +0900
set Bloglines n=100
r2703@rock (orig r961): miyagawa | 2006-06-11 01:35:38 +0900
MixiDiarySearch?: allow no_photo.gif
r2704@rock (orig r962): miyagawa | 2006-06-11 01:45:53 +0900
2chSearh: Fix error handling
r2705@rock (orig r963): miyagawa | 2006-06-11 02:07:11 +0900
added takesako-san for his patch
r2706@rock (orig r964): otsune | 2006-06-11 05:59:58 +0900
modified Chugoku SHinbun, add EFT for http://www.zianplus.net/

r2707@rock (orig r965): otsune | 2006-06-11 10:17:02 +0900
add pMachine ExpressionEngine? http://www.pmachine.com/

r2708@rock (orig r966): youpy | 2006-06-11 12:38:21 +0900
fix regexp

r2709@rock (orig r967): otsune | 2006-06-12 04:09:24 +0900
fix extract regexp

r2710@rock (orig r968): otsune | 2006-06-12 04:13:19 +0900
update regexp

r2711@rock (orig r969): otsune | 2006-06-12 04:29:18 +0900
support http://www.mainichi-msn.co.jp/photo/etc/photo_feature/

r2712@rock (orig r970): otsune | 2006-06-12 06:08:15 +0900
fix wordpress.
Add mainichi-msn Photo and separate handle.
Add http://www.actiblog.com/

r2713@rock (orig r971): otsune | 2006-06-12 07:02:23 +0900
refine livedoorblog.pl
fix miss.

r2714@rock (orig r972): miyagawa | 2006-06-12 13:25:28 +0900
extract_title should be case insensitive. via http://d.hatena.ne.jp/sfujiwara/20060611/1150051152
r2715@rock (orig r973): miyagawa | 2006-06-12 13:39:12 +0900
rewrite config doesn't die even if it can't rewrite because of permission problem
r2716@rock (orig r974): miyagawa | 2006-06-12 13:43:25 +0900
skip all livedoorkeyword link
r2719@rock (orig r975): otsune | 2006-06-12 14:50:19 +0900
fix misc regexp

r2720@rock (orig r976): miyagawa | 2006-06-12 15:44:57 +0900
support handle only in livedoorblog.pl to work with aggregated feeds
r2721@rock (orig r977): miyagawa | 2006-06-12 18:22:40 +0900
TruePermalink? for blogpeople redirector
r2722@rock (orig r978): otsune | 2006-06-12 22:14:03 +0900
opps 'Unmatched ( in regex;'

r2723@rock (orig r979): youpy | 2006-06-13 10:21:42 +0900
add mailman upgrader


r2724@rock (orig r980): youpy | 2006-06-13 10:28:19 +0900
fix handle regexp


r2727@rock (orig r983): miyagawa | 2006-06-13 19:00:22 +0900
Subscription
Planet: add feedster.jp
r2728@rock (orig r984): miyagawa | 2006-06-13 19:06:06 +0900
use lang/all on feedster.jp
r2734@rock (orig r985): otsune | 2006-06-13 22:11:21 +0900
fix regexp

r2735@rock (orig r986): miyagawa | 2006-06-14 00:34:01 +0900
new plugin Notify
Beep
r2736@rock (orig r987): miyagawa | 2006-06-14 00:34:40 +0900
planet: remove unnecessary bit
r2737@rock (orig r988): miyagawa | 2006-06-14 00:35:03 +0900
update example to use sixapart-std
r2738@rock (orig r989): otsune | 2006-06-14 02:55:47 +0900
remove icon_re. RecentComment? can't get it

r2745@rock (orig r990): miyagawa | 2006-06-14 12:07:29 +0900
t/core is for developer test and not needed for installers
r2746@rock (orig r991): miyagawa | 2006-06-14 12:49:00 +0900
support mixi_tos_paranoia mode
r2747@rock (orig r992): miyagawa | 2006-06-14 13:10:40 +0900
title would be ok
r2792@rock (orig r993): miyagawa | 2006-06-16 15:04:12 +0900
  • New plugin Subscription::Bookmarks (and its IE subclass) to read IE favorites.
r2793@rock (orig r994): miyagawa | 2006-06-16 15:11:52 +0900
added TODO as comment
r2794@rock (orig r995): youpy | 2006-06-17 20:36:18 +0900
add Plugin::Subscription::Bookmarks
Safari


r2795@rock (orig r996): youpy | 2006-06-17 21:39:18 +0900
add tag support by folder name


r2796@rock (orig r997): youpy | 2006-06-18 15:41:59 +0900
use $uri->file when scheme is 'file'


r2797@rock (orig r998): youpy | 2006-06-18 15:42:56 +0900
add Plugin::Subscription::Bookmarks
Mozilla


r2798@rock (orig r999): miyagawa | 2006-06-19 15:23:13 +0900
bump URI
Fetch req
r2800@rock (orig r1000): miyagawa | 2006-06-22 00:26:46 +0900
dependency for Bookmarks
Safari. 1000th commit!
r2801@rock (orig r1001): miyagawa | 2006-06-22 00:30:57 +0900
fix config rewriting bug when the password contains regexp metachars. via http://d.hatena.ne.jp/sfujiwara/20060621/1150899012
r2802@rock (orig r1002): otsune | 2006-06-22 00:54:24 +0900
add http://www.computerworld.jp/ http://autopage.teacup.com/
fix headlines_yahoo_jp (Thanks woremacx)
fix goo blog

r2803@rock (orig r1003): miyagawa | 2006-06-22 01:10:00 +0900
import drawnboy's EntryFullText? yamls via http://svn.nowherenear.net/repos/public/misc/eft/
r2804@rock (orig r1004): miyagawa | 2006-06-22 01:10:39 +0900
update AUTHOR
r2805@rock (orig r1005): s_nobu | 2006-06-22 06:17:15 +0900
require HTML
Entities for enclosure support.

r2807@rock (orig r1006): miyagawa | 2006-06-22 15:46:30 +0900
URI
Fetch 0.07 is broken (i was a moron), reverting back to 0.06 for now
r2808@rock (orig r1007): miyagawa | 2006-06-22 16:04:48 +0900
packaging 0.7.3
Line 
1 package Plagger::Util;
2 use strict;
3 our @ISA = qw(Exporter);
4 our @EXPORT_OK = qw( strip_html dumbnail decode_content extract_title load_uri mime_type_of );
5
6 use Encode ();
7 use List::Util qw(min);
8 use HTML::Entities;
9 use MIME::Types;
10 use MIME::Type;
11
12 our $Detector;
13
14 BEGIN {
15     if ( eval { require Encode::Detect::Detector; 1 } ) {
16         $Detector = sub { Encode::Detect::Detector::detect($_[0]) };
17     } else {
18         require Encode::Guess;
19         $Detector = sub {
20             my @guess = qw(utf-8 euc-jp shift_jis); # xxx japanese only?
21             eval { Encode::Guess::guess_encoding($_[0], @guess)->name };
22         };
23     }
24 }
25
26
27
28 sub strip_html {
29     my $html = shift;
30     $html =~ s/<[^>]*>//g;
31     HTML::Entities::decode($html);
32 }
33
34 sub dumbnail {
35     my($img, $p) = @_;
36
37     if (!$img->{width} && !$img->{height}) {
38         return '';
39     }
40
41     if ($img->{width} <= $p->{width} && $img->{height} <= $p->{height}) {
42         return qq(width="$img->{width}" height="$img->{height}");
43     }
44
45     my $ratio_w = $p->{width}  / $img->{width};
46     my $ratio_h = $p->{height} / $img->{height};
47     my $ratio   = min($ratio_w, $ratio_h);
48
49     sprintf qq(width="%d" height="%d"), ($img->{width} * $ratio), ($img->{height} * $ratio);
50 }
51
52 sub decode_content {
53     my $stuff = shift;
54
55     my $content;
56     my $res;
57     if (ref($stuff) && ref($stuff) eq 'URI::Fetch::Response') {
58         $res     = $stuff;
59         $content = $res->content;
60     } elsif (ref($stuff)) {
61         Plagger->context->error("Don't know how to decode " . ref($stuff));
62     } else {
63         $content = $stuff;
64     }
65
66     my $charset;
67
68     # 1) if it is HTTP response, get charset from HTTP Content-Type header
69     if ($res) {
70         $charset = ($res->content_type =~ /charset=([\w\-]+)/)[0];
71     }
72
73     # 2) if there's not, try XML encoding
74     $charset ||= ( $content =~ /<\?xml version="1.0" encoding="([\w\-]+)"\?>/ )[0];
75
76     # 3) if there's not, try META tag
77     $charset ||= ( $content =~ m!<meta http-equiv="Content-Type" content=".*charset=([\w\-]+)"!i )[0];
78
79     # 4) if there's not still, try Detector/Guess
80     $charset ||= $Detector->($content);
81
82     # 5) falls back to UTF-8
83     $charset ||= 'utf-8';
84
85     my $decoded = eval { Encode::decode($charset, $content) };
86
87     if ($@ && $@ =~ /Unknown encoding/) {
88         Plagger->context->log(warn => $@);
89         $charset = $Detector->($content) || 'utf-8';
90         $decoded = Encode::decode($charset, $content);
91     }
92
93     $decoded;
94 }
95
96 sub extract_title {
97     my $content = shift;
98     my $title = ($content =~ m!<title>\s*(.*?)\s*</title>!is)[0] or return;
99     HTML::Entities::decode($1);
100 }
101
102 sub load_uri {
103     my($uri, $plugin) = @_;
104
105     require Plagger::UserAgent;
106
107     my $data;
108     if (ref($uri) eq 'SCALAR') {
109         $data = $$uri;
110     }
111     elsif ($uri->scheme =~ /^https?$/) {
112         Plagger->context->log(debug => "Fetch remote file from $uri");
113
114         my $response = Plagger::UserAgent->new->fetch($uri, $plugin);
115         if ($response->is_error) {
116             Plagger->context->log(error => "GET $uri failed: " .
117                                   $response->http_status . " " .
118                                   $response->http_response->message);
119         }
120         $data = decode_content($response);
121     }
122     elsif ($uri->scheme eq 'file') {
123         Plagger->context->log(debug => "Open local file " . $uri->file);
124         open my $fh, '<', $uri->file
125             or Plagger->context->error( $uri->file . ": $!" );
126         $data = decode_content(join '', <$fh>);
127     }
128     else {
129         Plagger->context->error("Unsupported URI scheme: " . $uri->scheme);
130     }
131
132     return $data;
133 }
134
135 our $mimetypes = MIME::Types->new;
136 $mimetypes->addType( MIME::Type->new(type => 'video/x-flv', extensions => [ 'flv' ]) );
137 $mimetypes->addType( MIME::Type->new(type => 'audio/aac', extensions => [ 'm4a', '.aac' ]) );
138
139 sub mime_type_of {
140     my $ext = shift;
141
142     if (UNIVERSAL::isa($ext, 'URI')) {
143         $ext = ( $ext->path =~ /\.(\w+)/ )[0];
144     }
145
146     return unless $ext;
147     return $mimetypes->mimeTypeOf($ext);
148 }
149
150 1;
Note: See TracBrowser for help on using the browser.