root/branches/feature-server/plagger/lib/Plagger.pm

Revision 1009 (checked in by miyagawa, 2 years ago)

r2680@rock (orig r938): miyagawa | 2006-06-09 10:28:17 +0900
add podtrac TruePermalink?. via http://d.hatena.ne.jp/mryfmo/20060608
r2681@rock (orig r939): miyagawa | 2006-06-09 10:30:48 +0900
TruePermalink?: add feedburner podcast redirector. Refs #226
r2682@rock (orig r940): miyagawa | 2006-06-09 16:11:35 +0900
use Last-Modified header to populate entry date, even if handler can't find one.
via http://subtech.g.hatena.ne.jp/otsune/20060608/norkdailymemo
r2683@rock (orig r941): miyagawa | 2006-06-09 16:12:52 +0900
take off utf-8 flag when taking digest value
r2684@rock (orig r942): miyagawa | 2006-06-09 17:04:38 +0900

Publish
CHTML: Don't die if body contains non-sjis mappable characters
r2685@rock (orig r943): miyagawa | 2006-06-09 17:26:01 +0900
defaults to cp932 would be better
r2686@rock (orig r944): miyagawa | 2006-06-09 17:37:37 +0900

r2687@rock (orig r945): miyagawa | 2006-06-09 18:48:15 +0900
add pya.cc upgrader via http://subtech.g.hatena.ne.jp/otsune/20060608/pya2feed
r2688@rock (orig r946): miyagawa | 2006-06-09 21:21:47 +0900
CustomFeed?
2chSearch
r2689@rock (orig r947): miyagawa | 2006-06-09 21:26:31 +0900
oops, remove </b>
r2690@rock (orig r948): miyagawa | 2006-06-09 21:44:42 +0900
fix date if it found true entry
r2691@rock (orig r949): miyagawa | 2006-06-09 21:59:05 +0900
need quotes
r2692@rock (orig r950): miyagawa | 2006-06-09 22:06:35 +0900
Planet: Scrubber support back inlib/Plagger/Plugin/Publish/Planet.pm
r2693@rock (orig r951): miyagawa | 2006-06-09 22:08:01 +0900
oops
r2694@rock (orig r952): otsune | 2006-06-09 22:11:04 +0900
fix extract http://pyc.cc/

r2695@rock (orig r953): otsune | 2006-06-09 22:12:28 +0900
add EntryFulltext? for seesaa blog

r2696@rock (orig r954): otsune | 2006-06-09 23:27:11 +0900
fix %3A

r2697@rock (orig r955): miyagawa | 2006-06-10 02:26:28 +0900
MixiDiarySearch?: decode keyword query
r2698@rock (orig r956): miyagawa | 2006-06-10 02:53:41 +0900
TruePermalink? enbug stuff. Use permlalink to find handlers
r2699@rock (orig r957): otsune | 2006-06-10 03:08:33 +0900
add EntryFulltext? http://headlines.yahoo.co.jp/

r2700@rock (orig r958): otsune | 2006-06-10 04:38:27 +0900
add Apple KB and TIL document

r2701@rock (orig r959): otsune | 2006-06-10 04:43:22 +0900
oops.

r2702@rock (orig r960): miyagawa | 2006-06-10 23:07:48 +0900
set Bloglines n=100
r2703@rock (orig r961): miyagawa | 2006-06-11 01:35:38 +0900
MixiDiarySearch?: allow no_photo.gif
r2704@rock (orig r962): miyagawa | 2006-06-11 01:45:53 +0900
2chSearh: Fix error handling
r2705@rock (orig r963): miyagawa | 2006-06-11 02:07:11 +0900
added takesako-san for his patch
r2706@rock (orig r964): otsune | 2006-06-11 05:59:58 +0900
modified Chugoku SHinbun, add EFT for http://www.zianplus.net/

r2707@rock (orig r965): otsune | 2006-06-11 10:17:02 +0900
add pMachine ExpressionEngine? http://www.pmachine.com/

r2708@rock (orig r966): youpy | 2006-06-11 12:38:21 +0900
fix regexp

r2709@rock (orig r967): otsune | 2006-06-12 04:09:24 +0900
fix extract regexp

r2710@rock (orig r968): otsune | 2006-06-12 04:13:19 +0900
update regexp

r2711@rock (orig r969): otsune | 2006-06-12 04:29:18 +0900
support http://www.mainichi-msn.co.jp/photo/etc/photo_feature/

r2712@rock (orig r970): otsune | 2006-06-12 06:08:15 +0900
fix wordpress.
Add mainichi-msn Photo and separate handle.
Add http://www.actiblog.com/

r2713@rock (orig r971): otsune | 2006-06-12 07:02:23 +0900
refine livedoorblog.pl
fix miss.

r2714@rock (orig r972): miyagawa | 2006-06-12 13:25:28 +0900
extract_title should be case insensitive. via http://d.hatena.ne.jp/sfujiwara/20060611/1150051152
r2715@rock (orig r973): miyagawa | 2006-06-12 13:39:12 +0900
rewrite config doesn't die even if it can't rewrite because of permission problem
r2716@rock (orig r974): miyagawa | 2006-06-12 13:43:25 +0900
skip all livedoorkeyword link
r2719@rock (orig r975): otsune | 2006-06-12 14:50:19 +0900
fix misc regexp

r2720@rock (orig r976): miyagawa | 2006-06-12 15:44:57 +0900
support handle only in livedoorblog.pl to work with aggregated feeds
r2721@rock (orig r977): miyagawa | 2006-06-12 18:22:40 +0900
TruePermalink? for blogpeople redirector
r2722@rock (orig r978): otsune | 2006-06-12 22:14:03 +0900
opps 'Unmatched ( in regex;'

r2723@rock (orig r979): youpy | 2006-06-13 10:21:42 +0900
add mailman upgrader


r2724@rock (orig r980): youpy | 2006-06-13 10:28:19 +0900
fix handle regexp


r2727@rock (orig r983): miyagawa | 2006-06-13 19:00:22 +0900
Subscription
Planet: add feedster.jp
r2728@rock (orig r984): miyagawa | 2006-06-13 19:06:06 +0900
use lang/all on feedster.jp
r2734@rock (orig r985): otsune | 2006-06-13 22:11:21 +0900
fix regexp

r2735@rock (orig r986): miyagawa | 2006-06-14 00:34:01 +0900
new plugin Notify
Beep
r2736@rock (orig r987): miyagawa | 2006-06-14 00:34:40 +0900
planet: remove unnecessary bit
r2737@rock (orig r988): miyagawa | 2006-06-14 00:35:03 +0900
update example to use sixapart-std
r2738@rock (orig r989): otsune | 2006-06-14 02:55:47 +0900
remove icon_re. RecentComment? can't get it

r2745@rock (orig r990): miyagawa | 2006-06-14 12:07:29 +0900
t/core is for developer test and not needed for installers
r2746@rock (orig r991): miyagawa | 2006-06-14 12:49:00 +0900
support mixi_tos_paranoia mode
r2747@rock (orig r992): miyagawa | 2006-06-14 13:10:40 +0900
title would be ok
r2792@rock (orig r993): miyagawa | 2006-06-16 15:04:12 +0900
  • New plugin Subscription::Bookmarks (and its IE subclass) to read IE favorites.
r2793@rock (orig r994): miyagawa | 2006-06-16 15:11:52 +0900
added TODO as comment
r2794@rock (orig r995): youpy | 2006-06-17 20:36:18 +0900
add Plugin::Subscription::Bookmarks
Safari


r2795@rock (orig r996): youpy | 2006-06-17 21:39:18 +0900
add tag support by folder name


r2796@rock (orig r997): youpy | 2006-06-18 15:41:59 +0900
use $uri->file when scheme is 'file'


r2797@rock (orig r998): youpy | 2006-06-18 15:42:56 +0900
add Plugin::Subscription::Bookmarks
Mozilla


r2798@rock (orig r999): miyagawa | 2006-06-19 15:23:13 +0900
bump URI
Fetch req
r2800@rock (orig r1000): miyagawa | 2006-06-22 00:26:46 +0900
dependency for Bookmarks
Safari. 1000th commit!
r2801@rock (orig r1001): miyagawa | 2006-06-22 00:30:57 +0900
fix config rewriting bug when the password contains regexp metachars. via http://d.hatena.ne.jp/sfujiwara/20060621/1150899012
r2802@rock (orig r1002): otsune | 2006-06-22 00:54:24 +0900
add http://www.computerworld.jp/ http://autopage.teacup.com/
fix headlines_yahoo_jp (Thanks woremacx)
fix goo blog

r2803@rock (orig r1003): miyagawa | 2006-06-22 01:10:00 +0900
import drawnboy's EntryFullText? yamls via http://svn.nowherenear.net/repos/public/misc/eft/
r2804@rock (orig r1004): miyagawa | 2006-06-22 01:10:39 +0900
update AUTHOR
r2805@rock (orig r1005): s_nobu | 2006-06-22 06:17:15 +0900
require HTML
Entities for enclosure support.

r2807@rock (orig r1006): miyagawa | 2006-06-22 15:46:30 +0900
URI
Fetch 0.07 is broken (i was a moron), reverting back to 0.06 for now
r2808@rock (orig r1007): miyagawa | 2006-06-22 16:04:48 +0900
packaging 0.7.3
  • Property svn:keywords set to Id Revision
Line 
1 package Plagger;
2 use strict;
3 our $VERSION = '0.7.3';
4
5 use 5.8.1;
6 use Carp;
7 use Data::Dumper;
8 use File::Copy;
9 use File::Basename;
10 use File::Find::Rule;
11 use YAML;
12 use UNIVERSAL::require;
13
14 use base qw( Class::Accessor::Fast );
15 __PACKAGE__->mk_accessors( qw(conf update subscription plugins_path cache) );
16
17 use Plagger::Cache;
18 use Plagger::CacheProxy;
19 use Plagger::Date;
20 use Plagger::Entry;
21 use Plagger::Feed;
22 use Plagger::Subscription;
23 use Plagger::Template;
24 use Plagger::Update;
25
26 sub context { undef }
27
28 sub bootstrap {
29     my($class, %opt) = @_;
30
31     my $self = bless {
32         conf  => {},
33         update => Plagger::Update->new,
34         subscription => Plagger::Subscription->new,
35         plugins_path => {},
36         plugins => [],
37         rewrite_tasks => []
38     }, $class;
39
40     my $config;
41     if (-e $opt{config} && -r _) {
42         $config = YAML::LoadFile($opt{config});
43         $self->{config_path} = $opt{config};
44     } elsif (ref($opt{config}) && ref($opt{config}) eq 'SCALAR') {
45         $config = YAML::Load(${$opt{config}});
46     } elsif (ref($opt{config}) && ref($opt{config}) eq 'HASH') {
47         $config = $opt{config};
48     } else {
49         croak "Plagger->bootstrap: $opt{config}: $!";
50     }
51
52     $self->load_include($config);
53     $self->{conf} = $config->{global};
54     $self->{conf}->{log} ||= { level => 'debug' };
55
56     no warnings 'redefine';
57     local *Plagger::context = sub { $self };
58
59     $self->load_recipes($config);
60     $self->load_cache($opt{config});
61     $self->load_plugins(@{ $config->{plugins} || [] });
62     $self->rewrite_config if @{ $self->{rewrite_tasks} };
63     $self->run();
64 }
65
66 sub add_rewrite_task {
67     my($self, @stuff) = @_;
68     push @{ $self->{rewrite_tasks} }, \@stuff;
69 }
70
71 sub rewrite_config {
72     my $self = shift;
73
74     unless ($self->{config_path}) {
75         $self->log(warn => "config is not loaded from file. Ignoring rewrite tasks.");
76         return;
77     }
78
79     open my $fh, $self->{config_path} or $self->error("$self->{config_path}: $!");
80     my $data = join '', <$fh>;
81     close $fh;
82
83     my $old = $data;
84     my $count;
85
86     # xxx this is a quick hack: It should be a YAML roundtrip maybe
87     for my $task (@{ $self->{rewrite_tasks} }) {
88         my($key, $old_value, $new_value ) = @$task;
89         if ($data =~ s/^(\s+$key:\s+)\Q$old_value\E[ \t]*$/$1$new_value/m) {
90             $count++;
91         } else {
92             $self->log(error => "$key: $old_value not found in $self->{config_path}");
93         }
94     }
95
96     if ($count) {
97         File::Copy::copy( $self->{config_path}, $self->{config_path} . ".bak" );
98         open my $fh, ">", $self->{config_path} or return $self->log(error => "$self->{config_path}: $!");
99         print $fh $data;
100         close $fh;
101
102         $self->log(info => "Rewrote $count password(s) and saved to $self->{config_path}");
103     }
104 }
105
106 sub load_include {
107     my($self, $config) = @_;
108
109     return unless $config->{include};
110     for (@{ $config->{include} }) {
111         my $include = YAML::LoadFile($_);
112
113         for my $key (keys %{ $include }) {
114             my $add = $include->{$key};
115             unless ($config->{$key}) {
116                 $config->{$key} = $add;
117                 next;
118             }
119             if (ref($config->{$key}) eq 'HASH') {
120                 next unless ref($add) eq 'HASH';
121                 for (keys %{ $include->{$key} }) {
122                     $config->{$key}->{$_} = $include->{$key}->{$_};
123                 }
124             } elsif (ref($include->{$key}) eq 'ARRAY') {
125                 $add = [ $add ] unless ref($add) eq 'ARRAY';
126                 push(@{ $config->{$key} }, @{ $include->{$key} });
127             } elsif ($add) {
128                 $config->{$key} = $add;
129             }
130         }
131     }
132 }
133
134 sub load_recipes {
135     my($self, $config) = @_;
136
137     for (@{ $config->{recipes} }) {
138         $self->error("no such recipe to $_") unless $config->{define_recipes}->{$_};
139         my $plugin = $config->{define_recipes}->{$_};
140         $plugin = [ $plugin ] unless ref($plugin) eq 'ARRAY';
141         push(@{ $config->{plugins} }, @{ $plugin });
142     }
143 }
144
145 sub load_cache {
146     my($self, $config) = @_;
147
148     # use config filename as a base directory for cache
149     my $base = ( basename($config) =~ /^(.*?)\.yaml$/ )[0] || 'config';
150     my $dir  = $base eq 'config' ? ".plagger" : ".plagger-$base";
151
152     $self->{conf}->{cache} ||= {
153         base => File::Spec->catfile($ENV{HOME}, $dir),
154     };
155
156     $self->cache( Plagger::Cache->new($self->{conf}->{cache}) );
157 }
158
159 sub load_plugins {
160     my($self, @plugins) = @_;
161
162     if ($self->conf->{plugin_path}) {
163         for my $path (@{ $self->conf->{plugin_path} }) {
164             opendir my $dir, $path or do {
165                 $self->log(warn => "$path: $!");
166                 next;
167             };
168             while (my $ent = readdir $dir) {
169                 next if $ent =~ /^\./;
170                 $ent = File::Spec->catfile($path, $ent);
171                 if (-f $ent && $ent =~ /\.pm$/) {
172                     $self->add_plugin_path($ent);
173                 } elsif (-d $ent) {
174                     my $lib = File::Spec->catfile($ent, "lib");
175                     if (-e $lib && -d _) {
176                         $self->log(debug => "Add $lib to INC path");
177                         unshift @INC, $lib;
178                     } else {
179                         my $rule = File::Find::Rule->new;
180                            $rule->file;
181                            $rule->name('*.pm');
182                         my @modules = $rule->in($ent);
183                         for my $module (@modules) {
184                             $self->add_plugin_path($module);
185                         }
186                     }
187                 }
188             }
189         }
190     }
191
192     for my $plugin (@plugins) {
193         $self->load_plugin($plugin) unless $plugin->{disable};
194     }
195 }
196
197 sub add_plugin_path {
198     my($self, $file) = @_;
199
200     my $pkg = $self->extract_package($file)
201         or die "Can't find package from $file";
202     $self->plugins_path->{$pkg} = $file;
203     $self->log(debug => "$file is added as a path to plugin $pkg");
204 }
205
206 sub extract_package {
207     my($self, $file) = @_;
208
209     open my $fh, $file or die "$file: $!";
210     while (<$fh>) {
211         /^package (Plagger::Plugin::.*?);/ and return $1;
212     }
213
214     return;
215 }
216
217 sub autoload_plugin {
218     my($self, $plugin) = @_;
219     unless ($self->is_loaded($plugin)) {
220         $self->load_plugin({ module => $plugin });
221     }
222 }
223
224 sub is_loaded {
225     my($self, $stuff) = @_;
226
227     my $sub = ref $stuff && ref $stuff eq 'Regexp'
228         ? sub { $_[0] =~ $stuff }
229         : sub { $_[0] eq $stuff };
230
231     for my $plugin (@{ $self->{plugins} }) {
232         my $module = ref $plugin;
233            $module =~ s/^Plagger::Plugin:://;
234         return 1 if $sub->($module);
235     }
236
237     return;
238 }
239
240 sub load_plugin {
241     my($self, $config) = @_;
242
243     my $module = delete $config->{module};
244     $module =~ s/^Plagger::Plugin:://;
245     $module = "Plagger::Plugin::$module";
246
247     if ($module->isa('Plagger::Plugin')) {
248         $self->log(debug => "$module is loaded elsewhere ... maybe .t script?");
249     } elsif (my $path = $self->plugins_path->{$module}) {
250         eval { require $path } or die $@;
251     } else {
252         $module->require or die $@;
253     }
254
255     $self->log(info => "plugin $module loaded.");
256
257     my $plugin = $module->new($config);
258     $plugin->cache( Plagger::CacheProxy->new($plugin, $self->cache) );
259     $plugin->register($self);
260
261     push @{$self->{plugins}}, $plugin;
262 }
263
264 sub register_hook {
265     my($self, $plugin, @hooks) = @_;
266     while (my($hook, $callback) = splice @hooks, 0, 2) {
267         # set default rule_hook $hook to $plugin
268         $plugin->rule_hook($hook) unless $plugin->rule_hook;
269
270         push @{ $self->{hooks}->{$hook} }, +{
271             callback  => $callback,
272             plugin    => $plugin,
273         };
274     }
275 }
276
277 sub run_hook {
278     my($self, $hook, $args, $once) = @_;
279     for my $action (@{ $self->{hooks}->{$hook} }) {
280         my $plugin = $action->{plugin};
281         if ( $plugin->rule->dispatch($plugin, $hook, $args) ) {
282             my $done = $action->{callback}->($plugin, $self, $args);
283             return 1 if $once && $done;
284         }
285     }
286
287     # if $once is set, here means not executed = fail
288     return if $once;
289 }
290
291 sub run_hook_once {
292     my($self, $hook, $args) = @_;
293     $self->run_hook($hook, $args, 1);
294 }
295
296 sub run {
297     my $self = shift;
298
299     $self->run_hook('plugin.init');
300     $self->run_hook('subscription.load');
301
302     unless ( $self->is_loaded(qr/^Aggregator::/) ) {
303         $self->load_plugin({ module => 'Aggregator::Simple' });
304     }
305
306     for my $feed ($self->subscription->feeds) {
307         if (my $sub = $feed->aggregator) {
308             $sub->($self, { feed => $feed });
309         } else {
310             my $ok = $self->run_hook_once('customfeed.handle', { feed => $feed });
311             if (!$ok) {
312                 Plagger->context->log(error => $feed->url . " is not aggregated by any aggregator");
313                 Plagger->context->subscription->delete_feed($feed);
314             }
315         }
316     }
317
318     $self->run_hook('aggregator.finalize');
319
320     for my $feed ($self->update->feeds) {
321         for my $entry ($feed->entries) {
322             $self->run_hook('update.entry.fixup', { feed => $feed, entry => $entry });
323         }
324         $self->run_hook('update.feed.fixup', { feed => $feed });
325     }
326
327     $self->run_hook('update.fixup');
328
329     $self->run_hook('smartfeed.init');
330     for my $feed ($self->update->feeds) {
331         for my $entry ($feed->entries) {
332             $self->run_hook('smartfeed.entry', { feed => $feed, entry => $entry });
333         }
334         $self->run_hook('smartfeed.feed', { feed => $feed });
335     }
336     $self->run_hook('smartfeed.finalize');
337
338     $self->run_hook('publish.init');
339     for my $feed ($self->update->feeds) {
340         for my $entry ($feed->entries) {
341             $self->run_hook('publish.entry.fixup', { feed => $feed, entry => $entry });
342         }
343
344         $self->run_hook('publish.feed', { feed => $feed });
345
346         for my $entry ($feed->entries) {
347             $self->run_hook('publish.entry', { feed => $feed, entry => $entry });
348         }
349     }
350
351     $self->run_hook('publish.finalize');
352 }
353
354 sub log {
355     my($self, $level, $msg, %opt) = @_;
356
357     # hack to get the original caller as Plugin or Rule
358     my $caller = $opt{caller};
359     unless ($caller) {
360         my $i = 0;
361         while (my $c = caller($i++)) {
362             last if $c !~ /Plugin|Rule/;
363             $caller = $c;
364         }
365         $caller ||= caller(0);
366     }
367
368     chomp($msg);
369     if ($self->should_log($level)) {
370         warn "$caller [$level] $msg\n";
371     }
372 }
373
374 my %levels = (
375     debug => 0,
376     warn  => 1,
377     info  => 2,
378     error => 3,
379 );
380
381 sub should_log {
382     my($self, $level) = @_;
383     $levels{$level} >= $levels{$self->conf->{log}->{level}};
384 }
385
386 sub error {
387     my($self, $msg) = @_;
388     my($caller, $filename, $line) = caller(0);
389     chomp($msg);
390     die "$caller [fatal] $msg at line $line\n";
391 }
392
393 sub dumper {
394     my($self, $stuff) = @_;
395     local $Data::Dumper::Indent = 1;
396     $self->log(debug => Dumper($stuff));
397 }
398
399 sub template {
400     my $self = shift;
401     my $plugin = shift || (caller)[0];
402     Plagger::Template->new($self, $plugin->class_id);
403 }
404
405 sub templatize {
406     my($self, $plugin, $file, $vars) = @_;
407     my $tt = $self->template($plugin);
408     $tt->process($file, $vars, \my $out) or $self->error($tt->error);
409     $out;
410 }
411
412
413 1;
414 __END__
415
416 =head1 NAME
417
418 Plagger - Pluggable RSS/Atom Aggregator
419
420 =head1 SYNOPSIS
421
422   % plagger -c config.yaml
423
424 =head1 DESCRIPTION
425
426 Plagger is a pluggable RSS/Atom feed aggregator and remixer platform.
427
428 Everything is implemented as a small plugin just like qpsmtpd, blosxom
429 and perlbal. All you have to do is write a flow of aggregation,
430 filters, syndication, publishing and notification plugins in config
431 YAML file.
432
433 See L<http://plagger.org/> for cookbook examples, quickstart document,
434 development community (Mailing List and IRC), subversion repository
435 and bug tracking.
436
437 =head1 BUGS / DEVELOPMENT
438
439 If you find any bug, or you have an idea of nice plugin and want help
440 on it, drop us a line to our mailing list
441 L<http://groups.google.com/group/plagger-dev> or stop by the IRC
442 channel C<#plagger> at irc.freenode.net.
443
444 =head1 AUTHOR
445
446 Tatsuhiko Miyagawa E<lt>miyagawa@bulknews.netE<gt>
447
448 See I<AUTHORS> file for the name of all the contributors.
449
450 =head1 LICENSE
451
452 Except where otherwise noted, Plagger is free software; you can
453 redistribute it and/or modify it under the same terms as Perl itself.
454
455 =head1 SEE ALSO
456
457 L<http://plagger.org/>
458
459 =cut
Note: See TracBrowser for help on using the browser.