recently fixed TODO items
creole is a wanna to be standard markup for all wikis.
It's an agreement archived by many wiki engine developers.
Currently MoinMoin and Oddmuse supports it. And a lot of wikis (dokuwiki, tiddlywiki, pmwiki. podwiki, etc) have partial support. More info on support: http://www.wikicreole.org/wiki/Engines
Some useful infomation:
And there is a perl module: Text::WikiCreole
Syntax file for vim: http://www.peter-hoffmann.com/code/vim/ (Since a typical ikiwiki user usually use external editors.
Should be pretty easy to add a plugin to do it using > Text::WikiCreole. --Joey
done
Posted Thu Jun 19 19:01:36 2008The bzr plug echoes "added: somefile.mdwn" when it adds somefile.mdwn to the repository. As a result, the redirect performed after a new article is created fails because the bzr output comes before the HTTP headers.
The fix is simply to call bzr
with the --quiet switch. Something like this applied to bzr.pm works for me:
46c46
< my @cmdline = ("bzr", $config{srcdir}, "update");
---
> my @cmdline = ("bzr", "update", "--quiet", $config{srcdir});
74c74
< my @cmdline = ("bzr", "commit", "-m", $message, "--author", $user,
---
> my @cmdline = ("bzr", "commit", "--quiet", "-m", $message, "--author", $user,
86c86
< my @cmdline = ("bzr", "add", "$config{srcdir}/$file");
---
> my @cmdline = ("bzr", "add", "--quiet", "$config{srcdir}/$file");
94a95,97
> eval q{use CGI 'escapeHTML'};
> error($@) if $@;
>
Posted Tue Jun 10 13:46:01 2008done, although I left off the escapeHTML thing which seems to be in your patch by accident.
(Please use diff -u BTW..) --Joey
The search plugin could use xapian terms to allow some special searches. For example, "title:foo", or "link:somepage", or "author:foo", or "copyright:GPL".
Reference: http://xapian.org/docs/omega/termprefixes.html
done for title and link, which seem like the really useful ones.
Posted Tue Jun 10 13:46:01 2008The aggregate plugin's locking is a suboptimal.
There should be no need to lock the wiki while aggregating -- it's annoying that long aggregate runs can block edits from happening. However, not locking would present problems. One is, if an aggregate run is happening, and the feed is removed, it could continue adding pages for that feed. Those pages would then become orphaned, and stick around, since the feed that had created them is gone, and thus there's no indication that they should be removed.
To fix that, garbage collect any pages that were created by aggregation once their feed is gone.
Are there other things that could happen while it's aggregating that it should check for?
Well, things like the feed url etc could change, and it would have to merge in such changes before saving the aggregation state. New feeds could also be added, feeds could be moved from one source page to another.
Merging that feed info seems doable, just re-load the aggregation state
from disk, and set the message
, lastupdate
, numposts
, and error
fields to their new values if the feed still exists.
Another part of the mess is that it needs to avoid stacking multiple aggregate processes up if aggregation is very slow. Currently this is done by taking the lock in nonblocking mode, and not aggregating if it's locked. This has various problems, for example a page edit at the right time can prevent aggregation from happening.
Adding another lock just for aggregation could solve this. Check that lock (in checkconfig) and exit if another aggregator holds it.
The other part of the mess is that it currently does aggregation in checkconfig, locking the wiki for that, and loading state, and then dropping the lock, unloading state, and letting the render happen. Which reloads state. That state reloading is tricky to do just right.
A simple fix: Move the aggregation to the new 'render' hook. Then state would be loaded, and there would be no reason to worry about aggregating.
Or aggregation could be kept in checkconfig, like so:
- load aggregation state
- get list of feeds needing aggregation
- exit if none
- attempt to take aggregation lock, exit if another aggregation is happening
- fork a child process to do the aggregation
- load wiki state (needed for aggregation to run)
- aggregate
- lock wiki
- reload aggregation state
- merge in aggregation state changes
- unlock wiki
- drop aggregation lock
- force rebuild of sourcepages of feeds that were aggregated
- exit checkconfig and continue with usual refresh process
done
Posted Tue Jun 10 13:46:00 2008done, using xapian-omega! --Joey
After using it for a while, my feeling is that hyperestraier, as used in the search plugin, is not robust enough for ikiwiki. It doesn't upgrade well, and it has a habit of sig-11 on certain input from time to time.
So some other engine should be found and used instead.
Enrico had one that he was using for debtags stuff that looked pretty good. That was Xapian, which has perl bindings in libsearch-xapian-perl. The nice thing about xapian is that it does a ranked search so it understands what words are most important in a search. (So does Lucene..) Another nice thing is it supports "more documents like this one" kind of search. --Joey
xapian
I've invesitgated xapian briefly. I think a custom xapian indexer and use of the omega for cgi searches could work well for ikiwiki. --Joey
indexer
A custom indexer is needed because omindex isn't good enough for ikiwiki's needs for incremental rendering. (And because, since ikiwiki has page info in memory, it's silly to write it to disk and have omindex read it back.)
The indexer would run as a ikiwiki hook. It needs to be passed the page name, and the content. Which hook to use is an open question. Possibilities:
filter
- Since this runs before preprocess, only the actual text written on the page would be indexed. Not text generated by directives, pulled in by inlining, etc. There's something to be said for that. And something to be said against it. It would also get markdown formatted content, mostly, though it would still need to strip html, and also probably strip preprocessor directives too.sanitize
- Would get the htmlized content, so would need to strip html. Preprocessor directive output would be indexed. Doesn't get a destpage parameter, making optimisation hard.format
- Would get the entire html page, including the page template. Probably not a good choice as indexing the same template for each page is unnecessary.
The hook would remove any html from the content, and index it. It would need to add the same document data that omindex would.
The indexer (and deleter) will need a way to figure out the ids in xapian of the documents to delete. One way is storing the id of each page in the ikiwiki index.
The other way would be adding a special term to the xapian db that can be used with replacedocumentbyterm/deletedocumentbyterm. Hmm, let's use a term named "P".
The hook should try to avoid re-indexing pages that have not changed since
they were last indexed. One problem is that, if a page with an inline is
built, every inlined item will get each hook run. And so a naive hook would
index each of those items, even though none of them have necessarily
changed. Date stamps are one possibility. Another would be to avoid having
the hook not do any indexing when %preprocessing
is set (Ikiwiki.pm would
need to expose that variable.) Another approach would be to use a
needsbuild hook and only index the pages that are being built.
cgi
The cgi hook would exec omega to handle the searching, much as is done with estseek in the current search plugin.
It would first set OMEGA_CONFIG_FILE=.ikiwiki/omega.conf
; that omega.conf
would set database_dir=.ikiwiki/xapian
and probably also set a custom
template_dir
, which would have modified templates branded for ikiwiki. So
the actual xapian db would be in .ikiwiki/xapian/default/
.
lucene
I've done a bit of prototyping on this. The current hip search library is Lucene. There's a Perl port called Plucene. Given that it's already packaged, as
libplucene-perl
, I assumed it would be a good starting point. I've written a very rough patch againstIkiWiki/Plugin/search.pm
to handle the indexing side (there's no facility to view the results yet, although I have a command-line interface working). That's below, and should apply to SVN trunk.Of course, there are problems.
- Plucene throws up a warning when running under Taint mode. There's a patch on the mailing list, but I haven't tried applying it yet. So for now you'll have to build IkiWiki with
NOTAINT=1 make install
.- If I kill
ikiwiki
while it's indexing, I can screw up Plucene's locks. I suspect that this will be an easy fix.There is a C++ port of Lucene which is packaged as
libclucene0
. The Perl interface to this is called Lucene. This is supposed to be significantly faster, and presumably won't have the taint bug. The API is virtually the same, so it will be easy to switch over. I'd use this now, were it not for the lack of package. (I assume you won't want to make core functionality depend on installing a module from CPAN). I've never built a Debian package before, so I can either learn then try building this, or somebody else could do the honours.If this seems a sensible approach, I'll write the CGI interface, and clean up the plugin. -- Ben
The weird thing about lucene is that these are all reimplmentations of it. Thank you java.. The C++ version seems like a better choice to me (packages are trivial). --Joey
Might I suggest renaming the "search" plugin to "hyperestraier", and then creating new search plugins for different engines? No reason to pick a single replacement. --JoshTriplett
Index: IkiWiki/Plugin/search.pm =================================================================== --- IkiWiki/Plugin/search.pm (revision 2755) +++ IkiWiki/Plugin/search.pm (working copy) @@ -1,33 +1,55 @@ #!/usr/bin/perl -# hyperestraier search engine plugin package IkiWiki::Plugin::search; use warnings; use strict; use IkiWiki; +use Plucene::Analysis::SimpleAnalyzer; +use Plucene::Document; +use Plucene::Document::Field; +use Plucene::Index::Reader; +use Plucene::Index::Writer; +use Plucene::QueryParser; +use Plucene::Search::HitCollector; +use Plucene::Search::IndexSearcher; + +#TODO: Run the Plucene optimiser after a rebuild +#TODO: CGI query interface + +my $PLUCENE_DIR; +# $config{wikistatedir} may not be defined at this point, so we delay setting $PLUCENE_DIR +# until a subroutine actually needs it. +sub init () { + error("Plucene: Statedir <$config{wikistatedir}> does not exist!") + unless -e $config{wikistatedir}; + $PLUCENE_DIR = $config{wikistatedir}.'/plucene'; +} + sub import { #{{{ - hook(type => "getopt", id => "hyperestraier", - call => \&getopt); - hook(type => "checkconfig", id => "hyperestraier", + hook(type => "checkconfig", id => "plucene", call => \&checkconfig); - hook(type => "pagetemplate", id => "hyperestraier", - call => \&pagetemplate); - hook(type => "delete", id => "hyperestraier", + hook(type => "delete", id => "plucene", call => \&delete); - hook(type => "change", id => "hyperestraier", + hook(type => "change", id => "plucene", call => \&change); - hook(type => "cgi", id => "hyperestraier", - call => \&cgi); } # }}} -sub getopt () { #{{{ - eval q{use Getopt::Long}; - error($@) if $@; - Getopt::Long::Configure('pass_through'); - GetOptions("estseek=s" => \$config{estseek}); -} #}}} +sub writer { + init(); + return Plucene::Index::Writer->new( + $PLUCENE_DIR, Plucene::Analysis::SimpleAnalyzer->new(), + (-e "$PLUCENE_DIR/segments" ? 0 : 1)); +} + +#TODO: Better name for this function. +sub src2rendered_abs (@) { + return map { Encode::encode_utf8($config{destdir}."/$_") } + map { @{$renderedfiles{pagename($_)}} } + grep { defined pagetype($_) } @_; +} + sub checkconfig () { #{{{ foreach my $required (qw(url cgiurl)) { if (! length $config{$required}) { @@ -36,112 +58,55 @@ } } #}}} -my $form; -sub pagetemplate (@) { #{{{ - my %params=@_; - my $page=$params{page}; - my $template=$params{template}; +#my $form; +#sub pagetemplate (@) { #{{{ +# my %params=@_; +# my $page=$params{page}; +# my $template=$params{template}; +# +# # Add search box to page header. +# if ($template->query(name => "searchform")) { +# if (! defined $form) { +# my $searchform = template("searchform.tmpl", blind_cache => 1); +# $searchform->param(searchaction => $config{cgiurl}); +# $form=$searchform->output; +# } +# +# $template->param(searchform => $form); +# } +#} #}}} - # Add search box to page header. - if ($template->query(name => "searchform")) { - if (! defined $form) { - my $searchform = template("searchform.tmpl", blind_cache => 1); - $searchform->param(searchaction => $config{cgiurl}); - $form=$searchform->output; - } - - $template->param(searchform => $form); - } -} #}}} - sub delete (@) { #{{{ - debug(gettext("cleaning hyperestraier search index")); - estcmd("purge -cl"); - estcfg(); + debug("Plucene: purging: ".join(',',@_)); + init(); + my $reader = Plucene::Index::Reader->open($PLUCENE_DIR); + my @files = src2rendered_abs(@_); + for (@files) { + $reader->delete_term( Plucene::Index::Term->new({ field => "id", text => $_ })); + } + $reader->close; } #}}} sub change (@) { #{{{ - debug(gettext("updating hyperestraier search index")); - estcmd("gather -cm -bc -cl -sd", - map { - Encode::encode_utf8($config{destdir}."/".$_) - foreach @{$renderedfiles{pagename($_)}}; - } @_ - ); - estcfg(); + debug("Plucene: updating search index"); + init(); + #TODO: Do we want to index source or rendered files? + #TODO: Store author, tags, etc. in distinct fields; may need new API hook. + my @files = src2rendered_abs(@_); + my $writer = writer(); + + for my $file (@files) { + my $doc = Plucene::Document->new; + $doc->add(Plucene::Document::Field->Keyword(id => $file)); + my $data; + eval { $data = readfile($file) }; + if ($@) { + debug("Plucene: can't read <$file> - $@"); + next; + } + debug("Plucene: indexing <$file> (".length($data).")"); + $doc->add(Plucene::Document::Field->UnStored('text' => $data)); + $writer->add_document($doc); + } } #}}} - -sub cgi ($) { #{{{ - my $cgi=shift; - - if (defined $cgi->param('phrase') || defined $cgi->param("navi")) { - # only works for GET requests - chdir("$config{wikistatedir}/hyperestraier") || error("chdir: $!"); - exec("./".IkiWiki::basename($config{cgiurl})) || error("estseek.cgi failed"); - } -} #}}} - -my $configured=0; -sub estcfg () { #{{{ - return if $configured; - $configured=1; - - my $estdir="$config{wikistatedir}/hyperestraier"; - my $cgi=IkiWiki::basename($config{cgiurl}); - $cgi=~s/\..*$//; - - my $newfile="$estdir/$cgi.tmpl.new"; - my $cleanup = sub { unlink($newfile) }; - open(TEMPLATE, ">:utf8", $newfile) || error("open $newfile: $!", $cleanup); - print TEMPLATE IkiWiki::misctemplate("search", - "\n\n\n\n\n\n", - baseurl => IkiWiki::dirname($config{cgiurl})."/") || - error("write $newfile: $!", $cleanup); - close TEMPLATE || error("save $newfile: $!", $cleanup); - rename($newfile, "$estdir/$cgi.tmpl") || - error("rename $newfile: $!", $cleanup); - - $newfile="$estdir/$cgi.conf"; - open(TEMPLATE, ">$newfile") || error("open $newfile: $!", $cleanup); - my $template=template("estseek.conf"); - eval q{use Cwd 'abs_path'}; - $template->param( - index => $estdir, - tmplfile => "$estdir/$cgi.tmpl", - destdir => abs_path($config{destdir}), - url => $config{url}, - ); - print TEMPLATE $template->output || error("write $newfile: $!", $cleanup); - close TEMPLATE || error("save $newfile: $!", $cleanup); - rename($newfile, "$estdir/$cgi.conf") || - error("rename $newfile: $!", $cleanup); - - $cgi="$estdir/".IkiWiki::basename($config{cgiurl}); - unlink($cgi); - my $estseek = defined $config{estseek} ? $config{estseek} : '/usr/lib/estraier/estseek.cgi'; - symlink($estseek, $cgi) || error("symlink $estseek $cgi: $!"); -} # }}} - -sub estcmd ($;@) { #{{{ - my @params=split(' ', shift); - push @params, "-cl", "$config{wikistatedir}/hyperestraier"; - if (@_) { - push @params, "-"; - } - - my $pid=open(CHILD, "|-"); - if ($pid) { - # parent - foreach (@_) { - print CHILD "$_\n"; - } - close(CHILD) || print STDERR "estcmd @params exited nonzero: $?\n"; - } - else { - # child - open(STDOUT, "/dev/null"); # shut it up (closing won't work) - exec("estcmd", @params) || error("can't run estcmd"); - } -} #}}} - -1 +1;Posted Tue Jun 10 13:46:00 2008
Supporting or switching to MultiMarkdown would take care of a few of the outstanding feature requests. Quoting from the MultiMarkdown site:
MultiMarkdown is a modification of John Gruber's original Markdown.pl file. It uses the same basic syntax, with several additions:
I have added a basic metadata feature, to allow the inclusion of metadata within a document that can be used in different ways based on the output format.
I have allowed the automatic use of cross-references within a Markdown document. For instance, you can easily jump to [the Introduction][Introduction].
I have incorporated John's proposed syntax for footnotes. Since he has not determined the output format, I created my own. Mainly, I wanted to be able to add footnotes to the LaTeX output; I was less concerned with the XHTML formatting.
Most importantly, however, I have changed the way that the processed output is created, so that it is quite simple to export Markdown syntax to a variety of outputs. By setting the
Format
metadata tocomplete
, you generate a well-formed XHTML page. You can then use XSLT to convert to virtually any format you like.
MultiMarkdown would solve the BibTex request and the multiple output formats would make the print_link request an easy fix. MultiMarkdown is actively developed and can be found at:
Posted Tue Jun 10 13:46:00 2008I don't think MultiMarkdown solves the BibTeX request, but it might solve the request for LaTeX output. --JoshTriplett
Unless there's a way to disable a zillion of the features, please no. Do not switch to it. One thing that I like about markdown as opposed to most other ASCII markup languages, is that it has at least a bit of moderation on the syntax (although it could be even simpler). There's not a yet another reserved character lurking behind every corner. Not so in multimarkdown anymore. Footnotes, bibliography and internal references I could use, and they do not add any complex syntax: it's all inside the already reserved sequences of bracketed stuff. (If you can even say that ASCII markup languages have reserved sequences, as they randomly decide to interpret stuff, never actually failing on illegal input, like a proper language to write any serious documentation in, would do.) But tables, math, and so on, no thanks! Too much syntax! Syntax overload! Bzzzt! I don't want mischievous syntaxes lurking behind every corner, out to get me. --tuomov
ikiwiki already supports MultiMarkdown, since it has the same API as MarkDown. So if you install it as Markdown.pm (or as /usr/bin/markdown), it should Just Work. It would also be easy to support some other extension such as mmdwn to use multimarkdown installed as MuliMarkdown.pm, if someone wanted to do that for some reason -- just copy the mdwn plugin and lightly modify. --Joey
There's now a multimarkdown setup file option that uses Text::MultiMarkdown for .mdwn files. done --Joey
It would be nice if the sure could set the timezone of the wiki, and have ikiwiki render the pages with that timezone.
This is nice for shared hosting, and other situation where the user doesn't have control over the server timezone.
Posted Tue Jun 10 13:46:00 2008done via the ENV setting in the setup file. --Joey
Expand a comment so you know which bit to uncomment if you want to turn on feeds for recentchanges.
diff --git a/doc/ikiwiki.setup b/doc/ikiwiki.setup
index 99c81cf..7ca7687 100644
--- a/doc/ikiwiki.setup
+++ b/doc/ikiwiki.setup
@@ -91,9 +91,9 @@ use IkiWiki::Setup::Standard {
#},
],
- # Default to generating rss feeds for blogs?
+ # Default to generating rss feeds for blogs/recentchanges?
#rss => 1,
- # Default to generating atom feeds for blogs?
+ # Default to generating atom feeds for blogs/recentchanges?
#atom => 1,
# Allow generating feeds even if not generated by default?
#allowrss => 1,
Posted Tue Jun 10 13:46:00 2008Hmm, recentchanges is just a blog. Of course the word "blog" is perhaps being used in too broad a sense here, since it tends to imply personal opinions, commentary, not-a-journalist, sitting-in-ones-underwear-typing, and lots of other fairly silly stuff. But I don't know of a better word w/o all these connotations. I've reworded it to not use the term "blog".. done --Joey
In some situations, it makes sense to have the repository in use by ikiwiki reside on a different machine. In that case, one could juggle SSH keys for the post-update
hook. A better way may be to provide a different do
parameter handler for the CGI, which would pull new commits to the working clone and refresh the wiki. Then, the remote post-update
hook could just wget
that URL. To prevent simple DoS attacks, one might assign a simple password.
Posted Tue Jun 10 13:45:59 2008done via the pinger and pingee plugins --Joey
I'd like the ability to use a shortcut, but declare an explicit link text
rather than using the link text defined on shortcuts. For example, if I
create a shortcut protogit
pointing to files in the xcb/proto.git gitweb
repository, I don't always want to use the path to the file as the link text;
I would like to src/xcb.xsd, but use the link text "XML Schema for the X
Window System protocol". --JoshTriplett
If I understand you correctly, you can use Markdown [your link text](the path or URL) . Using your example: XML Schema for the X Window System protocol
If I don't understand this, can you give an HTML example? --JeremyReed
The problem is like that in shortcuts don't escape from Markdown. We would like to use the shortcuts plugin but add a descriptive text -- in this case [[xcbgit src/xcb.xsd|XML Schema...]] The file src/xcb.xsd could be any url, and the point of shortcuts is that you get to shorten it. --Ethan
Some clarifications: You can always write something like
[XML Schema for the X Window System Protocol](http://gitweb.freedesktop.org/?p=xcb/proto.git;a=blob;hb=HEAD;f=src/xcb.xsd)
to get XML Schema for the X Window System Protocol. However, I want to define a shortcut to save the typing. If I define something likeprotogit
pointing tohttp://gitweb.freedesktop.org/?p=xcb/proto.git;a=blob;hb=HEAD;f=%s
, then I can write[[protogit src/xcb.xsd]]
; however, I then can't change the link text to anything other than what the shortcut defines as the link text. I want to write something like[[XML Schema for the X Window System Protocol|protogit src/xcb.xsd]]
, just as I would write a wikilink like[[the_shortcuts_on_this_wiki|shortcuts]]
to get the shortcuts on this wiki. (The order you suggest, with the preprocessor directive first, seems quite confusing since wikilinks work the other way around.) --JoshTriplettHow about [xcbgit XML_Schema|src/xcb.xsd]. That's the same way round as a wikilink, if you look at it the right way. The syntax Josh suggests is not currently possible in ikiwiki.
However.. Short wikilinks has some similar objectives in a way, and over there a similar syntax to what Josh proposes was suggested. So maybe I should modify how ikiwiki preprocessors work to make it doable. Although, I seem to have come up with a clear alternative syntax over there. --Joey
One possible alternative, would be a general [[url ]]
scheme for all kinds of links. As mentioned in Short wikilinks, I have wanted a way to enter links to the wiki with markdown-style references,
specifying the actual target elsewhere from the text, with just a short reference in the text. To facilitate automatic conversion from earlier (already markdownised) "blog", I finally ended up writing a custom plugin that simply gets the location of wikipage, and use markdown mechanisms:
Here [is][1] a link.
[1]: [[l a_page_in_the_wiki]]
Obviously [this]([[l another_page]]) also works, although the syntax is quite cumbersome.
So that the 'l' plugin inserts the location the page there, and markdown does the rest. My plugin currently fails if it can't find the page, as that is sufficient for my needs. Differing colouring for non-existing pages is not doable in a straightforward manner with this approach.
For external links, that is no concern, however. So you could define for each shortcut an alternative directive, that inserts the URL. Perhaps [[url shortcutname params]]
or \[[@shortcutname params]]
(if the preprocessor supported the @), and this could be extended to local links in an obvious manner: [[url page]]
or [[@page]]
. Now, if you could just get rid off the parantheses for markdown, for the short inline links --tuomov (who'd really rather not have two separate linking mechanisms: ikiwiki's heavy syntax and markdown's lighter one).
I've added code to make the [[foo 123]] syntax accept a desc
parameter. I've named it like this to signal that it overrides the
desc provided at description time. %s
is expanded here as well.
done -- Adeodato Simó
Posted Sun Jun 8 00:14:52 2008