Major updates to text-slicer plugin

* In the interests of performance and expressiveness, switched to using a Sax parser instead of a DOM implementation.
* Use extensible declarative rules to control the slicing process
* Added new optional set of rules for slicing by heading, where the paragraphs underneath a heading are packed into the same tiddler as the heading
* Added a modal dialogue for specifying parameters when slicing in the browser
This commit is contained in:
Jermolene 2017-12-14 14:16:54 +00:00
parent f128650c6e
commit e344c38349
39 changed files with 2943 additions and 713 deletions

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,7 @@
"description": "Tools for slicing up long texts into individual tiddlers",
"plugins": [
"tiddlywiki/text-slicer",
"tiddlywiki/xmldom"
"tiddlywiki/sax"
],
"languages": [
],

View File

@ -0,0 +1,41 @@
The ISC License
Copyright (c) Isaac Z. Schlueter and Contributors
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR
IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
====
`String.fromCodePoint` by Mathias Bynens used according to terms of MIT
License, as follows:
Copyright Mathias Bynens <https://mathiasbynens.be/>
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -0,0 +1,225 @@
# sax js
A sax-style parser for XML and HTML.
Designed with [node](http://nodejs.org/) in mind, but should work fine in
the browser or other CommonJS implementations.
## What This Is
* A very simple tool to parse through an XML string.
* A stepping stone to a streaming HTML parser.
* A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
docs.
## What This Is (probably) Not
* An HTML Parser - That's a fine goal, but this isn't it. It's just
XML.
* A DOM Builder - You can use it to build an object model out of XML,
but it doesn't do that out of the box.
* XSLT - No DOM = no querying.
* 100% Compliant with (some other SAX implementation) - Most SAX
implementations are in Java and do a lot more than this does.
* An XML Validator - It does a little validation when in strict mode, but
not much.
* A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
masochism.
* A DTD-aware Thing - Fetching DTDs is a much bigger job.
## Regarding `<!DOCTYPE`s and `<!ENTITY`s
The parser will handle the basic XML entities in text nodes and attribute
values: `&amp; &lt; &gt; &apos; &quot;`. It's possible to define additional
entities in XML by putting them in the DTD. This parser doesn't do anything
with that. If you want to listen to the `ondoctype` event, and then fetch
the doctypes, and read the entities and add them to `parser.ENTITIES`, then
be my guest.
Unknown entities will fail in strict mode, and in loose mode, will pass
through unmolested.
## Usage
```javascript
var sax = require("./lib/sax"),
strict = true, // set to false for html-mode
parser = sax.parser(strict);
parser.onerror = function (e) {
// an error happened.
};
parser.ontext = function (t) {
// got some text. t is the string of text.
};
parser.onopentag = function (node) {
// opened a tag. node has "name" and "attributes"
};
parser.onattribute = function (attr) {
// an attribute. attr has "name" and "value"
};
parser.onend = function () {
// parser stream is done, and ready to have more stuff written to it.
};
parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
// stream usage
// takes the same options as the parser
var saxStream = require("sax").createStream(strict, options)
saxStream.on("error", function (e) {
// unhandled errors will throw, since this is a proper node
// event emitter.
console.error("error!", e)
// clear the error
this._parser.error = null
this._parser.resume()
})
saxStream.on("opentag", function (node) {
// same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.xml")
.pipe(saxStream)
.pipe(fs.createWriteStream("file-copy.xml"))
```
## Arguments
Pass the following arguments to the parser function. All are optional.
`strict` - Boolean. Whether or not to be a jerk. Default: `false`.
`opt` - Object bag of settings regarding string formatting. All default to `false`.
Settings supported:
* `trim` - Boolean. Whether or not to trim text and comment nodes.
* `normalize` - Boolean. If true, then turn any whitespace into a single
space.
* `lowercase` - Boolean. If true, then lowercase tag names and attribute names
in loose mode, rather than uppercasing them.
* `xmlns` - Boolean. If true, then namespaces are supported.
* `position` - Boolean. If false, then don't track line/col/position.
* `strictEntities` - Boolean. If true, only parse [predefined XML
entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
(`&amp;`, `&apos;`, `&gt;`, `&lt;`, and `&quot;`)
## Methods
`write` - Write bytes onto the stream. You don't have to do this all at
once. You can keep writing as much as you want.
`close` - Close the stream. Once closed, no more data may be written until
it is done processing the buffer, which is signaled by the `end` event.
`resume` - To gracefully handle errors, assign a listener to the `error`
event. Then, when the error is taken care of, you can call `resume` to
continue parsing. Otherwise, the parser will not continue while in an error
state.
## Members
At all times, the parser object will have the following members:
`line`, `column`, `position` - Indications of the position in the XML
document where the parser currently is looking.
`startTagPosition` - Indicates the position where the current tag starts.
`closed` - Boolean indicating whether or not the parser can be written to.
If it's `true`, then wait for the `ready` event to write again.
`strict` - Boolean indicating whether or not the parser is a jerk.
`opt` - Any options passed into the constructor.
`tag` - The current tag being dealt with.
And a bunch of other stuff that you probably shouldn't touch.
## Events
All events emit with a single argument. To listen to an event, assign a
function to `on<eventname>`. Functions get executed in the this-context of
the parser object. The list of supported events are also in the exported
`EVENTS` array.
When using the stream interface, assign handlers using the EventEmitter
`on` function in the normal fashion.
`error` - Indication that something bad happened. The error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. By
listening to this event, you can keep an eye on that kind of stuff. Note:
this happens *much* more in strict mode. Argument: instance of `Error`.
`text` - Text node. Argument: string of text.
`doctype` - The `<!DOCTYPE` declaration. Argument: doctype string.
`processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument:
object with `name` and `body` members. Attributes are not parsed, as
processing instructions have implementation dependent semantics.
`sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`
would trigger this kind of event. This is a weird thing to support, so it
might go away at some point. SAX isn't intended to be used to parse SGML,
after all.
`opentagstart` - Emitted immediately when the tag name is available,
but before any attributes are encountered. Argument: object with a
`name` field and an empty `attributes` set. Note that this is the
same object that will later be emitted in the `opentag` event.
`opentag` - An opening tag. Argument: object with `name` and `attributes`.
In non-strict mode, tag names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, then it will contain
namespace binding information on the `ns` member, and will have a
`local`, `prefix`, and `uri` member.
`closetag` - A closing tag. In loose mode, tags are auto-closed if their
parent closes. In strict mode, well-formedness is enforced. Note that
self-closing tags will have `closeTag` emitted immediately after `openTag`.
Argument: tag name.
`attribute` - An attribute node. Argument: object with `name` and `value`.
In non-strict mode, attribute names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, it will also contains namespace
information.
`comment` - A comment node. Argument: the string of the comment.
`opencdata` - The opening tag of a `<![CDATA[` block.
`cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can get
quite large, this event may fire multiple times for a single block, if it
is broken up into multiple `write()`s. Argument: the string of random
character data.
`closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block.
`opennamespace` - If the `xmlns` option is set, then this event will
signal the start of a new namespace binding.
`closenamespace` - If the `xmlns` option is set, then this event will
signal the end of a namespace binding.
`end` - Indication that the closed stream has ended.
`ready` - Indication that the stream has reset, and is ready to be written
to.
`noscript` - In non-strict mode, `<script>` tags trigger a `"script"`
event, and their contents are not checked for special xml characters.
If you pass `noscript: true`, then this behavior is suppressed.
## Reporting Problems
It's best to write a failing test if you find an issue. I will always
accept pull requests with failing tests if they demonstrate intended
behavior, but it is very hard to figure out what issue you're describing
without a test. Writing a test is also the best way for you yourself
to figure out if you really understand the issue you think you have with
sax-js.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,18 @@
{
"tiddlers": [
{
"file": "lib/sax.js",
"fields": {
"type": "application/javascript",
"title": "$:/plugins/tiddlywiki/sax/sax.js",
"module-type": "library"
}
},{
"file": "LICENSE",
"fields": {
"type": "text/plain",
"title": "$:/plugins/tiddlywiki/sax/license"
}
}
]
}

View File

@ -0,0 +1,7 @@
{
"title": "$:/plugins/tiddlywiki/sax",
"description": "Wrapper for sax.js library by Isaac Z. Schlueter",
"author": "Isaac Z. Schlueter",
"core-version": ">=5.0.0",
"list": "readme license"
}

View File

@ -0,0 +1,3 @@
title: $:/plugins/tiddlywiki/sax/readme
This plugin packages [[sax.js|https://github.com/isaacs/sax-js]] for use by other plugins. It does not provide any end-user visible features.

View File

@ -0,0 +1,7 @@
title: $:/plugins/tiddlywiki/text-slicer/docs/exporters
tags: $:/plugins/tiddlywiki/text-slicer/docs
caption: Exporters
Documents can be saved under Node.js, or previewed in the browser.
[TBD]

View File

@ -0,0 +1,97 @@
title: $:/plugins/tiddlywiki/text-slicer/docs/internals
tags: $:/plugins/tiddlywiki/text-slicer/docs
caption: Internals
! Introduction
The slicing process is performed by a simple automaton that scans the document and applies simple declarative rules to yield a collection of tiddlers.
The automaton processes the incoming XML document starting with the root element and then recursively visits each child node and their children. Actions are triggered as each component of the document is encountered:
* Opening tags of elements
* Closing tags of elements
* Text nodes
Components are matched against the current set of rules to determine what actions should be performed. They can include a combination of:
* Starting a new tiddler with specified fields
* Rendering the markup for the current tag into the current tiddler
* Appending the content of the current text node to the current tiddler
* Threading tiddlers to their parents using a combination of the `list` and `tags` fields
! Slicing State Data
As the automaton performs its scan, it maintains the following state information:
* ''chunks'' - an array of tiddlers without titles, addressed by their numeric index. The title field is reused to hold the plain text of the chunk that is later used to generate the final title for the tiddler
* ''currentChunk'' - the numeric index of the chunk currently being filled, or `null` if there is no current chunk
* ''parentStack'' - a stack of parent chunks stored as `{chunk: <chunk-index>, actions: <actions>}`
At the start, the special document chunk is created and pushed onto the stack of parent chunks
! Slicing Rules
Slicing rules are maintained in tiddlers tagged `$:/tags/text-slicer/slicer-rules` with the following fields:
* ''title'' - title of the tiddler containing the listof rules
* ''name'' - short, human readable name for the set of rules
* ''inherits-from'' - (optional) the ''name'' field of another set of rules that should be inherited as a base
* ''text'' - JSON data as described below
The JSON data is an array of rules, each of which is an object with the following fields:
* ''selector'' - a selector string identifying the components to be matched by this rule
* ''actions'' - an object describing the actions to be performed when this selector matches a tag
!! Selectors
The selector format is a simplified form of CSS selectors. They are specified as follows:
* A ''selector'' is a list of one or more ''match expressions'' separated by commas. The rule is triggered if any of the match expressions produce a positive match
* A ''match expression'' is a list of one or element ''tag names'' separated by spaces. The rule is triggered if the final tag name in the list matches the tag of the current element, and all of the preceding tags in the expression exist as ancestors of the current element in the specified order (but not necessarily as immediate children of one another)
* A ''tag name'' is the textual name of an element
* Tag names in match expressions may optionally be separated by a `>` sign surrounded by spaces to impose the requirement that the left hand element be the immediate parent of the right hand element
!!! Example Selectors
This XML document will be used to illustrate some examples:
```
<a>
<b>
<d>one</d>
</b>
<c>
<d>two</d>
<e>
three
<e>
four
</e>
</e>
</c>
</a>
```
|!Selector |!Matches |
|b |Matches the single `<b>` element |
|d |Matches both of the two `<d>` elements |
|c,d |Matches the `<c>` element and both of the two `<d>` elements |
|c d |Matches the second of the two `<d>` elements |
|a d |Matches both of the two `<d>` elements |
|a > d |Doesn't match anything |
|e |Matches both of the two `<e>` elements |
|c > e |Matches the outermost of the two `<e>` elements |
|e > e |Matches the innermost of the two `<e>` elements |
!! Actions
The ''action'' property of a slicer rule is an object that can have any of the following optional fields:
* ''startNewChunk'' - causes a new chunk to be started on encountering an opening tag. The value is an object containing the fields to be assigned to the new chunk
* ''isParent'' - causes the new chunk to be marked as a child of the current chunk (boolean flag; only applies if ''startNewChunk'' is set)
* ''headingLevel'' - arrange heading parents according to level (numerical index; only applies if ''startNewChunk'' and ''isParent'' are set)
* ''dontRenderTag'' - disables the default rendering of opening and closing tags to the current chunk. By default the tags are rendered as XML tags, but this can be overridden via ''markup'' (boolean; defaults to ''false'')
* ''isImage'' - identifies an element as representing an HTML image element, with special processing for the ''src'' attribute
* ''markup'' - optional object with either or both of `{wiki: {prefix: <str>,suffix: <str>}}` and `{html: {prefix: <str>,suffix: <str>}}` allowing the rendered tags to be customised

View File

@ -1,61 +1,6 @@
title: $:/plugins/tiddlywiki/text-slicer/docs
! Introduction
This plugin contains tools to help work with documents that are structured as a hierarchical outline of tiddlers. The structural relationships within the document are expressed as lists: for example, headings have a list specifying the content to be shown under the heading.
TiddlyWiki is built on the philosophy that text is easier to re-use and work with if it is sliced up into separate chunks that can be independently manipulated, and then woven back together to make up stories and narratives for publication.
The components within the text slicer plugin include:
* ''the slicer'', a tool that slices up an existing monolithic document according to the headings, lists and paragraphs. It is available as a toolbar button for the browser, or as a command for use under Node.js
* ''document preview column'', a new sidebar on the left that shows the full text of any documents in the wiki and allows individual tiddlers to be opened with a click
* ''exporters'' for exporting the individual documents as HTML files (and for previewing them)
! Slicing Monolithic Documents
The tool can slice any tiddler that can be rendered as HTML, including both WikiText and HTML itself.
Documents created with Microsoft Word will need to be first converted to HTML. The library [[mammoth.js|https://github.com/mwilliamson/mammoth.js]] is recommended for this purpose.
!! Browser
In the browser, you can slice a monolithic document tiddler using the slicer toolbar button.
!! Node.js
The `--slice` command allows a tiddler to be sliced under Node.js:
```
tiddlywiki mywiki --slice SourceDocument --build index
```
! Working with the Document Preview Column
The document preview column appears at the left side of the screen. The content of headings can be collapsed and expanded to help navigation. Clicking on a tiddler opens the corresponding tiddler in the main story river.
Clicking ''Show toolbar'' causes each tiddler be preceded by a toolbar showing the underlying title. It can be edited directly to rename the tiddler. References to the tiddler in the ''tags'' and ''list'' are automatically updated to reflect the change, but note that links to the tiddler will not be automatically changed.
The following theme tweaks should be applied to enable the preview column:
* Set [[story left position|$:/themes/tiddlywiki/vanilla/metrics/storyleft]] to ''400px'' (or more)
* It is recommended to also set the [[sidebar layout|$:/themes/tiddlywiki/vanilla/options/sidebarlayout]] to ''fluid-fixed''.
! Exporting Documents
Documents can be saved under Node.js, or previewed in the browser.
!! Exporting Documents in the Browser
To preview a document, locate it in the preview column and click the button labelled "View document". The document will open in plain text in a new window. The window will be automatically updated as you work on the document.
!! Exporting Documents under Node.js
[TBD]
! Sliced Document Format
!! Introduction
title: $:/plugins/tiddlywiki/text-slicer/docs/model
tags: $:/plugins/tiddlywiki/text-slicer/docs
caption: Document Model
Individual tiddlers are created for each heading, paragraph and list item. They are linked together into a hierarchical outline using lists.
@ -186,9 +131,3 @@ Notes are available during editing but hidden for static renderings. The slicing
* ''title'': an automatically generated unique title
* ''text'': the text of the note
* ''tags'': any CSS classes found in the HTML are converted into tags
! Document Metadata, Tags and Classes
[TBD]

View File

@ -0,0 +1,14 @@
title: $:/plugins/tiddlywiki/text-slicer/docs/preview
tags: $:/plugins/tiddlywiki/text-slicer/docs
caption: Preview
The document preview column appears at the left side of the screen. The content of headings can be collapsed and expanded to help navigation. Clicking on a tiddler opens the corresponding tiddler in the main story river.
Clicking ''Show toolbar'' causes each tiddler to be preceded by a toolbar showing the underlying title. It can be edited directly to rename the tiddler. References to the tiddler in the ''tags'' and ''list'' are automatically updated to reflect the change, but note that links to the tiddler will not be automatically changed.
The following theme tweaks should be applied to enable the preview column:
* Set [[story left position|$:/themes/tiddlywiki/vanilla/metrics/storyleft]] to ''400px'' (or more)
* It is recommended to also set the [[sidebar layout|$:/themes/tiddlywiki/vanilla/options/sidebarlayout]] to ''fluid-fixed''.
To preview the entire document in a separate window, locate it in the preview column and click the button labelled "View document". The document will open in plain text in a new window. The window will be automatically updated as you work on the document.

View File

@ -0,0 +1,19 @@
title: $:/plugins/tiddlywiki/text-slicer/docs/usage
tags: $:/plugins/tiddlywiki/text-slicer/docs
caption: Usage
The tool can slice any tiddler that can be rendered as HTML, including both WikiText and HTML itself.
Documents created with Microsoft Word will need to be first converted to HTML. The library [[mammoth.js|https://github.com/mwilliamson/mammoth.js]] is recommended for this purpose.
!! Browser
In the browser, you can slice a monolithic document tiddler using the slicer toolbar button.
!! Node.js
The `--slice` command allows a tiddler to be sliced under Node.js:
```
tiddlywiki mywiki --slice SourceDocument --build index
```

View File

@ -0,0 +1,19 @@
title: $:/plugins/tiddlywiki/text-slicer/docs
list: $:/plugins/tiddlywiki/text-slicer/docs/usage $:/plugins/tiddlywiki/text-slicer/docs/preview $:/plugins/tiddlywiki/text-slicer/docs/model $:/plugins/tiddlywiki/text-slicer/docs/exporters $:/plugins/tiddlywiki/text-slicer/docs/internals
! Introduction
This plugin contains tools to help work with documents that are structured as a hierarchical outline of tiddlers. The structural relationships within the document are expressed through the `list` and `tags` fields: for example, headings have a list specifying the chunks of content to be shown under the heading.
The major components within the text slicer plugin include:
* ''the slicer'', a tool that slices up an existing monolithic document according to the headings, lists and paragraphs. It is available as a toolbar button for the browser, or as a command for use under Node.js
* ''document preview column'', a new sidebar on the left that shows the full text of any documents in the wiki and allows individual tiddlers to be opened with a click
* ''templates'' for previewing and exporting the individual documents as HTML files
Minor components include:
* a new `list-children` filter that returns all the descendents listed in the `list` field of the selected tiddlers
* a new canned filter for [[advanced search|$:/AdvancedSearch]] that lists orphans tiddlers that are not part of any document
<<tabs "[all[tiddlers+shadows]tag[$:/plugins/tiddlywiki/text-slicer/docs]!has[draft.of]]" "$:/plugins/tiddlywiki/text-slicer/docs/usage">>

View File

@ -0,0 +1,6 @@
title: $:/plugins/tiddlywiki/text-slicer/readme
This plugin contains tools to help slice up long texts into individual tiddlers. It currently works directly with XHTML documents and with Microsoft Word compatible DOCX documents via conversion to HTML.
It is an expression of the philosophy of TiddlyWiki: that text is easier to re-use and work with if it is sliced up into separate chunks that can be independently manipulated, and then woven back together to make up stories and narratives for publication.

View File

@ -1,3 +0,0 @@
title: $:/plugins/tiddlywiki/text-slicer/exporters/full-doc
{{||$:/plugins/tiddlywiki/text-slicer/templates/static/document}}

View File

@ -4,5 +4,18 @@ description: Slice a hierarchical document into individual tiddlers
Slices the specified tiddler
```
--slice <title>
--slice <source-title> [<dest-title>] [<slicer-rules>] [<output-mode>]
```
* ''source-title'': Title of the tiddler to be sliced
* ''dest-title'': Base title for the generated output tiddlers
* ''slicer-rules'': Name of the slicer rules to use for the operation (see below)
* ''output-mode'': "html" vs "wiki"
The plugin comes with several built-in sets of slicer rules:
* //html-by-paragraph//: Slice every paragraph into a separate tiddler, threaded by heading
* //html-by-heading//: Slice every heading into separate threaded tiddlers
* //html-plain-paragraphs//: Slice every paragraph into a separate tiddler, without formatting or headings
Advanced users can create or edit their own slicer rules for precise control over the conversion process

View File

@ -34,13 +34,22 @@ Command.prototype.execute = function() {
wiki = this.commander.wiki,
sourceTitle = this.params[0],
destTitle = this.params[1],
slicerRules = this.params[2],
outputMode = this.params[3],
slicer = new textSlicer.Slicer({
sourceTiddlerTitle: sourceTitle,
baseTiddlerTitle: destTitle,
wiki: wiki
slicerRules: slicerRules,
outputMode: outputMode,
wiki: wiki,
callback: function(err,tiddlers) {
if(err) {
return self.callback(err);
}
wiki.addTiddlers(tiddlers);
self.callback();
}
});
wiki.addTiddlers(slicer.getTiddlers());
$tw.utils.nextTick(this.callback);
return null;
};

View File

@ -0,0 +1,189 @@
title: $:/plugins/tiddlywiki/text-slicer/slicer-rules/html-by-heading.json
name: html-by-heading
caption: By Heading (HTML)
description: One tiddler per heading, threaded (HTML)
inherits-from: html-by-paragraph
type: application/json
tags: $:/tags/text-slicer/slicer-rules
[
{
"selector": "address,center,fieldset,form,hr,iframe,isindex,noframes,noscript,ol,ul,li,pre,table",
"actions": {}
},
{
"selector": "blockquote",
"actions": {
"markup": {
"wiki": {
"prefix": "<<<\n",
"suffix": "<<<\n"
}
}
}
},
{
"selector": "dd",
"actions": {
"markup": {
"wiki": {
"prefix": "\n: ",
"suffix": "\n"
}
}
}
},
{
"selector": "dl",
"actions": {
"markup": {
"wiki": {
"prefix": "\n",
"suffix": "\n"
}
}
}
},
{
"selector": "dt",
"actions": {
"markup": {
"wiki": {
"prefix": "\n; ",
"suffix": "\n"
}
}
}
},
{
"selector": "h1",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h1"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 1,
"markup": {
"wiki": {
"prefix": "! ",
"suffix": "\n"
}
}
}
},
{
"selector": "h2",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h2"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 2,
"markup": {
"wiki": {
"prefix": "!! ",
"suffix": "\n"
}
}
}
},
{
"selector": "h3",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h3"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 3,
"markup": {
"wiki": {
"prefix": "!!! ",
"suffix": "\n"
}
}
}
},
{
"selector": "h4",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h4"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 4,
"markup": {
"wiki": {
"prefix": "!!!! ",
"suffix": "\n"
}
}
}
},
{
"selector": "h5",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h5"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 5,
"markup": {
"wiki": {
"prefix": "!!!!! ",
"suffix": "\n"
}
}
}
},
{
"selector": "h6",
"actions": {
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h6"
},
"mergeNext": true,
"setCaption": true,
"isParent": true,
"headingLevel": 6,
"markup": {
"wiki": {
"prefix": "!!!!!! ",
"suffix": "\n"
}
}
}
},
{
"selector": "p",
"actions": {
"markup": {
"wiki": {
"prefix": "",
"suffix": "\n"
}
}
}
},
{
"selector": "*",
"actions": {
"dontRenderTag": true
}
}
]

View File

@ -0,0 +1,265 @@
title: $:/plugins/tiddlywiki/text-slicer/slicer-rules/html-by-paragraph.json
name: html-by-paragraph
caption: By Paragraph (HTML)
description: One tiddler per paragraph, threaded by heading (HTML)
type: application/json
tags: $:/tags/text-slicer/slicer-rules
[
{
"selector": "address,center,fieldset,form,hr,iframe,isindex,noframes,noscript,pre,table",
"actions": {
"startNewChunk": {
"toc-type": "paragraph"
}
}
},
{
"selector": "blockquote",
"actions": {
"startNewChunk": {
"toc-type": "paragraph"
},
"markup": {
"wiki": {
"prefix": "<<<\n",
"suffix": "<<<\n"
}
}
}
},
{
"selector": "body,div,head,html,span",
"actions": {
"dontRenderTag": true
}
},
{
"selector": "dd",
"actions": {
"dontRenderTag": true,
"startNewChunk": {
"toc-type": "definition"
}
}
},
{
"selector": "dl",
"actions": {
"dontRenderTag": true,
"isParent": true,
"startNewChunk": {
"toc-type": "def-list",
"toc-list-filter": "[list<currentTiddler>!has[draft.of]]"
}
}
},
{
"selector": "dt",
"actions": {
"dontRenderTag": true,
"startNewChunk": {
"toc-type": "term"
}
}
},
{
"selector": "em,i",
"actions": {
"markup": {
"wiki": {
"prefix": "//",
"suffix": "//"
}
}
}
},
{
"selector": "h1",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 1,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h1"
}
}
},
{
"selector": "h2",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 2,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h2"
}
}
},
{
"selector": "h3",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 3,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h3"
}
}
},
{
"selector": "h4",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 4,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h4"
}
}
},
{
"selector": "h5",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 5,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h5"
}
}
},
{
"selector": "h6",
"actions": {
"dontRenderTag": true,
"isParent": true,
"headingLevel": 6,
"startNewChunk": {
"toc-type": "heading",
"toc-heading-level": "h6"
}
}
},
{
"selector": "img",
"actions": {
"isImage": true
}
},
{
"selector": "li",
"actions": {
"dontRenderTag": true,
"startNewChunk": {
"toc-type": "item"
}
}
},
{
"selector": "ol",
"actions": {
"dontRenderTag": true,
"isParent": true,
"startNewChunk": {
"toc-type": "list",
"toc-list-type": "ol",
"toc-list-filter": "[list<currentTiddler>!has[draft.of]]"
}
}
},
{
"selector": "p",
"actions": {
"dontRenderTag": true,
"startNewChunk": {
"toc-type": "paragraph"
}
}
},
{
"selector": "strike",
"actions": {
"markup": {
"wiki": {
"prefix": "~~",
"suffix": "~~"
}
}
}
},
{
"selector": "strong,b",
"actions": {
"markup": {
"wiki": {
"prefix": "''",
"suffix": "''"
}
}
}
},
{
"selector": "sub",
"actions": {
"markup": {
"wiki": {
"prefix": ",,",
"suffix": ",,"
}
}
}
},
{
"selector": "sup",
"actions": {
"markup": {
"wiki": {
"prefix": "^^",
"suffix": "^^"
}
}
}
},
{
"selector": "head > title",
"actions": {
"dontRenderTag": true,
"startNewChunk": {
"toc-type": "paragraph"
}
}
},
{
"selector": "u",
"actions": {
"markup": {
"wiki": {
"prefix": "__",
"suffix": "__"
}
}
}
},
{
"selector": "ul",
"actions": {
"dontRenderTag": true,
"isParent": true,
"startNewChunk": {
"toc-type": "list",
"toc-list-type": "ul",
"toc-list-filter": "[list<currentTiddler>!has[draft.of]]"
}
}
},
{
"selector": "*",
"actions": {}
}
]

View File

@ -0,0 +1,24 @@
title: $:/plugins/tiddlywiki/text-slicer/slicer-rules/html-plain-paragraphs.json
name: html-plain-paragraphs
caption: Plain Paragraphs (HTML)
description: One tiddler per paragraph, without formatting (HTML)
type: application/json
tags: $:/tags/text-slicer/slicer-rules
[
{
"selector": "address,blockquote,center,dd,dt,h1,h2,h3,h4,h5,h6,li,p",
"actions": {
"startNewChunk": {
"toc-type": "paragraph"
},
"dontRenderTag": true
}
},
{
"selector": "*",
"actions": {
"dontRenderTag": true
}
}
]

View File

@ -6,11 +6,13 @@ module-type: library
Slice a tiddler or DOM document into individual tiddlers
var slicer = new textSlicer.Slicer(doc,{
slicerRules: JSON data defining slicer rules -or- title of rules taken from tiddlers tagged $:/tags/text-slicer/slicer-rules
sourceTiddlerTitle: tiddler to slice -or-
sourceText: text to slice -or-
sourceDoc: DOM document to
baseTiddlerTitle: "MySlicedTiddlers-",
sourceText: text to slice
outputMode: "html" (default) -or- "wiki"
baseTiddlerTitle: "MySlicedTiddlers-"
role: "sliced-content"
callback: function(err,tiddlers)
});
\*/
@ -20,178 +22,383 @@ var slicer = new textSlicer.Slicer(doc,{
/*global $tw: false */
"use strict";
var DOMParser = $tw.browser ? window.DOMParser : require("$:/plugins/tiddlywiki/xmldom/dom-parser").DOMParser;
function Slicer(options) {
// Quick tests
this.testSlicerRuleMatching();
// Marshal parameters
this.sourceDoc = options.sourceDoc;
this.sourceTiddlerTitle = options.sourceTiddlerTitle;
this.sourceText = options.sourceText;
this.wiki = options.wiki;
if(options.baseTiddlerTitle) {
this.baseTiddlerTitle = options.baseTiddlerTitle
} else {
if(this.sourceTiddlerTitle) {
this.baseTiddlerTitle = "Sliced up " + this.sourceTiddlerTitle;
} else {
this.baseTiddlerTitle = "SlicedTiddler";
}
}
this.role = options.role || "sliced-html";
// Initialise state
this.extractedTiddlers = {}; // Hashmap of created tiddlers
this.parentStack = []; // Stack of parent heading or list
this.containerStack = []; // Stack of elements containing other elements
this.slicers = $tw.modules.applyMethods("slicer");
this.anchors = Object.create(null); // Hashmap of HTML anchor ID to tiddler title
// Get the DOM document for the source text
if(!this.sourceDoc) {
if(this.sourceTiddlerTitle) {
this.sourceDoc = this.parseTiddlerText(this.sourceTiddlerTitle);
} else {
this.sourceDoc = this.parseHtmlText(this.sourceText);
}
this.outputMode = options.outputMode || "html";
this.callbackFn = options.callback;
// Get the slicer rules
var nameSlicerRules = null;
if(!options.slicerRules) {
nameSlicerRules = "html-by-paragraph";
this.slicerRules = this.loadSlicerRules(nameSlicerRules);
} else if(typeof options.slicerRules === "string") {
nameSlicerRules = options.slicerRules;
this.slicerRules = this.loadSlicerRules(nameSlicerRules);
} else {
this.slicerRules = options.slicerRules;
}
// Create parent tiddler
console.log("Slicing to",this.baseTiddlerTitle)
var sliceTiddler = {
title: this.baseTiddlerTitle,
text: "Sliced at " + (new Date()),
// Set up the base tiddler title
this.baseTiddlerTitle = this.getBaseTiddlerTitle(options.baseTiddlerTitle);
// Initialise state
this.namespaces = {}; // Hashmap of URLs
this.chunks = []; // Array of tiddlers without titles, addressed by their index. We use the title field to hold the plain text content
this.currentChunk = null; // Index of the chunk currently being written to
this.parentStack = []; // Stack of parent chunks {chunk: chunk index,actions:}
this.elementStack = []; // Stack of {tag:,isSelfClosing:,actions:}
// Set up the document tiddler as top level heading
this.chunks.push({
"toc-type": "document",
tags: [],
title: "", // makeUniqueTitle will later initialise it to baseTiddlerTitle
text: "<div class='tc-table-of-contents'><<toc-selective-expandable '" + this.baseTiddlerTitle + "document'>></div>",
list: [],
role: this.role
};
this.addTiddler(sliceTiddler);
// Slice the text into subordinate tiddlers
this.parentStack.push({type: "h0", title: sliceTiddler.title});
this.currentTiddler = sliceTiddler.title;
this.containerStack.push(sliceTiddler.title);
this.processNodeList(this.sourceDoc.childNodes);
this.containerStack.pop();
tags: [],
role: this.role,
"slicer-rules": nameSlicerRules,
"slicer-output-mode": this.outputMode
});
this.parentStack.push({chunk: 0, actions: this.getMatchingSlicerRuleActions("(document)")});
// Set up the parser
var sax = require("$:/plugins/tiddlywiki/sax/sax.js");
this.sax = sax.parser(true,{
xmlns: true
});
this.sax.onerror = this.onError.bind(this);
this.sax.onopennamespace = this.onOpenNamespace.bind(this);
this.sax.onclosenamespace = this.onCloseNamespace.bind(this);
this.sax.onopentag = this.onOpenTag.bind(this);
this.sax.onclosetag = this.onCloseTag.bind(this);
this.sax.ontext = this.onText.bind(this);
this.sax.onend = this.onEnd.bind(this);
// Start streaming the data
this.sax.write(this.getSourceText());
this.sax.close();
}
Slicer.prototype.parseTiddlerText = function(title) {
var tiddler = this.wiki.getTiddler(title);
if(tiddler) {
if(tiddler.fields.type === "text/html") {
return this.parseHtmlText(tiddler.fields.text);
Slicer.prototype.callback = function(err,tiddlers) {
var self = this;
$tw.utils.nextTick(function() {
self.callbackFn(err,tiddlers);
});
};
Slicer.prototype.loadSlicerRules = function(name) {
// Collect the available slicer rule tiddlers
var self = this,
titles = this.wiki.getTiddlersWithTag("$:/tags/text-slicer/slicer-rules"),
tiddlers = {},
rules = {},
ruleNames = [];
titles.forEach(function(title) {
var tiddler = self.wiki.getTiddler(title);
tiddlers[tiddler.fields.name] = tiddler;
rules[tiddler.fields.name] = self.wiki.getTiddlerData(title,[]);
});
// Follow the inheritance trail to get a stack of slicer rule names
var n = name;
do {
ruleNames.push(n);
n = tiddlers[n] && tiddlers[n].fields["inherits-from"];
} while(n && ruleNames.indexOf(n) === -1);
// Concatenate the slicer rules
rules = ruleNames.reduce(function(accumulator,name) {
return accumulator.concat(rules[name]);
},[]);
return rules;
};
Slicer.prototype.getMatchingSlicerRuleActions = function(name) {
var rule = this.searchSlicerRules(name,this.slicerRules,this.elementStack);
if(!rule) {
return {};
} else {
return rule.actions;
}
};
Slicer.prototype.testSlicerRuleMatching = function() {
var tests = [
{
test: this.searchSlicerRules("title",[
{selector: "title,head,body", rules: true},
{selector: "body", rules: true}
],[
{tag:"head"}
]),
result: "title,head,body"
},
{
test: this.searchSlicerRules("body",[
{selector: "title,head,body", rules: true},
{selector: "body", rules: true}
],[
{tag:"head"}
]),
result: "title,head,body"
},
{
test: this.searchSlicerRules("title",[
{selector: "head > title", rules: true},
{selector: "title", rules: true}
],[
{tag:"head"}
]),
result: "head > title"
}
],
results = tests.forEach(function(test,index) {
if(test.test.selector !== test.result) {
throw "Failing test " + index + ", returns " + test.test.selector + " instead of " + test.result;
}
});
};
Slicer.prototype.searchSlicerRules = function(name,rules,elementStack) {
return rules.find(function(rule) {
// Split and trim the selectors for this rule
return !!rule.selector.split(",").map(function(selector) {
return selector.trim();
// Find the first selector that matches, if any
}).find(function(selector) {
// Split and trim the parts of the selector
var parts = selector.split(" ").map(function(part) {
return part.trim();
});
// * matches any element
if(parts.length === 1 && parts[0] === "*") {
return true;
}
// Make a copy of the element stack so that we can be destructive
var elements = elementStack.slice(0).concat({tag: name}),
nextElementMustBeAtTopOfStack = true,
currentPart = parts.length - 1;
while(currentPart >= 0) {
if(parts[currentPart] === ">") {
nextElementMustBeAtTopOfStack = true;
} else {
if(!nextElementMustBeAtTopOfStack) {
while(elements.length > 0 && elements[elements.length - 1].tag !== parts[currentPart]) {
elements.pop();
}
}
if(elements.length === 0 || elements[elements.length - 1].tag !== parts[currentPart]) {
return false;
}
elements.pop();
nextElementMustBeAtTopOfStack = false;
}
currentPart--;
}
return true;
});
});
};
Slicer.prototype.getBaseTiddlerTitle = function(baseTiddlerTitle) {
if(baseTiddlerTitle) {
return baseTiddlerTitle
} else {
if(this.sourceTiddlerTitle) {
return "Sliced up " + this.sourceTiddlerTitle + ":";
} else {
return this.parseWikiText(tiddler);
return "SlicedTiddler";
}
}
};
Slicer.prototype.parseWikiText = function(tiddler) {
Slicer.prototype.getSourceText = function() {
if(this.sourceTiddlerTitle) {
var tiddler = this.wiki.getTiddler(this.sourceTiddlerTitle);
if(!tiddler) {
console.log("Tiddler '" + this.sourceTiddlerTitle + "' does not exist");
return "";
}
if(tiddler.fields.type === "text/html" || tiddler.fields.type === "text/xml" || (tiddler.fields.type || "").slice(-4) === "+xml") {
return tiddler.fields.text;
} else {
return this.getTiddlerAsHtml(tiddler);
}
} else {
return this.sourceText;
}
};
Slicer.prototype.getTiddlerAsHtml = function(tiddler) {
var widgetNode = this.wiki.makeTranscludeWidget(tiddler.fields.title,{
document: $tw.fakeDocument,
parseAsInline: false,
importPageMacros: true}),
container = $tw.fakeDocument.createElement("div");
widgetNode.render(container,null);
return container;
return ["<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">","<html xmlns=\"http://www.w3.org/1999/xhtml\">","<head>","</head>","<body>",container.innerHTML,"</body>","</html>"].join("\n");
};
Slicer.prototype.parseHtmlText = function(text) {
text = text || "";
if($tw.browser) {
this.iframe = document.createElement("iframe");
document.body.appendChild(this.iframe);
this.iframe.contentWindow.document.open();
this.iframe.contentWindow.document.write(text);
this.iframe.contentWindow.document.close();
return this.iframe.contentWindow.document;
} else {
return new DOMParser().parseFromString(text);
}
Slicer.prototype.getImmediateParent = function() {
return this.parentStack.slice(-1)[0];
};
Slicer.prototype.addToList = function(parent,child) {
var parentTiddler = this.getTiddler(parent) || {},
parentList = parentTiddler.list || [];
parentList.push(child);
this.addTiddler($tw.utils.extend({title: parent},parentTiddler,{list: parentList}));
Slicer.prototype.onError = function(e) {
console.error("Sax error: ", e)
// Try to resume after errors
this.sax.error = null;
this.sax.resume();
};
Slicer.prototype.insertBeforeListItem = function(parent,child,beforeSibling) {
var parentTiddler = this.getTiddler(parent) || {},
parentList = parentTiddler.list || [],
parentListSiblingPosition = parentList.indexOf(beforeSibling);
if(parentListSiblingPosition !== -1) {
parentList.splice(parentListSiblingPosition,0,child)
this.addTiddler($tw.utils.extend({title: parent},parentTiddler,{list: parentList}));
}
else {debugger;}
Slicer.prototype.onOpenNamespace = function(info) {
this.namespaces[info.prefix] = info.uri;
};
Slicer.prototype.popParentStackUntil = function(type) {
// Pop the stack to remove any entries at the same or lower level
var newLevel = this.convertTypeToLevel(type),
topIndex = this.parentStack.length - 1;
do {
var topLevel = this.convertTypeToLevel(this.parentStack[this.parentStack.length - 1].type);
if(topLevel !== null && topLevel < newLevel ) {
break;
Slicer.prototype.onCloseNamespace = function(info) {
};
Slicer.prototype.onOpenTag = function(node) {
var actions = this.getMatchingSlicerRuleActions(node.name);
// Check for an element that should start a new chunk
if(actions.startNewChunk) {
// If this is a heading, pop off any higher or equal level headings first
if(actions.isParent && actions.headingLevel) {
var parentActions = this.getImmediateParent().actions;
while(parentActions.isParent && parentActions.headingLevel && parentActions.headingLevel >= actions.headingLevel) {
this.parentStack.pop();
parentActions = this.getImmediateParent().actions;
}
}
// Start the new chunk
this.startNewChunk(actions.startNewChunk);
// If this is a parent then also add it to the parent stack
if(actions.isParent) {
this.parentStack.push({chunk: this.currentChunk, actions: actions});
}
this.parentStack.length--;
} while(true);
return this.parentStack[this.parentStack.length - 1].title;
};
Slicer.prototype.getTopContainer = function() {
return this.containerStack[this.containerStack.length-1];
};
Slicer.prototype.appendToCurrentContainer = function(newText) {
var title = this.containerStack[this.containerStack.length-1];
if(title) {
var tiddler = this.getTiddler(title) || {},
text = tiddler.text || "";
this.addTiddler($tw.utils.extend({title: title},tiddler,{text: text + newText}));
}
else {debugger;}
// Render the tag inline in the current chunk unless we should ignore it
if(!actions.dontRenderTag) {
if(actions.isImage) {
this.onImage(node);
} else {
var markupInfo = actions.markup && actions.markup[this.outputMode];
if(markupInfo) {
this.addTextToCurrentChunk(markupInfo.prefix);
} else {
this.addTextToCurrentChunk("<" + node.name + (node.isSelfClosing ? "/" : "") + ">");
}
}
}
// Remember whether this tag is self closing
this.elementStack.push({tag: node.name,isSelfClosing: node.isSelfClosing, actions: actions});
};
Slicer.prototype.convertTypeToLevel = function(type) {
if(type.charAt(0) === "h") {
return parseInt(type.charAt(1),10);
} else {
return null;
Slicer.prototype.onImage = function(node) {
var url = node.attributes.src.value;
if(url.slice(0,5) === "data:") {
// var parts = url.slice(5).split(",");
// this.chunks.push({
// title: ,
// text: parts[1],
// type: parts[0].split[";"][0],
// role: this.role
// });
}
this.addTextToCurrentChunk("[img[" + url + "]]");
};
Slicer.prototype.onCloseTag = function(name) {
var e = this.elementStack.pop(),
actions = e.actions,
selfClosing = e.isSelfClosing;
// Set the caption if required
if(actions.setCaption) {
this.chunks[this.currentChunk].caption = this.chunks[this.currentChunk].title;
}
// Render the tag
if (!actions.dontRenderTag && !selfClosing) {
var markupInfo = actions.markup && actions.markup[this.outputMode];
if(markupInfo) {
this.addTextToCurrentChunk(markupInfo.suffix);
} else {
this.addTextToCurrentChunk("</" + name + ">");
}
}
// Check for an element that started a new chunk
if(actions.startNewChunk) {
if(!actions.mergeNext) {
this.currentChunk = null;
}
// If this is a parent and not a heading then also pop it from the parent stack
if(actions.isParent && !actions.headingLevel) {
this.parentStack.pop();
}
}
};
Slicer.prototype.onText = function(text) {
this.addTextToCurrentChunk($tw.utils.htmlEncode(text));
this.addTextToCurrentChunk(text,"title");
};
Slicer.prototype.onEnd = function() {
this.assignTitlesToChunks();
this.callback(null,this.chunks);
};
Slicer.prototype.addTextToCurrentChunk = function(str,field) {
field = field || "text";
if(this.currentChunk !== null) {
this.chunks[this.currentChunk][field] += str;
}
};
Slicer.prototype.startNewChunk = function(fields) {
var parentIndex = this.getImmediateParent().chunk;
this.chunks.push($tw.utils.extend({},{
title: "",
text: "",
tags: [parentIndex],
list: [],
role: this.role
},fields));
this.currentChunk = this.chunks.length - 1;
this.chunks[parentIndex].list.push(this.currentChunk);
};
Slicer.prototype.isBlank = function(s) {
return (/^[\s\xA0]*$/g).test(s);
};
Slicer.prototype.registerAnchor = function(id) {
this.anchors[id] = this.currentTiddler;
}
Slicer.prototype.processNodeList = function(domNodeList) {
$tw.utils.each(domNodeList,this.processNode.bind(this));
}
Slicer.prototype.processNode = function(domNode) {
var nodeType = domNode.nodeType,
tagName = (domNode.tagName || "").toLowerCase(),
hasProcessed = false;
for(var slicerTitle in this.slicers) {
var slicer = this.slicers[slicerTitle];
if(slicer.bind(this)(domNode,tagName)) {
hasProcessed = true;
break;
Slicer.prototype.assignTitlesToChunks = function() {
var self = this;
// Create a title for each tiddler
var titles = {};
this.chunks.forEach(function(chunk) {
var title = self.makeUniqueTitle(titles,chunk["toc-type"] + "-" + chunk.title)
titles[title] = true;
chunk.title = title;
});
// Link up any indices in the tags and list fields
this.chunks.forEach(function(chunk) {
if(chunk.tags) {
chunk.tags.map(function(tag,index) {
if(typeof tag === "number") {
chunk.tags[index] = self.chunks[tag].title;
}
});
}
}
if(!hasProcessed) {
if(nodeType === 1 && domNode.hasChildNodes()) {
this.processNodeList(domNode.childNodes);
if(chunk.list) {
chunk.list.map(function(listItem,index) {
if(typeof listItem === "number") {
chunk.list[index] = self.chunks[listItem].title;
}
});
}
}
});
};
Slicer.prototype.makeUniqueTitle = function(rawText) {
Slicer.prototype.makeUniqueTitle = function(tiddlers,rawText) {
// Remove characters other than lowercase alphanumeric and spaces
var prefix = this.baseTiddlerTitle,
self = this,
@ -215,45 +422,19 @@ Slicer.prototype.makeUniqueTitle = function(rawText) {
var c = 0,
s = "";
while(c < words.length && (s.length + words[c].length + 1) < 50) {
s += "-" + words[c++];
s += (s === "" ? "" : "-") + words[c++];
}
prefix = prefix + s;
}
// Check for duplicates
var baseTitle = prefix;
c = 0;
var title = baseTitle;
while(this.getTiddler(title)) {
title = baseTitle + "-" + (++c);
var title = prefix;
while(title in tiddlers) {
title = prefix + "-" + (++c);
}
return title;
};
Slicer.prototype.addTiddler = function(fields) {
if(fields.title) {
this.extractedTiddlers[fields.title] = Object.assign({},fields);
}
return fields.title;
};
Slicer.prototype.addTiddlers = function(fieldsArray) {
var self = this;
(fieldsArray || []).forEach(function(fields) {
self.addTiddler(fields);
});
};
Slicer.prototype.getTiddler = function(title) {
return this.extractedTiddlers[title];
};
Slicer.prototype.getTiddlers = function() {
var self = this;
return Object.keys(this.extractedTiddlers).map(function(title) {
return self.extractedTiddlers[title]
})
};
exports.Slicer = Slicer;
})();

View File

@ -1,26 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/anchor.js
type: application/javascript
module-type: slicer
Handle slicing anchor nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processAnchorNode = function(domNode,tagName) {
if(domNode.nodeType === 1 && tagName === "a") {
var id = domNode.getAttribute("id");
if(id) {
this.registerAnchor(id);
return true;
}
}
return false;
};
})();

View File

@ -1,40 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/def-list.js
type: application/javascript
module-type: slicer
Handle slicing definition list nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processDefListNode = function(domNode,tagName) {
if(domNode.nodeType === 1 && tagName === "dl") {
var title = this.makeUniqueTitle("def-list-" + tagName),
parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.parentStack.push({type: tagName, title: this.addTiddler({
"toc-type": "def-list",
"toc-list-filter": "[list<currentTiddler>!has[draft.of]]",
text: "",
title: title,
list: [],
tags: tags
})});
this.currentTiddler = title;
this.processNodeList(domNode.childNodes);
this.parentStack.pop();
return true;
}
return false;
};
})();

View File

@ -1,44 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/definition.js
type: application/javascript
module-type: slicer
Handle slicing definition nodes in definition lists
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processDefinitionNode = function(domNode,tagName) {
var text = $tw.utils.htmlEncode(domNode.textContent);
if(domNode.nodeType === 1 && tagName === "dd") {
// if(!this.isBlank(text)) {
var title = this.makeUniqueTitle("definition " + text),
parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.addTiddler({
"toc-type": "definition",
title: title,
text: "",
list: [],
tags: tags
});
this.currentTiddler = title;
this.containerStack.push(title);
// this.containerStack.push("Just testing" + new Date());
this.processNodeList(domNode.childNodes);
this.containerStack.pop();
return true;
// }
}
return false;
};
})();

View File

@ -1,42 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/heading.js
type: application/javascript
module-type: slicer
Handle slicing heading nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processHeadingNode = function(domNode,tagName) {
if(domNode.nodeType === 1 && (tagName === "h1" || tagName === "h2" || tagName === "h3" || tagName === "h4")) {
var text = $tw.utils.htmlEncode(domNode.textContent);
var title = this.makeUniqueTitle("heading " + text),
parentTitle = this.popParentStackUntil(tagName),
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.parentStack.push({type: tagName, title: this.addTiddler({
"toc-type": "heading",
"toc-heading-level": tagName,
title: title,
text: "",
list: [],
tags: tags
})});
this.currentTiddler = title;
this.containerStack.push(title);
this.processNodeList(domNode.childNodes);
this.containerStack.pop();
return true;
}
return false;
};
})();

View File

@ -1,71 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/image.js
type: application/javascript
module-type: slicer
Handle slicing img nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processImageNode = function(domNode,tagName) {
if(domNode.nodeType === 1 && tagName === "img") {
var src = domNode.getAttribute("src");
if(src) {
var containerTitle = this.getTopContainer(),
containerTiddler = this.getTiddler(containerTitle),
title, tiddler = {
"toc-type": "image"
};
if(src.substr(0,5) === "data:") {
var parts = src.toString().substr(5).split(";base64,");
tiddler.type = parts[0];
tiddler.text = parts[1];
var contentTypeInfo = $tw.config.contentTypeInfo[tiddler.type] || {extension: ""};
title = this.makeUniqueTitle("image " + containerTitle) + contentTypeInfo.extension;
tiddler.title = title;
this.addTiddler(tiddler);
} else {
title = $tw.utils.resolvePath(src,this.baseTiddlerTitle);
}
switch(containerTiddler["toc-type"]) {
case "document":
// Make the image be the next child of the document
this.addToList(containerTitle,title);
break;
case "heading":
// Make the image be the older sibling of the heading
var parentTitle = this.parentStack[this.parentStack.length - 2].title;
this.insertBeforeListItem(parentTitle,title,containerTitle);
break;
case "paragraph":
// Make the image be the older sibling of the paragraph
var parentTitle = this.parentStack[this.parentStack.length - 1].title;
this.insertBeforeListItem(parentTitle,title,containerTitle);
break;
case "item":
// Create a new older sibling item to contain the image
var parentTitle = this.parentStack[this.parentStack.length - 1].title,
itemTitle = this.makeUniqueTitle("image-item-wrapper " + containerTitle),
itemTiddler = {
title: itemTitle,
"toc-type": "item",
list: [title],
text: "[img[" + title + "]]"
};
this.addTiddler(itemTiddler);
this.insertBeforeListItem(parentTitle,itemTitle,containerTitle);
break;
}
// this.appendToCurrentContainer("[img[" + title + "]]");
return true;
}
}
return false;
};
})();

View File

@ -1,44 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/item.js
type: application/javascript
module-type: slicer
Handle slicing list item nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processListItemNode = function(domNode,tagName) {
var text = $tw.utils.htmlEncode(domNode.textContent);
if(domNode.nodeType === 1 && tagName === "li") {
// if(!this.isBlank(text)) {
var title = this.makeUniqueTitle("list-item " + text),
parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.addTiddler({
"toc-type": "item",
title: title,
text: "",
list: [],
tags: tags
});
this.currentTiddler = title;
this.containerStack.push(title);
// this.containerStack.push("Just testing" + new Date());
this.processNodeList(domNode.childNodes);
this.containerStack.pop();
return true;
// }
}
return false;
};
})();

View File

@ -1,41 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/list.js
type: application/javascript
module-type: slicer
Handle slicing list nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processListNode = function(domNode,tagName) {
if(domNode.nodeType === 1 && (tagName === "ul" || tagName === "ol")) {
var title = this.makeUniqueTitle("list " + tagName),
parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.parentStack.push({type: tagName, title: this.addTiddler({
"toc-type": "list",
"toc-list-type": tagName,
"toc-list-filter": "[list<currentTiddler>!has[draft.of]]",
text: "",
title: title,
list: [],
tags: tags
})});
this.currentTiddler = title;
this.processNodeList(domNode.childNodes);
this.parentStack.pop();
return true;
}
return false;
};
})();

View File

@ -1,41 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/paragraph.js
type: application/javascript
module-type: slicer
Handle slicing paragraph nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processParagraphNode = function(domNode,tagName) {
var text = $tw.utils.htmlEncode(domNode.textContent);
if(domNode.nodeType === 1 && tagName === "p") {
if(!this.isBlank(text)) {
var parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [],
title = this.makeUniqueTitle("paragraph " + text);
if(domNode.className && domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,this.addTiddler({
"toc-type": "paragraph",
title: title,
text: "",
tags: tags
}));
this.currentTiddler = title;
this.containerStack.push(title);
this.processNodeList(domNode.childNodes);
this.containerStack.pop();
return true;
}
}
return false;
};
})();

View File

@ -1,44 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/term.js
type: application/javascript
module-type: slicer
Handle slicing term nodes in definition lists
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processTermNode = function(domNode,tagName) {
var text = $tw.utils.htmlEncode(domNode.textContent);
if(domNode.nodeType === 1 && tagName === "dt") {
// if(!this.isBlank(text)) {
var title = this.makeUniqueTitle("term " + text),
parentTitle = this.parentStack[this.parentStack.length - 1].title,
tags = [];
if(domNode.className && domNode.className.trim() !== "") {
tags = tags.concat(domNode.className.split(" "));
}
this.addToList(parentTitle,title);
this.addTiddler({
"toc-type": "term",
title: title,
text: "",
list: [],
tags: tags
});
this.currentTiddler = title;
this.containerStack.push(title);
// this.containerStack.push("Just testing" + new Date());
this.processNodeList(domNode.childNodes);
this.containerStack.pop();
return true;
// }
}
return false;
};
})();

View File

@ -1,23 +0,0 @@
/*\
title: $:/plugins/tiddlywiki/text-slicer/modules/slicers/text.js
type: application/javascript
module-type: slicer
Handle slicing text nodes
\*/
(function(){
/*jslint node: true, browser: true */
/*global $tw: false */
"use strict";
exports.processTextNode = function(domNode,tagName) {
if(domNode.nodeType === 3) {
this.appendToCurrentContainer($tw.utils.htmlEncode(domNode.textContent));
return true;
}
return false;
};
})();

View File

@ -22,20 +22,29 @@ exports.synchronous = true;
// Install the root widget event handlers
exports.startup = function() {
// Check xmldom is installed
if(!$tw.utils.hop($tw.modules.titles,"$:/plugins/tiddlywiki/xmldom/dom-parser")) {
// Check sax is installed
if(!$tw.utils.hop($tw.modules.titles,"$:/plugins/tiddlywiki/sax/sax.js")) {
// Make a logger
var logger = new $tw.utils.Logger("text-slicer");
logger.alert("The plugin 'text-slicer' requires the 'xmldom' plugin to be installed");
logger.alert("The plugin 'text-slicer' requires the 'sax' plugin to be installed");
}
// Add tm-slice-tiddler event handler
$tw.rootWidget.addEventListener("tm-slice-tiddler",function(event) {
var slicer = new textSlicer.Slicer({
sourceTiddlerTitle: event.param,
slicerRules: event.paramObject && event.paramObject.slicerRules,
outputMode: event.paramObject && event.paramObject.outputMode,
baseTiddlerTitle: event.paramObject && event.paramObject.destTitle,
wiki: $tw.wiki
role: event.paramObject && event.paramObject.role,
wiki: $tw.wiki,
callback: function(err,tiddlers) {
if(err) {
logger.alert("Slicer error: " + err);
} else {
$tw.wiki.addTiddlers(tiddlers);
}
}
});
$tw.wiki.addTiddlers(slicer.getTiddlers());
});
};

View File

@ -1,5 +0,0 @@
title: $:/plugins/tiddlywiki/text-slicer/readme
//''This plugin is under active development, and is subject to change in the future''. It is currently only intended for advanced users. The tools are in the early stages of development, and likely to need some customisation to do what you need.//
This plugin contains tools to help slice up long texts into individual tiddlers.

View File

@ -22,12 +22,12 @@ $:/state/plugins/tiddlywiki/text-slicer/heading-status/$(currentTiddler)$
<div class="tc-sliced-document">
<div class="tc-sliced-document-header">
<div class="tc-document-tiddler-toolbar">
<$reveal type="nomatch" state=<<config-document-status>> text="close" default="open">
<$reveal type="nomatch" state=<<config-document-status>> text="close" default="open" tag="div">
<$button set=<<config-document-status>> setTo="close" class="tc-btn-invisible">
{{$:/core/images/down-arrow}}
</$button>
</$reveal>
<$reveal type="match" state=<<config-document-status>> text="close" default="open">
<$reveal type="match" state=<<config-document-status>> text="close" default="open" tag="div">
<$button set=<<config-document-status>> setTo="open" class="tc-btn-invisible">
{{$:/core/images/right-arrow}}
</$button>
@ -35,7 +35,7 @@ $:/state/plugins/tiddlywiki/text-slicer/heading-status/$(currentTiddler)$
</div>
<h1 class="tc-sliced-document-title">''Document'': <$link><$view field="title"/></$link></h1>
</div>
<$reveal type="nomatch" state=<<config-document-status>> text="close" default="open">
<$reveal type="nomatch" state=<<config-document-status>> text="close" default="open" tag="div">
{{||$:/plugins/tiddlywiki/text-slicer/ui/document/header}}
<div class='tc-sliced-document-body'>
<$set name="tv-show-toolbar" value={{$(config-show-toolbar)$}}>

View File

@ -8,12 +8,12 @@ $(tv-heading-status-config-title)$/$(tv-heading-status-config-prefix)$/$(current
<$set name="tv-heading-status-config-title" value=<<config-heading-status>>>
<div class="tc-document-tiddler">
<div class="tc-document-tiddler-toolbar">
<$reveal type="nomatch" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>>>
<$reveal type="nomatch" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>> tag="div">
<$button set=<<tv-heading-status-config-title>> setTo="close" class="tc-btn-invisible">
{{$:/core/images/down-arrow}}
</$button>
</$reveal>
<$reveal type="match" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>>>
<$reveal type="match" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>> tag="div">
<$button set=<<tv-heading-status-config-title>> setTo="open" class="tc-btn-invisible">
{{$:/core/images/right-arrow}}
</$button>
@ -22,7 +22,7 @@ $(tv-heading-status-config-title)$/$(tv-heading-status-config-prefix)$/$(current
<$link tag="$level$" class="tc-document-tiddler-link">
<$transclude/>
</$link>
<$reveal type="nomatch" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>>>
<$reveal type="nomatch" state=<<tv-heading-status-config-title>> text="close" default=<<tv-default-heading-state>> tag="div">
<$list filter="[list<currentTiddler>!has[draft.of]]" template="$:/plugins/tiddlywiki/text-slicer/templates/interactive/tiddler"/>
</$reveal>
</div>

View File

@ -1,45 +1,28 @@
title: $:/plugins/tiddlywiki/text-slicer/templates/interactive/tiddler
\define if(condition,then,else)
<$list filter="""$condition$ +[limit[1]]""" emptyMessage="""$else$""" variable="ignore">
$then$
</$list>
\end
\define include-component(type)
<<if "[{!!toc-type}prefix[$type$]]" """
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/$type$" mode="block"/>
""">>
\end
<$list filter="[<tv-show-toolbar>prefix[yes]]" variable="hasToolbar">
{{||$:/plugins/tiddlywiki/text-slicer/ui/tiddler/toolbar}}
</$list>
<$reveal type="match" state="!!toc-type" text="document">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/document" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="heading">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/heading" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="paragraph">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/paragraph" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="note">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/note" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="list">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/list" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="item">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/item" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="image">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/image" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="def-list">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/def-list" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="term">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/term" mode="block"/>
</$reveal>
<$reveal type="match" state="!!toc-type" text="definition">
<$transclude tiddler="$:/plugins/tiddlywiki/text-slicer/templates/interactive/definition" mode="block"/>
</$reveal>
<<include-component "document">>
<<include-component "heading">>
<<include-component "paragraph">>
<<include-component "note">>
<<include-component "list">>
<<include-component "item">>
<<include-component "image">>
<<include-component "def-list">>
<<include-component "term">>
<<include-component "definition">>

View File

@ -0,0 +1,18 @@
title: $:/plugins/tiddlywiki/text/slicer/ui/slice-modal
footer: <$button message="tm-close-tiddler">Cancel</$button> <$button><$action-sendmessage $message="tm-close-tiddler"/><$action-sendmessage $message="tm-slice-tiddler" $param=<<currentTiddler>> slicerRules={{$:/config/plugins/text-slicer/slice-rule}} outputMode={{$:/config/plugins/text-slicer/output-mode}} destTitle={{$:/config/plugins/text-slicer/base-title}}/>Slice</$button>
subtitle: Slicing "<$text text=<<currentTiddler>>/>" into chunks
''Choose how the tiddler should be sliced''
Prefix for extracted tiddlers: <$edit-text tiddler="$:/config/plugins/text-slicer/base-title" default={{{ [[Sliced up ]addsuffix<currentTiddler>addsuffix[:]] }}} tag="input" size="30"/>
<$select tiddler="$:/config/plugins/text-slicer/slice-rule" default="html-by-paragraph">
<$list filter="[all[shadows+tiddlers]tag[$:/tags/text-slicer/slicer-rules]!has[draft.of]]">
<option value={{!!name}}><$text text={{!!description}}/></option>
</$list>
</$select>
Output mode: <$select tiddler="$:/config/plugins/text-slicer/output-mode" default="html">
<option value="html">HTML</option>
<option value="wiki">Wiki text</option>
</$select>

View File

@ -7,10 +7,11 @@ description: Slice this text tiddler by headings and lists
\whitespace trim
\define hint()
Slice this text tiddler by headings and lists
Slice this text tiddler into chunks
\end
<$button message="tm-slice-tiddler" param=<<currentTiddler>> tooltip=<<hint>> aria-label=<<hint>> class=<<tv-config-toolbar-class>>>
<$button tooltip=<<hint>> aria-label=<<hint>> class=<<tv-config-toolbar-class>>>
<$action-sendmessage $message="tm-modal" $param="$:/plugins/tiddlywiki/text/slicer/ui/slice-modal" currentTiddler=<<currentTiddler>>/>
<$list filter="[<tv-config-toolbar-icons>prefix[yes]]">
{{$:/plugins/tiddlywiki/text-slicer/images/text-slicer-icon}}
</$list>