Chapter 3. Text inclusions
The general form of a text inclusion is:
|<xi:include xmlns:xi='http://www.w3.org/2001/XInclude'
| href='/path/to/document.txt'
| parse='text'
| fragid='text(…)'/>
The parse
attribute must be present
and must have the value text
, that’s what makes it a text
inclusion. The fragment identifier is also optional; if it’s not
present, the entire document is included. The attribute xpointer
can be used instead of fragid
, but that’s discouraged because
technically an XPointer can only refer to an XML document.
Parsing the example from Chapter 2, XML inclusions as text, inserts the whole file:
|<xi:include href="abstraction.xml" parse="text"/>
|<blockquote xmlns="http://docbook.org/ns/docbook" version="5.2">
|<title>Abstraction</title>
|<attribution><personname>Paul Hudak</personname></attribution>
|<para xml:id="abs"><quote>Abstraction, abstraction and abstraction.</quote>
|This is the answer to the question, <quote>What are the three most
|important words in programming?</quote></para>
|</blockquote>
3.1. XML fragment identifier schemes
char=
-
A
char=
fragment identifier is interpreted according to RFC 5147 with integrity checking.Example 3.2. Text inclusion with a char identifier|
<xi:include href="abstraction.xml" parse="text" fragid="char=68,87"/>
Subexample 3.2.1. The XInclude|
tle>Abstraction</ti
Subexample 3.2.2. What’s included line=
-
A
line=
fragment identifier is interpreted according to RFC 5147 with integrity checking.Example 3.3. Text inclusion with a line identifier|
<xi:include href="abstraction.xml" parse="text" fragid="line=3,5"/>
Subexample 3.3.1. The XInclude|
<para xml:id="abs"><quote>Abstraction, abstraction and abstraction.</quote>
|This is the answer to the question, <quote>What are the three most
Subexample 3.3.2. What’s included L#-L#
-
This scheme is the loosely documented format supported by GitHub. It identifies a line or range of lines, for example
L3
identifies line 3 andL3-L7
identifies lines 3 through 7, inclusive.Example 3.4. Text inclusion with a L#-L# identifier|
<xi:include href="abstraction.xml" parse="text" fragid="L3-L5"/>
Subexample 3.4.1. The XInclude|
<attribution><personname>Paul Hudak</personname></attribution>
|<para xml:id="abs"><quote>Abstraction, abstraction and abstraction.</quote>
|This is the answer to the question, <quote>What are the three most
Subexample 3.4.2. What’s included search=
-
The
search=
fragment identifier locates lines by searching within the text.Example 3.5. Text inclusion with a searchidentifier|
<xi:include href="abstraction.xml" parse="text"
|fragid="search=/<para/,#/para#"/>
Subexample 3.5.1. The XInclude|
<para xml:id="abs"><quote>Abstraction, abstraction and abstraction.</quote>
|This is the answer to the question, <quote>What are the three most
|important words in programming?</quote></para>
Subexample 3.5.2. What’s included
3.1.1. RFC 5147 integrity checking
Both the char=
and line=
flavors of RFC 5147 identifiers (and the search=
extension scheme) support either file size or MD5 integrity checking.
This fragment identifier: line=23,67;length=3134
will
fail unless the file identified is 3,134 bytes long. Alternatively,
line=23,67;md5=135b35933056ba8d06e8d3f5f4ecd318
will fail
unless the file has an MD5 message
digest equal to
135b35933056ba8d06e8d3f5f4ecd318
.
|<xi:include href="abstraction.xml" parse="text"
| fragid="line=3,5;md5=d6090e3280649716833e3c33269d1892"/>
|<para xml:id="abs"><quote>Abstraction, abstraction and abstraction.</quote>
|This is the answer to the question, <quote>What are the three most
Many systems come with a program named md5
that will compute the MD5 hash of a file:
|$
|md5 abstraction.xml
MD5 (abstraction.xml) = d6090e3280649716833e3c33269d1892
Alternatively, you can specify an incorrect hash in the fragment identifier and SInclude will tell you what it was expecting when the integrity check fails.
3.2. Text searching
The search
scheme has no official standard. I
invented
it a few years ago. The idea is that instead of using explicit character
or line references, as RFC 5147 does, allow the user to identify the lines
by what they contain.
Expressed in a lazy pseudo-BNF, it looks like this:
|search = "search=" startSearch? ("," endSearch?)? (";" searchOpt?)?
|startSearch = searchExpr (";" startOpt?)?
|endSearch = searchExpr (";" endOpt?)?
|searchExpr = ([0-9]+)? (.) (.*?) \2
|startOpt = "from" | "after" | "trim"
|endOpt = "to" | "before" | "trim"
|searchOpt = "strip" | RFC 5147 integrity checks
The core of the syntax is the searchExpr
. A search expression is
an optional number, followed by any quote character, followed by a
string delimited by a second occurrence of the quote character. The
number allows you to find a specific occurrence of the string.
The expression 3/abcde/
finds the third line that
contains the string “abcde”. So do 3#abcde#
and 3xabcdex
. If you
leave the occurrence number out, it defaults to 1: /marker text/
finds the first line that contains the string “marker text”.
If you don’t specify a start expression, inclusion starts at the beginning of the file. If you don’t specify an end expression, all of the file after the starting match is included. It’s an error if the starting expression is specified and it never matches.
After that, it’s just a matter of a few useful options. On
search expressions, the default options are from
and to
. They specify that the matched line is
included. The values after
and
before
, specify that the matched line is not
included. The value trim
specifies not only that the matched line is
not included, but that any leading (in the case of start) or trailing
(in the case of end) lines that consist entirely of whitespace are
trimmed away.
The top level search option strip
specifies
that whitespace stripping should be performed on the start of each
included line. The smallest indent value is determined and that number
of whitespace characters is removed from the beginning of each line.
The other top level search options are the RFC 5147 integrity check
options.