Extensible Markup Language (XML)
class StreamCacher : public StreamReader
Lingfa Yang
Inspired by
water_bucket_playground_spill_fun
-
We first make
StreamCacher behave exactly the same as StreamReader.
It must pass the following test:
bool streamCacher_test()
{
fstring fileName = "c:/document.xml";
StreamReader r0;
StreamCacher r1;
if (!r0.readFile(fileName)) return false;
if (!r1.readFile(fileName)) return false;
while (!r1.atEnd() && !r0.atEnd()) {
// Read the same types of tokens
if (r1.readNext() != r0.readNext()) return false;
// Start the same element, same attribute set
if (r1.isStartElement()) {
if (!r0.isStartElement()) return false;
if (r1.name() != r0.name()) return false;
if (r1.attrs() != r0.attrs()) return false;
}
// End the same element
else if (r1.isEndElement()) {
if (!r0.isEndElement()) return false;
if (r1.name() != r0.name()) return false;
}
// read the same element content
else if (r1.isCharacters()) {
if (!r0.isCharacters()) return false;
if (r1.text() != r0.text()) return false;
}
}
// Yes, they do behaves the same.
return true;
}
|
-
The stream, both in and out looking from external, are continuous;
while, the internal is quantized, like periodically spilling of a water bucket in a water playground.
Here, the water bucket is a token vector:
std::vector <TokenInfo> tokens;
|
-
load
As the normal stream flowing, check to load.
enum TokenType StreamCacher::readNext()
{
...
enum TokenType type = StreamReader::readNext();
if (isStartElement()) {
if (name() == cacheTag) {
return load();
}
}
...
}
|
Loading pushes token into a vector.
enum TokenType StreamCacher::load()
{
int32 count = 0;
while (!this->atEnd()) {
if (isEndElement()) {
if (name() == cacheTag) {
-- count;
if (!count) {
tokens.push_back(TokenInfo(*token)); // store the last token
break;
}
}
}
else if (isStartElement(cacheTag)) ++ count;
tokens.push_back(TokenInfo(*token));
StreamReader::readNext();
}
token0 = token; // back up
idx = 0;
return next();
}
|
-
next
When full, spill it (step it one-by-one = next)
enum TokenType StreamCacher::next()
{
token = &tokens[idx]; ++ idx;
enum TokenType type = token->type;
if (idx == tokens.size()) {
tokens.clear();
token = token0;
}
return type;
}
|
-
water_bucket_playground_spill_fun
enum TokenType StreamCacher::readNext()
{
if (idx < tokens.size()) {
return next();
}
enum TokenType type = StreamReader::readNext();
if (isStartElement()) {
if (name() == cacheTag) {
return load();
}
}
return type;
}
|
-
Make fun
Are you surprise getting content of paragraph at the moment you just read an openning p (Paragraph) tag?
Normally, when you finish reading a paragraph, you know its content. Here is a paragraph reader:
fwstring readParagraph(StreamReader *r)
{
if (!r->isStartElement("p")) return L"";
fwstring text;
while (!r->atEnd()) {
r->readNext();
if (r->isEndElement("p")) break;
if (r->isStartElement("t")) {
text += r->readContent();
}
}
return text;
}
|
With such a StreamCacher, you know the whole content when you just start to read. Here it is:
while (!r.atEnd()) {
r.readNext();
if (r.isStartElement("p")) {
fwstring content = r.content();
}
}
|
You keep awaring the whole content during reading, and suddenly "forget" everything at the end of a paragraph (meet closing p tag).
while (!r.atEnd()) {
r.readNext();
if (r.isEndElement("p")) {
fwstring nothing = r.content(); // too late to know
}
}
|
The secret is the "water mucket", which quantizes the continuous stream.
fwstring StreamCacher::content()
{
fwstring text;
std::vector <TokenInfo>::const_iterator i,
b = tokens.begin(),
e = tokens.end();
for(i = b; i != e; ++ i) {
if (i->type == CHARACTERS) {
if ( (i-1)->type == START_ELEMENT
&& (i-1)->name == "t") {
text += i->text;
}
}
}
return text;
}
|
-
Example
Input XML file:
Expect Output:
Main Street has 5 homes.
1 out of 5 is George Washington.
2 out of 5 is John Adams.
3 out of 5 is Thomas Jefferson.
4 out of 5 is James Madison.
5 out of 5 is James Monroe.
Liberty Street has 2 homes.
1 out of 2 is George W. Bush.
2 out of 2 is Barack Obama.
|
Code:
bool streamCacher_read()
{
fstring fileName = "c:/yanglingfa/xml/concord.xml";
StreamCacher r;
if (!r.readFile(fileName)) return false;
r.setCacheTag("street"); // Specify cached tag name
StreamWriter * os = new StreamConsole;
EndOfLine endl;
fwstring streetName;
uint32 numberOfHome;
uint32 count;
while (!r.atEnd()) {
r.readNext();
if (r.isStartElement()) {
if (r.name() == "street") {
streetName = r.attrs()[L"name"];
// Construct a new StreamCacher, and use StreamReader to read.
numberOfHome = homeCount(&StreamCacher(r)); // count in advance !!!
*os << streetName << " has " << numberOfHome << " homes." << endl;
count = 0;
}
else if (r.name() == "home") {
++ count; // count in stream
*os << "\t" << count << " out of " << numberOfHome
<< " is "<< r.attrs()[L"name"] << "." << endl;
}
}
}
delete os;
return true;
}
|
where, still use StreamReader interface read token cache.
uint32 homeCount(StreamReader * r)
{
uint32 count = 0;
while (!r->atEnd()) {
r->readNext();
if (r->isStartElement("home")) ++ count;
}
return count;
}
|
Behand the scene in storage, there are 22 tokens:
One can reset to any token to read;
StreamCacher r1(r);
size_t size = r1.size(); // 22
bool ok = r1.reset(11); // reset to a place you want
fstring name = r1.name(); // apartment
bool isStartApt = r1.isStartElement("apartment"); // true
r1.readNext();
bool isCharacter = r1.isCharacters(); // true
|
r1.reset(); // reset to idx = 0 as default
fstring name1 = r1.name(); // "street"
size_t count1 = 1;
while (!r1.atEnd()) {
r1.readNext();
++ count1;
}
// Is should end up at 22
|
-
Tree ?
If you prefer to work on a structure tree instead of these cached token as a vector,
or over such a stream reader interface, you can code this way:
TokenNode * st = StreamCacher(r).tree();
|
Then, you can see a tree with only 7 start element tokens.
The tree can also hold text nodes if it has, but tokens for indent characters and end elements are not seen in such a tree structure.
That is why we use a vector to collect all tokens, not use tree for storage, and we use tree for representation purpuse only.
As you can see, this tree has rootName:
fstring rootName = st->info.name; // "street"
|
Ask a tree node for its size means numbers of children the node has.
Here, st has 4 children. Its 4th child has 2 children.
size_t children = st->size(); // 4
size_t grandChildren = st->child(3)->size(); // 2
|
StreamReader
|