Here are the instructions for running XPB4J out of the box:
xpb4j-0.90
.
JAVA_HOME
set to the base directory of J2SDK1.4.x installation.
bin
directory of Ant as a component.
xpb4j-0.90
) and issue
the command:
$ ant run
res0.xml
kept in directory xmldata
and JAXP libraries of J2SDK, along with other XML parsing libraries included with XPB4J.
If you want to learn more and experiment around, read on.
XML Processing Benchmark for Java (XPB4J) is a Java based performance measurement and comparison program for XML processing software. XML operations such as parsing, transformation, validation, encryption/decryption, custom access/manipulation or any combination of these applied on one or more XML files and/or byte streams is considered as XML processing.
Specific examples of such processing include:
XPB4J can be used to measure the performance of any of these processing activities, provided you hookup the processing code to XPB4J. By default, XPB4J includes the processing code for XStat Processing, a kind of XML content analysis that collects certain statistical information about an XML document.
XPB4J doesn't define any benchmark standard; it simply defines a framework to execute and measure performance characteristics of Processing Activities. If the same operation can be performed with different Processing Methods ( say, using different parsing APIs such as SAX, DOM, JDOM or Pull Parser API) then the performance charateristics of these can be measured and compared. One could also use different parsers and/or transformers and compare the results for the same processing method.
I wrote XPB4J primarily to
I have exercised XPB4J for XStat Processing using different parsing APIs and parser implementations. The different processing methods used are:
You can find my observations and conclusions in XML Processing Measurements using XPB4J. You could also run XPBJ4 on your machine with your favourite parser/transformer with your typical input and observe the results.
If your interest is in finding out performance and memory usage of your own custom processing, you can write your own classes using XPB4J Framework to invoke your processing and collect the relevant data. Unfortunately, documentation for this framework doesn't exist right now. For now, I would suggest that you look at the source code and figure it out by yourself. This is actually simpler than you might expect.
Note: The directory path and execution script name in this document use the MS WINDOWS convention. Their
UNIX equivalents can be derived simply by replacing \
by /
in path names.
You can either download the released software as a single
.zip
file or get the latest code from CVS archive anonymously. If you download the
the zipped file xpb4j-0.90.zip
, unzip it in your working directory to get the
directory tree starting at XPB4J home directory xpb4j-0.90
.
For cvs access, you should have CVS client software. If you are a Linux user, you should have it already. If you are a Windows user, you can either get the command line version as part of Cygwin toolkit or install the excellent WinCVS.
Instructions given below are for command ine version on Windows machine. Instruction on Linux and different flavours of UNIX are very similar.
$ set CVSROOT=:pserver:anonymous@cvs.xpb4j.sourceforge.net:/cvsroot/xpb4j
$ cvs login
$ cvs -z8 co xpb4j
You get the latest snapshot from CVS archive. But beware, it may not work ( or even compile !! ).
As illustrated in section Guide For the Impatient, running XPB4J with default arguments is very simple.
By default, configurable parameters are taken from build.properties
file, jar files
are loaded from ( besides those loaded by the JVM ) lib
directory and the input data set
is made by all .xml
files in xmldata
directory. To change these default
values, you can either modify the Ant script build.xml
or change the content of the
above mentioned files and/or directories. It is also possbile to set certain parameters as Ant
properties in the command line.
I have found it convenient to
.jar
files to ( and remove unwanted .jar
files from )
lib
directory to change parser and processor implementations.
.xml
data files to ( and remove unwanted .xml
files from ) xmldata
directory to change input data files.
xpb.loopcount
and/or xpb.runcount
in the command
line itself for a specific invocation.
Examples of invocation commands include:
$ ant run $ ant run-dom $ ant run -Dxpb.loopcount=1000 $ ant run -Dxpb.loopcount=10 -Dxpb.runcount=10 $ ant help
Refer to Ant build script and the section on Measurement Process for more details on these.
A successful execution of XPB4J writes the measurements in file pdata.xml
and
processing results in file results.xml
.
Execution of the measurement program is started by invoking the command
"ant run
".
The measurement program executes the equivalent of following loop for measurements:
// Code for illustration only. Won't compile. for (int r = 0; r < runcount; r++) // 4 runs { Runtime.gc(); // Hope that this will force garbage collection. long startMem = Runtime.totalMemory() - Runtime.freeMemory(); long startTime = System.currentTimeMillis(); for (int l = 0; l < loopcount; l++) // 100 loops { if (gc flag is on) // off by default System.gc(); for (file f in input files ) // Do the processing. process f; } long endTime = System.currentTimeMillis(); long endMem = Runtime.totalMemory() - Runtime.freeMemory(); System.out.println("Processing Time: " + (endTime - startTime)/100 + " milli secs."); System.out.println("Memory Use: " + (endMem - startMem)/1024 + " KB."); }
This loop is actually spread over two different classes in the code. You can find the
corresponding source code in files PMethod.java
and
XPBmain.java
under package org.xperf.xpb
. The importatn thing to
note is that all the runs are within same execution of the JVM ( so that the warmup overhead
is incurred only once ) and each run consists of a number of processing iterations ( so that
the measurement window is large enough to get meaningful average processing time ). Each
processing iteration processes all the specified input files.
XPB4J distribution includes input data files obtained from
Google using
Googles' Web API by issuing SOAP requests
with search string "Bill Gates". Files
res0.xml
,
res1.xml
, ...,
res9.xml
, a total of ten, were created
by saving the returned documents, each having 10 search result entries, res0.xml
containing first
to ninth, res1.xml
containing tenth to nineteenth and so on. Each file is a valid SOAP document and is approximately
10KB in size. A big file having all the entries, file
res.xml
was
created by concatenating the entries of all the other files. Note that file res.xml
is
not a textual concatenation of the files but contains the totality of the search result entries,
thus preserving the
structure of a valid SOAP document but with a total size that is slightly less than the sum of
individual file sizes.
File res0.xml
can be found in direcotry xmldata
and all the files,
including res0.xml
, can be found in direcotry xmldata\google
.
Note: I collected these files sometime in the middle of May 2002. Due to dynamic nature of the Web, if you try the same query now, you may not get exactly same search results.
Random XML data can also be generated by the supplied rxgen
utility.
This program takes a number or approximate size in KB as an argument and generates a somewhat
random XML document. Look at the source file org.xperf.xpb.RandXMLGen.java
( included in the distribution ) to understand how the random file is generated.
Examples of rxgen
invocation commands include:
$ rxgen -elemcount 10 > xmldata\rxgen100.xml $ rxgen -datasize 100 > xmldata\rxgen100KB.xml
XStat processing consist of scanning one or more XML files and collecting following statistical information for each element in the file:
Note that this processing does require some book-keeping and is not a simple parse of the file. Memory tree oriented APIs like DOM, JDOM and DOM4J make it simple to write the processing code whereas linear scan APIs like SAX and XmlPull require extra datastructures to be maintained. Acknowledgement: I have borrowed the idea behind this processing from the article Using The Perl XML::Parser Module.
org.xperf.xpb.xstat.sax
for details.org.xperf.xpb.xstat.dom
for details.org.xperf.xpb.xstat.pull
for details.org.xperf.xpb.xstat.jdom
for details.org.xperf.xpb.xstat.xslt
for details.org.xperf.xpb.xstat.dom4j
for details.