The details on how everything works.
The unified content vision
In today’s world, there is an ever growing demand for quick and easy access to all types of information generally archived in a variety of methods like paper, microfilm, microfiche, TIFF, HTML, PDF, and other formats. Several content management and search engine technologies have emerged to bring order and structure to this information chaos.
Olive greatly extends the range of services we can offer our customers. Olive’s XML-based approach furthers our mission of providing seamless integration of new and legacy documents across the enterprise.
President and CEO
Spectrum Information Services
Challenges with XML implementation
Many experts agree that XML is the future of our content. Then, why is XML, which was invented more than 10 years ago and is now supported by all the top software vendors, so difficult to implement? Why does it exist in so few organizations?
The main reason is that XML implementation requires special professional tools for its creation, special dedicated workflow for its control, and rigorous organizational procedures to maintain consistency in a large repository. Another problem is that most organizations depend on external content that is not under their control, so they must handle many formats and structures. In addition, we also find ourselves dealing with historical document repositories, comprised of hard copy, microfiche, and legacy digital files. Converting external and legacy documents into XML was virtually impossible due to the complexity and costs involved, until Olive…
Olive XML Automation concept
Olive’s automatic XML transformation solves most of these issues by implementing a “plug and play” XML architecture in any organization. Olive automatically transforms any document resources into XML, enabling organizations to continue their usual content creation routines and benefit from the XML advantages, with no additional effort. Olive easily transforms to XML not only internal content, but also content received from external sources and even processes complete existing historical/legacy collections.
Olive ViewPoint turns the vision of unified content into reality, by automatically transforming documents into a single, unified, intelligent XML format. Olive enables unified view, unified access interfaces on any platform, better search and knowledge utilization, content portability, and future-proof preservation.
Olive’s XML Schema
The architecture is based on a schema that’s principally a comprehensive page and document description language. It maps the original document’s content, style, and hidden intelligence in an open source XML format. This schema encapsulates and preserves text, metadata, structure, context, relationships, styling data, file properties, knowledge tags, hyperlinks, graphics, and image maps. The richly tagged nature of the schema enables the software to restore the look and feel of the original documents (like PDF does) and at the same time provides flexibility to create customized, flexible views on demand and advanced search capabilities (the XML main value), all within any standard browser.
Olive’s XML schema has been designed as a physical “hyper-schema”—a raw XML schema that is not limited to a specific industry or domain but is situated “above” other schemas. By using a hyper-schema, users can easily transform content into any other logical industry schema standards such as NITF, OAI, NewsML, ADSML, or METS. This is easily accomplished by utilizing common filters and XML transformations (XSL). The main advantage of Olive’s hyper-schema model is the ability to create an XML infrastructure that is not limited to a specific industry schema today, but open to be adjusted on the fly to specific schemas as needed, now or in the future.
Olive’s Contextual Information Components
Almost every document is a collection of different linked ideas. Contextual information components represent these ideas and are the logical units (or atoms) of information within a given document that define these ideas.
For example, a component can be an article in a journal or newspaper, a picture in a magazine, or a section or a paragraph in a report. Most users are regularly seeking specific ideas in documents and dislike searching and downloading large documents just to read a certain item in it. Contextual component extraction and tagging enables management and direct search and access to any information nugget within a document, independent of other components, yet maintained in the context of its original document, which readers can jump to with a single click.
The long-term content preservation challenge
The traditional problem that has plagued content preservation technologies is the dependence of content on the technologies from which they were created and more specifically the “proprietary binary document formats.” The organization’s ability to access and read archived digital content in the future mainly depends on the software vendor’s willingness and ability to constantly maintain backward compatibility of a document’s associated reading applications. Many software developers frequently introduce new proprietary document formats, (such as MS Word and PDF), mainly to increase the customer-vendor technology dependency.
Problems arise for a variety of reasons, such as when support is discontinued with versions of proprietary applications, or when organizations decide to stop using certain software. For example, it is virtually impossible today to open a file from 15 years ago. Another aspect of the long-term digital preservation challenge is not only the binary files, but also the digital archive applications themselves. Most of the existing document management and archiving systems are based on a complex, multi-applications, distributed model. This model is based upon management of a variety of proprietary file formats stored in a proprietary repository, while at the same time, the document’s metadata is stored in a proprietary database or management application. It is virtually impossible to maintain such a complex, unsynchronized model over many years.
Olive’s digital preservation model is the only model that provides a solution for the critical challenge of long-term digital document preservation. The solution is based on the idea of a complete separation between the content and its corresponding technology and database-free collections and metadata management. XML in general is the only format that guarantees long-term preservation, since it is an ASCII-based, open-standard format. Olive XML specifically improves the content durability even more, encapsulating in a single file all the data that is normally spread across multiple systems and binary file formats, such as metadata, structure, context, relationships, styling data, file properties, knowledge tags, hyperlinks, graphics, and image maps.