Choosing The Standard Document Format For Your Business
Choosing a standard document format has always been a question for corporates because the answer didn’t lie only in choosing a format, but choosing the best format that solves business problems as well as improve accessibility. The primary objective for the corporates was not to choose the best format, but to address the need that existed. The need was of producing reports, advertising, producing documentations, easy creation and distribution of content over mail and web, learning, training and content management. The demand for choosing the best format was very emerging and straightforward. Until mobile devices and digitization stepped in, the primary intent of corporates was to have a medium to create a content and distribution format of the content that primarily has the following capabilities:
- Easy to read: A document that is easy to read and adjusts itself to the laptop/desktop screens.
- Easy to write: Where a content creator can easily write, edit, and share the content.
- Lay outing capabilities.
- Reformatting and rearranging.
- Highlighting trends and patterns.
- Support charts and graphs.
- Professional in nature and producing visual impact.
- Small in size.
- Cost effective.
- Easy to distribute and share.
1. MS Office
Microsoft came up with an initial release in mid-November 1990 of a revolution name MS Office that has a plethora of document formats out of which major were Word, PowerPoint, and Excel. We call them major in the context of emerging corporate needs as mentioned. A major set of problems were addressed with this Microsoft suite, and the applications have grown successfully with a wide scope of adoption due to their unlimited range of offerings and easy to use nature. In a decade Softpedia reported that Office is used by over a billion people worldwide.
MS Word provided the following offerings to corporates:
- Easy reading.
Easier on-screen reading, control over reading configurations, tools adding value to reading, visually impactful, and professional in nature.
- Easy writing.
MS Word offers ability to create, edit, and share work quickly and easily. Most people can open and work with a document in Word.
MS Excel was another program that helped corporates in maintaining and creating data and reports.
- Data lay-outing.
It provides capability to organize the numeric or text data in spreadsheets or workbooks and helps make more informed decisions.
- Reformatting and rearranging.
Excel provides easier reformatting and rearranging of reports and data. Filter based options to improve readability.
- Helps in analysis.
Excel performs complex analysis over the data. And it summarizes the data with previews of pivot-table options.
- Charts and graphs.
Excel recommends the charts and graphs that best illustrate the data patterns.
- Trends and patterns.
Make it easy to spot trends and patterns in the data by using bars, colors, and icons to visually highlight important values.
PowerPoint offered on the go flexibility to create professional presentations.
- Professional design.
Delivers high quality customized presentations by providing design options that helps maximize the visual impact of the presentation.
- Cinematic motion.
Sliding and animation support.
MS Office took care of most of the needs, but when PDF format stepped in, corporates and other industries like publishing sooner or later realized what more was needed to overcome the challenges lingering to adopt a standard. Few of the challenges that eventually emerged were:
- Security of the content.
- Digital Rights Management (DRM).
- Accessibility (for visually impaired professionals).
- Storage cost.
- More control over content in terms of fonts.
- Size of the content produced.
- Print friendly content.
- Software cost.
- Searchable documents.
- Quick and easy to create and transform.
- Platform independent content and documents.
Earlier MS Word was an option to create, share, and distribute content in a portable format, but it had a lot of dependencies on costly software available to create and for reading, as well it lacked in most of the areas as mentioned above which were taken care of by PDF documents.
2. MOBI eBook Format
3. AZW eBook Format
4. AZW3 Or KF8 eBook Formats
"Early Kindle devices used AZW eBook format, which was a modified MOBI standard. But as Amazon developed even better versions of Kindle, the AZW eBook format also evolved with time. AZW3 or KF8 is the eBook format for the fourth generation Kindle devices. This format was introduced when Amazon released Kindle Fire in the market. KF8 stands for Kindle Format 8. AZW3 is a proprietary format of Amazon. It cannot be used by any other company.AZW3 is a Digital Rights Management (DRM) restricted format. It means that the AZW eBook files are locked to a particular device. These files cannot be transferred to another device. This is done to put a stop on piracy." (Difference Between EPUB, PDF, MOBI and AZW eBook Standards)
5. PDF (Portable Document Format)
Adobe, mentioned about this format in 1991 at Seybold conference commenced at San Jose. It was first called as ‘IPS’ i.e. ‘Interchange PostScript' and the Version 1.0 of PDF was announced at Comdex Fall in 1992 and this technology won a ‘best of Comdex’ award.
PDFs are essential for business and legal documents and forms that are intended to retain their exact visual appearance. These are the important documents that must retain their integrity and security. With the PDF format, corporates can secure the documents so that no one can change the wording of an application or the terms of an agreement.
It seemed that PDF knew what existing challenges were that corporates faced in context of adoption of a standard. PDF took care of lingering need of security to reduced document size and portability to a print friendly content. When compared to MS Word, the advantages were very straightforward. Following section covers few of the advantages of PDF over word and why corporates that shows why not only corporates but other industries like education, publishing were inclined towards PDF adoption.
PDF Vs. Word
PDFs maintain the quality of the content with reduced file size on disk.
- PDFs universal attribute.
PDFs are universal in nature in context of platforms. Word documents are easy to create and edit, but for example when word is viewed on Mac, it doesn’t renders correctly, whereas PDF is platform-independent and maintains the view on varying platforms.
- Quick and easy to create and convert.
PDFs are easy to create and convert. MS Word, PowerPoint, and MS Excel documents can easily be converted in PDF format and vice-versa when needed.
- Trusted security.
Legal professionals too prefer the PDF as a standard document format. As per Legalscans.com, for an electronic document to be admissible in a court of law, it must be created in a file format that cannot be modified. PDFs satisfy that need.
- Password protection.
Corporates deal in sensitive material or intellectual copyrights that need a higher level of security. The password protection in PDF allows both recipients and those receiving the file to know that their information is secure.
- Reading is free.
Most PDF readers, including Adobe Reader, are free.
- Interactive documents.
PDFs support interactive document creation with the help of embedded links, audio, or videos as well.
- Search ability.
Search is quick, fast, and organized.
Corporates alternatively preferred one over other when it came to adoption of Word or PDF. It was never a one sided game.
PDFs were more used under following scenarios.
- Online content.
PDF is useful for taking high-quality documents (newsletters, catalogs, manuals, technical papers, etc.) and making them available on the internet.
When compressed PDF files can be very compact, so it is ideal for storage.
- Business and legal documents.
- Combining multiple formats.
- No restriction on viewing.
- Document exchange.
Corporates preferred MS Word usage in the following scenarios:
- Reusing images.
Images that are part of a Word document are easy to extract and reuse. This can’t be done or tricky (depends on additional tools) with a PDF, because its images are embedded.
There’s no doubt that Word is a powerful document editor.
Word documents were the clear choice for corporates for editing and creating intuitive documents where as PDFs were the choice for presenting or viewing the content i.e. also portable.
With the changing paradigms and emerging trends of Open Standards, Web, and Mobile, the hidden challenges that remained unnoticed started to emerge. Though the documents like PDF and Word were very powerful and had all the credentials to be adopted as a standard, but when it came to produce more cost effective, time effective, storage effective, easy to access, and engaging content, they were bound with their own limitations. The documents, reports, presentations produced were lengthy, detailed, and time taking which intentionally or unintentionally forced readers to lose interest as the only solution to get the information and knowledge share was to read the document.
Had it been a scenario where a client wanted the analysis report on a project for last one year, the legacy would bound a professional to write a detailed document and create charts and analysis report in the form of a 50-80 page document and share it with the client over mail. This would also certainly take a considerable amount to prepare. Not only gathering the information, but to make it presentable, detailed, and professional. And the client is forced to download the document, take at least half their day or more to read and go through it to get the chunk of information that they wanted and even may get tired of reading the 80 page document in a go, which eventually may turn them to lose interest or in the interest of time may get delayed for them to go through. The question is: Was all this required to just share some information? Or could the information be made much more interactive and engaging? What if the client asks the manager to produce one or more iterations to the report? Being too busy, the client didn’t want to read such a lengthy document, so what is the solution? What if they needed to access the information on mobile devices? What would be the cost of maintaining the storage, version, time, and effort to create the document? What if the report and analysis has to be shared with someone who is visually impaired? These questions are very generic, and any corporate or a professional can relate to this situation, and the major question is: Was this the best possible way to create, share, and distribute content and information? The new challenges that corporates start struggling were:
- Content on device.
- Reflow ability of content.
- Digital first approach.
- Engaging content.
Corporates are producing increasing amount of content, as witnessed in the amount of storage capacity used world over, also increasingly the content produced is fighting for attention of the readers, and again the same argument arises: Is the content engaging enough? If not, then the ROI made to produce and distribute the content is negative.
In today’s world the content has to be served in an engaging manner; so video, audio, interactivity, annotation, speech to text, and text to speech, reflow-able and adaptable to different devices formats, sizes, accessible and readable to various user groups with abilities, analytics, track ability, DRM, control access, ease of distribution, these are some of the business factors that corporates care about when it comes to claiming and being digital.
The Publishing World And ePub3 Evolution
The publishing world has been in the forefront of ad acing and solving some of these challenges as they began their digital journey with their content many years back. And that was the time when this evolution started. It’s not more than three years when AAP (American Association of Publishers) stood forward to announce a new project named ePub3 Implementation Project in July 2013. The intent behind this initiative was to reach out to more and more publishers to adopt ePub3 that is the new eBook format that IDPF came up with in October 2011. ePub3 being the latest version of ePub is no doubt based on modern web standards that includes HTML5 and CSS3. In the short term, the migration from print to ePub3 was less about the books that more interactivities and accessible content, but was much more towards slicing the cost of production and delivery of the book across the wide range of devices like tablets and smartphones that modern readers were using. Initially HTML5 and CSS 3 support were used as a marketing strategy for ePub3 adoption, but when the standard was being considered to produce a rich interactive content, which proved to be a way beyond accessible and controlled over print media, the need was not limited to constrain price. In few areas the ePub3 standard has proved to go beyond the browser baseline, like in enabling global language support and rich media and interactivity. The overall value that an ePub3 adds to the digital publishing can be summarized into 4 major spikes:
- Layout enhancements and styling.
- This includes CSS3 enhancements, fonts support, typography support, and layout support as well.
- Rich media and interactivity.
- HTML5 support for scripting.
- Media types like audio, video, and animation.
- Global language support.
- The universal language support for vertical writing.
- Page progression directions.
- Phonetic annotation, etc.
- Accessibility features.
- Better semantics support.
- Synchronizing pre-recorded media with text display.
- Pronunciation hints, etc.
Out of four, the features that played an important role towards U.S. adoption of the standard are primarily “Layout enhancement and styling” and “Accessibility features”. The earlier version of ePub i.e. ePub2 supported very minimal subset of styling, layout, and CSS support. Though ePub2 intentionally supported reflowable format which indeed was a game changer, it lacked in reliable styling and embedded font support. Accessibility made its way not only to educational content, but to other kind of eBooks that are widely used in educational and government settings. Making content accessible certainly increases the sales and reduces digital barriers. In fact, these were the two crucial features that lead corporates to incline towards adopting the format, and the icing on the cake was device support and portability (that eventually turned out to be a benefit for professionals).
The empowerment is not only limited to ePub3, but it extends to its advanced format EDUPUB. EDUPUB is not a new file format; its purpose is to optimize the learning and teaching experience by adding some specifications to ePub3 content for better interoperability and as well as for optimization.
Due to the lack of standards for creating a good interactive content, the readers also lacked in displaying that. The rationale behind the EDUPUB profile was not only to create a new format like ePub3, but to define some guidelines that includes the following:
- Taking advantage of web standards that include CSS3 and HTML5.
- Creating the content which is semantically well structured and relevant for education as well.
- Optimize and focus more on metadata.
- Creating highly accessible and distributable content.
- Supporting an authoring structure with useful education data.
EDUPUB is an implementation profile of ePub3 that is developed to cater some specific requirements for creating a standard ePub for education. Publishers, corporates have a choice to take up different sections, like building configurations over scriptable objects of the profile that suits best to their content meeting business needs.
ePub3 Vs. PDF
"The ePub3 specification is more of a distribution and interchange format. ePub3 has proved to be a medium of representing, wrapping semantically advanced web content for distribution in a single-file format. Advanced web content includes Hypertext Markup Language 5 (HTML5), Cascading Style Sheets (CSS), Web Accessibility Initiative Accessible Rich Internet Applications (WAI-ARIA), Math Markup Language (MathML), Scalable Vector Graphics (SVG), images, and other resources." (ePub3 Overview)
When it comes to readability, PDF is the best option for reading on mac and windows computers but not on devices. For readability on devices ePub is more preferable due to its reflowable nature.
Adobe official website says “More than 150 million PDF documents publicly available on the web today, along with countless PDF files in government agencies and businesses around the world”. As per this analysis, it seems that ePub is not yet as popular as PDF.
PDF and ePub both are open standards, and can be viewable in multiplatform.
PDF is static in nature that retains original layout as well as eBooks. While ePub has its outstanding reflowable content, the text display can be optimized for the ePub-reading devices.
PDF file can be created to take notes, highlights with Adobe Acrobat. However, ePub is more versatile in nature to accommodate various needs.
"PDF is well known for its security feature. Not only the digital signature allows you to proclaim the authority of the PDF files, but also the open password and owner password set to the PDF can protect others from copying and printing, even opening. ePub files can be optionally containing DRM, but it is not a requisition." (7 Differences between PDF and ePub)
In short, ePub3 allows creating interactive, engaging, digital content at level that is cross-platform and connects to the educational and corporate ecosystem seamlessly. ePub3 gives numerous features to tag the content at very granular level, therefore providing a digital solution to all the problems related to portability, accessibility, and readability. The accessibility is not only limited to details of the content, but the format is accessible to professionals who are blind or visually impaired. ePub3 promises a vast access to employment, education, entertainment, information, and knowledge for people having disabilities.
PDF files on the other hand give more control over layouts and fonts. A PDF can be generated by certain GUI-based tools. The major challenge w.r.t. PDF is accessibility and device support. It is more difficult to produce PDF in a web friendly format which adapts well to various displays and devices. PDFs have almost no reflow ability support that ePub3 provides. Though, when it comes to a print friendly format, PDFs can always be put at first position.
Comparison Of eBook Formats
IBM And ePub3 Adoption
Recently IBM made a constitutional change in how it delivers documentation, choosing ePub3 over print media to advance accessibility, aid ease of use on smart devices like tablets, mobile phones, and to provide a medium for delivering next generation interactive, accessible content, documentation to its customers. The only need that leads to this change was to reduce digital barriers and increase mobile support. In a press release at SEATTLE, WA on 13th Feb 2014, IDPF (International Digital Publishing Forum) and IBM announced a collaboration to create a white paper showcasing lessons learned from IBM’s decision to support ePub as its primary standard format. As a positive reinforcement of the “ePub3 Implementation Project” started by AAP, IBM in 2016 looks forward to adopt ePub3 as its standard format for documentation.
IBM faced a lot of challenges w.r.t. print media which lead to adoption of ePub3 format. PDF format, though can be tagged but was very cumbersome to make accessible, overall accessibility infrastructure, was very limited and outdated and there was almost no access to math content and rich graphics and interactivities. When it was looked at the customer’s point of view, it turned out to be very costly to maintain accessibility issues and it was and it was of almost no use to disabled users. Search was another issue but not a constraint. The content needs to be more searchable, not only syntactically, but also semantically. The limitation with the PDF format is that it is very tough to make it interactive and accessible. Translation, control over fonts, programmatic accessibility, reflow ability, scrolls were few of the challenges that hugely impact readability of the document.
The focus of the company to look beyond windows desktop platform and go mobile first lead to the need of changing the legacy of documentation to something which is more rich, can be controlled and accessible, and the answer was ePub3. ePub3 does not guarantee to produce a print friendly content and mobile devices on the other hand does not care for it, which eventually turns into a perfect mix to get a desired solution. PDF was not at all usable format for handhelds, and apart from that IBM wanted to leverage accessibility work on open web platform for all its documentation. This also benefited the company in reusing the open web accessibility and having access to digital math through MathML. With this change, therefore overcoming all the challenges, IBM's vision was to provide highly interactive content, documentation to its customers. IBM had an internal transformation strategy to meet this goal that involved transformation tools that converted the documents stored in OASIS Darwin Information typing Architecture (DITA). Oxygen XML editors were used earlier to author the content. IBM is contributing new open source emitters to convert DITA to accessible ePub documents for others to use.
“AAP looks forward to the forthcoming IDPF/IBM white paper, which we believe will be a positive reinforcement of the work of our ePub3 Implementation Project cross-industry partnership created last year. The member publishers of AAP believe that broad adoption of ePub3 will offer countless benefits for all those who read, use, create, produce, distribute, and publish content. In particular, educators will have the ability to incorporate more feature-rich and interactive materials which will be inherently accessible to all students, including those with print disabilities. We look forward to this collaboration and to the resulting white paper, which should bring more valuable insight to our community.” - Edward McCoyd, Executive Director for Digital, Environmental and Accessibility Affairs, The Association of American Publishers
Can Corporates Reconsider The PDF Format?
Unlike ePub, PDF documents can also contain high-quality, resolution-independent content from any source. Unlike ePub, PDF technology includes all the basic features people expect in electronic documents, including the ability to add comments, redact text, use pages of differing sizes and orientations, navigate objects that span more than a single page, and more. Once software vendors have added features to PDF viewers that allow for an ePub-like experience we can expect to see the appeal of ePub continue to fade against PDFs. Given that the US federal government’s Access Board has dramatically raised the profile of accessible PDF, this sort of development will not be long in coming.
Unlike ePub, PDF isn’t just a publication format, it’s a document format. It seems that people like their publications to have document features as well.
So certainly there are intrinsic factors that may lead a company to re-consider PDF for standardization. Corporate is very vast and so the document standards are. And the need of corporates on case to case basis may be biased to choose various formats, and each one also may have a valid reason to do so.
Challenges In ePub3 Adoption
1. Reader Support.
One of the major challenges with ePub3 right now is that not all online readers can support it fully. Companies like Amazon, Kobo, and Sony currently have not widely adopted this standard in their iOS and Android applications. Apple on the other hand currently provides BEST support for ePub3 via iBooks, but invest zero marketing effort into proclaiming it. The crux of the issue is that if corporates distribute their content in ePub3, it limits the number of apps patrons can use to read the content, defying a pure multimedia experience.
2. Conversion And Creation Tools Available.
There are many conversion and creation tools available for ePub3, that not only claim Word, PDF, ppt, excel, image to ePub3 conversion but also claim to provide a digital first approach in creating the content, but there are few which actually do what they claim. The nature of content is very wide and mysterious which is very tricky to convert in the form of a standard IDPF compliant ePub3. ePub3 is an open format and can also be created manually by anyone having some knowledge of HTML5 and ePub3 package structure, but doing it manually will incur a lot of time, and so the need of automation tool arises. Very few automation tools are available that understands various input documents and process them to get an accessible and standard ePub3. Most of the tools result in issues like font extraction, image extraction, maintain styling, text extraction and thus lose the track and disappoint their customers.
3. A Bit Heavy Format.
ePub3 as a package comprises a large set of files and that may go on increasing as per need and requirement, which makes the size of an ePub a bit heavy. All the resources, assets used in ePub3 file have to be packaged inside the format. Though this limitation persists, there is a greener side to it that it does not force content producer to create multiple copies of the content, and a content producer can benefit this by just storing and distributing an ePub on any cloud-based content distribution platform in a single repository and can continuously work on it.
ePub3 provides a clear, comprehensive, practical specification that promises not just to enable cool things to be done in eBooks and content in a more consistent way but also to help corporates rationalize their workflows to save money and time and make their content more adaptable to reading systems of the future.
At a large scale when we sneak peek in Asia Pacific, ePub is widely adopted and serves as a standard. Since Amazon could not make much in this region, ePub3 standard steals the show for providing rich, device ready, accessible content. All the educational content is ePub in Korea, it is widely adopted throughout. Nevertheless the use of Open Web Platform, ePubs will benefit rapid changes in accessible technologies and provide a reflowable format best suited for mobile devices and lend itself to adaptation, embedded dynamic content, bookmarking, annotation, and advanced learning and various ways to improve education material. This is the same reason where corporates indulge to leverage the capability that this standard provides.
The real victory of the corporates lie in the next generation flexibilities to create and distribute rich, interactive, enhanced, and accessible mobile-ready content and documents.