Gaining Control Over Your Big Content - RhinoDox
post-template-default,single,single-post,postid-2172,single-format-standard,ajax_fade,page_not_loaded,,qode_grid_1300,footer_responsive_adv,hide_top_bar_on_mobile_header,qode-content-sidebar-responsive,qode-theme-ver-13.9,qode-theme-bridge,disabled_footer_top,wpb-js-composer js-comp-ver-6.4.1,vc_responsive

Gaining Control Over Your Big Content

Blog Header_Gaining Control Over Your Big Content_7/23

Gaining Control Over Your Big Content

And Solving the Big Content Problem

This is part 2 of a 2 part series on Big Content. If you missed part 1, you can read it here.

In my last post, I introduced the concept of Big Content. This time, we will talk about strategies for managing Big Content, and solving your Big Content problems.

Oceans of Content

In “What is Big Content?”, I defined the concept as the world of digital and physical content and documents you create and interact with every day. And how across most organizations, there are puddles, pools and oceans of content and documents stored in a number of different places.

I also discussed how one of the most popular places to store content in an organization is an Enterprise Content Management (ECM) platform.

At one point, there were glimmers of hope that there could be “one ECM platform to rule them all”, and that enterprises may find a universal platform to fulfill all of their content management needs. That was an unrealistic vision, however.

Most enterprise companies have at least one legacy ECM, and many medium to large enterprises have several different ECM systems. Often these different systems are set up by department (e.g. Marketing & Sales using a different solution than Supply Chain & Logistics). Some of these different solutions may be tightly integrated with other solutions, but many are either poorly integrated or not integrated at all.

To get around the limitations of legacy systems, some teams use cloud document storage like Google Drive and Dropbox, while IT still manages network shares, not to mention all the content living inside your email servers. The result is a poorly-stitched “patchwork quilt” of systems that should make content management easier, while in fact they drive decreased collaboration and productivity. At an enterprise company, this is what we call a Big Content problem.

And in order to solve the Big Content problem, organizations must find an effective way to connect these puddles, pools and oceans of documents and content…

This is no small task.

If you’re lucky enough to be at an organization with a modern software platform, the system should expose integration points and APIs that can begin to make the process easier. For the legacy products however, you may need to consider a modernization project to retire those systems in favor of newer and more connected platforms.

As you evaluate solutions to your Big Content needs, consider the puddles, pools and oceans in your organization. How will you connect them? Do the products you’re evaluating offer connectivity, or do they ignore the reality of the vastness of your content storage and workflows?

Content Analysis

With all of your content stores connected, you can begin deploying analysis tactics to extract value from all of the information your organization has been holding on to.

One tactic you might explore is Text Classification. If you’ve ever visited Google News, you’ve experienced Text Classification at work. At the time of writing this blog, two major world events took place – France just won the World Cup, and the NATO summit just completed. Google News algorithms analyzed the content of all the news articles published on the Internet and intelligently grouped the articles together based on similar topics. So while two news articles about NATO and the World Cup may both talk about France and French interests, the algorithms are smart enough to know one is about sports while the other is about politics.

By applying this kind of classification to your own content, your organization can begin to automatically organize content into appropriate categories. Perhaps you have contracts, invoices, project plans, policies & procedures and client documentation. A text classification algorithm can automatically identify, based on the content of a document, which category it belongs to.  

Another tactic is to apply Natural Language Processing (NLP) to the text of your content. NLP has a variety of applications, including converting a bit of human written text into something better understood by a computer – for example converting a question into a database query; sentiment analysis – was the tone of this text positive, negative or neutral?; and named entity recognition – identifying the people, places and things within text. Traditional NLP approaches have relatively low accuracy rates, but in recent years, neural network algorithms have been developed that dramatically improve accuracy and reliability.

Content Graphs

If you’ve used social media, you’ve used graph technology. Graphs are a way of connecting bits of related information. And In the case of social media, connecting people who have some sort of relationship. As the graph grows, new and interesting insights can be inferred. For example, John knows Mary and Steve, Sue knows Mary and Steve, so we can infer Sue likely knows John.

Applying the graph concept to your Big Content problem can lead to interesting analyses. First, let’s build a document graph, automatically, that relates documents based on the similarity of their content. This supports the capability that while you’re viewing one document, we can now suggest other documents that may be of interest. This can surface documents you would have had to spend additional time searching for, or may not have known to look for at all.

Additionally, we can keep track of what content you view over time. By doing this, we can begin to build out a content graph specific to you, in order to help the system understand the kinds of information you’re interested in. When new content is created, the system can suggest you review it based on your previous searches.


Everything we’ve described so far serves to generate metadata about our content. We now have valuable data that can be used in analytics. This could be as straightforward as analyzing usage patterns, such as number of documents created in a given category over time, or documents viewed over time. Or we can begin to solve more complicated tasks, such as predicting contract renewals based on activity in the content repository for a given customer. Once you have a strategy to manage your Big Content, you can begin to ask questions of it. Knowing what questions are important to ask, and what questions are able to be answered depends on your business and the types of information you have available to you. In this case, the questions are less important than ensuring your content repository allows you to ask them in the first place.

As organizations pursue Digital Transformation, Big Content should be on the radar. When you begin to evaluate solutions to Big Content problems, first seek out systems that allow for connectivity between your oceans of content. A truly modern enterprise content management platform will either already do content analysis, or have plans for the near future to invest in analysis. Lastly, you need to think about analytics in terms of your Big Content, and seek out platforms that are also analytics minded.

Travis Whelan is the Principal Engineer at RhinoDox. He has been in the software industry for more than 15 years, and in the ECM industry for more than 10. When he’s not a busy Rhino, Travis is pretending he’s in the Chopped kitchen serving up creative meals to his family and friends. It’s unknown whether or not any of them have actually enjoyed his cooking.