In Part 4 of our Convergence or Collision? AI and Content Marketing series, we’re exploring Natural Language Generation.
Natural Language Generation 101
To begin, what is Natural Language Generation? Natural language generation, or NLG, “is a software process that automatically turns data into human-friendly prose,” according to Automated Insights.
There are two kinds of NLG:
- Template-based NLG is the first and simpler solution. It employs the use of templates with canned text and placeholders to insert data into them. These systems depend on hard-coded rules, which makes them less flexible.
- Advanced NLG solutions use both supervised and unsupervised machine learning, which renders them more flexible. Rather than inserting structured data into templates, advanced NLG uses neural networks that learn linguistic, morphological, and grammatical patterns from a large body of written language.
Alternative Terms Related to NLG
Natural Language Generation is a type of Natural Language Processing (NLP). NLP encompasses all software capable of interpreting or producing human language, written or spoken.
There are four main types of NLP:
- Speech recognition software understands or transcribes spoken language, such as Dragon NaturallySpeaking.
- Speech synthesis software speaks or reads text out loud, such as CereProc CereVoice.
- Natural language understanding (NLU) software extracts information from written text, such as some of the tools that comprise IBM Watson.
- Natural Language Generation (NLG) software produces narratives and reports in simple language, such as Arria’s NLG Platform.
In a sense, NLG can be viewed as the opposite process of natural language understanding (NLU).
In NLG, the system makes decisions for how to put a concept into words. On the other hand, in NLU the system disambiguates an input sentence to produce the machine representation language. It is machine reading comprehension and considered an AI-hard problem. Disassembling and parsing input is more complex than the reverse process done in NLG.
Why Do We Need NLG?
Natural language generation is important in a cognitive science sense. NLG helps us understand the human language production facility more deeply. NLG saves any organization time and money by heightening staff efficiency. Instead of spending time mining vast arrays of data and descriptive reporting, these tasks can be allocated to NLG systems and employees can devote their hours to more creative, valuable tasks.
Beyond this, NLG can also reduce the time required to communicate information such as credit card balance or financial reports to customers.
Perhaps most importantly, natural language generation can help us quickly evaluate data. The ability to automatically communicate analytical observations in clear, concise, understandable language can be very useful.
How NLG tools help us interpret and evaluate data:
- NLG tools can help us track the progress of real-time events and data, such as sports events, elections, live shows, and more. It keeps us current by automatically updating published stories when new information is available.
- NLG can help supply focus for readers by highlighting the data facts that are most important.
- When it comes to business intelligence, NLG can automatically analyze and generate an English translation of the significant and meaningful parts of the data.
- Next-generation NLG software can summarize large volumes of data, explain why numbers are what they are, and generate reports automatically. This is particularly valuable today in the age of big data, as the incredible amounts of data available to business users and IT departments can be overwhelming.
NLG: The History
Some of the most successful NLG applications to date have been data-to-text systems. These systems generate textual summaries of databases and data sets. Additionally, date-to-text systems generally perform data analysis and text generation.
Academic interest in natural language generation increased in the 1970s:
- In 1970, computer scientist Jaime Carbonell published a report on SCHOLAR, an interactive program for computer-aided instruction based on semantic nets. It is often regarded as the first Intelligent Training System (ITS)
- In the early 1970s, Jane Robinson & Don Walker established the influential Natural Language Processing group at the Stanford Research Institute
- In 1971, Terry Winograd’s PhD thesis at the Massachusetts Institute of Technology demonstrated the ability of computers to understand English sentences in a “blocks world,” which restricted the program’s world to a simulated world of children’s toy blocks. The SHRDLU program could also respond verbally and carried out actions using a robot arm
NLG: Notable Examples
Popular interest in NLG increased along with a desire to summarize financial and business data. The SPOTLIGHT system, for instance, was developed at A.C. Nielsen and automatically generated readable English text by analyzing large amounts of retail sales data.
Other notable examples of NLG include:
- The Pollen Forecast for Scotland system is a simple NLG system that takes six numbers as input. Using these numbers, which are the predicted pollen levels in different parts of Scotland, it generates a short textual summary of pollen levels.
- Dominion Dealer Solutions use a NLG tool to generate vehicle descriptions based on data from multiple sources, such as automotive reviews, Kelley Blue Book, and CARFAX. A study showed that cars with these descriptions sell an average of twenty days earlier than those without one!
- Computational Humor, or joke-telling robots, so to speak, are considered by some specialists to be the ‘final frontier’ for AI, as achieving this successfully requires mastery of spontaneity, self-awareness, and linguistic subtlety.
- Template-based systems that generate form letters, such as a mail merge.
- Business Intelligence dashboard text explanations, as well as reporting, online analytical processing, data mining, predictive analytics and more. These are most popular and most popular for content creation purposes – that is, turning data into written reports
- Video creation – USA Today, for instance, uses an NLG service called Wibbitz to produce short news videos. The service condenses news articles into scripts, combines images and video footage, and adds machine voice as narration to create the videos.
How Does NLG Work?
According to Robert Dale and Ehud Reiter in “Building Natural Language Generation Systems,” the stages or subtasks of NLG generation:
- Content determination: deciding what information to mention explicitly and communicate in the generated text
- Document structuring: overall organization of the information – the order and grouping of sentences
- Aggregation: merging similar sentences and phrases to improve readability and make it feel more natural
- Lexical choice: choosing the content words for the concepts
- Referring expression generation: creating referring expressions (noun phrases) to identify objects and regions – this subtask includes making decisions about other types of anaphora
- Realization: creating the actual text, in accordance with the rules of syntax, morphology, and orthography – how do you put the words together so they make sense?
Evaluating NLG Effectiveness
Now that we know how NLG works, what are the measurements used to evaluate the effectiveness of NLG? There are 3 key metrics:
- Task-based, or extrinsic evaluation – a person assess how a piece of NLG generated text on how it helps them perform a task
- Human ratings – a person rates the quality and usefulness of an NLG generated text
- Metrics – using the same input data for both, compare the texts generated by NLG to those written by people, using automatic metrics such as BLEU (bilingual evaluation understudy).
Potential Problems with NLG
One of the biggest challenges of NLG systems is the inability of NLG to create prose on their own. NLG depends on a templated format and require access to structured data sets.
Additionally, these is no guarantee of consistency with the program’s behavior. This means programmers must anticipate all possible questions and answers.
While the NLG system is able to “write,” it doesn’t have reading comprehension. This means NLG can’t analyze a news story and pull out the figures the way humans are able to. This is good news for content marketers: an NLG system can’t take your place.
Natural Language Generation and Content Creation
Now, for the million dollar question: how useful is natural language generation for content creation? That all depends on the situation and feasibility.
While NLG content automation can be very useful for organizations, but it can also be quite costly. If your organization does not need content at scale, it might not be worth the time or effort to customize NLG technology. Brands must also be willing and able to invest beyond the initial setup, as future content improvements will be required.
Additionally, it’s important to note that NLG requires an entirely separate limb of social content specialists to manage the machinery. Beyond this talent, implementing NLG “requires knowledge of rule-based and branching logic as well as a low grasping power of primary narrative mechanics,” which can all add to your cost.
Bottom line? NLG can best benefit organizations that need large quantities of data evaluated and analyzed, then summarized with automated reports.
In our next post, we’ll answer the question: what’s compelling platform adoption?
Read more from our Convergence Or Collision? AI and Content Marketing series here: