Personify Individuality from Online Forums


By Derek Wang

Individuality Matters!

Every consumer is unique. People purchase your product for different reasons. In high unit-price industries - such as automotive and hospitality – personalizing the customer experience is critical.

Within the automotive industry manufacturers offer dozens of vehicles with hundreds of additional options for consumers to choose from. Understanding the tastes of a customer has a direct impact on conversion. How else can a dealer know ahead of time how to effectively guide customers towards a purchase before they even walk through the door?

Instead of asking your customers to fill in a lengthy survey, our team at Stratifyd is trying to proactively follow the "bread-crumbs" from their textual data (e.g. comments from their forum posts). We automatically highlight who your customers are and discern why they like and dislike your product. Social intelligence of this nature includes what makes them tick and what ticks them off. Stratifyd uses the results to tailor personalized offers that meet and exceed customer expectations from the first interaction.

Follow the Bread-Crumbs

The challenge we imposed on ourselves was to infer over 8 million user personas from the #1 Chinese online forum based on over 53 million forum posts. Why a Chinese forum, you may ask? It's simple; we wanted to see if our algorithm can be applied to a global scale in a matter of days!

Well, we are proud to say that we have done it! Take a look at the Spider Web Chart below – this is a sample output we have for every registered user who commented on the forum. Each users’ preferences around "Gas Efficiency", "Cost Efficiency", "Power", "Interiors", "Exteriors", "Comfort", "Space", and "Handlings" are automatically inferred by our engine within a few hours. We are accomplishing a high 90% accuracy in our inference by comparing these outputs to the training set, and we are looking at expanding the persona to more granular dimensions.

Note: no personal indentifying information is used in this inference.

Figure 1. Spider Web Chart. Customer auto feature preferences inferred by Stratifyd engine.


We hired a million Amazon Mechanic Turk audiences and paid them to do the work. Entirely joking – at this scale, even if we hired the same amount of humans, they would not be able to generate the desired result because there is too much bias towards forum posts and scattered knowledge.

If we rely on the power of GPU and machines, this feat becomes accomplishable through deep learning at its finest. To start, we collected all the comments and post data from Autohome Forum. In total, we collected over 53 million blog posts from the active users on that forum.

Next, we needed to get the implicit connectivity between preferences and the textual data. We also collected another 1.5 million official vehicle reviews as shown in the table below. These can be any product review with consumer rating data - such as JD Power Data - to build our intent detection model. When pumped into our Stratifyd GPU-accelerated mixture neural network (MNN), less than 3% (1.5/53 million) of the total blog posts helped us to accomplish the ~93% accuracy rate in inference, and we followed an 80-10-10 rule in machine learning.

Table 1. Data Sample of a Detailed Owner Review

Our inference model can predict the option on each sentence or short phrase even without context. For example, it can tell when you say “this car looks manly”, to infer you are talking about “exterior”; or when one refers to “cut through corner” to guess you are talking about “handling”. The model allows us to perform sentence level classifications for long texts like posts and articles.

As illustrated below in Figure 2, during the inference phase, each user’s historical posts are decomposed into sentences before applying the model. The model will annotate each sentence based on its semantic meaning, and further summarize to their own persona.

Figure 2. Inference Process. Posts are decomposed into sentences, annotated, and summarized.


On the macro level, our results show that the top three factors influencing customers are “appearance”, “gas efficiency”, and “horsepower” of a vehicle. We do observe a shift in time when a temporal dimension is added to the analysis.

Figure 3. Overall Customer Intents. Factors influencing automotive consumers inferred from their text reviews.

Cars have stereotypical buyers too, for the simple reason that they are designed with that target audience in mind.

What looks good, feels good, and seems practical is completely relative, not universal. What’s more actionable is actually at the micro level; marketers can now start matching vehicle personas to customer preferences. For example, Customer A or Consumer Segment A is inferred to place a high value on horsepower when contemplating an automotive purchase based on historical reviews. Marketers can use this insight to drive their touchpoints across all channels that interact with or reach the customer or consumer segment.

By visualizing the predicted proportion upon 8 factors of a user’s historical posts, it is intuitive to match those attributes to specific the user ID and introduce vehicles to them that feature these factors more prominently than others.

Figure 4. Customer Opinion Visualization. User A’s customer intent scores per factor.

As shown in Figure 4, appearance and handling are the most mentioned aspects in user A’s texts. This indicates that the appearance and handling of a vehicle will be the most important features influencing user A’s purchasing decision. This customer may respond more favorably to cars with higher torque for better handling and more detailed designs that improve appearance.


Forums are a natural place for people to freely discuss the most relevant opinions around a product or service. Their comments and posts provide tremendous value in guiding your engagement with them, from your marketing to your product design. Inferring their personas and matching them with your product features will significantly broaden your scope of client acquisition and also improve your customer satisfaction. Let us help and contact us for more details.

About Stratifyd

Stratifyd cares about the intersection between AI and Customer Analytics. We help companies like yours understand why your customers love you, how your customers want to buy your products, and which customers are at risk of churning. Our AI is tuned to analyze your consumer interactions and automatically derive their representative personas. We apply the same rigorous automation to improve your customer retention and acquisition on the first day of implementation.

About the Author

Derek Wang holds a Ph.D. in Computer Science from the University of North Carolina at Charlotte and is the Founder and CEO of Stratifyd. The company is the result of post-doctorate work involving government-funded research on how AI could ingest, analyze, and visualize unstructured data. The strong business and government use cases led to the emergence of Stratifyd as a leading innovator in analytics technology and AI. He was previously the Associate Director of the Charlotte Visualization Center at the University of North Carolina at Charlotte. Mr. Wang has also worked as a software engineer and research scientist for Bank of America, Microsoft, Xerox, and Motorola. He is highly involved in Charlotte's local tech and academic community