Prospective onboarding screens from Sparky. June 2024.

CONTEXT

Customers needed questions answered about high consideration products before they could commit to purchase. That's why we made Sparky.

In-page customized suggestion chips and knowledge base switching in Sparky. November 2024.

In early 2024, Walmart had a problem. We sold half a billion products and had no good way for customers to ask questions about them. 

Savvy shoppers — the ones doing real research before dropping $400 on a robot vacuum or $200 on a kids' learning tablet — were stuck. They'd scroll through pages of specs, wade through hundreds of reviews, search the internet for answers that might not even be reliable. And they still weren't sure. One of the savvy shoppers we talked to put it best:

"Most of the products that I’ve spent the time to research and get out there, I’ve been satisfied with the results when I got them. Do I wish I could feel this good about things without spending the time on research? Sure. But I don't think that's feasible."

That resignation was the brief. My job was to give those savvy shoppers a trustworthy source of information, reflecting the Walmart brand.

I joined the Sparky team at the very beginning — no product, no design system, no precedent. What we had was a problem worth solving and a greenfield to work in. Over the course of 2024 I established a north star vision for the product and led conversational design across multiple interconnected workstreams, shaping everything from Sparky's fundamental voice and personality to how it handles a blank page, surfaces reviews, and explains why it's recommending a product.

This is that story.

North Star prototype for Sparky showing chat interface with product carousel and suggestion chips. March 2024.

The situation

Walmart sells over 500 million products. Customers have questions about these items that need answers before they're confident enough to buy. The research process — scrolling specs, reading hundreds of reviews, searching the internet — is exhausting and unreliable.

The obstacle

We needed to define what Sparky actually was before we could build it. Agentic search would be a new behavior for most of our users accustomed to a search bar. We had to invent the vision and align stakeholders across product, engineering, data science, brand, and design, and do it quick enough that it didn't become stalled. 

Funnel framework for directing purchase conversation to more focused product. April 2024.

What I did

  • Led intensive discovery process with cross-functional team of 9

  • Constructed prototypes and designed new chat interface features including product carousels, suggestion chips, and review integration

  • Developed a funnel-based analytic framework

  • Defined the grand framework for care, sensitive topic handling, and fallback behavior

  • Coordinated with product and engineering on fundamental principles and guidelines

The funnel framework was a key contribution. By connecting what a user says to where they are in their decision journey, we could design responses that felt less like a search engine and more like a trusted friend who happened to know everything about the products.

Results

Sparky launched to 5% of Walmart app users and ramped to 100% — over 100 million users.

Authentic customer responses from 12 separate interviews with “Savvy Sams”. February 2024.

The situation

Sparky could genuinely help savvy customers during the planning phase of a high-consideration purchase. But there was a fundamental adoption problem: LLM-based conversational search is a new behavior. Most users had no mental model for it. One of the savvy shoppers we spoke to captured the anxiety perfectly:

"With anything with AI, you have to know how to ask a question right… I've got to ask the right question to get the correct answer."

That fear of asking the wrong question was keeping people from asking any question at all.

The obstacle

Users didn't understand what Sparky could do that standard search couldn't. Without that understanding, the blank input field was intimidating rather than inviting.

Prospective onboarding screens introducing Sparky's personality and suggested questions to first-time users. July 2024.

What I did

  • Performed competitive analysis across AI assistant onboarding patterns

  • Defined behaviors, barriers, and benefits for new users

  • Conceived non-obtrusive ways to surface conversation starters through contextually engineered prompts

  • Designed a two-screen onboarding flow introducing Sparky's personality and capabilities

  • Designed UI elements in conjunction with CxD North Star vision

Results

Flow designed and approved, and scheduled for a later phase of the product roadmap. Sometimes the work you do creates the runway for the next team.

 

Side-by-side training examples showing verbose versus concise Sparky responses used to establish tone and readability guidelines. August 2024.

The situation

Users need answers about product nuances to confidently make purchase decisions — and they want those answers fast. Reading through thousands of reviews is time-consuming, and the internet isn't always a reliable source. But that doesn't mean savvy customers want long drawn-out answers either. One of the savvy shoppers we spoke to put it like this:

"When it gives me too much information, it gets confusing for me. Just tell me what I need to know."

The obstacle

The chatbot was responding in ways that were too verbose, wishy-washy, and not in the right brand voice, leading to early chat abandonment.

Portion of Conversational / Captivating / Confident Rubric for Sparky. This was used in conjunction with tests on hallucinations and actionability to calculate Conversation Quality. October 2024.

What I did

  • Led conversational design process and conceptualization

  • Conducted workshops with data science, brand, and product teams

  • Developed a rubric for Conversational Quality Score in conjunction with Data Science and Analytics

  • Adapted Walmart's brand voice 3 C's (Confident, Captivating, Conversational) into HP Grice's conversational maxims (Be truthful, Be relevant, Be informative, Be clear) to ensure all messaging moved conversations forward

  • Wrote training content and LLM guidelines — including positive and negative response examples for each principle

  • Wrote sample responses within product types to fine-tune response variations

  • Performed intensive data analysis and presented guidelines to design and product leadership

The Conversational Quality Score measured how well responses embodied Walmart's brand voice (Confident, Captivating, Conversational) while also being truthful, actionable, and free of hallucinated details. It gave us a measurable baseline for the first time. 

Results

After Model Revision 1, the Conversational Quality Score improved dramatically. After Model Revision 2, the score reached 100% improvement from baseline.

Conversational Quality Score improvement across two model revision cycles, showing 100% lift from baseline. January 2025.

Experimentation with different kinds of suggestion chips for Sparky. September 2024.

The situation

When savvy shoppers are searching for high-consideration products, they often don't know enough about the product category to know the right questions to ask. That cognitive load — having to generate a question from scratch about something you're not an expert in — was a significant factor in bounce rate. One of the savvy shoppers we spoke to put it bluntly: 

"I don't want to think about TVs. I'm okay with thinking about something like candles, but TVs? No thank you. I definitely want someone else to think about that." 

The obstacle

Using a virtual assistant is a new behavior and users don't understand what it answers better than standard search. The blank input field asks too much of users who are already uncertain.

Portion of Universal Entry Point chip grouping framework defining ten contextual suggestion chip categories. November 2024.

What I did

  • Devised a funnel-based framework specifically for sales-related utterances

  • Used the framework to define a design strategy where suggestion chips adapt to where the user is in the funnel

  • Designed contextually generated chip categories: Seasonal Savings, Trends & Gifts, Personal Viewed Items, Wish List

  • Wrote guidelines and good/bad examples for model training

  • Collaborated with UX design and Product to define behavior and corresponding UI modules

  • Developed the Universal Entry Point chip grouping system

Result 

Full chip system designed and documented. Testing was scheduled for Early 2025 when I left.

Recommendation reasoning in carousel options (headline vs. no headline). August 2024.

The situation

Savvy customers don't just want to know what to buy, they want to know why. They're sophisticated enough to recognize when they're being sold to, and they're sensitive to anything that resembles a sales tactic. One of the savvy shoppers we spoke to articulated the standard we had to meet:

"I'd prefer it to be more informative than persuasive… if it can present me enough information to make a decision, I feel like I can do that… rather than have the AI point me to a particular model."

 

The obstacle

We couldn't overload users with content or lose their trust by delivering irrelevant results. And there was a structural constraint: Sparky didn't actually make product choices — the search algorithm did. The reasoning had to be genuinely informative, not a sales wrapper around an algorithmic output.

Recommendation reasoning moderated user testing reactions. September 2024.

What I did

  • Trained the LLM to write relevant justifications based on conversation context and specific utterance

  • Designed principles and guidelines around recommendation content

  • Collaborated on research plan design with Consumer Insights lead

  • Ran desirability/viability/feasibility/ethics matrix experiments to gauge and derisk design assumptions 

Results 

This was in active experimentation when I left Walmart — derisking design assumptions before a full build. The work established the content principles and LLM prompting patterns that govern how Sparky explains its recommendations at scale.

What this project says about me

Sparky is the most technically complex, largest-scale project I've ever worked on, and I was there from day one when the answer to "what should Sparky do?" was genuinely "we don't know yet."

The thing I'm most proud of isn't any single deliverable. It's that I built frameworks — the funnel model, the quality rubric, the suggestion chip taxonomy — that other designers and data scientists could work from. Good conversation design at this scale has to be principled, documented, and teachable.

I also learned something about what it means to design for trust. Every decision we made — how verbose to be, how to frame a recommendation, how to handle a question Sparky couldn't answer well — was a trust decision. Get it wrong and users bounce. Get it right and you replace hours of research with a thirty-second conversation that gets them to a purchase decision. 

That building of trust by serving the user is the work I want to keep doing.


Next Case Study: Behavioral design and IVR Authentication at Amazon

Check out my other case studies or get in touch: robert@sosincerely.com