What’s New in Harper 4.6: A Deep Dive Into Vector Indexing, Logging, and Performance

Harper 4.6 introduces a powerful trio of enhancements designed for modern application builders: native vector indexing, granular developer-first logging, and performance improvements optimized for global scale. In this Q&A-style companion to our latest webinar, Harper Field CTO Jaxon Repp sat down with Nenne Nwodo to unpack what these upgrades mean in practice—and how they simplify complex tasks without sacrificing power.

‍

How has search evolved over the last 18 months?

Search has shifted from rigid keyword matching to flexible, AI-powered retrieval. What used to be a matter of SQL LIKE queries has now become a matter of meaning—semantic search. As data grows and user expectations rise, developers need search systems that are both fast and meaning-aware.

‍

What’s the difference between semantic search and vector search?

Semantic search is the umbrella term—it's about retrieving content based on meaning rather than exact matches. Vector search is a technique within that umbrella. By transforming data into high-dimensional vectors, it allows systems to compute "closeness" between concepts, not just keywords.

‍

Why does Harper 4.6 use HNSW for vector indexing?

HNSW (Hierarchical Navigable Small World) offers the best trade-off between performance and accuracy. It structures vector data in a way that allows Harper to quickly retrieve approximate nearest neighbors without exhaustive comparisons. This keeps search fast even as datasets grow.

‍

Are other indexing algorithms supported?

HNSW is the default because it’s well-established and broadly effective. However, Harper’s modular design makes it easy to add custom indexing algorithms if your data or search patterns demand it. Support for multiple embedding models—like OpenAI or local Ollama models—is already built in.

‍

What are some real-world use cases for vector search?

E-commerce recommendations: Search by intent, not rigid product filters.
Code similarity detection: Identify related code snippets or modules.
Video indexing: Search by visual or spoken content in real time.
Personalized recommendations: Match patterns in taste or behavior, not just explicit metadata.

What makes Harper's hybrid search capabilities unique?

Harper’s query planner dynamically decides whether to apply vector or attribute filters first, based on dataset cardinality. This hybrid approach blends structured and unstructured search for more accurate and performant results—all abstracted behind a simple developer experience.

‍

How does Harper handle updates or deletions in vector indexes?

Vector embeddings are recalculated automatically on record updates using Harper’s dynamic field feature. Deletions are cleanly removed from the index. Even bulk re-indexing is supported for cases like swapping embedding models or tuning algorithm parameters.

‍

What’s new in Harper 4.6 logging and why does it matter?

4.6 introduces a revamped, developer-friendly logging system:

Full HTTP request tracing through the entire stack
Per-component configurability
Log shape customization
Live updates to logging settings without restarts

This is especially valuable for debugging complex workflows in production, without compromising system stability.

‍

What’s the new Plugins API, and how is it different from extensions?

Plugins replace extensions as the go-forward abstraction for reusable logic. Unlike extensions, plugins:

Are dynamically loaded
Register via a single-handle method
Avoid being loaded by components that don’t need them
Simplify how components interface with shared functionality

Extensions are still supported for now, but will eventually be deprecated.

‍

How does Harper address performance degradation over time?

Two key challenges affect long-term performance: massive data growth and rising user expectations. Harper addresses both with:

Horizontal scale and intelligent sharding
Real-time indexing optimizations
Component-level replication
A query planner that adapts based on data and workload

All while preserving the simplicity developers expect from document databases like MongoDB.

‍

What’s the performance cost of enhanced logging?

Virtually none—unless you deliberately configure it that way. Logging in 4.6 is opt-in, per component, and can be fine-tuned for shape and frequency. Harper minimizes disk writes and avoids logging overload, ensuring observability without sacrificing performance.

‍

What are common gotchas when migrating from multi-system architectures?

The most common friction comes from mindset shifts, not technical blockers. Harper consolidates multi-service stacks into a single, composable platform, eliminating synchronization issues and latency from chained services. The challenge is unlearning old patterns—once users build their first endpoint, the benefits quickly become obvious.

‍

How important are GPUs in Harper's vector pipeline?

GPUs accelerate the generation of vector embeddings, especially at scale. But Harper supports CPU-based embedding, local testing, and token-based APIs (like OpenAI) as well. You choose the performance/cost tradeoff that fits your use case.

‍

How does Harper handle conflict resolution in active-active replication?

Harper uses CRDTs with versioning and last-writer-wins logic to resolve write conflicts—though simultaneous microsecond-level updates on the same record are rare. This ensures data consistency across distributed nodes without interrupting performance.

‍

Final Takeaway

Harper 4.6 represents a leap forward in developer experience, search capability, and system visibility. Whether you’re building AI-native search experiences or maintaining mission-critical APIs, this release gives you the tools to simplify your stack and scale with confidence.

Check out the docs and try the new features. Feedback? Questions? We’re on LinkedIn, X, Threads, BlueSky, and Slack. You can also contact us directly through our contact form.

‍

Our Story

Podcast

Blog