AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Sharing Conversations with Large Language Models: A New Resource for Researchers

As large language models (LLMs) like GPT-4 and LLAMA become increasingly sophisticated, they are used by a wider range of people, from experts to the general public, for various tasks. These interactions generate valuable data for training and improving LLMs. While for-profit companies collect user data through their model APIs, the open source and research community lags behind in accessing and using this data.

To bridge this gap, researchers from The Hebrew University of Jerusalem and MIT have developed the ShareLM collection, a unified set of human conversations with LLMs, and a Chrome plugin that allows users to contribute their own conversations.

The Need for Open Data

The ShareLM collection recognizes that the existing open datasets of human-model conversations are treated as static artifacts, lacking the dynamic nature of real-world interactions. These datasets also struggle with diversity and representativeness, as they often rely on specific demographics and platforms.

ShareLM Plugin: Empowering Users

The ShareLM plugin addresses these limitations by giving users control over their data and offering a simple way to contribute to the open-source community. Here’s how it works:

The ShareLM Collection: A Growing Resource

The ShareLM collection currently contains over 2.3 million conversations from more than 40 models, and the plugin continuously adds new data. This rich dataset provides researchers with a valuable resource for:

A Call for Community Effort

The ShareLM collection and plugin are a testament to the power of open data and community collaboration in advancing LLM research. By making this data publicly available, researchers hope to encourage others to contribute to the collection and build upon the existing work.

The ShareLM initiative represents a significant step towards a more open and collaborative approach to LLM development. By providing a framework for sharing human-model conversations, it empowers researchers and empowers users to contribute to the advancement of AI.