Workshop on Open-Source Generative AI
OSGAI is an NSF sponsored workshop that will bring together technical leaders in open-source AI and research with the goal of defining and addressing core challenges of Generative AI. The focus will be on the technical issues that are specific to open AI systems, with the goal of targeting and defining impactful areas for study that are currently not being address in academic venues. These include:
- How can we improve approaches that facilitate adaptation of AI for a broader range of users and use-cases?
- How can we further develop an open ecosystem for open data curation and human feedback on generative AI?
- How can develop evaluations to allow value-driven open-source organizations ensure ethical, safe, and accurate systems?
- How can we create tooling to allow open-source organizations to build AI models in a decentralized manner?
Details
The workshop will be a small group of researchers and developers interested in defining the future challenges of open-source generative AI.
The event will be hosted at Cornell Tech in New York City on March 25-26, 2024. Two-nights of lodging and travel for participants will be supported by the workshop.
Schedule
Time | Session | Speaker | Topic |
---|---|---|---|
9:00 | Welcome | Sasha Rush | OSGAI |
9:10 | Introduction | Jeffrey Stanton, NSF | NSF POSE and Broader Goals |
9:30 | Session | Hanna Hajishirzi | OLMo: Accelerating the Science of Language Modeling |
10:00 | Stella Biderman | “Ethical" != "Closed": Building a Better World in the Open | |
10:30 | Coffee Break | ||
11:00 | Sara Hooker | ||
11:30 | Session | John Cook | Building an ecosystem, Not a Monolith |
Tim Dettmers | Competitive advantages of open vs closed-source. | ||
Kathleen Kenealy | Gemma | ||
Eugene Cheah | Building a roadmap, from idea to large-scale model training | ||
Leshem Choshen | Wiki-models through Natural Feedback | ||
Ying Sheng | Bridging human and LLM systems: present and future | ||
Hector Liu | From open source to collaborative LLM research | ||
12:45 | Lunch | ||
1:30 | Group walk | ||
2:30 | Peter Henderson | What are the possibilities and limits for safety in open-source foundation models? | |
3:00 | Daphne Ippolito | Data Curation is not One-Size-Fits-All | |
3:30 | Breakout | ||
4:30 | Breakout: Reporting | ||
9:00 | Session | Ludwig Schmidt | "Open source AI for multimodality: OpenCLIP, LAION, and DataComp |
9:30 | Hao Zhang | Some reflections after running Chatbot Arena for 1 year | |
10:00 | Tatsunori Hashimoto | Lessons learned from the Alpaca project | |
10:30 | Coffee Break | ||
11:00 | Session | Greg Leppert | |
Emma Strubell | Economics of Open Source | ||
Yacine Jernite | The roles of data access and transparency | ||
Swabha Swayamdipta | When all you have are Logits… Towards (Closed-Source) LLM Accountability via Logit Signatures. | ||
Danqi Chen | TBD | ||
Louis Castricato | RLAIF, user autonomy, and controllability | ||
Irina Rish | Continual Training of Foundation Models | ||
12:30 | Lunch | ||
2:00 | Session | Tegan Maharaj | What are the possibilities and limits for safety in open-source foundation models |
2:30 | Graham Neubig | Can we make building with open-source AI as simple as prompting ChatGPT? | |
3:00 | Soumith Chintala | We need to create a sinkhole: fixing the post-training data problem for open-source models | |
3:30 | Wrap-Up Discussion: Next Steps |