Open-Source Generative AI

March 25-26, 2024

Workshop on Open-Source Generative AI

OSGAI is an NSF sponsored workshop that will bring together technical leaders in open-source AI and research with the goal of defining and addressing core challenges of Generative AI. The focus will be on the technical issues that are specific to open AI systems, with the goal of targeting and defining impactful areas for study that are currently not being address in academic venues. These include:

  • How can we improve approaches that facilitate adaptation of AI for a broader range of users and use-cases?
  • How can we further develop an open ecosystem for open data curation and human feedback on generative AI?
  • How can develop evaluations to allow value-driven open-source organizations ensure ethical, safe, and accurate systems?
  • How can we create tooling to allow open-source organizations to build AI models in a decentralized manner?

OSGAI Report

Details

The event was hosted at Cornell Tech in New York City on March 25-26, 2024.

Schedule

The workshop will consist of two full days of invited talks and discussions. The full schedule will be posted after attendance is confirmed.

Time Session Speaker Topic
9:00 Session Hanna Hajishirzi OLMo: Accelerating the Science of Language Modeling
9:30 Stella Biderman “Ethical" != "Closed": Building a Better World in the Open
10:00 Sara Hooker
10:30 Coffee Break
11:00 Session John Cook Building an ecosystem, Not a Monolith
Tim Dettmers
Kathleen Kinealy
Eugene Cheah Building a roadmap, from idea to large-scale model training
Leshem Choshen Wiki-models through Natural Feedback
Ying Sheng
Hector Liu From open source to collaborative LLM research
Justin Maier
12:30 Lunch
1:30 Group walk
2:30 Peter Henderson What are the possibilities and limits for safety in open-source foundation models?
3:00 Daphne Ippolito Data Curation is not One-Size-Fits-All
3:30 Breakout
4:30 Breakout: Reporting
9:00 Session Ludwig Schmidt "Open source AI for multimodality: OpenCLIP, LAION, and DataComp
9:30 Hao Zhang
10:00 Tatsunori Hashimoto
10:30 Coffee Break
11:00 Session Greg Leppert
Yacine Jernite The roles of data access and transparency
Swabha Swayamdipta When all you have are Logits… Towards (Closed-Source) LLM Accountability via Logit Signatures.
Danqi Chen
Louis Castricato
Irina Rish
Suraj Patil
12:30 Lunch
2:00 Session Tegan Maharaj What are the possibilities and limits for safety in open-source foundation models
2:30 Graham Neubig
3:00 Soumith Chintala We need to create a sinkhole: fixing the post-training data problem for open-source models
3:30 Wrap-Up Discussion: Next Steps

Organizers

Alexander Rush
Professor
Cornell Tech / @srush_nlp
Colin Raffel
Professor
University of Toronto