Scraping YouTube with OpenAI (Python, ChatGPT, YouTube Transcript API)

  • 00:00 Intro
  • 04:22 Demonstration
  • 10:10 Lab Setup
  • 15:33 Getting YouTube Transcripts with Python
  • 22:12 Summarize YouTube Transcript with ChatGPT
  • 28:34 Create topic Time codes with ChatGPT
  • 35:31 Final Thoughts

Setup:

Get OpenAI API Key - https://platform.openai.com/account/api-keys
pip3 install openai
pip3 install youtube-transcript-api

1-transcript.py

This lab pulls the YouTube Transcript for a video and prints it out as both JSON, and plain text.

from youtube_transcript_api import YouTubeTranscriptApi

url = 'https://www.youtube.com/watch?v=NXJqHVZJ9lI'
print(url)

video_id = url.replace('https://www.youtube.com/watch?v=', '')
print(video_id)

transcript = YouTubeTranscriptApi.get_transcript(video_id)

print(transcript)

output=''
for x in transcript:
  sentence = x['text']
  output += f' {sentence}\n'

print(output)

2-gpt.py

This lab summarizes a YouTube video, and creates Tags using ChatGPT.

from youtube_transcript_api import YouTubeTranscriptApi
import openai

openai.api_key = 'APIKEY'

url = 'https://www.youtube.com/watch?v=UCGaKvZpJYc'
print(url)

video_id = url.replace('https://www.youtube.com/watch?v=', '')
print(video_id)

transcript = YouTubeTranscriptApi.get_transcript(video_id)

output=''
for x in transcript:
  sentence = x['text']
  output += f' {sentence}\n'

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a journalist."},
    {"role": "assistant", "content": "write a 100 word summary of this video"},
    {"role": "user", "content": output}
  ]
)
summary = response["choices"][0]["message"]["content"]

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a journalist."},
    {"role": "assistant", "content": "output a list of tags for this blog post in a python list such as ['item1', 'item2','item3']"},
    {"role": "user", "content": output}
  ]
)
tag = response["choices"][0]["message"]["content"]

print('>>>SUMMARY:')
print(summary)
print('>>>TAGS:')
print(tag)
print('>>>OUTPUT:')
#print(output)

#print(transcript)

3-timecode.py

This lab asks ChatGPT for subjects discussed in a vide and asks for the time code.

from youtube_transcript_api import YouTubeTranscriptApi
import openai

openai.api_key = 'APIKEY'

url = 'https://www.youtube.com/watch?v=UCGaKvZpJYc'
print(url)

video_id = url.replace('https://www.youtube.com/watch?v=', '')
print(video_id)

transcript = YouTubeTranscriptApi.get_transcript(video_id)

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k",
  messages=[
    {"role": "system", "content": "You are a database computer"},
    {"role": "assistant", "content": "data is stored in JSON {text:'', start:'', duration:''}"},
    {"role": "assistant", "content": str(transcript)},
    {"role": "user", "content": "what are the topics discussed in this video. Provide start time codes in seconds and also in minutes and seconds"}
  ]
)
timecode = response["choices"][0]["message"]["content"]

print(timecode)

Be the first to comment

Leave a Reply