{
“cells”: [
{

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“# Languagesn”, “n”, “This tutorial will explain how to set the language property for various nodes and file objects when using the ricecooker framework.”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Explore language objects and language codesn”, “n”, “First we must import the le-utils pacakge. The languages supported by Kolibri and the Content Curation Server are provided in le_utils.constants.languages.n”

]

}, {

“cell_type”: “code”, “execution_count”: 1, “metadata”: {}, “outputs”: [

{
“data”: {
“text/plain”: [

“Language(native_name=’English’, primary_code=’en’, subcode=None, name=’English’, ka_name=None)”

]

}, “execution_count”: 1, “metadata”: {}, “output_type”: “execute_result”

}

], “source”: [

“from le_utils.constants import languagesn”, “n”, “n”, “# can lookup language using language coden”, “language_obj = languages.getlang(‘en’)n”, “language_obj”

]

}, {

“cell_type”: “code”, “execution_count”: 2, “metadata”: {}, “outputs”: [

{
“data”: {
“text/plain”: [

“Language(native_name=’English’, primary_code=’en’, subcode=None, name=’English’, ka_name=None)”

]

}, “execution_count”: 2, “metadata”: {}, “output_type”: “execute_result”

}

], “source”: [

“# can lookup language using language name (the new le_utils version has not shipped yet)n”, “language_obj = languages.getlang_by_name(‘English’)n”, “language_obj”

]

}, {

“cell_type”: “code”, “execution_count”: 3, “metadata”: {}, “outputs”: [

{
“data”: {
“text/plain”: [

“‘en’”

]

}, “execution_count”: 3, “metadata”: {}, “output_type”: “execute_result”

}

], “source”: [

“# all language attributed (channel, nodes, and files) need to use language coden”, “language_obj.code”

]

}, {

“cell_type”: “code”, “execution_count”: 4, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

“Language(native_name=’Français, langue française’, primary_code=’fr’, subcode=None, name=’French’, ka_name=’francais’)n”, “frn”

]

}

], “source”: [

“from le_utils.constants.languages import getlang_by_native_namen”, “n”, “lang_obj = getlang_by_native_name(‘français’)n”, “print(lang_obj)n”, “print(lang_obj.code)n”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“The above language code is an internal representaiton that uses two-letter codes, and sometimes has locale information, e.g., pt-BR for Brazilian Portuiguese. Sometimes the internal code representaiton for a language is the three-letter vesion, e.g., zul for Zulu.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Create chef classn”, “n”, “We now create subclass of ricecooker.chefs.SushiChef and defined its get_channel and construct_channel methods.n”, “n”, “For the purpose of this example, we’ll create three topic nodes in different languages that contain one document in each.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“from ricecooker.chefs import SushiChefn”, “from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNoden”, “from ricecooker.classes.files import DocumentFilen”, “from le_utils.constants import licensesn”, “n”, “from le_utils.constants.languages import getlangn”, “n”, “n”, “n”, “class MultipleLanguagesChef(SushiChef):n”, “ """n”, “ A sushi chef that creates a channel with content in EN, FR, and SP.n”, “ """n”, “ channel_info = {n”, “ ‘CHANNEL_TITLE’: ‘Languages test channel’,n”, “ ‘CHANNEL_SOURCE_DOMAIN’: ‘<yourdomain.org>’, # where you got the contentn”, “ ‘CHANNEL_SOURCE_ID’: ‘<unique id for channel>’, # channel’s unique id CHANGE ME!!n”, “ ‘CHANNEL_LANGUAGE’: getlang(‘mul’).code, # set global language for channeln”, “ ‘CHANNEL_DESCRIPTION’: ‘This channel contains nodes in multiple languages’,n”, “ ‘CHANNEL_THUMBNAIL’: None, # (optional)n”, “ }n”, “n”, “ def construct_channel(self, **kwargs):n”, “ # create channeln”, “ channel = self.get_channel(**kwargs)n”, “n”, “ # create the English topic, add a DocumentNode to itn”, “ topic = TopicNode(n”, “ source_id="<en_topic_id>",n”, “ title="New Topic in English",n”, “ language=getlang(‘en’).code,n”, “ )n”, “ doc_node = DocumentNode(n”, “ source_id="<en_doc_id>",n”, “ title=’Some doc in English’,n”, “ description=’This is a sample document node in English’,n”, “ files=[DocumentFile(path=’samplefiles/documents/doc_EN.pdf’)],n”, “ license=licenses.PUBLIC_DOMAIN,n”, “ language=getlang(‘en’).code,n”, “ )n”, “ topic.add_child(doc_node)n”, “ channel.add_child(topic)n”, “n”, “ # create the Spanish topic, add a DocumentNode to itn”, “ topic = TopicNode(n”, “ source_id="<es_topic_id>",n”, “ title="Topic in Spanish",n”, “ language=getlang(‘es-MX’).code,n”, “ )n”, “ doc_node = DocumentNode(n”, “ source_id="<es_doc_id>",n”, “ title=’Some doc in Spanish’,n”, “ description=’This is a sample document node in Spanish’,n”, “ files=[DocumentFile(path=’samplefiles/documents/doc_ES.pdf’)],n”, “ license=licenses.PUBLIC_DOMAIN,n”, “ language=getlang(‘es-MX’).code,n”, “ )n”, “ topic.add_child(doc_node)n”, “ channel.add_child(topic)n”, “n”, “ # create the French topic, add a DocumentNode to itn”, “ topic = TopicNode(n”, “ source_id="<fr_topic_id>",n”, “ title="Topic in French",n”, “ language=languages.getlang(‘fr’).code,n”, “ )n”, “ doc_node = DocumentNode(n”, “ source_id="<fr_doc_id>",n”, “ title=’Some doc in French’,n”, “ description=’This is a sample document node in French’,n”, “ files=[DocumentFile(path=’samplefiles/documents/doc_FR.pdf’)],n”, “ license=licenses.PUBLIC_DOMAIN,n”, “ language=getlang(‘fr’).code,n”, “ )n”, “ topic.add_child(doc_node)n”, “ channel.add_child(topic)n”, “n”, “ return channeln”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Run of you chef by creating an instance of the chef class and calling it’s run method:”

]

}, {

“cell_type”: “code”, “execution_count”: 6, “metadata”: {}, “outputs”: [

{

“name”: “stderr”, “output_type”: “stream”, “text”: [

“u001b[32mINFO u001b[0m u001b[34mIn SushiChef.run method. args={‘command’: ‘dryrun’, ‘reset’: True, ‘verbose’: True, ‘token’: ‘YOURTO…’} options={}u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mn”, “n”, “* Starting channel build process *n”, “n”, “u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mCalling construct_channel… u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Setting up initial channel structure… u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Validating channel structure…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Languages test channel (ChannelNode): 6 descendantsu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m New Topic in English (TopicNode): 1 descendantu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Some doc in English (DocumentNode): 1 fileu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Topic in Spanish (TopicNode): 1 descendantu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Some doc in Spanish (DocumentNode): 1 fileu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Topic in French (TopicNode): 1 descendantu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Some doc in French (DocumentNode): 1 fileu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Tree is validn”, “u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mDownloading files…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mProcessing content…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mt— Downloaded e8b1fe37ce3da500241b4af4e018a2d7.pdfu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mt— Downloaded cef22cce0e1d3ba08861fc97476b8ccf.pdfu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mt— Downloaded 6c8730e3e2554e6eac0ad79304bbcc68.pdfu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m All files were successfully downloadedu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mCommand is dryrun so we are not uploading chanel.u001b[0mn”

]

}

], “source”: [

“mychef = MultipleLanguagesChef()n”, “args = {n”, “ ‘command’: ‘dryrun’, # use ‘uploadchannel’ for real runn”, “ ‘verbose’: True,n”, “ ‘token’: ‘YOURTOKENHERE9139139f3a23232’n”, “}n”, “options = {}n”, “mychef.run(args, options)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Congratulations, you put three languages on the internet!”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Example 2: YouTube video with subtitles in multiple languagesn”, “n”, “You can use the library youtube_dl to get lots of useful metadata about videos and playlists, including the which language subtitle are vailable for a video.n”

]

}, {

“cell_type”: “code”, “execution_count”: 7, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

“[youtube] FN12ty5ztAs: Downloading webpagen”, “[youtube] FN12ty5ztAs: Downloading MPD manifestn”, “dict_keys([‘en’, ‘fr’, ‘zu’])n”

]

}

], “source”: [

“import youtube_dln”, “n”, “ydl = youtube_dl.YoutubeDL({n”, “ #’quiet’: True,n”, “ ‘no_warnings’: True,n”, “ ‘writesubtitles’: True,n”, “ ‘allsubtitles’: True,n”, “})n”, “n”, “n”, “youtube_id = ‘FN12ty5ztAs’n”, “n”, “info = ydl.extract_info(youtube_id, download=False)n”, “subtitle_languages = info["subtitles"].keys()n”, “n”, “print(subtitle_languages)”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“n”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“### Full sushi chef examplen”, “n”, “The YoutubeVideoWithSubtitlesSushiChef class below shows how to create a channel with youtube video and upload subtitles files with all available languages.”

]

}, {

“cell_type”: “code”, “execution_count”: 10, “metadata”: {}, “outputs”: [], “source”: [

“from ricecooker.chefs import SushiChefn”, “from ricecooker.classes import licensesn”, “from ricecooker.classes.nodes import ChannelNode, TopicNode, VideoNoden”, “from ricecooker.classes.files import YouTubeVideoFile, YouTubeSubtitleFilen”, “from ricecooker.classes.files import is_youtube_subtitle_file_supported_languagen”, “n”, “n”, “import youtube_dln”, “ydl = youtube_dl.YoutubeDL({n”, “ ‘quiet’: True,n”, “ ‘no_warnings’: True,n”, “ ‘writesubtitles’: True,n”, “ ‘allsubtitles’: True,n”, “})n”, “n”, “n”, “# Define the license object with necessary infon”, “TE_LICENSE = licenses.SpecialPermissionsLicense(n”, “ description=’Permission granted by Touchable Earth to distribute through Kolibri.’,n”, “ copyright_holder=’Touchable Earth Foundation (New Zealand)’n”, “)n”, “n”, “n”, “class YoutubeVideoWithSubtitlesSushiChef(SushiChef):n”, “ """n”, “ A sushi chef that creates a channel with content in EN, FR, and SP.n”, “ """n”, “ channel_info = {n”, “ ‘CHANNEL_SOURCE_DOMAIN’: ‘<yourdomain.org>’, # where you got the contentn”, “ ‘CHANNEL_SOURCE_ID’: ‘<unique id for channel>’, # channel’s unique id CHANGE ME!!n”, “ ‘CHANNEL_TITLE’: ‘Youtube subtitles downloading chef’,n”, “ ‘CHANNEL_LANGUAGE’: ‘en’,n”, “ ‘CHANNEL_THUMBNAIL’: ‘https://edoc.coe.int/4115/postcard-47-flags.jpg’,n”, “ ‘CHANNEL_DESCRIPTION’: ‘This is a test channel to make sure youtube subtitle languages lookup works’n”, “ }n”, “n”, “ def construct_channel(self, **kwargs):n”, “ # create channeln”, “ channel = self.get_channel(**kwargs)n”, “n”, “ # get all subtitles available for a sample videon”, “ youtube_id =’FN12ty5ztAs’n”, “ info = ydl.extract_info(youtube_id, download=False)n”, “ subtitle_languages = info["subtitles"].keys()n”, “ print(‘Found subtitle_languages = ‘, subtitle_languages)n”, “ n”, “ # create video noden”, “ video_node = VideoNode(n”, “ source_id=youtube_id,n”, “ title=’Youtube video’,n”, “ license=TE_LICENSE,n”, “ derive_thumbnail=True,n”, “ files=[YouTubeVideoFile(youtube_id=youtube_id)],n”, “ )n”, “n”, “ # add subtitles in whichever languages are available.n”, “ for lang_code in subtitle_languages:n”, “ if is_youtube_subtitle_file_supported_language(lang_code):n”, “ video_node.add_file(n”, “ YouTubeSubtitleFile(n”, “ youtube_id=youtube_id,n”, “ language=lang_coden”, “ )n”, “ )n”, “ else:n”, “ print(‘Unsupported subtitle language code:’, lang_code)n”, “n”, “ channel.add_child(video_node)n”, “n”, “ return channeln”, “n”, “ “

]

}, {

“cell_type”: “code”, “execution_count”: 11, “metadata”: {}, “outputs”: [

{

“name”: “stderr”, “output_type”: “stream”, “text”: [

“u001b[32mINFO u001b[0m u001b[34mIn SushiChef.run method. args={‘command’: ‘dryrun’, ‘reset’: True, ‘verbose’: True, ‘token’: ‘YOURTO…’} options={}u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mn”, “n”, “* Starting channel build process *n”, “n”, “u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mCalling construct_channel… u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Setting up initial channel structure… u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Validating channel structure…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Youtube subtitles downloading chef (ChannelNode): 1 descendantu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Youtube video (VideoNode): 4 filesu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34m Tree is validn”, “u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mDownloading files…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mProcessing content…u001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mt— Downloaded (YouTube) a144a6af6977247684d2a3977dc6f841.mp4u001b[0mn”

]

}, {

“name”: “stdout”, “output_type”: “stream”, “text”: [

“Found subtitle_languages = dict_keys([‘en’, ‘fr’, ‘zu’])n”

]

}, {

“name”: “stderr”, “output_type”: “stream”, “text”: [

“u001b[32mINFO u001b[0m u001b[34mt— Downloaded 5f22a71e53271eb2d2abe013457a625d.jpgu001b[0mn”, “u001b[31mERROR u001b[0m u001b[34m 3 file(s) have failed to downloadu001b[0mn”, “u001b[33mWARNING u001b[0m u001b[34mtVideo FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs n”, “t Subtitle with langauge en is not available for http://www.youtube.com/watch?v=FN12ty5ztAsu001b[0mn”, “u001b[33mWARNING u001b[0m u001b[34mtVideo FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs n”, “t Subtitle with langauge fr is not available for http://www.youtube.com/watch?v=FN12ty5ztAsu001b[0mn”, “u001b[33mWARNING u001b[0m u001b[34mtVideo FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs n”, “t Subtitle with langauge zul is not available for http://www.youtube.com/watch?v=FN12ty5ztAsu001b[0mn”, “u001b[32mINFO u001b[0m u001b[34mCommand is dryrun so we are not uploading chanel.u001b[0mn”

]

}

], “source”: [

“chef = YoutubeVideoWithSubtitlesSushiChef()n”, “args = {n”, “ ‘command’: ‘dryrun’, # use ‘uploadchannel’ for real runn”, “ ‘verbose’: True,n”, “ ‘token’: ‘YOURTOKENHERE9139139f3a23232’n”, “}n”, “options = {}n”, “chef.run(args, options)”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}

], “metadata”: {

“kernelspec”: {

“display_name”: “Python 3”, “language”: “python”, “name”: “python3”

}, “language_info”: {

“codemirror_mode”: {

“name”: “ipython”, “version”: 3

}, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.6.9”

}

}, “nbformat”: 4, “nbformat_minor”: 2

}