Web scraping GitHub with Power BI




Have you tried the GitHub connector in Power BI? You’ll find information on changes and statistics of the repo, but not much more. What if you need raw code or binary data for a report? Read on.


Goal: show Azure Icons and metadata based on a GitHub repo.

http://sanu.si/icons





Ben Coleman has a superb write-up and repo of Azure Icons (past and present). He also has some tools and artifacts for web scraping.

http://code.benco.io/icon-collection/


Ben also has a well written web gallery for viewing, searching and downloading Azure Icons:

http://code.benco.io/icon-collection/azure-icons/

In this post I’ll walk you through creating a Power BI report based on a GitHub repo of images and Power BI web data “by providing example” feature:


https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-connect-to-web-by-example


  1. locate the repo

  2. in Power BI desktop: connect to the Web page

  3. Click on “Add Table using Examples”

  4. scrape the URLs of the images

  5. create a calculated column to link to GitHub’s “raw” URL

  6. use table, image grid or other Power BI visuals to render image

  7. publish the report, schedule refreshes if desired.


Here's the M code. See AzureIcons2021.pbix on Github:


let
    Source = Web.BrowserContents("https://github.com/benc-uk/icon-collection/tree/master/azure-icons"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Icon Name", ".Link\-\-primary.js-navigation-open"}, {"URL", "[data-pjax=""\#repo-content-pjax-container""]", each [Attributes][href]?}}, [RowSelector=".Box-row + *"]),
    #"Inserted Replaced Text" = Table.AddColumn(#"Extracted Table From Html", "Icon URL", each Text.Replace([URL], "/benc-uk/icon-collection/blob", "https://raw.githubusercontent.com/benc-uk/icon-collection"), type text),
    #"Added Prefix" = Table.TransformColumns(#"Inserted Replaced Text", {{"URL", each "https://github.com" & _, type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Added Prefix",{{"Icon Name", "File Name"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "File Name", "File Name - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"File Name - Copy", "Icon"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns1","-"," ",Replacer.ReplaceText,{"Icon"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",".svg","",Replacer.ReplaceText,{"Icon"})
in
    #"Replaced Value1"






45 views0 comments

Recent Posts

See All