top of page
Writer's pictureSeth Sanusi

Web scraping GitHub with Power BI




Have you tried the GitHub connector in Power BI? You’ll find information on changes and statistics of the repo, but not much more. What if you need raw code or binary data for a report? Read on.


Goal: show Azure Icons and metadata based on a GitHub repo.





Ben Coleman has a superb write-up and repo of Azure Icons (past and present). He also has some tools and artifacts for web scraping.


Ben also has a well written web gallery for viewing, searching and downloading Azure Icons:

In this post I’ll walk you through creating a Power BI report based on a GitHub repo of images and Power BI web data “by providing example” feature:



  1. locate the repo

  2. in Power BI desktop: connect to the Web page

  3. Click on “Add Table using Examples”

  4. scrape the URLs of the images

  5. create a calculated column to link to GitHub’s “raw” URL

  6. use table, image grid or other Power BI visuals to render image

  7. publish the report, schedule refreshes if desired.


Here's the M code. See AzureIcons2021.pbix on Github:


let
    Source = Web.BrowserContents("https://github.com/benc-uk/icon-collection/tree/master/azure-icons"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Icon Name", ".Link\-\-primary.js-navigation-open"}, {"URL", "[data-pjax=""\#repo-content-pjax-container""]", each [Attributes][href]?}}, [RowSelector=".Box-row + *"]),
    #"Inserted Replaced Text" = Table.AddColumn(#"Extracted Table From Html", "Icon URL", each Text.Replace([URL], "/benc-uk/icon-collection/blob", "https://raw.githubusercontent.com/benc-uk/icon-collection"), type text),
    #"Added Prefix" = Table.TransformColumns(#"Inserted Replaced Text", {{"URL", each "https://github.com" & _, type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Added Prefix",{{"Icon Name", "File Name"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "File Name", "File Name - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"File Name - Copy", "Icon"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns1","-"," ",Replacer.ReplaceText,{"Icon"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",".svg","",Replacer.ReplaceText,{"Icon"})
in
    #"Replaced Value1"





122 views1 comment

Recent Posts

See All

1 Comment


Enhance your IT capabilities with top-tier Azure Cloud training in Pune. Our comprehensive program is designed to equip you with the skills needed to excel in the ever-evolving world of cloud computing. You'll gain a deep understanding of Azure services, architecture, and deployment through theoretical knowledge and hands-on practice.


Like
Post: Blog2_Post
bottom of page