Mangala Kader | TIL: Firefox Bookmarks Export

Today I Learned

Long long time ago, I started using browser bookmarks and the burden of carrying around them system to system has been a pain. I hope, there are people like me, want to carry forward the legacy bookmarks to beyond.

I’m not going to be giving the steps for exporting the bookmarks from firefox, as they are already straight forward. Still, if somebody needs help, please look - here .

SED to the rescue

SED - stream editor, popular choice of most of the sysadmins and hardcore developers. Once, you’ve exported, you’d have a html file in any destination based on your choice.

In your terminal, use the following script to filter out only the href part from all those nasty large links:

sed -r -e '/.* HREF=.*$/!d' -e 's/.* HREF="(.*)\" ADD_DATE.*$/\1/g' {INPUT FILE NAME}.html | uniq > {OUTPUT FILE NAME}.txt

Let me breakdown the command

sed is the bash tool, that we’ll be using
-r for extended regex,
-e for expression
'/.* HREF="(.*)\" ADD_DATE.*$/!d' Any line not matching the pattern will be deleted using the first expression
's/.* HREF="(.*)\" ADD_DATE.*$/\1/g' Any line matching the pattern will substitute the captured links in place
Input File (HTML from the export)
uniq for deleting the duplicate links
> {OUTPUT FILE} - store the extracted text from html file

Now, the output file can be managed with any VCS out there.

Improvements (December, 2020):

parallel --pipepart -a {INPUT FILE NAME}.html -j4 --roundrobin \
sed\ -r\ -e\ \'/.\*\ HREF\=\"\(.\*\)\\\"\ ADD_DATE.\*\$/\!d\'\ 
         -e\ \'s/.\*\ HREF\=\"\(.\*\)\\\"\ ADD_DATE.\*.\*\$/\\1/g\' | \
uniq > {OUTPUT FILE NAME}.txt

After sometime, browser took quite sometime to export the bookmarks.html file which was around 50mb and sed took about 13 to 15 secs to process the whole html file. The improvement requires a new package named parallel from GNU. (sudo apt install parallel or directly download from the site)

Command breakdown:

parallel is the parallel job execution tool from GNU.
--pipepart tells the parallel job to chunk and pipe the file to the command
-a input file to the parallel command, here is the bookmarks.html file
-j4 --roundrobin 4 parallel jobs in roundrobin fashion
SED command here is shell-quoted using the parallel utility as follows:

$ parallel --shellquote
parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
{PASTE the SED command from the above (before improvements)}
{SHELL QUOTED OUTPUT STRING}

Copy the shellquoted output string and use it along with parallel utility

After using parallel, the time was reduced to 6 to 7 secs which is half the actual time taken.

til (4) , sed (1) , firefox (1) , bookmarks (1) , linux (5) , unix (5) , script (3) , bash (1) , sh (1) , parallel (1)

TIL: Firefox Bookmarks Export - HTML to PLAIN File

Oct 31, 2020

Today I Learned

SED to the rescue

Improvements (December, 2020):

systemd-nspawn - Containerization - Part...

Erlang Help Docs on Elixir Shell - Using...