Skip to content

AWS megatests outputs page#514

Merged
ewels merged 24 commits intonf-core:masterfrom
mashehu:add-aws-results-page
Oct 30, 2020
Merged

AWS megatests outputs page#514
ewels merged 24 commits intonf-core:masterfrom
mashehu:add-aws-results-page

Conversation

@mashehu
Copy link
Copy Markdown
Contributor

@mashehu mashehu commented Oct 2, 2020

Basically copy-pasted from aws-js-s3-explorer. Files are currently downloaded when clicking their name in the table and not yet rendered on page, as was suggested.
I am actually not 100% sure, how to handle that. Should we basically have a download for non-html files and html files will be rendered on that page (with the file table above or below)?

We also need to figure out a way to handle the dev branch, since it doesn't appear in pipeline.json yet, so we don't have the commit-hash for it (which is needed for the path in the aws bucket). Similarly, for versions without a aws results, the "example output" tab should be hidden.

Closes #435.

add releases hashes to $releases array
@mashehu mashehu requested a review from ewels October 2, 2020 16:52
@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 6, 2020

Absolutely love it 🚀

For dev, could maybe show a list of unrecognised commits? Or even just query the GitHub API to find the commit for the latest dev? Not sure what would be best.

Rendering HTML / txt files etc would definitely be nice. May be tricky as I guess they are downloading due to the header that is being set from amazon though. Not sure how we can change / bypass that?

Minor stuff:

  • Missing path separators in the top path: image (needs some CSS :after magic for the breadcrumbs I guess)
  • Full-width page might be nicer?
  • When I load a new pipeline the breadcrumbs show that I'm in the folder for the results, but the browser shows that top-level directory that I have to click to see inside
  • Would be nice to hide the top-level directories above that release perhaps, so that you can't (easily) browse off to other pipelines etc or lose your place if you click too high in the breadcrumbs

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 7, 2020

Also, I have a feeling that in some cases files are being shown even when they should be hidden within a folder? eg. digging around various viralrecon runs.

There are also a crap-tonne of browser console logs (expected if still dev code I guess), plus a loading spinner would be quite nice when it's having a proper thing about stuff 👀

Still totally love how well this works - is pretty much exactly what I was hoping for 😍

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 8, 2020

Added a couple of very minor tweaks just so that I can take some screengrabs for a talk 😉

@mashehu mashehu marked this pull request as ready for review October 9, 2020 15:27
@ggabernet
Copy link
Copy Markdown
Member

Great work, the viewer looks amazing! 😍 🚀

I just have some minor things that I saw when testing it with the dark website mode:

  • when using the dark view mode of the website the html background is also dark which makes it difficult to read.
  • In the dark theme the AWS full path at the top is difficult to read because of the light colour.

It might be nice to have the possibility of downloading the whole results folder in addition to individual files, if that is possible.

Alright, now just the example outputs need to be in the bucket! I'll try to go over the most recent releases of the pipelines and execute the runs in AWS with the release commit so that at least the "small" tests results are there as a showcase.

Copy link
Copy Markdown
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had another look at this and I absolutely love it 😍 The new file icons are an absolute winner..

The one thing I find a bit unintuitive still is that clicking a file opens it in a new tab to download. I expect it to open folders, but the download surprises me every time.

What do you think about a file click opening the preview box underneath for everything? Then we could remove the "Preview file" buttons and just have the box preview if it meets criteria. If it doesn't, we can get the preview box to say "Binary file, 20kb. Click here to download" or whatever. I'm thinking kind of like how GitHub does it:

image

This then requires two clicks to download the file, which I think is appropriate given that some of the files are huge and that most people won't actually want to download files most of the time.

Comment thread includes/header.php Outdated
Comment thread includes/header.php Outdated
Comment thread public_html/assets/css/nf-core-dark.css Outdated
Comment thread public_html/assets/js/aws-s3-explorer.js Outdated
Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>
@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Oct 28, 2020

After testing it, I am not so sure about the "Open in new tab" button, because it directly starts a download for html and image files (on firefox) when clicking on it and only opens text files in a new tab.
And since we have now a "download file" button in every file row, maybe we should can remove it from there and rename the "Open in new tab" button into download file button.

I couldn't figure out the URL to download a whole directory, so I just added a copy button for the s3 url of it (the button label "Copy Bucket S3 URL" needs some rephrasing I think 🙂)

…-aws-results-page

# Conflicts:
#	includes/pipeline_page/_index.php
@ewels ewels changed the title first proof-of-concept version of example outputs page AWS megatests outputs page Oct 28, 2020
@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 28, 2020

Ok nice - yes I agree, switch the text for the new tab button for a download button (as that's what it does most of the time) and remove the download button from the table 👍🏻

Can we also add some additional checks to prevent previewing the file in the browser in some cases? eg. very large files and certain file extensions (.xml). On the mhcquant page I nearly crashed my browser by clicking on a 6MB XML file that was loaded into the DOM (didn't show anything except a very very long empty <pre> element with lots of invisible content).

Checked with .bam and .bai files and it showed No preview available for binary or compressed files. very nicely 🙌🏻

Only other request that I can think of is making the fontawesome file icon part of the link. I keep clicking on them and nothing happens 😉

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 28, 2020

Just poking a bit and seems to be using <iframe srcdoc=""> with inline file contents. Any reason to do this instead of just <iframe src=""> with the URL?

@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Oct 29, 2020

I used <iframe srcdoc=""> because some files for example the execution_*.html inside the pipeline_info directory, don't have the correct content-type, when requested. So a <iframe src=""> just shows a white page and starts a file download instead.

matthiasho added 2 commits October 29, 2020 14:37
fix iframe src setting, remove file download buttons, add icons into link, fix xml preview.
@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

Right, gotcha - yes and it works beautifully to get around the HTML headers. So we're stuck between a rock and a hard place - either download some HTML files or break the rendering of others.

I can see the difference between these two HTML files in their metadata on s3:

execution_report.html (triggers download)

qualimapReport.html (renders in browser)

image

image

@pditommaso - do you have any idea why files with the same filetype would be uploaded to s3 by Nextflow with different Content-type headers?

As the in-browser rendering is working great for all other filetypes, we could simply have a cron-job that loops over all HTML files in the s3 bucket and fixes the content-type headers. Then we can use regular iframes and have the reports load properly. What do you think @mashehu?

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

Confirmed - if you edit the content-header to text/html, the execution HTML reports view in browser (assuming that your browser doesn't have a cache of them).

@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Oct 29, 2020

Yes, I think the majority should load fine by using src=. So whatever solution you find for the AWS side of this is fine for me. 🙂

I was playing around on how to do the handle the dev version. I couldn't figure out a way to find the most recent results- folder in a bucket on AWS (it looks like only files have a date and the directory structures are a bit too complicated/inconsistent for me to sort them based on that). So now dev shows all available result directories for a pipeline.

I fixed the URL based rendering, so now we can deep link into a specific AWS directory, e.g. http://localhost:8888/mhcquant/example_output#sarek/results-bce378e09de25bb26c388b917f93f84806d3ba27/Reports/1234N_1234N_M1/bamQC/1234N_1234N_M1/

Oh and the xml file rendering works now as well (the large xml files killed your browser because it was trying to render it as an html file 🙄 ).
I set a file size limit of 10MB, because some multiqc_reports can be quite large and I think they are one of the more important things to have a preview for. But I am happy to lower the limit, if you think this is not necessary .

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

The other thing that we could do to avoid messing around with AWS content headers is to tunnel HTML files through a PHP script on the server. I've done this in other cases to strip annoying Access-Control-Allow-Origin headers that prevent files being loaded into the DOM.

Basically, just have a PHP file with the following:

<?php
header("Content-type: text/html; charset=utf-8");
echo file_get_contents($_GET['url']);

Then load that in the iframe instead. Might be easier than messing around with cron jobs and AWS authentication etc.

@pditommaso
Copy link
Copy Markdown
Contributor

do you have any idea why files with the same filetype would be uploaded to s3 by Nextflow with different Content-type headers?

Because NF is not aware of support for MIME provided by S3

@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Oct 29, 2020

The other thing that we could do to avoid messing around with AWS content headers is to tunnel HTML files through a PHP script on the server. I've done this in other cases to strip annoying Access-Control-Allow-Origin headers that prevent files being loaded into the DOM.

Basically, just have a PHP file with the following:

<?php
header("Content-type: text/html; charset=utf-8");
echo file_get_contents($_GET['url']);

Then load that in the iframe instead. Might be easier than messing around with cron jobs and AWS authentication etc.

The problem with this solution is that we generate the iframe dynamically with javascript, so I am not sure how to do this then...

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

@pditommaso - Because NF is not aware of support for MIME provided by S3

So I guess there is some kind of AWS magic somewhere that is setting these automatically? They are there and they vary across different HTML files, so something is doing it 🤔

@mashehu - The problem with this solution is that we generate the iframe dynamically with javascript, so I am not sure how to do this then...

Just set the iframe src attribute (not srcdoc) to '/s3/'+url so that the URL ends up being something like:

https://nf-co.re/s3?url=http%3A%2F%2Fnf-core-awsmegatests.s3-eu-west-1.amazonaws.com%2Fsarek%2Fresults-bce378e09de25bb26c388b917f93f84806d3ba27%2FReports%2F1234N_1234N_M1%2FbamQC%2F1234N_1234N_M1%2FqualimapReport.html

Edit: Updated URL to use an encoded link from javascript encodeURIComponent() function. PHP then decodes this with urldecode() and it's safe to use in a URL

Can start being more clever with nicer URLs and url / file extension / file size validation etc, but that's the core concept. Should work fine with the iframe with dynamic content I think.

@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Oct 29, 2020

I think we reach the limits of my PHP-skills here. 🙂
Where should I add handlers in the php code for these special URLs? Can you maybe point me to examples of what you are thinking of.

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

Ok I had a quick play with the redirection thing and whilst it works (headers are stripped and the files display in the browser properly), the web pages with relative links still break in the same way as they do with the iframe srcdoc. So let's just stick with that. I'm thinking that there shouldn't be so many HTML reports that do this anyway - hopefully just Qualimap. If it's a big issue then we can always come back to it.

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 29, 2020

@mashehu - are you happy to revert the changes back to srcdoc then? Sorry about this. Then I think that we're good to merge 👍🏻

Copy link
Copy Markdown
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments left which I think can be deleted, but then LGTM for merge - amazing work 🙌🏻

Ongoing discussing on Slack #aws-megatests about fixing the content-type headers for all HTML files, so may revisit this in the future if we can get that to work. But I'd rather merge this now as it is basically working and come back to that if anything happens.

Comment thread includes/pipeline_page/example_output.php Outdated
Comment thread includes/pipeline_page/example_output.php Outdated
Comment thread public_html/assets/css/nf-core-dark.css Outdated
@ewels ewels merged commit 99aafb4 into nf-core:master Oct 30, 2020
@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 30, 2020

Hah, found my first bug as soon as I merged: Copy Bucket s3 URL button doesn't seem to do anything:

image

Shows a toast saying it worked, but nothing changed in my clipboard..

@ewels
Copy link
Copy Markdown
Member

ewels commented Oct 30, 2020

Other copy buttons for specific files are working..

@apeltzer
Copy link
Copy Markdown
Member

It also doesn't seem to show anything for e.g. atacseq, eager, ...

@ggabernet
Copy link
Copy Markdown
Member

@apeltzer, unfortunately the problem of the container not being built before the release AWS test is triggered is persisting and did not find a good solution for that.

However, I've manually re-triggered the workflow again for the latest release commit of the most recently released pipelines (until two months ago). The tests were triggered again for these. In the case of eager, it failed and I made a PR to hopefully fix that: nf-core/eager#600.

Proteomicslfq and epitopeprediction also fail on AWS, so the output can't be showed.

I've triggered with Tower chipseq which was passing in the past and it's running at the moment, so hopefully this solves it for this pipeline as well.

@ggabernet
Copy link
Copy Markdown
Member

Hi @ewels @mashehu, I'll look into triggering a Lambda function now on all the html files. What exactly in the html needs to be converted to what?

This is one example of html file only displayed when clicking download.

Is it this line? And exactly which tag?

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

@mashehu
Copy link
Copy Markdown
Contributor Author

mashehu commented Nov 3, 2020

Some have a content-type octet-stream, but they should all be text/html. See #514 (comment)

@ggabernet
Copy link
Copy Markdown
Member

Hi, sorry I still don't get exactly the issue. For me, the execution_report.html of your example, which I guess would be this one, renders fine in the browser, even though the metadata is Content-type: application/octet-stream.

@ggabernet
Copy link
Copy Markdown
Member

Ah gotcha, it triggers a download in a new window

@ggabernet
Copy link
Copy Markdown
Member

ggabernet commented Nov 3, 2020

Unfortunately after reading about it, it does not seem to be so straight-forward:

Each Amazon S3 object has data, a key, and metadata. The object key (or key name) uniquely identifies the object in a bucket. Object metadata is a set of name-value pairs. You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show #awsmegatests results on pipeline page

5 participants